MyArxiv
Robotics
AwareVLN: Reasoning with Self-awareness for Vision-Language Navigation CVPR 2026
Vision-and-Language Navigation (VLN) requires an agent to ground language instructions to its own movement within a visual environment. While state-of-the-art methods leverage the reasoning capabilities of Vision-Language Models (VLMs) for end-to-end action prediction, they often lack an explicit and explainable understanding of the relationships between the agent, the instruction, and the scene. Conversely, explicitly building a scene map for heuristic planning is intuitively appealing but relies on additional 3D sensors and hinders large-scale vision-language pre-training. To bridge this gap, we propose AwareVLN, a novel framework that equips the navigation model with a self-aware reasoning mechanism, enabling it to understand the agent's state and task progress in a fully end-to-end and data-driven manner. Our approach features two key innovations: (1) a structural reasoning module that fosters spatial and task-oriented self-awareness, and (2) an automatic data engine with progress division for effective training. Extensive experiments on various datasets in Habitat simulator show our AwareVLN significantly outperforms previous state-of-the-art vision-language navigation methods. Project page: https://gwxuan.github.io/AwareVLN/.
comment: Accepted to CVPR 2026. Project page: https://gwxuan.github.io/AwareVLN/
GesVLA: Gesture-Aware Vision-Language-Action Model Embedded Representations
Vision-Language-Action (VLA) models have shown strong potential for general-purpose robot manipulation by unifying perception and action. However, existing VLA systems primarily rely on textual instructions and struggle to resolve spatial ambiguity in complex scenes with multiple similar objects. To address this limitation, we introduce gesture as a parallel instruction modality and propose a Gesture-aware Vision-Language-Action model (GesVLA). Our approach encodes gesture features directly into the latent space, enabling them to participate in both high-level reasoning and low-level action generation, and adopts a dual-VLM architecture to achieve tight coupling between gesture representations and action policies. At the data level, we construct a scalable gesture data generation pipeline by rendering hand models onto real-world scene images. This reduces the sim-to-real visual gap while producing rich data with diverse motion patterns and corresponding pointing annotations. In addition, we employ a two-stage training strategy to equip the model with both gesture perception and action prediction capabilities. We evaluate our approach on multiple real-world robotic tasks, including a controlled block manipulation task for validation and more practical scenarios such as product and produce selection. Experimental results show that incorporating gesture consistently improves target grounding accuracy and human-robot interaction efficiency, especially in complex and cluttered environments. Project page: https://gwxuan.github.io/GesVLA/.
comment: Project page: https://gwxuan.github.io/GesVLA/
Superhuman Safe and Agile Racing through Multi-Agent Reinforcement Learning
Autonomous systems have achieved superhuman performance in isolation or simulation, yet they remain brittle in shared, dynamic real-world spaces. This failure stems from the dominant single-agent paradigm for physical applications, where other actors are ignored or treated as environmental noise, preventing effective coordination. Here we show that multi-agent reinforcement learning provides the essential safety scaffolding required for real-world interaction. Using high-speed quadrotor racing as a high-stakes testbed, we train agents to navigate complex aerodynamic interactions and strategic maneuvering with a variable number of racers. Through league-based self-play, agents evolve sophisticated anticipatory behaviors, including proactive collision avoidance, overtaking, and handling multi-agent physical interactions, including aerodynamic downwash. Our agents outperform a champion-level human pilot in multi-player races at speeds exceeding 22 m/s, while simultaneously reducing collision rates by 50 % compared to state-of-the-art single-agent baselines. Crucially, training with diverse artificial agents enables zero-shot generalization to safer human interaction. These results suggest that the path to robust robotic co-existence lies not in isolated safety constraints, but in the rigorous demands of multi-agent interaction. Multimedia materials are available at: https://rpg.ifi.uzh.ch/marl
comment: 12 pages (+4 supplementary). Website: https://rpg.ifi.uzh.ch/marl
N3P: Accelerated Automated Parking via a Learning-Based Naturalistic Three-Stage Scheme SC 2026
Autonomous parking requires efficient path planning that ensures kinematic feasibility and collision avoidance in constrained environments. Hybrid A* is widely used but computationally expensive, while reinforcement learning (RL) methods lack reliability and often struggle with long-horizon geometric constraints, leading to suboptimal trajectories. We present N3P, a fast learning-based three-stage framework for automated parking. By introducing an intermediate preparatory pose and using a learning module to predict it, N3P decomposes the maneuver into simpler subproblems, thereby reducing computational complexity and accelerating path generation. We validate the framework by integrating it with Hybrid A* algorithms. Experiments in perpendicular and parallel parking scenarios show that N3P-enhanced Hybrid A* speeds up planning by more than 80%. It also outperforms RL baselines in success rate and trajectory quality, producing shorter trajectories with fewer gear changes, while achieving comparable or lower planning time in most cases.
comment: Accepted at IEEE Intelligent Transportation Systems Conference (ITSC 2026)
TriSweep: A Four-Drone Swarm Framework for Electromagnetic Side-Channel Analysis
Electromagnetic (EM) side-channel analysis traditionally assumes a stationary, close-proximity probe - a threat model that underestimates aerial adversaries. TriSweep is a simulation framework that designs and evaluates a four-drone swarm architecture for autonomous standoff EM-SCA of embedded microcontrollers at 0.25-1.5 m. Three spatially specialized collector drones - Anchor (full-spectrum), Mask Probe (mask-register loading leakage), and Cipher Probe (masked SubBytes output leakage) - feed a stationary Accumulator drone that performs coherent combining (+4.8 dB SNR gain) and second-order mask cancellation via a centered product of the two spatially separated leakage streams. Evaluated against three real ANSSI ASCAD datasets (ATmega8515 masked AES-128 and 50/100-sample desynchronized variants), the framework achieves a simulated key rank of 18 +/- 1.7 (five-seed) at 0.25 m on the primary masked dataset. Profiling-trace cross-correlation alignment reduces single-drone rank from 89 to 21 on the 100-sample-jitter variant, demonstrating compensation for drone hover vibration. A two-channel CNN in the Accumulator converges to a loss of 0.454 (vs. random baseline 5.545) and improves rank on desynchronized datasets. No physical hardware has been fabricated; prototype construction is the planned next step.
comment: Simulation framework + systems design for a four-drone swarm performing standoff electromagnetic side-channel analysis. No hardware fabricated yet
Scout-Assisted Planning for Heterogeneous Robot Teams under Partially Known Environments
Autonomous robot teams navigating partially known environments face costly backtracking when ground robots encounter blocked roads that are only revealed upon physical traversal. We address this with Scout-Assisted Planning, a heterogeneous planning framework in which scouting Unmanned Aerial Vehicles proactively gather environmental information to improve Unmanned Ground Vehicle navigation. To focus scouting on the most consequential edges, we propose Information Gain-based Action Pruning, which scores candidate scouting actions by their expected impact on ground robot behavior. Since exact Information Gain-based Action Pruning computation is prohibitively expensive, we develop a Graph Neural Network based model that predicts information gain values directly from graph structure and belief state, reducing planning time to real-time levels without sacrificing solution quality. Experiments across three environment types show that SAP with Information Gain Action Pruning reduces ground robot travel cost by 31.9--37.7% over the Canadian Traveler Problem baseline, and outperforms proximity-based scouting guidance by an additional 8--14%, confirming that principled information-gain-guided scouting is both more effective and computationally feasible for real-world deployment
Symmetries Here and There, Combined Everywhere: Cross-space Symmetry Compositions in Robotics
Robots exhibit a rich variety of symmetries arising from their mechanical structure and the properties of their tasks. Although many robotics problems exhibit several symmetries simultaneously, existing approaches typically treat them in isolation, failing to exploit their combined potential. This paper introduces cross-space symmetry compositions, a framework for learning robot policies that are jointly equivariant to multiple symmetries across configuration and task spaces. Leveraging the differential-geometric structure of the forward kinematics map, we both descend symmetries from configuration to task space and lift symmetries from task to configuration space, enabling their composition within a unified representation space. We validate our framework on simulated and real-world experiments on a dual-arm robot, demonstrating that jointly leveraging multiple symmetries yields improved generalization.
comment: 8 pages, 8 figures, 1 table
SE3Kit: A Lightweight Python Library for Specialized Geometric Primitives in Robotics
The Python robotics ecosystem faces a challenge: while many libraries exist for rigid body transformations, few are both lightweight and mathematically strict. This paper introduces SE3Kit, a lightweight Python library efficient operations on the Special Euclidean Group SE(3) and the Special Orthogonal Group SO(3). Unlike established frameworks that require heavy dependencies (e.g., SpatialMath, PyPose) or general tools that lack robotics-specific features (e.g., SciPy), SE3Kit targets the gap between these extremes. It is designed for embedded deployment, rapid prototyping, and education while providing rigorous mathematical implementation. It provides a pure-Python, NumPy-only implementation of Lie Group operations, without the overhead of deep learning or other visualization software.
Decoupling Ego-Motion from Target Dynamics via Dual-Interval Motion Cues for UAV Detection
Object detection from Unmanned Aerial Vehicles (UAVs) is challenged by severe ego-motion, camera jitter, and large scale variations. While modern detectors perform well on static images, their direct application to UAV video often fails, particularly for small objects in dynamic scenes. Existing motion-based methods either rely on computationally expensive optical flow or use single-interval differencing, which is sensitive to jitter and limited in capturing diverse motion patterns. We propose a vision-only motion-guided detection framework that decouples target motion from camera-induced disturbances. A homography-based Global Motion Compensation (GMC) first aligns adjacent frames. We then introduce a Dual-Interval Motion Extraction strategy that captures both short-term and long-term motion cues. To integrate these cues, a lightweight Motion-Guided Attention (MGA) module enhances feature representations within a Feature Pyramid Network. Experiments on the VisDrone-VID dataset demonstrate consistent improvements over a strong YOLOv8 baseline under severe ego-motion. Ablation studies further confirm the effectiveness of the dual-interval design and the proposed motion-guided attention mechanism.
Branch-Stochastic Model Predictive Control for Motion Planning under Multi-Modal Uncertainty with Scenario Clustering
Motion planning for autonomous driving must account for multi-modal uncertainty in both the intentions and trajectories of surrounding vehicles. Handling uncertainty in a worst-case manner guarantees robustness but often leads to excessive conservatism. Stochastic Model Predictive Control (SMPC) reduces trajectory-level conservatism through chance constraints, yet remains conservative with respect to intention uncertainty since constraints must hold across all intentions. We present a novel combination of SMPC and the branching structure, enabling the planner to generate distinct trajectories for different possible intentions while maintaining safety under trajectory uncertainty. A novel scenario clustering is proposed to merge prediction scenarios based on high-level decision similarity, thereby ensuring real-time tractability. Furthermore, an adaptive branching-time computation postpones commitment to separate plans until intention uncertainty is sufficiently reduced. Simulation studies in challenging highway scenarios demonstrate that the proposed method improves safety, reduces conservatism, and achieves real-time computational performance.
comment: This work has been accepted for presentation at IFAC World Congress 2026
MoSA: Motion-constrained Stress Adaptation for Mitigating Real-to-Sim Gap in Continuum Dynamics via Learning Residual Anisotropy
Learning real-world dynamics from visual observations is crucial for various domains. A common strategy is to calibrate simulators by estimating physical parameters, yet accuracy is ultimately bounded by the underlying physical models, which often assume materials are homogeneous and isotropic. Even if reasonable, real-world objects typically exhibit mild anisotropy and heterogeneity. After the near-isotropic backbone is well calibrated, these residual effects become the key bottleneck for further closing the real-to-sim gap. Although neural networks can fit dynamics end-to-end, such black-box modeling discards strong physical priors, leading to poor data efficiency and overfitting. Therefore, we propose MoSA, a motion-constrained stress adaptation framework that targets these residual effects to further improve real-to-sim dynamics learning. MoSA uses an isotropic model as a physics prior and learns residual stress operators to capture mild anisotropy and heterogeneity. It progressively adapts stresses via microplane-constrained redistribution in a physics-informed cascaded network. We further impose motion constraints by supervising temporal and spatial derivatives of the deformation field. Experimentally, our learned dynamics achieves superior accuracy, generalization, and robustness, while learning physically meaningful residual anisotropy. Finally, we validate MoSA in a robot manipulation setting, showing that better real-to-sim dynamics modeling translates into more reliable sim-to-real transfer. Project Page is available at https://mercerai.github.io/MoSA/.
Quantifying Full-Body Immersion
Humanity is at the forefront of yet another digital revolution, where the lines between real and virtual worlds are dissolving, reshaping how we perceive and interact with our surroundings. In this context, we introduce a transformative paradigm for immersive virtual experiences centered around whole-body kinetic interactions. Our approach redefines immersion through three distinct levels: audio-visual immersion, capturing sensory realism; physical immersion, delivering haptic feedback; and full-body immersion (FBI), where dynamic bodily interaction integrates seamlessly with virtual environments. At the core of this innovation lies a scalable, distributable platform based on modular robotic surface units inspired by the adaptive designs of nature. These units enable the rendering of immersive environments at any scale, from intimate personal experiences to expansive multi-user settings, dynamically adapting to interactions in real-time. The modular system distributes force, shape, and motion feedback throughout entire spaces, replicating the physical characteristics of the environment and enabling new depth of engagement through FBI. By combining scalability, adaptability, and dynamic physical engagement, this framework bridges the gap between real and virtual worlds. It offers an unprecedented level of immersion where users can engage their entire bodies in symbiotic interactions with the virtual space. This work not only advances immersive technology but also redefines how humans and virtual environments coexist, setting a foundation for a new era of human-environment synthesis.
comment: This manuscript is under consideration for possible publication in the Nature. Copyright may be transferred to Nature if the manuscript is accepted for publication, without further notice
Understanding Multimodal Failure in Action-Chunking Behavioral Cloning
Behavioral cloning becomes difficult when the same observation admits several valid actions. We study this problem for action-chunking policies and show that different multimodal parameterizations fail in different ways. For latent-variable policies, posterior-prior regularization makes deployment-time sampling more reliable, but excessive regularization removes the action-conditioned information needed to distinguish demonstrated modes. Reducing this regularization can preserve mode information, but then success depends on whether the prior covers the relevant latent regions. For action-space generative policies, multimodality is constrained by the smoothness of the base-to-action transport: a map with small Lipschitz constant cannot assign substantial probability to many well-separated modes. Covering many modes therefore requires either sharp transitions in base space or off-support bridge regions in action space. Experiments on synthetic multimodal tasks and robotic simulation benchmarks support these mechanisms.
Steins;Gate Drive: Semantic Safety Arbitration over Structured Futures for Latency-Decoupled LLM Planning
Cloud-hosted LLM driver agents provide useful semantic judgments, but their inference latency exceeds stepwise vehicle-control windows. Learned world models predict futures, but they usually keep future generation and action selection inside large coupled loops. We present SteinsGateDrive, a latency-decoupled planner-runtime architecture in which the worldline metaphor from the eponymous story names one plausible consequence of an intervention: the LLM selects counterfactual driving futures before the final control instant, and a runtime reuses the selected forecast only while safety contracts remain valid. The generator builds three world-line roles: alpha nominal ego-conditioned futures, beta interaction counterfactuals around nearby vehicles, and gamma hazard-stress futures such as braking, cut-ins, or blocked corridors. The selected branch becomes a typed StrategicForecast with horizon, validity/abort conditions, fallback, and authority. On a within-subject, matched-seed normal-highway protocol with 10 seeds and 20 steps, GPT-5.4 mini reduces effective lag from +3.07 s at 1-second horizon to -0.01 s at 4-second horizon while preserving the measured no-collision safety boundary. The architecture's safety contribution comes from the atom-predicate runtime check, not from the drift score, which functions as a refresh-frequency knob.
comment: 10 pages, 2 figures, 5 tables, submitted to IEEE transaction of intelligent vehicles
Pre-VLA: Preemptive Runtime Verification for Reliable Vision-Language-Action and World-Model Rollouts
While large vision-language-action (VLA) models and generative world models (WM) have advanced long-horizon embodied intelligence, their practical deployment remains challenged by uncertainty in learning-based action generation. Low-quality actions may cause physical failures during execution or lead to misleading world-model rollouts with redundant rendering costs. To address this issue, we propose Pre-VLA, a unified runtime verification architecture that performs preemptive action validity assessment before physical execution or world-model imagination. Pre-VLA leverages an efficient multimodal backbone with modality-aware pooling and a lightweight dual-branch head to predict both safety confidence and critic-derived advantage scores for candidate action chunks. To handle severe class imbalance and unstable boundary decisions, we train Pre-VLA with a multi-task objective combining Focal classification, advantage regression, and soft-threshold calibration. During deployment, a dual-mode preemptive resampling scheduler filters low-quality actions and triggers adaptive resampling under a limited computation budget. Experiments on the LIBERO benchmark show that Pre-VLA improves the average closed-loop success rate across four suites from 30.79\% to 37.62\% over RynnVLA-002, reduces task execution steps, achieves 183.9 ms average forward verification time per action chunk, and mitigates error accumulation in world-model rollouts.
Terminal Constraint Model Predictive Control for Image-Based Visual Servoing of UAVs with Kalman Filter-Based Moment Loss Compensation
Image-Based Visual Servoing (IBVS) provides an efficient vision-guided control paradigm for unmanned aerial vehicles (UAVs) by directly regulating image-space errors. However, conventional IBVS controllers are vulnerable to two critical issues: loss of closed-loop stability near the target due to input and state constraints, and control failure caused by intermittent loss of moment-based visual features under aggressive motion. To address these challenges, this paper proposes a terminal-constraint model predictive control (TC-MPC) framework for IBVS, integrated with a Kalman filter (KF)-based state-prediction mechanism. The TC-MPC explicitly incorporates terminal-state constraints and a terminal cost into the IBVS error dynamics, ensuring recursive feasibility, improved convergence behavior, and closed-loop stability under control and state constraints. In parallel, the Kalman filter predicts the temporal evolution of image moments during short-term visual degradation, enabling the controller to preserve control continuity when moment measurements are partially unavailable. The proposed approach is validated through real-time UAV visual servoing experiments.
Real-Time Auto-Optimization in Unknown Environments via Structure-Exploiting Dual Control for Exploration and Exploitation
This paper develops a fast numerical dual control for exploration and exploitation (DCEE) method to address auto-optimization problems in unknown environments. In auto-optimization problems, the optimal operating condition is unknown a priori and may vary with the environment. As in classical dual control techniques, computational burden remains a major concern in DCEE for active learning. Existing DCEE methods provide a principled exploration-exploitation objective, but mainly realized through standard optimization packages or explicit gradient-type update laws, where the numerical structure of the DCEE has not been fully exploited. This paper shows that the reward function in DCEE has an inherent convex-over-nonlinear structure, where the exploitation and exploration terms form a unified nonlinear residual map equipped with a convex outer loss. Benefiting from this structure, a structure-exploiting numerical method is developed by linearizing only the nonlinear residual map while preserving the convex outer loss. Thus, each subproblem is transformed into a structured convex form that can be solved reliably. The resulting generalized Gauss-Newton Hessian approximation is positive semidefinite and depends only on first-order derivatives, thereby supporting fast online computation. The proposed method is evaluated on a vehicle cruising auto-optimization problem and compared with existing methods. Simulation and hardware-in-the-loop experimental results show that the proposed method improves control performance and achieves a speedup of approximately one order of magnitude, with a microsecond-level maximum computation time of only 83 μs on a typical vehicle embedded CPU.
Diffusion-guided Generalizable Enhancer for Urban Scene Reconstruction ICRA 2026
Urban scene reconstruction from real-world observations has emerged as a powerful tool for self-driving development and testing. While current neural rendering approaches achieve high-fidelity rendering along the recorded trajectories, their quality degrades significantly under large viewpoint shifts, limiting the applicability for closed-loop simulation. Recent works have shown promising results in using diffusion models to enhance quality at these challenging viewpoints and distill improvements back into 3D representations. However, they often require costly per-scene optimization, and the distilled representations remain fragile and fail to generalize beyond limited synthesized views. To address these limitations, we propose GenRe, a novel diffusion-guided generalizable enhancer for urban scene reconstruction. GenRe takes as input any pretrained 3D Gaussian representation and fixes the deficiencies within a few minutes. By learning to distill generative priors across diverse scenes, GenRe produces robust and high-fidelity representation efficiently that generalizes reliably to challenging unseen viewpoints (e.g., lane change). Experiments show that GenRe outperforms existing methods in both quality and efficiency and benefits various downstream tasks, enabling robust and scalable sensor simulation for autonomous driving.
comment: ICRA 2026. Project page: https://waabi.ai/genre
How can reasoning capability empower the AI copilot robot in endoscopic surgery
Reasoning capability has significantly advanced complex logical inference and robotic decision-making in general domains. However, its potential in the Artificial Intelligence (AI) copilot robot-particularly implemented based on the Vision-Language-Action (VLA) model-remains unexplored in endoscopic surgery. Effective reasoning should enable AI copilot robots to integrate multimodal cues, interpret surgical intent, and infer hidden tissue dynamics, thereby alleviating intraoperative uncertainty and cognitive burden on surgeons. Properly implemented, reasoning-driven autonomy can transform AI copilot robots from reactive executors into cognitive collaborators, enhancing precision, safety, and sustainability in clinical practice.
comment: Accepted by npj digital medicine
Spatial Memory for Out-of-Vision Manipulation in Vision-Language-Action ICML 2026
We introduce SOMA, the Spatial Memory framework for Out-of-Vision Manipulation in Vision-Language-Action (VLA) models. Most existing VLAs implicitly assume that task-relevant objects are always visible, leading to brittle and reactive behaviors when targets fall outside the camera's field of view. SOMA addresses this limitation by equipping VLAs with a persistent spatial memory constructed from multi-view observations acquired via a movable head camera, enabling reasoning beyond the current visual frustum. The framework consists of three components: Spatial Memory Construction, which aggregates angular-wise observations into a unified spatial-semantic representation through scanning; Dynamic Memory Refinement, which maintains global consistency over time; and Contextual Memory Retrieval, which activates instruction-relevant spatial cues during manipulation. We evaluate SOMA on five challenging real-world out-of-vision manipulation tasks, including multi-step and dual-arm scenarios where target objects are initially invisible. Experimental results show that SOMA not only improves task success rates, but also induces qualitatively different manipulation behaviors, with faster target localization, reduced viewpoint search, and near one-shot grasping under partial observability. Additional experiments on RoboCasa GR1 and SimplerEnv further validate the effectiveness of SOMA's memory design under conventional fully observable settings. Code will be released soon.
comment: Accepted by ICML 2026
Imagine2Real: Towards Zero-shot Humanoid-Object Interaction via Video Generative Priors
Whole-body Humanoid-Object Interaction (HOI) is bottlenecked by the scarcity of high-fidelity 3D data. While video generative priors offer a promising alternative, existing methods suffer from \textit{Representation Misalignment} due to their reliance on geometric priors (e.g., explicit CAD models), and \textit{Retargeting Complexity} arising from intensive morphing and morphological mismatch. We propose Imagine2Real, a zero-shot HOI framework for flexible, geometry-free interaction. To resolve misalignment, we formulate robot and object motions as unified 4D point trajectories. To overcome retargeting complexity, our Keypoints Tracker tracks only sparse critical points (base, hands, and object), entirely bypassing the error-amplifying retargeting process. To maintain natural gaits despite these sparse signals, we utilize the latent space of a Behavior Foundation Model (BFM) as the tracker's search domain. Using a progressive training strategy, Imagine2Real learns robust behaviors with simple tracking rewards, enabling zero-shot physical deployment within a motion capture(mocap) system.
An Evidence Hierarchy for Bayesian Object Classification via OSINT-Aided Heterogeneous Sensor Fusion
Heterogeneous sensor fusion is vital for detecting, localizing, and classifying CBRNE threats. However, individual sensors are often only capable of detecting a subset of relevant threats with varying reliability or can even provide only indirect threat indications, making threat classification challenging. Furthermore, high clutter rates on the sensor side present a great challenge for fusion systems. Additionally, the limited availability of high quality datasets hinders the advancement of learning-based detection and classification models in smart sensors. To mitigate these sensor related shortcomings, a context-aware and domain knowledge-enhanced fusion process is proposed. First, a novel evidence hierarchy is established that enables modeling of direct, indicative, and contextual information. Second, contextual information about the environment is introduced into the fusion process, by collecting, processing, and exploiting OSINT inputs. Third, all levels of the evidence hierarchy are used to craft a Bayesian threat type classification mechanism with domain knowledge-informed priors. The proposed methodology is evaluated in simulated scenarios, and the results demonstrate the benefit of the proposed fusion approach in terms of robustness to clutter and prior mismatch, with an overall classification accuracy of up to 95%.
comment: 6 pages, 1 figure; \c{opyright} 2026 The Authors. Submitted to the 2026 IEEE International Conference on Multisensor Fusion and Integration (MFI 2026). Under review
Temporal Coding as a Substrate for Sensorimotor Object Inference: A Spiking Reinterpretation of Thousand Brains Architecture
The Thousand Brains Theory (TBT) and its open-source Monty framework model object recognition through sensorimotor inference -- identifying objects by actively moving a sensor across their surface and building evidence contact by contact. The current implementation encodes each contact as a dense floating-point vector. While Monty tracks inter-step displacement and accumulates evidence across contacts, it treats the feature activation pattern at each contact as an unordered set - the directional sequence in which features are encountered carries no representational weight. In TBT, the sequence of contacts carries spatial meaning: knowing that feature A was felt before feature B during a left-to-right sweep tells you something about where A and B sit on the object. Dense vectors discard this ordering. We propose replacing dense vectors with rank-order spike packets: each contact produces a brief burst of neural events where the most strongly activated neuron fires first. The time gap between successive bursts implicitly encodes sensor displacement without explicit coordinate calculations. A biologically motivated learning rule (STDP) encodes traversal direction into synaptic weights. A learnable parameter lambda adjusts reliance on earlier versus recent contacts, adapting to each object's geometry. We derive three testable predictions and specify an implementation of four components in approximately 450 lines of NumPy. Three synthetic experiments confirm the core claims: temporal coding achieves perfect discrimination accuracy on objects with identical features in different spatial arrangements, where dense accumulation performs at chance; temporal coding maintains a 30-50 percentage point advantage across all tested noise levels; the adaptive lambda converges to distinct values, reflecting object geometric complexity. End-to-end evaluation on Monty's YCB benchmark is left for future work.
comment: 18 pages, 5 figures
Learning A Unified Risk Map for Autonomous Driving in Partially Observable Environments
Occlusion-aware prediction remains a critical challenge in autonomous driving due to the inherent uncertainty of unobserved regions. Existing approaches either overestimate risk based on reachable states or struggle to predict accurate trajectories under high occlusion uncertainty. To address these limitations, we propose a unified risk map modeling and learning framework for partially observable environments. Our method integrates traffic flow risk and collision risk through spatiotemporal modeling, enabling fine-grained assessment of occlusion-induced hazards. To address the scarcity of scenarios involving occluded interactions, we introduce a diffusion-based scenario generation framework that produces realistic yet adversarial scenarios. We integrate the modeling and learning of a unified risk map into a framework that supports risk-aware planning under partial observability. Experiments on the Waymo Open Motion Dataset show that our method significantly outperforms the state-of-the-art occlusion-aware baseline, improving minimum time-to-collision by 0.78 times and average time-to-collision by 1.67 times. The proposed framework offers a comprehensive and practical solution for risk-aware planning in partially observable environments.
comment: Published in IEEE Robotics and Automation Letters
Action with Visual Primitives
Vision-Language-Action (VLA) models have emerged as a promising paradigm for generalist robotic manipulation. A common design in current architectures maps language instructions and visual observations to actions in a single forward pass. While conceptually simple, this formulation entangles instruction comprehension, spatial scene understanding, and motor control within a single learning objective. As a result, the action expert must implicitly relearn cognitive and perceptual capabilities already present in the pretrained VLM, which can limit both learning efficiency and generalization. We introduce AVP (Action with Visual Primitives), an end-to-end architecture that implements this visual-primitive-centric interface: the VLM infers the next-stage target and emits visual-primitive tokens that condition a flow-matching action expert, with supervision derived from end-effector kinematics. Real-robot experiments on general pick-and-place tasks show that AVP improves the success rate by 27.61% over pi_0.5 and outperforms other recent methods, with consistent gains in data efficiency, spatial-compositional generalization, and object-level transfer.
comment: 9 pages, 6 figures. Project page: https://kingdroper.github.io/AVP/
Beyond Euclidean Proximity: Repairing Latent World Models with Horizon-Matched Trajectory Reachability Metrics
Latent world models can contain the state needed for control, yet their terminal-cost interface can expose the planner to the wrong decision-relevant information. In common latent MPC, candidate sequences are ranked by Euclidean distance between predicted terminal and goal latent states; this assumes that raw latent distance weights reachability-relevant variables correctly. We propose trajectory reachability metrics (TRM), a post-hoc terminal-ranking method for fixed latent world models. TRM trains a small pairwise head from logged trajectory structure and uses it as a replacement or hybrid cost; the encoder, dynamics, sampler, optimizer, and evaluation manifests remain fixed. The key design choice is horizon-aware supervision: the metric is trained on broad, balanced temporal separations to match the long-horizon terminal candidate ranking problem. On a hard TwoRoom benchmark, raw latent planning with LeWorldModel (LeWM) reaches 7.0% success, while full-horizon TRM reaches 97.0%; shuffled temporal-label controls stay at 0.0%. The same recipe improves a PLDM baseline from 32.7% to 84.0% across three seeds, and a short-horizon TRM variant reaches only 35.0% with the 100,000 pair budget. In TwoRoom, we provide mechanistic evidence for why TRM works: XY position is linearly decodable (R^2=0.998), yet raw latent MSE misranks candidates; the XY-probe rowspace accounts for less than 1% of terminal-goal latent MSE but carries most candidate-quality signal; and SCSA audits show that TRM improves the ordering and selected endpoint seen by the planner. On PushT go50/go75, TRM-style task-state metrics improve SCSA ranking and selected final distance more cleanly than closed-loop success, motivating auxiliary hybrid costs in continuous manipulation. TRM is the planner-facing repair, and audits explain when terminal reachability metrics should replace or augment raw latent proximity.
comment: 26 pages, 7 figures
Efficient Agentic Reasoning Through Self-Regulated Simulative Planning
How should an agent decide when and how to plan? A dominant approach builds agents as reactive policies with adaptive computation (e.g., chain-of-thought), trained end-to-end expecting planning to emerge implicitly. Without control over the presence, structure, or horizon of planning, these systems dramatically increase reasoning length, yielding inefficient token use without reliable accuracy gains. We argue efficient agentic reasoning benefits from decomposing decision-making into three systems: simulative reasoning (System II) grounding deliberation in future-state prediction via a world model; self-regulation (System III) deciding when and how deeply to plan via a learned configurator; and reactive execution (System I) handling fine-grained action. Simulative reasoning provides unified planning across diverse tasks without per-domain engineering, while self-regulation ensures the planner is invoked only when needed. To test this, we develop SR$^2$AM (Self-Regulated Simulative Reasoning Agentic LLM), realizing both as distinct stages within an LLM's chain-of-thought, with the LLM as world model. We explore two instantiations: recording decisions from a prompted multi-module system (v0.1) and reconstructing structured plans from traces of pretrained reasoning LLMs (v1.0), trained via supervised then reinforcement learning (RL). Across math, science, tabular analysis, and web information seeking, v0.1-8B and v1.0-30B achieve Pass@1 competitive with 120-355B and 685B-1T parameter systems respectively, while v1.0-30B uses 25.8-95.3% fewer reasoning tokens than comparable agentic LLMs. RL increases average planning horizon by 22.8% while planning frequency grows only 2.0%, showing it learns to plan further ahead rather than more often. More broadly, learned self-regulation instantiates a principle we expect to extend beyond planning to how agents govern their own learning and adaptation.
comment: Code and model artifacts are available at https://github.com/sailing-lab/sr2am
Beyond Pixels: Learning Invariant Rewards for Real-World Robotics From a Few Demonstrations
Designing reward functions that generalize beyond controlled laboratory settings remains a fundamental challenge in reinforcement learning for robotics. In open-world manipulation problems, a single task can appear in numerous variants through different object instances, positions, and camera viewpoints. Recent vision-based reward models tend to memorize specific pixel distributions and fail to generalize beyond their training conditions. To address this, we propose a framework that learns invariant symbolic reward functions from as few as five demonstrations. The insight is to shift from visual feature-fitting to the discovery of behavioral invariants: task-level properties that remain constant across diverse visual instantiations. The framework has two coupled components: a structural reward formulation that encodes task-level strategies and physical constraints while preserving optimal policy invariance, and a hybrid symbolic-numerical procedure that distills these invariants from demonstrations without online interaction. Experiments on eight Meta-World tasks and three Franka manipulation tasks demonstrate that our method achieves stronger process alignment and policy rollout ranking abilities compared to baselines, accelerating downstream policy learning. Three real-world out-of-distribution experiments further show that the same learned reward generalizes zero-shot to position, viewpoint, and object variations, enabling a single reward representation to be reused across diverse task variants in practice.
CoRMA: Contrastive RMA for Contact-Rich Meta-Adaptation
We present CoRMA(Contrastive Robotic Motor Adaptation), a context-based meta-adaptation framework that modifies RMA for force-dominant assembly. CoRMA replaces raw simulator-parameter adaptation with a compact 6D simulator-only semantic contact context describing contact onset, lateral engagement, guided transition, contact direction, and jamming. A deployable causal Transformer adapter infers this context online from force, proprioceptive, and action histories using semantic regression and a force-regime contrastive objective. At deployment, oracle context is removed and replaced by the inferred context, enabling within-episode adaptation without demonstrations, privileged inputs, or gradient updates. We evaluate CoRMA on PegInsert, GearMesh, and NutThread in Isaac Lab / Isaac Sim~5.0 and on a real Marvin arm. Compared with FORGE baselines that achieve high simulation success but degrade substantially on hardware, CoRMA retains higher verified real success under controlled target-pose noise. These results support semantic contact inference as a reusable adaptation interface within a related assembly task family, while broader unseen-task generalization and Real2Sim calibration remain future work.
Industrial Dual-Arm Box Handling via Online Inertial Estimation and Convex Wrench Optimization
Industrial robotic object handling often involves boxes and packages whose mass and center of mass are not known in advance. These uncertainties affect the force--moment balance required for stable lifting, and improper regulation of contact wrenches can lead to slip, object drop, orientation deviation, or excessive squeezing. This paper presents a friction-aware dual-arm box-handling framework for objects with unknown inertial properties. The proposed approach estimates the object mass and center of mass online from measured contact wrenches, and computes friction-feasible contact forces and torsional moments through a second-order cone program (SOCP) under ellipsoidal friction-limit-surface constraints. An offline trajectory refinement stage is also included to reduce undesired object--environment contact when geometric constraints are present. By enforcing friction feasibility as a hard constraint and minimizing contact effort within the feasible region, the framework achieves stable lifting without treating slip avoidance and excessive squeezing as separately tuned objectives. Experiments on a real dual-arm robotic system under different center-of-mass configurations demonstrate that the method lifts objects with unknown inertial properties while maintaining stable frictional contact.
comment: 14 pages, submitted to Robotics and Computer-Integrated Manufacturing (RCIM) Journal
FRED: A Multi-Modal Autonomous Driving Dataset for Flooded Road Environments
The Flooded Road Environments Dataset (FRED) is, to our knowledge, the first multi-modal autonomous driving dataset specifically targeting the collection of data from scenarios involving water hazards on the road. The dataset contains images from a 2.3 MP FLIR Blackfly USB3 camera, 64-beam 360$^\circ$ point clouds from an Ouster OS1-64 LiDAR, and data from an iXblue ATLANS-C IMU corrected by a Geoflex RTK GNSS, from five separate locations captured both during and after flooding events. The data has been released in two formats: a KITTI-style format for easy integration with existing data tools, and the RTMaps format for direct replay of the vehicle's data capture. We provide semantic labels to enable the training and evaluation of both single-sensor and sensor-fusion methods for water hazard detection. Position and velocity, as well as data captured under dry conditions, are provided to enable the development of location-based detection methods that may incorporate maps, and to evaluate other tasks such as localisation and SLAM.
TacO: Benchmarking Tactile Sensors for Object Manipulation
Vision-based learning from demonstrations has achieved remarkable success in enabling robots to perform manipulation tasks and high-level semantic reasoning, yet it remains insufficient for complex, contact-rich manipulation. While there is broad agreement that tactile sensing improves manipulation, there is no empirical guidance on which tactile sensors are best suited for which manipulation tasks. In this paper, we provide a systematic, task-driven evaluation of tactile sensors for robot manipulation and propose a framework for selecting and evaluating sensors based on manipulation policy performance. Separate manipulation policies are trained for tactile sensors of four distinct modalities: visual, acoustic, magnetic, and resistive, across three tasks: pick-and-place with unknown mass, object reorientation, and plug insertion. For each task, an analysis of how sensor properties such as spatial resolution, shear sensing, and tactile representation, and the inherent material friction affect task performances is done. Rather than tactile sensing being universally beneficial in the same way, our results show that the usefulness of tactile information depends strongly on sensor modality, material properties, and the specific manipulation tasks. All of the tactile sensors, code, data, and hardware setup will be publicly available on the project website.
A Visitation Grid for Complete Coverage Foraging in Robot Swarms
The complete collection of sparse resources in large, unknown environments remains a challenging problem for autonomous robot swarms. Previous studies have shown that a substantial portion of total mission time is consumed during the final stage of collection, where only a small fraction of randomly scattered resources remain. Consequently, many existing swarm foraging algorithms (search and collection) focus on collecting most resources within a limited time window, rather than improving end-stage efficiency for collecting all resources. We propose a grid-based stochastic foraging strategy that explicitly reduces redundant visits and accelerates late-stage collection. The unknown search area is partitioned into a grid map, which is maintained by a lightweight central server. To maintain scalability, both robots and the server operate within limited memory and computational constraints. The server updates the grid-level visitation counts based on robot-reported locations, producing a global estimate of the exploration density. For each new foraging trip, a robot selects its next search area from a local 3 X 3 neighborhood of grids probabilistically with the lowest visitation count, thus biasing exploration toward under-visited regions while maintaining stochasticity. Extensive simulation experiments demonstrate that the proposed strategy consistently outperforms the canonical centrally placed baseline foraging algorithm (CPFA). Compared to CPFA, the proposed method reduces the total collection time by up to 33% and improves collection efficiency by more than 48% during the final stage of the mission. These results indicate that the proposed strategy is robust, flexible, and scalable for near-complete and complete resource collection in robot swarms and can serve as a general enhancement for stochastic swarm foraging methods under limited onboard resources.
comment: The 23rd International Conference on Ubiquitous Robots, 10 figures, 3 tables
Learning to Evolve: Multi-modal Interactive Fields for Robust Humanoid Navigation in Dynamic Environments
Safe manipulation-oriented navigation for humanoid robots requires scene memory that remains reliable under locomotion-induced perceptual distortion, environmental changes, and interaction-level geometric safety constraints. Existing semantic mapping and scene-graph systems are difficult to deploy directly in this setting because they often assume stable camera trajectories, static environments, or coarse object geometry. We introduce the Multi-modal Interactive Field (MIF), a humanoid-oriented system that integrates confidence-aware semantic 3D Gaussian Splatting, discrepancy-triggered spatial memory updates, and task-driven geometric reconstruction within a closed-loop perception-adaptation pipeline. MIF couples three fields: an uncertainty-aware 3DGS Appearance Field that suppresses gait-induced blur, a Spatial Field that maintains topological memory, and a Geometry Field that supports Interaction Pose Safety (IPS) before manipulation. A discrepancy detection score is introduced to separate locomotion-induced false-positive changes from persistent changes and updates only locally inconsistent regions. On a Unitree-G1 humanoid in a real dynamic office, MIF improves relocation success in non-static environments from 12% to 94% compared with static scene-graph memory, while reducing semantic memory footprint by 91.4% through feature distillation for practical online operation. Project page and code: https://ziya-jiang.github.io/MIF-homepage/
comment: Accepted by Robotics: Science and Systems 2026
Auction-Consensus Algorithm with Learned Bidding Scheme for Multi-Robot Systems
Multi-Robot Task Allocation (MRTA) is a central challenge in decentralized multi-agent systems, where teams of robots must cooperatively assign and execute tasks under limited communication while optimizing global performance objectives. Auction-consensus algorithms, such as the Consensus-Based Bundle Algorithm (CBBA), provide scalable decentralized coordination with provable convergence, but rely on hand-crafted greedy scoring functions that often lead to suboptimal task allocations. This paper proposes a learning-enhanced auction-consensus framework in which CBBA's deterministic bidding mechanism is replaced by a neural bidding policy trained using reinforcement learning. Under a centralized training and decentralized execution paradigm, agents learn to compute task bids from partial local observations while retaining the standard auction and consensus phases for decentralized coordination. The learned bidding policy is trained using Proximal Policy Optimization with rewards shaped by proximity to globally optimal solutions obtained via mixed-integer linear programming. Multiple neural architectures are evaluated, including a Neural Additive Model, the Long Short-Term Memory (LSTM) model, and the Set Transformer Model. Experimental results across varying swarm sizes demonstrate that learned bidding policies can improve solution quality over classical CBBA while preserving decentralized execution. The proposed approach highlights the effectiveness of integrating reinforcement learning with classical distributed coordination algorithms, offering a scalable pathway toward higher-quality decentralized multi-robot task allocation.
comment: The 23rd International Conference on Ubiquitous Robots, 9 figures, 6 pages
Non-Contact Vibration-Based Damage Detection of Civil Structures Using a Cost-Effective Autonomous UAV
This paper presents a non-contact approach for vibration-based structural damage detection using an autonomous and customized cost-effective unmanned aerial vehicle (UAV). Vibration signals are extracted from video recordings through vision-based motion tracking to identify shifts in natural frequencies indicative of structural degradation. A laboratory-scale frame structure is evaluated under healthy and simulated-damage conditions. The proposed system is validated through an experimental study involving two smartphones, a USB camera, and a custom-built low-cost UAV equipped with an onboard camera and an autonomous alignment system for operation in GPS-denied environments. The displacement time is extracted and analyzed in the frequency domain and compared to reference measurements from contact accelerometers and a finite element model. Experimental results show that all platforms successfully capture the fundamental frequency and its shift due to damage. Although the UAV exhibits slightly higher errors (up to 5.7%) due to platform-induced disturbances and sensing limitations, it reliably detects damage-induced frequency changes. Compared to commercial UAV systems, the proposed platform achieves comparable inspection performance at significantly lower cost. These results demonstrate that low-cost autonomous UAVs provide a practical, flexible, and scalable solution for structural health monitoring, particularly in scenarios where contact-based sensing is impractical. The findings also support the potential for the deployment of multiple cooperative UAVs to further enhance inspection coverage and robustness.
comment: 8 pages, 8 figures, The 2026 International Conference on Unmanned Aircraft Systems, ICUAS 2026
Higher Order Reasoning for Collaborative Communicationless Mobile Robot Operations
In communicationless environments, multi-robot systems must operate without the constant information exchange that many coordination strategies typically assume. This paper presents a novel dynamic epistemic planning framework that enables implicit coordination and long horizon planning through higher-order reasoning among robots. With our approach, robots form and propagate higher-order belief particles, update world beliefs using Bayesian inference, and select actions via a behavior tree that anticipates teammates' likely decisions. A temporally aware Model Predictive Path Integral (MPPI) controller integrates this reasoning into low-level execution, allowing robots to plan intercepts and adapt trajectories under partial observability. The proposed framework is evaluated in both simulations and physical experiments, where it consistently reduces task completion time compared to a first-order baseline, demonstrating that epistemic logic can serve as a robust foundation for resilient coordination in communication-restricted domains.
OCELOT: Odometry and Contact Estimation for Legged Robots
One of the significant challenges in legged robotics is achieving accurate odometry using only onboard proprioceptive sensors. In this study, we present a complete leg odometry pipeline based on an Error-State EKF (ESEKF) that relies exclusively on proprioceptive data: a body fixed IMU, joint encoders, and force sensors, where filter's state is corrected by feet determined to be in a stationary stance. The core of our contribution is fused contact detection and an uncertainty quantification module designed to explicitly identify and reject slippage. This module runs two detectors in parallel for each foot, 1) a debounced, force-based Gaussian Mixture Model (GMM) guided Finite State Machine (FSM) to confirm physical contact, and 2) a kinematic-based Generalized Likelihood Ratio Test (GLRT) on the estimated velocity of the foot. The continuous quality scores from both estimators are fused to detect if the foot is both physically loaded and kinematically stationary and served as an uncertainty signal for each contact. To validate our approach, we collected a multi-modal dataset of 29 sequences spanning diverse indoor and outdoor terrains (e.g., concrete, grass, pebble, and rock) total of 2.4 km long. We benchmarked our approach against both proprioceptive and exteroceptive methods. The results demonstrate our method's efficacy in providing accurate odometry estimates, robustly handling slippage-prone environments. We also share our code and real-time ROS2 package as open-source.
comment: 8 pages
EvoScene-VLA: Evolving Scene Beliefs Inside the Action Decoder for Chunked Robot Control
Chunked vision-language-action (VLA) policies predict multi-step robot controls, conditioning each update on the current visual observation alone. Yet robot actions cause contact, occlusion, and object motion, and the geometry that later decisions depend on can change before the next visual update arrives. Spatial VLAs improve current-frame geometry. Temporal VLAs aggregate past frames. Neither maintains an action-updated scene prior across chunks. We argue for a persistent action-updated scene state across control calls, and introduce EvoScene-VLA. Its recurrent scene prefix carries a geometry-aware scene state across chunks. At each vision-language model (VLM) call, the VLM combines scene information from the current observation with the action-updated prior from the previous chunk; the action decoder outputs both the next action chunk and a compact scene update. This update becomes the next prior, which the VLM corrects against the new observation when the next call arrives. Each control call therefore starts from a scene prior that reflects both recent actions and fresh visual evidence. During training, \textbf{Scene Predictor} supplies future scene-token targets, and Geometric Anchor aligns scene slots with frozen depth and 3D teachers. We discard both modules at deployment. On 31 RoboTwin tasks, EvoScene-VLA raises average success from 87.2% to 89.1% in fixed evaluation and from 86.1% to 88.5% in randomized evaluation. On the Galaxea R1-Lite real robot, EvoScene-VLA outperforms all baselines.
Analytical and Experimental Force Analysis of a Soft Linear Pneumatic Actuator
Soft sleeve actuators (SSAs) have recently been developed as a pneumatic actuation approach for wearable and assistive robotic systems. By integrating the actuation structure into a sleeve-like geometry, these actuators can reduce reliance on external attachment layers and transmission mechanisms while maintaining compliance with limb-shaped surfaces. However, the force-generation behavior of SSAs remains insufficiently explained, particularly with respect to the variation of output force during extension, the influence of external loading, and the mechanical role of axial stiffness. This paper presents an analytical and experimental force analysis of a linear soft sleeve actuator (LSSA). A quasi-static analytical model was developed by expressing the net axial force as the pressure-generated contribution from the cap and folded walls, reduced by the force associated with axial stiffness. The model incorporates internal pressure, projected pressure areas, folded wall geometry, axial displacement, and an experimentally fitted axial stiffness relation. Prescribed-extension and static-load experiments were conducted to evaluate the actuator response. At 125 kPa, the generated force decreased from approximately 112 N at zero extension to nearly zero at 40 mm. Static loading delayed measurable force generation and reduced force output, particularly at low and intermediate pressures. The results show that LSSA force generation is governed by coupled effects of pressure, geometry, displacement, loading, and axial stiffness.
SONIC: Supersizing Motion Tracking for Natural Humanoid Whole-Body Control
Despite the rise of billion-parameter foundation models trained across thousands of GPUs, similar scaling gains have not been shown for humanoid control. Current neural controllers for humanoids remain modest in size, target a limited set of behaviors, and are trained on a handful of GPUs. We show that scaling model capacity, data, and compute yields a generalist humanoid controller capable of natural, robust whole-body movements. We position motion tracking as a scalable task for humanoid control, leveraging dense supervision from diverse motion-capture data to acquire human motion priors without manual reward engineering. We build a foundation model for motion tracking by scaling along three axes: network size (1.2M to 42M parameters), dataset volume (100M+ frames from 700 hours of motion capture), and compute (21k GPU hours). Beyond demonstrating the benefits of scale, we further show downstream utility through: (1) a real-time kinematic planner bridging motion tracking to tasks such as navigation, enabling natural and interactive control, and (2) a unified token space supporting VR teleoperation and vision-language-action (VLA) models with a single policy. Through this interface, we demonstrate autonomous VLA-driven whole-body loco-manipulation requiring coordinated hand and foot placement. Scaling motion tracking exhibits favorable properties: performance improves steadily with compute and data diversity, and learned policies generalize to unseen motions, establishing motion tracking at scale as a practical foundation for humanoid control.
comment: Project page: https://nvlabs.github.io/SONIC/
Pelican-Unify 1.0: A Unified Embodied Intelligence Model for Understanding, Reasoning, Imagination and Action
We present Pelican-Unify 1.0, the first embodied foundation model trained according to the principle of unification. Pelican-Unify 1.0 uses a single VLM as a unified understanding module, mapping scenes, instructions, visual contexts, and action histories into a shared semantic space. The same VLM also serves as a unified reasoning module, autoregressively producing task-, action-, and future-oriented chains of thought in a single forward pass and projecting the final hidden state into a dense latent variable. A Unified Future Generator (UFG) then conditions on this latent variable and jointly generates future videos and future actions through two modality-specific output heads within the same denoising process. The language, video, and action losses are all backpropagated into the shared representation, enabling the model to jointly optimize understanding, reasoning, imagination, and action during training, rather than training three isolated expert systems. Experiments demonstrate that unification does not imply compromise. With a single checkpoint, Pelican-Unify 1.0 achieves strong performance across all three capabilities: 64.7 on eight VLM benchmarks, the best among comparable-scale models; 66.03 on WorldArena, ranking first; and 93.5 on RoboTwin, the second-best average among compared action methods. These results show that the unified paradigm succeeds in preserving specialist strength while bringing understanding, reasoning, imagination, and action into one model.
VRA: Grounding Discrete-Time Joint Acceleration in Voltage-Constrained Actuation
Discrete-time joint acceleration constraints are widely used to enforce position and velocity limits. However, under voltage-constrained electric actuators, kinematically admissible accelerations may be physically unrealizable, exposing a missing execution-level abstraction. We propose Voltage-Realizable Acceleration (VRA), a joint-level acceleration interface that grounds kinematic acceleration in voltage-constrained actuator physics by restricting commanded accelerations to voltage-realizable constraints. Hardware experiments on electric actuators and a wheel-legged quadruped show that VRA removes unrealizable accelerations, restores consistent near-constraint execution, and reduces constraint-induced oscillations.
comment: 10 pages, Accepted by RSS 2026
Harnessing Embodied Agents: Runtime Governance for Policy-Constrained Execution
Embodied agents are evolving from passive reasoning systems into active executors that interact with tools, robots, and physical environments. Once granted execution authority, the central challenge becomes how to keep actions governable at runtime. Existing approaches embed safety and recovery logic inside the agent loop, making execution control difficult to standardize, audit, and adapt. This paper argues that embodied intelligence requires not only stronger agents, but stronger runtime governance. We propose a framework for policy-constrained execution that separates agent cognition from execution oversight. Governance is externalized into a dedicated runtime layer performing policy checking, capability admission, execution monitoring, rollback handling, and human override. We formalize the control boundary among the embodied agent, Embodied Capability Modules (ECMs), and runtime governance layer, and validate through 1000 randomized simulation trials across three governance dimensions. Results show 96.2% interception of unauthorized actions, reduction of unsafe continuation from 100% to 22.2% under runtime drift, and 91.4% recovery success with full policy compliance, substantially outperforming all baselines (p<0.001). By reframing runtime governance as a first-class systems problem, this paper positions policy-constrained execution as a key design principle for embodied agent systems.
comment: 36 pages, 3 figures, 10 tables
Learning Without Losing Identity: Capability Evolution for Embodied Agents
Embodied agents are expected to operate persistently in dynamic physical environments, continuously acquiring new capabilities over time. Existing approaches to improving agent performance often rely on modifying the agent itself -- through prompt engineering, policy updates, or structural redesign -- leading to instability and loss of identity in long-lived systems. In this work, we propose a capability-centric evolution paradigm for embodied agents. We argue that a robot should maintain a persistent agent as its cognitive identity, while enabling continuous improvement through the evolution of its capabilities. Specifically, we introduce the concept of Embodied Capability Modules (ECMs), which represent modular, versioned units of embodied functionality that can be learned, refined, and composed over time. We present a unified framework in which capability evolution is decoupled from agent identity. Capabilities evolve through a closed-loop process involving task execution, experience collection, model refinement, and module updating, while all executions are governed by a runtime layer that enforces safety and policy constraints. We demonstrate through simulated embodied tasks that capability evolution improves task success rates from 32.4% to 91.3% over 20 iterations, outperforming both agent-modification baselines and established skill-learning methods (SPiRL, SkiMo), while preserving zero policy drift and zero safety violations. Our results suggest that separating agent identity from capability evolution provides a scalable and safe foundation for long-term embodied intelligence.
comment: 12 pages, 2 figures, 7 tables
Federated Single-Agent Robotics: Multi-Robot Coordination Without Intra-Robot Multi-Agent Fragmentation
As embodied robots move toward fleet-scale operation, multi-robot coordination is becoming a central systems challenge. Existing approaches often treat this as motivation for increasing internal multi-agent decomposition within each robot. We argue for a different principle: multi-robot coordination does not require intra-robot multi-agent fragmentation. Each robot should remain a single embodied agent with its own persistent runtime, local policy scope, capability state, and recovery authority, while coordination emerges through federation across robots at the fleet level. We present Federated Single-Agent Robotics (FSAR), a runtime architecture for multi-robot coordination built on single-agent robot runtimes. Each robot exposes a governed capability surface rather than an internally fragmented agent society. Fleet coordination is achieved through shared capability registries, cross-robot task delegation, policy-aware authority assignment, trust-scoped interaction, and layered recovery protocols. We formalize key coordination relations including authority delegation, inter-robot capability requests, local-versus-fleet recovery boundaries, and hierarchical human supervision, and describe a fleet runtime architecture supporting shared Embodied Capability Module (ECM) discovery, contract-aware cross-robot coordination, and fleet-level governance. We evaluate FSAR on representative multi-robot coordination scenarios against decomposition-heavy baselines. Results show statistically significant gains in governance locality (d=2.91, p<.001 vs. centralized control) and recovery containment (d=4.88, p<.001 vs. decomposition-heavy), while reducing authority conflicts and policy violations across all scenarios. Our results support the view that the path from embodied agents to embodied fleets is better served by federation across coherent robot runtimes than by fragmentation within them.
comment: 30 pages, 10 figures, 9 tables. Code: https://github.com/s20sc/fsar-fleet-coordination
4D Radar Semantic Segmentation of People in Field Conditions Using Temporal Multi-View Networks
Reliable people detection is crucial for the safe autonomy of mobile robots and heavy vehicles, both on roads and in industrial settings like mining and construction. However, common sensors like cameras or lidars are prone to failure in adverse conditions such as dust, fog, or smoke, which limits their use in real-world robotic systems. Radar, on the other hand, delivers robust measurements in a wide range of environmental conditions. In particular, modern high-resolution 4D imaging radars provide 4D point clouds across range, azimuth, and elevation, as well as per-point Doppler velocity data, well suited for robot perception. We propose TMVA4D, a family of artificial neural network architectures based on CNN and ConvLSTM encoders that leverage the 4D radar modality for semantic segmentation. The architectures are trained to distinguish between background and person classes using a series of 2D projections of the 4D radar data, encompassing elevation, azimuth, range, and Doppler velocity dimensions. Evaluated across several operational sites, our models achieve promising performance (Dice 75.9%, IoU 61.2% for class person) even in low-visibility conditions. The data and code will be made publicly available upon publication.
When Simultaneous Localization and Mapping Meets Wireless Communications: A Survey
This paper surveys the state-of-the-art in the nexus of SLAM and Wireless Communications, attributing the bidirectional impact of each with a focus on visual SLAM (V-SLAM) integration. We provide an overview of key concepts related to wireless signal propagation, geometric channel modeling, and radio frequency (RF)-based localization and sensing. In addition to this, we show image processing techniques that can detect landmarks, proactively predicting optimal paths for wireless channels. Several dimensions are considered, including the prerequisites, techniques, background, and future directions and challenges of the intersection between SLAM and wireless communications. We analyze estimation and control approaches such as Bayesian filters, feature-based pose estimation, perception-aware motion control, spatial methods for signal processing such as vector fields, and key technological aspects. We expose techniques and items towards enabling a highly effective retrieval of the autonomous robot state. Among other interesting findings, we observe that monocular V-SLAM would benefit from RF relevant information, as the latter can serve as a proxy for the scale ambiguity resolution. Conversely, we find that wireless communications in the context of 5G and beyond can potentially benefit from visual odometry that is central in SLAM. Moreover, we examine other sources besides the camera for SLAM and describe the twofold relation with wireless communications. Finally, integrated solutions performing joint communications and SLAM appear to be in their infancy: theoretical and practical advancements are required to add higher-level localization and semantic perception capabilities to RF and multi-antenna technologies.
General Agentic Planning Through Simulative Reasoning with World Models
What does it mean to plan? Current agentic systems, whether scaffolded workflows or end-to-end policies, rely on reactive decision-making: selecting the next action via a fixed procedure with at most undifferentiated adaptive computation (e.g., chain-of-thought) lacking explicit modeling of future outcomes. This limits generalizability, as each new task demands re-engineering rather than transfer of shared reasoning capacity. Humans, by contrast, plan by mentally simulating consequences of candidate actions within an internal world model, a capacity known as simulative reasoning (System II) that supports flexible, goal-directed behavior across diverse contexts. We argue that simulative reasoning through a world model provides a general-purpose planning mechanism for agentic systems, improving upon reactive policies (System I) by grounding decisions in predicted future states rather than pattern-matched responses. To verify this, we introduce SiRA (Simulative Reasoning Architecture), a goal-oriented architecture instantiating simulative reasoning using an LLM-based world model with natural-language belief states, while remaining model-agnostic. We evaluate across three qualitatively distinct task categories: constrained navigation, multi-hop information aggregation, and general instruction following, in a web-browser environment. Across all categories, simulative reasoning achieves up to 124% higher task completion rates than a matched reactive baseline, and increases constrained navigation success from 0% to 32.2% compared to a representative open-web agent. The persistent advantage across distinct task types suggests the benefit stems from generalizable counterfactual evaluation rather than task-specific tuning.
comment: Winner of Berkeley LLM Agents Hackathon (Fundamentals Track); code available at https://github.com/sailing-lab/sira
Active Defense Against False Data Injection Attacks in Robotic Manipulators
Robotic systems are vulnerable to False Data Injection Attacks (FDIAs), where adversaries corrupt sensor signals to gain malicious control. Feedback linearization exposes robotic systems to integrator vulnerability, making them susceptible to stealthy attacks that can cause significant deviations in end-effector behavior without raising alarms. This paper addresses the resilience of manipulators against finite-horizon FDIAs by formalizing two defense methods, namely anomaly-aware virtual damping and manipulability reduction, with probabilistic guarantees on nominal task execution. Simulations on a 7-DOF redundant manipulator show that the proposed defenses substantially reduce the impact of FDIA compared to using solely a threshold-based ADS like the Chi-squared, while preserving nominal task performance in the absence of attack.
comment: Extended 8-page version containing full proofs. An abridged 6-page version has been accepted for publication in the Proceedings of the 23rd IFAC World Congress (2026). v3: Minor typographical fixes and updated reference formatting
FUSE: A Framework for Unified State Estimation in Vehicular and Robotic SLAM Systems
Tightly coupled SLAM formulations under mixed-rate sensing often bind temporal processing, local geometric association, estimator formulation, and map-update policy into method-specific designs. Such binding makes it difficult to vary one design choice without re-engineering the rest of the state-estimation process. This paper presents FUSE, a framework for unified state estimation in vehicular and robotic SLAM systems. FUSE organizes the state-estimation interface around observation ingestion, propagation, update, and state query, and uses this interface to separate temporal processing, residual-ready local geometric association, estimator formulation, and map-update policy. A LiDAR--IMU instantiation is developed to examine the framework under mixed-rate sensing and directional degeneracy, where high-rate inertial propagation, LiDAR-triggered geometric update, residual screening, and degeneracy-aware correction operate through the same interface boundaries. On a 418~m loop-corridor sequence, the instantiation reports a 1.626 m end-to-end trajectory error, corresponding to a 7.9% relative error reduction compared with Faster-LIO, the lowest-error baseline on this sequence. The results support FUSE as a framework for organizing state-estimation design choices and show how the evaluated instantiation regularizes updates along weakly observable directions.
HUSKY: Humanoid Skateboarding System via Physics-Aware Whole-Body Control
While current humanoid whole-body control frameworks predominantly rely on the static environment assumptions, addressing tasks characterized by high dynamism and complex interactions presents a formidable challenge. In this paper, we address humanoid skateboarding, a highly challenging task requiring stable dynamic maneuvering on an underactuated wheeled platform. This integrated system is governed by non-holonomic constraints and tightly coupled human-object interactions. Successfully executing this task requires simultaneous mastery of hybrid contact dynamics and robust balance control on a mechanically coupled, dynamically unstable skateboard. To overcome the aforementioned challenges, we propose HUSKY, a learning-based framework that integrates humanoid-skateboard system modeling and physics-aware whole-body control. We first model the coupling relationship between board tilt and truck steering angles, enabling a principled analysis of system dynamics. Building upon this, HUSKY leverages Adversarial Motion Priors (AMP) to learn human-like pushing motions and employs a physics-guided, heading-oriented strategy for lean-to-steer behaviors. Moreover, a trajectory-guided mechanism ensures smooth and stable transitions between pushing and steering. Experimental results on the Unitree G1 humanoid platform demonstrate that our framework enables stable and agile maneuvering on skateboards in real-world scenarios. The project page is available on https://husky-humanoid.github.io/.
comment: Accepted to RSS2026
Learning Human-Intention Priors from Large-Scale Human Demonstrations for Robotic Manipulation
Human videos contain rich manipulation priors, but using them for robot learning remains difficult because raw observations entangle scene understanding, human motion, and embodiment-specific action. We introduce MoT-HRA, a hierarchical vision-language-action framework that learns human-intention priors from large-scale human demonstrations. We first curate HA-2.2M, a 2.2M-episode action-language dataset reconstructed from heterogeneous human videos through hand-centric filtering, spatial reconstruction, temporal segmentation, and language alignment. On top of this dataset, MoT-HRA factorizes manipulation into three coupled experts: a vision-language expert predicts an embodiment-agnostic 3D trajectory, an intention expert models MANO-style hand motion as a latent human-motion prior, and a fine expert maps the intention-aware representation to robot action chunks. A shared-attention trunk and read-only key-value transfer allow downstream control to use human priors while limiting interference with upstream representations. Experiments on hand motion generation, simulated manipulation, and real-world robot tasks show that MoT-HRA improves motion plausibility and robust control under distribution shift.
comment: 13 pages, 5 figures
LFX: Towards Unified Light Field Dense Semantic Segmentation and Salient Object Detection
Light field cameras capture multi-view observations within a single exposure. However, existing studies are typically tailored to specific LF representations, leaving the field without a unified learning framework. To bridge this gap, we present LFX, the first unified framework for LF perception. LFX establishes a representation-invariant feature modulation space, enabling it to adapt to heterogeneous LF representations and diverse perception tasks. Specifically, we propose Field-of-Parallax Angular Subspace Modeling (FoP-ASM), which assigns an independent angular marker to each auxiliary view, enabling view-wise independent modeling. Meanwhile, shared manifold subspace constraints and regularization losses enforce globally consistent semantic modulation across views. Extensive evaluations across three LF benchmarks show that LFX achieves state-of-the-art results across distinct LF representations, outperforming representation-specific methods by up to 12% and 20% with 0.029/0.027 MAE for salient object detection, and achieving 84.37 mIoU for semantic segmentation. The source code will be made publicly available at https://github.com/FeiT-FeiTeng/LFX.
comment: The source code will be made publicly available at https://github.com/FeiT-FeiTeng/LFX
IVGT: Implicit Visual Geometry Transformer for Neural Scene Representation
Reconstructing coherent 3D geometry and appearance from unposed multi-view images is a fundamental yet challenging problem in computer vision. Most existing visual geometry foundation models predict explicit geometry by regressing pixel-aligned pointmaps, often suffering from redundancy and limited geometric continuity. We propose IVGT, an Implicit Visual Geometry Transformer that implicitly models continuous and coherent geometry from pose-free multi-view images. This formulation learns a continuous neural scene representation in a canonical coordinate system and supports continuous spatial queries at any 3D positions, retrieving local features to predict signed distance (SDF) values and colors using lightweight decoders. It allows direct extraction of continuous and coherent surface geometry, enabling rendering of RGB images, depth maps, and surface normal maps from arbitrary viewpoints. We train IVGT via multi-dataset joint optimization with 2D supervision and 3D geometric regularization. IVGT demonstrates generalization across scenes and achieves strong performance on various tasks, including mesh and point cloud reconstruction, novel view synthesis, depth and surface normal estimation, and camera pose estimation.
comment: Code: https://github.com/wzzheng/IVGT/
DSSP: Diffusion State Space Policy with Full-History Encoding
Diffusion-based imitation learning has shown strong promise for robot manipulation. However, most existing policies condition only on the current observation or a short window of recent observations, limiting their ability to resolve history-dependent ambiguities in long-horizon tasks. To address this, we introduce DSSP, a history-conditioned Diffusion State Space Policy that enables efficient, full-history conditioning for robot manipulation. Leveraging the continuous sequence modeling properties of State Space Models (SSMs), our history encoder effectively compresses the entire observation stream into a compact context representation. To ensure this context preserves critical information regarding future state evolution, the encoder is optimized with a dynamics-aware auxiliary training objective. This high-level context representation is then seamlessly fused with recent state observations to form a hierarchical conditioning mechanism for action generation. Furthermore, to maintain architectural consistency and minimize GPU memory overhead, we also instantiate the diffusion backbone itself using an SSM. Extensive experiments across simulation benchmarks and real-world manipulation tasks show that DSSP achieves state-of-the-art performance with a significantly smaller model size, demonstrating superior efficiency of the hierarchical conditioning in capturing crucial information as the history length increases.
SENIOR: Efficient Query Selection and Preference-Guided Exploration in Preference-based Reinforcement Learning IROS 2025
Preference-based Reinforcement Learning (PbRL) methods provide a solution to avoid reward engineering by learning reward models based on human preferences. However, poor feedback- and sample- efficiency still remain the problems that hinder the application of PbRL. In this paper, we present a novel efficient query selection and preference-guided exploration method, called SENIOR, which could select the meaningful and easy-to-comparison behavior segment pairs to improve human feedback-efficiency and accelerate policy learning with the designed preference-guided intrinsic rewards. Our key idea is twofold: (1) We designed a Motion-Distinction-based Selection scheme (MDS). It selects segment pairs with apparent motion and different directions through kernel density estimation of states, which is more task-related and easy for human preference labeling; (2) We proposed a novel preference-guided exploration method (PGE). It encourages the exploration towards the states with high preference and low visits and continuously guides the agent achieving the valuable samples. The synergy between the two mechanisms could significantly accelerate the progress of reward and policy learning. Our experiments show that SENIOR outperforms other five existing methods in both human feedback-efficiency and policy convergence speed on six complex robot manipulation tasks from simulation and four real-worlds. Videos can be found on our project website: https://2025senior.github.io/
comment: 8 pages, 8 figures, IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2025)
Dissecting Embodied Abilities in Multimodal Language Models through Skill-level Evaluation and Diagnosis ICML 2026
Understanding the capability bottlenecks of embodied multimodal large language models (MLLMs) is crucial for improving embodied agents. However, existing embodied benchmarks mainly focus on task-level evaluation and fail to provide actionable insights into the underlying causes of model failures. To address this limitation, we introduce BEAR, a benchmark that decomposes embodied tasks into 14 atomic skills for fine-grained skill-level evaluation. BEAR comprises 4,469 interleaved image-video-text samples spanning 14 skills across 6 categories, ranging from low-level perception to high-level planning. We evaluate 20 MLLMs on BEAR under a hierarchical skill-level diagnosis framework and uncover two key findings: (1) perceptual capabilities are major bottlenecks behind reasoning failures, and (2) current models suffer from unstable spatiotemporal modeling that remains largely unexposed in prior benchmarks. Motivated by these findings, we further propose BEAR-Agent, a multimodal conversational agent that augments MLLMs with visual and spatial reasoning tools. BEAR-Agent substantially improves performance across embodied skills, achieving a relative improvement of 17.5% on GPT-5 over the base model on BEAR, while also outperforming strong baselines in both simulation and real-world robotic experiments. Project page: https://bear-official66.github.io/
comment: Accepted to ICML 2026
Multiagent Systems
LCGuard: Latent Communication Guard for Safe KV Sharing in Multi-Agent Systems
Large language model (LLM)-based multi-agent systems increasingly rely on intermediate communication to coordinate complex tasks. While most existing systems communicate through natural language, recent work shows that latent communication, particularly through transformer key-value (KV) caches, can improve efficiency and preserve richer task-relevant information. However, KV caches also encode contextual inputs, intermediate reasoning states, and agent-specific information, creating an opaque channel through which sensitive content may propagate across agents without explicit textual disclosure. To address this, we introduce \textbf{LCGuard} (Latent Communication Guard), a framework for safe KV-based latent communication in multi-agent LLM systems. LCGuard treats shared KV caches as latent working memory and learns representation-level transformations before cache artifacts are transmitted across agents. We formalize representation-level sensitive information leakage operationally through reconstruction: a shared cache artifact is unsafe if an adversarial decoder can recover agent-specific sensitive inputs from it. This leads to an adversarial training formulation in which the adversary learns to reconstruct sensitive inputs, while LCGuard learns transformations that preserve task-relevant semantics and reduce reconstructable information. Empirical evaluations across multiple model families and multi-agent benchmarks show that LCGuard consistently reduces reconstruction-based leakage and attack success rates while maintaining competitive task performance compared to standard KV-sharing baselines.
Superhuman Safe and Agile Racing through Multi-Agent Reinforcement Learning
Autonomous systems have achieved superhuman performance in isolation or simulation, yet they remain brittle in shared, dynamic real-world spaces. This failure stems from the dominant single-agent paradigm for physical applications, where other actors are ignored or treated as environmental noise, preventing effective coordination. Here we show that multi-agent reinforcement learning provides the essential safety scaffolding required for real-world interaction. Using high-speed quadrotor racing as a high-stakes testbed, we train agents to navigate complex aerodynamic interactions and strategic maneuvering with a variable number of racers. Through league-based self-play, agents evolve sophisticated anticipatory behaviors, including proactive collision avoidance, overtaking, and handling multi-agent physical interactions, including aerodynamic downwash. Our agents outperform a champion-level human pilot in multi-player races at speeds exceeding 22 m/s, while simultaneously reducing collision rates by 50 % compared to state-of-the-art single-agent baselines. Crucially, training with diverse artificial agents enables zero-shot generalization to safer human interaction. These results suggest that the path to robust robotic co-existence lies not in isolated safety constraints, but in the rigorous demands of multi-agent interaction. Multimedia materials are available at: https://rpg.ifi.uzh.ch/marl
comment: 12 pages (+4 supplementary). Website: https://rpg.ifi.uzh.ch/marl
Self-Evolving Multi-Agent Systems via Decentralized Memory
Self-evolving multi-agent systems (MAS) have emerged as a promising route to LLM agents that continually improve from experience, with persistent memory at their foundation. However, existing designs almost exclusively adopt a centralized repository shared across agents, incurring communication and coordination overhead, raising privacy concerns, and collapsing agent diversity. We propose DecentMem, a decentralized memory framework in which each agent maintains its own dual-pool memory -- an exploitation pool of consolidated past trajectories and an exploration pool of LLM-generated candidates for unseen contexts. The two pools are reweighted online based on stage-wise feedback from an LLM-as-a-judge. Theoretically, we prove that this design guarantees global reachability of the solution space and achieves $O(\log T)$ cumulative regret, matching the stochastic bandit lower bound up to constants. In practice, across three MAS frameworks (AutoGen, DyLAN, AgentNet), three Qwen3 backbones (4B/8B/14B), two Gemma4 backbones (E2B/E4B) and five benchmarks spanning math, code, QA, and embodied tasks, DecentMem improves average accuracy by up to 23.8% over the strongest centralized memory baseline and by up to 52.5% over the no-memory baseline, while reducing token usage by up to 49%.
A Generalized Nash Equilibrium-Seeking Scheme for Trauma Resuscitation
Trauma resuscitation is a clinical process for treating life-threatening physiological disorders in safety-critical environments, driven by the experience of healthcare workers (HCWs). Designing and optimizing quantifiable metrics that accurately capture HCW decisions may augment current resuscitation procedures with the potential to improve patient outcomes. This motivates our socio-technical formulation of trauma resuscitation as a distributed generalized Nash equilibrium (GNE)-seeking game with coupled inequality constraints. This method is optimized over a time-varying communication graph. We introduce novel insights from clinical experience to model HCWs behavior. This work facilitates the best possible resuscitation outcome given HCWs workloads, schedules, competencies, and limited resources.
Sibyl-AutoResearch: Autonomous Research Needs Self-Evolving Trial-and-Error Harnesses, Not Paper Generators
Autonomous research systems increasingly make the scientific workflow executable: agents can propose ideas, run code, inspect results, and draft papers. But executable workflows do not by themselves produce research judgment. We analyze where current systems lose trial experience: weak evidence becomes prose, pilot signals become broad claims, memory remains textual, and recurring process failures do not change later behavior. We introduce Sibyl-AutoResearch, a self-evolving AutoResearch framework built around Scientific Trial-and-Error Harnesses. A harness lets agents run bounded trials, preserve positive and negative outcomes, and route lessons into later planning, validation, claim scope, scheduling, critique, writing, and harness repair. We formalize this through two auditable conversion units: trial-to-behavior conversion, which links trial signals to later research actions, and trial-to-harness-behavior conversion, which links recurring process failures to system updates. We implement the framework in SIBYL, a file-backed autonomous research system that exposes the state, roles, memory, gates, and artifact traces needed to inspect these conversion paths. A retrospective audit identifies eight high-confidence conversion events, with a median latency of one iteration and a maximum latency of three iterations. A recovered-failure registry further shows how five naturally occurring failure classes, including duplicate results, stale numbers, and unsupported statistics, were blocked, downgraded, or routed into later repair. These traces do not establish a comparative performance claim; they show that the proposed conversion units are recoverable from realistic autonomous-research workspaces. The SIBYL framework and system are available at https://github.com/Sibyl-Research-Team/AutoResearch-SibylSystem.
ACCoRD: Actor-Critic Conflict Resolution with Deep learning for O-RAN xApps
Conflict Mitigation (ConMit) is a crucial part of intelligent network control in Open Radio Access Networks (O-RAN). In this paper, we propose a method named ACCoRD to resolve detected control conflicts in Near-Real Time RAN Intelligent Controller using a Conflict Resolution (CR) Agent with an Artificial Neural Network (ANN) trained with a reinforcement learning algorithm PPO-Clip. The implemented ANN analyzes data about the network and conflicting control decisions to infer optimal CR actions. The CR Agent gathers feedback from the network after each resolved conflict to assess its efficiency and adjust the ANN's weights during batch training. The evaluation of the proposed approach is based on simulation data. A new methodology for evaluating CR solutions is proposed. Results show that the proposed ANN-based method improves on the efficiency of rule-based approaches by significantly reducing negative network events caused by conflicting control decisions in medium and high traffic scenarios.
Cross-domain benchmarks reveal when coordinated AI agents improve scientific inference from partial evidence
Scientific evidence often spans instruments, databases, and disciplines, so no single source records the full phenomenon. This makes it difficult to determine when coordinated AI agents add value over simpler scientific workflows. We evaluate this question with a cross-domain benchmark spanning four scientific tasks: mapping molecular structure into musical representations, detecting historical paradigm shifts in science, identifying vector-borne disease emergence, and vetting transiting-exoplanet candidates. Each case uses a frozen evaluation panel, predefined scoring protocols, explicit baselines, ablations or null controls, and stated limitations. The results define three operating regimes. When different disciplines each capture only part of the phenomenon, cross-channel composites improve over single-channel baselines: climate-vector emergence reaches AUROC 0.944 and exoplanet vetting reaches AUROC 0.955. However, the exoplanet workflow is effectively tied with a strong combined-summary baseline, showing that decomposition does not always improve top-line performance. When one signal dominates, as in paradigm-shift detection, coordination mainly improves interpretation and traceability. For molecular sonification, the gain is representational rather than predictive. ScienceClaw x Infinite provides the auditable artifact and provenance layer for this evaluation. The benchmark therefore assigns value to coordination only when the corresponding performance, provenance, or representation claim is supported by explicit comparators.
Emergence of agriculture in an artificial society of reinforcement learning agents
The origin of agriculture represents a major evolutionary transition and a paradigmatic example of how complex collective behaviors emerge from simple interactions. Here we introduce an artificial society of reinforcement learning agents embedded in a dynamic ecological environment to identify general principles underlying this transition. Within this system, agricultural practices emerge spontaneously - without explicit instruction - through the coupled dynamics of learning and environmental modification. We show that this transition is governed by four key ingredients: individual planning through the valuation of delayed rewards, social vulnerability to cheaters, stabilization via social learning, and an emergent lock-in effect that renders agriculture effectively irreversible once established. In particular, we demonstrate that social learning acts as a "firewall" that suppresses cheater invasion and enables the propagation of successful strategies, leading to sustained population growth and nonlinear amplification of domesticated resources. Together, these results reveal universal mechanisms linking individual decision-making, social interactions, and ecological feedbacks. More broadly, they highlight the potential of artificial societies as experimental platforms to study the emergence of cultural innovations and major evolutionary transitions.
The Log is the Agent: Event-Sourced Reactive Graphs for Auditable, Forkable Agentic Systems
Most agent frameworks are built around the language model: a conversation loop comes first, then tools, then rules, and finally a logging layer bolted on for observability, with state persisted as retrievable "memory." We describe ActiveGraph, a runtime that inverts this arrangement. The append-only event log is the source of truth; the working graph is a deterministic projection of that log; and behaviors--ordinary functions, classes, LLM-backed routines, or logic attached to typed edges--react to changes in the graph and emit new events. No component instructs another; coordination happens entirely through the shared graph. This single design decision yields three properties that retrieval-and-summarization memory systems do not provide: deterministic replay of any run from its log, cheap forking that branches a run at any event without re-executing the shared prefix, and end-to-end lineage from a high-level goal down to the individual model call that produced each artifact. We present the architecture, a determinism contract that makes replay sound, and a worked diligence example whose full causal structure is reconstructable from the log alone. We discuss--without claiming to demonstrate--why this substrate is unusually well suited to self-improving agents, and how it extends the BabyAGI lineage and prior graph-memory research.
comment: 11 pages, 1 figure. Open-source Apache-2.0 implementation with reproducible quickstart demo, deterministic replay, fork-and-diff, and lineage tracing
AI-Enabled Serious Games: Integrating Intelligence and Adaptivity in Training Systems
Serious games are widely used for learning and training across domains such as healthcare, defense, and education. Persistent challenges remain, however, including static scenario design, authoring bottlenecks, limited learner modeling, and difficulty implementing meaningful real-time instructional adaptation. Recent advances in artificial intelligence (AI) introduce novel capabilities such as dynamic scenario variation, contextual feedback, adaptive pacing, and learner-state modeling that may help address some of these limitations. At the same time, integrating AI into serious games raises important questions related to validity, transparency, system control, and learner trust. This chapter examines how contemporary AI approaches may support real-time instructional adaptation in serious games. It distinguishes between instructional intelligence, defined as a system's capacity to infer learner knowledge and reason about pedagogically appropriate responses, and adaptivity, defined as the ability to modify instructional actions during interaction. A historical synthesis of adaptive learning systems is presented, tracing developments from early computer-assisted instruction through intelligent tutoring systems (ITS), dynamic difficulty adjustment (DDA), authoring platforms, learning analytics, and recent AI-enabled architectures. Building on this perspective, the chapter discusses how large language models (LLMs), reinforcement learning (RL), and agent-based architectures may contribute to more integrated forms of intelligence and adaptivity in serious games. It also highlights practical and research challenges associated with AI-enabled systems, including explainability, validation, computational cost, and the limited empirical evidence regarding long-term learning outcomes in AI-enabled serious games.
comment: Book chapter, 1 figure. To appear in "Advances in Global Applied Artificial Intelligence," G. A. Tsihrintzis, M. Virvou, N. G. Bourbakis, and L. C. Jain (Eds.), Springer, Learning and Analytics in Intelligent Systems book series, 2026
Toward Goal-Oriented Communication in Multi-Agent Systems: An overview
As multi-agent systems (MAS) become increasingly prevalent in autonomous systems, distributed control, and edge intelligence, efficient communication under resource constraints has emerged as a critical challenge. Traditional communication paradigms often emphasize message fidelity or bandwidth optimization, overlooking the task relevance of the exchanged information. In contrast, goal-oriented communication prioritizes the importance of information with respect to the agents' shared objectives. This review provides a comprehensive survey of goal-oriented communication in MAS, bridging perspectives from information theory, communication theory, and machine learning. We examine foundational concepts alongside learning-based approaches and emergent protocols. Special attention is given to coordination under communication constraints, as well as applications in domains such as swarm robotics, federated learning, and edge computing. The paper concludes with a discussion of open challenges and future research directions at the intersection of communication theory, machine learning, and multi-agent decision making.
comment: 37 pages
High-Probability Convergence Guarantees of Decentralized SGD
Convergence in high-probability (HP) has attracted increasing interest, due to implying exponentially decaying tail bounds and strong guarantees for individual runs of an algorithm. While many works study HP guarantees in centralized settings, much less is understood in the decentralized setup, where existing works require strong assumptions, like uniformly bounded gradients, or asymptotically vanishing noise. This results in a significant gap between the assumptions used to establish convergence in the HP and the mean-squared error (MSE) sense, and is also contrary to centralized settings, where it is known that $\mathtt{SGD}$ converges in HP under the same conditions on the cost function as needed for MSE convergence. Motivated by these observations, we study the HP convergence of Decentralized $\mathtt{SGD}$ ($\mathtt{DSGD}$) in the presence of light-tailed noise, providing several strong results. First, we show that $\mathtt{DSGD}$ converges in HP under the same conditions on the cost as in the MSE sense, removing the restrictive assumptions used in prior works. Second, our sharp analysis yields order-optimal rates for both non-convex and strongly convex costs. Third, we establish a linear speed-up in the number of users, leading to matching or strictly better transient times than those obtained from MSE results, further underlining the tightness of our analysis. To the best of our knowledge, this is the first work that shows $\mathtt{DSGD}$ achieves a linear speed-up in the HP sense. Our relaxed assumptions and sharp rates stem from several technical results of independent interest, including a result on the variance-reduction effect of decentralized methods in the HP sense, as well as a novel bound on the moment-generating function of strongly convex costs, of interest even in centralized settings. Numerical experiments validate our theory.
comment: 43 pages, 6 figures
When Simultaneous Localization and Mapping Meets Wireless Communications: A Survey
This paper surveys the state-of-the-art in the nexus of SLAM and Wireless Communications, attributing the bidirectional impact of each with a focus on visual SLAM (V-SLAM) integration. We provide an overview of key concepts related to wireless signal propagation, geometric channel modeling, and radio frequency (RF)-based localization and sensing. In addition to this, we show image processing techniques that can detect landmarks, proactively predicting optimal paths for wireless channels. Several dimensions are considered, including the prerequisites, techniques, background, and future directions and challenges of the intersection between SLAM and wireless communications. We analyze estimation and control approaches such as Bayesian filters, feature-based pose estimation, perception-aware motion control, spatial methods for signal processing such as vector fields, and key technological aspects. We expose techniques and items towards enabling a highly effective retrieval of the autonomous robot state. Among other interesting findings, we observe that monocular V-SLAM would benefit from RF relevant information, as the latter can serve as a proxy for the scale ambiguity resolution. Conversely, we find that wireless communications in the context of 5G and beyond can potentially benefit from visual odometry that is central in SLAM. Moreover, we examine other sources besides the camera for SLAM and describe the twofold relation with wireless communications. Finally, integrated solutions performing joint communications and SLAM appear to be in their infancy: theoretical and practical advancements are required to add higher-level localization and semantic perception capabilities to RF and multi-antenna technologies.
Reliability and Effectiveness of Autonomous AI Agents in Supply Chain Management
This paper studies autonomous generative AI agents in multi-echelon supply chains using the MIT Beer Game. We identify four inference-time levers that shape performance: model selection, policies and guardrails, centralized data sharing, and prompt engineering. Model capability is the dominant factor: an out-of-the-box reasoning model exceeds human-level performance, and optimized reasoning models reduce costs by up to 67% relative to human teams. However, strong average performance masks substantial reliability risks. We introduce agent bullwhip: the amplification of run-to-run decision instability in autonomous multi-echelon systems. A central component is decision bullwhip, the portion of order variability generated by stochastic agent decisions rather than by changes in customer demand. We show that decision instability can amplify both across facilities at a fixed point in time and within the same facility over time, even when the demand path is held fixed. Repeated sampling, a natural test-time remedy, fails to meaningfully reduce this instability, suggesting that reliability requires changing the underlying decision policy rather than merely averaging over model outputs. To address this limitation, we propose a Group Relative Policy Optimization (GRPO)-based reinforcement-learning post-training framework that trains a shared base LLM using system-level supply-chain rewards. Post-training substantially reduces tail events, curtails agent bullwhip, and improves the reliability of autonomous supply-chain agents.
Understanding Persuasion in Long-Running Agents
Modern AI agents increasingly combine conversational interaction with autonomous task execution, such as coding and web research, raising a natural question: What happens when an agent engaged in long-horizon tasks is exposed to user persuasion? Yet studying this possibility is challenging because long-running agent behavior is noisy and costly to reproduce, and it remains unclear which unique challenges emerge only in extended task execution. We study how belief-level intervention can influence downstream task behavior, a phenomenon we name persuasion propagation. We introduce a behavior-centered evaluation framework that distinguishes between persuasion applied during or prior to task execution. Across web research and coding tasks, we find that on-the-fly persuasion induces weak and inconsistent behavioral effects. In contrast, when the belief state is explicitly specified at task time, belief-prefilled agents conduct on average 26.9% fewer searches and visit 16.9% fewer unique sources than neutral-prefilled agents. These results suggest that persuasion, even in prior interaction, can affect the agent's behavior, motivating behavior-level evaluation in agentic systems.
comment: Code available at https://github.com/HyejunJeong/persuasion-propagation
Systems and Control (EESS)
Dynamic Lane Allocation in UAM Corridors for Efficient Multimodal Door-to-Door Mobility
This article presents dynamic directional lane allocation in urban air mobility (UAM) corridors as a discrete-time mixed-integer linear program (MILP). This formulation activates, deactivates, and reverses lane direction as bi-directional airspace demand evolves. We model demand from disaggregate ground travel data by decomposing each trip into a multi-modal sequence with first-, middle-, and last-mile legs and routing the UAM-served middle-mile segment through a vertiport-side dispatch model. We use the San Francisco Bay Area as a case study by placing a multi-region spanning corridor between Contra Costa county and Silicon Valley. We find that the dynamic policy cuts unused airspace capacity by 5x, increases mean lane utilization from 36-48% to 67% at the same service level relative to baselines, and reduces commuting-population mean travel time by up to 21.6%. These results show that dynamic configuration of airspace capacity alleviates a significant percentage of the under-utilization issue of lane-based UAM airspace design and UAM concept of operations. This dynamic allocation also provides a safe, structural way to increase throughput, making UAM a more viable complement to multimodal door-to-door mobility systems.
comment: Submitted to AIAA Aviation Forum
N3P: Accelerated Automated Parking via a Learning-Based Naturalistic Three-Stage Scheme SC 2026
Autonomous parking requires efficient path planning that ensures kinematic feasibility and collision avoidance in constrained environments. Hybrid A* is widely used but computationally expensive, while reinforcement learning (RL) methods lack reliability and often struggle with long-horizon geometric constraints, leading to suboptimal trajectories. We present N3P, a fast learning-based three-stage framework for automated parking. By introducing an intermediate preparatory pose and using a learning module to predict it, N3P decomposes the maneuver into simpler subproblems, thereby reducing computational complexity and accelerating path generation. We validate the framework by integrating it with Hybrid A* algorithms. Experiments in perpendicular and parallel parking scenarios show that N3P-enhanced Hybrid A* speeds up planning by more than 80%. It also outperforms RL baselines in success rate and trajectory quality, producing shorter trajectories with fewer gear changes, while achieving comparable or lower planning time in most cases.
comment: Accepted at IEEE Intelligent Transportation Systems Conference (ITSC 2026)
TriSweep: A Four-Drone Swarm Framework for Electromagnetic Side-Channel Analysis
Electromagnetic (EM) side-channel analysis traditionally assumes a stationary, close-proximity probe - a threat model that underestimates aerial adversaries. TriSweep is a simulation framework that designs and evaluates a four-drone swarm architecture for autonomous standoff EM-SCA of embedded microcontrollers at 0.25-1.5 m. Three spatially specialized collector drones - Anchor (full-spectrum), Mask Probe (mask-register loading leakage), and Cipher Probe (masked SubBytes output leakage) - feed a stationary Accumulator drone that performs coherent combining (+4.8 dB SNR gain) and second-order mask cancellation via a centered product of the two spatially separated leakage streams. Evaluated against three real ANSSI ASCAD datasets (ATmega8515 masked AES-128 and 50/100-sample desynchronized variants), the framework achieves a simulated key rank of 18 +/- 1.7 (five-seed) at 0.25 m on the primary masked dataset. Profiling-trace cross-correlation alignment reduces single-drone rank from 89 to 21 on the 100-sample-jitter variant, demonstrating compensation for drone hover vibration. A two-channel CNN in the Accumulator converges to a loss of 0.454 (vs. random baseline 5.545) and improves rank on desynchronized datasets. No physical hardware has been fabricated; prototype construction is the planned next step.
comment: Simulation framework + systems design for a four-drone swarm performing standoff electromagnetic side-channel analysis. No hardware fabricated yet
A Generalized Nash Equilibrium-Seeking Scheme for Trauma Resuscitation
Trauma resuscitation is a clinical process for treating life-threatening physiological disorders in safety-critical environments, driven by the experience of healthcare workers (HCWs). Designing and optimizing quantifiable metrics that accurately capture HCW decisions may augment current resuscitation procedures with the potential to improve patient outcomes. This motivates our socio-technical formulation of trauma resuscitation as a distributed generalized Nash equilibrium (GNE)-seeking game with coupled inequality constraints. This method is optimized over a time-varying communication graph. We introduce novel insights from clinical experience to model HCWs behavior. This work facilitates the best possible resuscitation outcome given HCWs workloads, schedules, competencies, and limited resources.
Output regulation via input-output data
From a multi-input-multi-output (MIMO) discrete-time linear system, we collect input-output data affected by noise in the form of an unknown exosignal and, from these data points (without knowledge of the system model), we design a feedback controller that asymptotically annihilates the effect of that exosignal on the output. This amounts to solving an output regulation problem purely from input-output data, for MIMO linear systems. The design of the controller corresponds to a semidefinite program and is pursued on a suitable auxiliary system. Such design carries over from the auxiliary system to the original one by a rigorous examination of the relation between the solutions of the two systems.
Global Convergence of Control-Based Lagrangian Flows for Non-Convex Optimization
This paper studies the flows of continuous-time dynamics for equality-constrained optimization based on control-theoretic Lagrangian methods. In particular, we consider dynamics induced by proportional-integral and feedback linearization controllers, which have been recently proposed as alternatives to primal-dual gradient methods. Unlike existing convergence results, which rely on strong convexity of the objective function or boundedness assumptions, we exploit the geometric structure induced by the constraints. Specifically, we show global exponential convergence for non-convex problems that satisfy a suitable convexity property when restricted to the constraint manifold.
KAPPS: A knowledge-based CPPS Architecture for the Circular Factory
While linear manufacturing relies on homogeneous materials and predefined process sequences, circular manufacturing reintroduces used products with heterogeneous and uncertain conditions. This shift demands manufacturing systems capable of handling variable product states, dynamically reconfigurable processes, and the integration of human and machine knowledge. Conventional manufacturing IT architectures, designed for stable structures and deterministic execution, are unable to meet these requirements, as they cannot adequately represent and manage the uniqueness of individual components at runtime. Following a design science methodology for developing a Cyber Physical Production System for circular manufacturing, we derive 14 requirements from five complementary perspectives. Based on these requirements, we design KAPPS, a knowledge-based architecture that uses an ontology-grounded knowledge graph as a unifying data backbone, combined with a semantic interface layer to enable consistent data and information integration, reasoning, and communication across heterogeneous systems and services, turning the knowledge graph from an integration layer into the factories authoritative write-time state. KAPPS incorporates modules for constraint enforcement and event-driven planning, enabling incremental adaptation of execution plans under uncertainty and human-machine knowledge exchange. The applicability of KAPPS is demonstrated through two implemented use cases: (i) Anomaly detection and learning through knowledge graph mediated services and (ii) runtime constraint enforcement in a modular conveyor system. Subsequently, the architecture is evaluated against the 14 requirements (ed. abstract shortened)
comment: Submitted to Journal of Manufacturing Systems (JMS)
QuCtrl-BELL: A Compiler-Driven Sub-Microsecond Feedback Control Stack for Scalable Trapped-Ion Quantum Experiments
As trapped-ion quantum computing scales to larger qubit registers and more complex control protocols, classical control systems face a fundamental tradeoff: sub-microsecond board-level feedback requires tight hardware coupling, whereas maintainability and extensibility require clean, modular software abstractions. This paper presents QuCtrl-BELL (Bell), a compiler-driven software stack for trapped-ion quantum control. The design resolves this tradeoff by decoupling control flow -- including loops, branches, and synchronization -- from hardware state data. A Python-embedded domain-specific language (DSL) is lowered through a six-stage transpilation pipeline covering control flow graph (CFG) construction, static single-assignment (SSA) conversion, liveness analysis, and graph-coloring register allocation. The compiler generates deterministic distributed board-level programs and compact step-table data. A cross-board synchronization protocol supports feedback loops with latency below 700~ns without host intervention. Bell is deployed and evaluated on the QuCtrl-BELL platform (RISC-V + PXIe), demonstrating that a compiler-based infrastructure can provide programmability, deterministic timing, and modularity for scalable trapped-ion quantum control.
comment: 7 pages, 6 figures
Low-Complexity Tensor Beamforming for RIS-Aided Multiuser Multistream MIMO Systems
We address joint active and passive beamforming for uplink RIS-assisted multi-user multi-stream MIMO systems with joint detection. The coupled design of the receive combiner, block-diagonal user precoders, and RIS phase vector is formulated through a third-order composite channel tensor. Exploiting this multilinear structure, we propose a multi-stream tensor alternating optimization method that updates the combiner, user precoders, and RIS coefficients via low-dimensional tensor projections. Simulations show that the proposed method approaches a multi-start alternating-optimization benchmark while reducing computational complexity and improving large-RIS scaling.
comment: 5 pages, 3 figures,
Online Optimization with Unknown Time-Varying Parameters from Noisy Gradient Measurements
We study online optimization problems in which the cost function depends on latent, time-varying parameters that are unmeasurable and governed by unknown dynamics. Specifically, we consider a strongly convex cost function whose linear term evolves according to unknown linear stochastic dynamics, while the algorithm has access only to finite noisy gradient measurements. We propose a solution that uses control theoretic tools to reconstruct the latent parameters from gradient observations using a Gauss-Markov estimator, then identifies the parameter dynamics using an instrumental-variable estimator, and finally forecasts the parameters to compute the future minimizer. We provide a bound on the expected tracking error. We illustrate the effectiveness of our algorithm on a series of numerical examples.
Kernel-Based Safe Exploration in Deep Reinforcement Learning
Safety has been a major concern when deploying deep reinforcement learning algorithms in the real world. A promising direction that ensures that the learned policy does not visit unsafe regions is to learn a \emph{barrier function} along with the policy. A barrier is a function from states to reals that assigns low values to the initial states, high values to the unsafe states, and decreases in expectation on each transition; such a function can be used to bound the probability of reaching unsafe states. Previous attempts learned a barrier function directly from exploration data, but this required either large amounts of data or restrictions on the system dynamics. In this paper, we show how kernel embeddings can be used to learn barrier functions during deep reinforcement learning for stochastic systems with unknown dynamics. Our algorithm, \emph{kernel-based safe exploration (KBSE)}, learns an optimal policy and a barrier simultaneously during exploration. The barriers are computed iteratively, represented as conditional mean embeddings, and provide better probabilistic safety guarantees with more exploration. The exploration algorithm uses the learned barrier functions to identify safety violations. In the case of violation, it intervenes to modify the unsafe action to a safe action, thereby ensuring that the exploration is restricted to actions that bound the probability of reaching unsafe states. We evaluate KBSE on several complex continuous control benchmarks. Experimental results establish our new algorithm to be suitable for synthesizing control policies that are probabilistically safe without degradation in reward accumulation.
comment: Accepted at L4DC Conference (22 Jan 2026)
Equilibrium-Free Contraction Stability Analysis for Grid-Forming Converter-Based Microgrids
Renewable-driven microgrids dominated by grid-forming (GFM) converters are subject to persistent power fluctuations, making equilibrium-known stability assessments restrictive. This paper develops an equilibrium-free contraction stability method based on semi-contraction theory. By formulating the system in a symmetry-aware projected state space, the intrinsic rotational mode induced by uniform angle shifts is removed. A blockwise Jacobian decomposition is introduced to characterize the coupled active and reactive power dynamics, yielding a computable regional contraction condition. This condition is then converted into forward-invariant stability certificates that provide trajectory-level performance guarantees. For autonomous operation without disturbances, the method provides an equilibrium-free nonlinear stability characterization together with an estimation of the region of attraction (ROA). For non-autonomous operation under disturbances, it derives explicit bounds for quasi-steady tracking under slowly varying injections and for robustness under fast or composite disturbances. Case studies on a 9-bus system validate the proposed method.
Bearing-Only Solution to the Fermat-Weber Location Problem for Unicycle Agent
This paper addresses bearing-only algorithms for solving the Fermat-Weber Location Problem (FWLP) with a unicycle agent. Unlike existing FWLP solutions for single- or double-integrator agents, our approach accounts for the nonholonomic constraints of wheeled robots. We first develop a bearing-only control law for the case with stationary beacons. Next, we consider saturated control inputs and propose a corresponding bearing-only control law. Finally, we address moving beacons with constant velocities and develop a control law that enables the unicycle agent to track the moving Fermat-Weber point. Both simulations and experiments are provided to demonstrate the effectiveness of the proposed methods.
comment: This paper has been accepted for presentation at the 23rd IFAC World Congress
Engineering Hybrid Physics-Informed Neural Networks for Next-Generation Electricity Systems: A State-of-the-Art Review
The integration of machine learning with domain-specific physics is transforming the design, monitoring, and control of electricity systems, where data scarcity, limited interpretability, and the need to enforce physical laws constrain purely data-driven models. Physics-informed machine learning (PIML) addresses these limitations by embedding governing equations directly into the learning process, yielding accurate, efficient, and scalable solutions for Industry 4.0 applications. This article reviews hybrid PIML architectures for electricity systems, including physics-informed neural networks (PINNs), Deep Operator Networks (DeepONets), Fourier Neural Operators, Extreme Learning Machine-enhanced PINNs, graph-based PINNs (PIGNNs), and domain-decomposition PINNs. Each approach is examined through case studies spanning field analysis, fault detection, digital twins, surrogate modeling, and control optimization. The review shows that embedding Maxwell's equations and other first-principles constraints substantially improves predictive accuracy under sparse and noisy data, reduces simulation time by orders of magnitude relative to finite element methods, and enhances generalization across operating regimes. Hybrid frameworks consistently outperform purely data-driven baselines on parameter sensitivity, dynamic behavior, and robustness, while supporting real-time digital-twin calibration and uncertainty quantification. Persistent challenges include training instability for stiff multi-scale problems, computational cost of high-fidelity models, and the absence of standardized benchmarks. The findings demonstrate that PIML enables a paradigm shift from black-box data-driven methods to transparent, physics-informed strategies, positioning the field for sustained innovation in resilient and intelligent electricity systems.
comment: 59 pages, 6 Figures
AdaPTwin: Adaptive Multi-Fidelity Predictive Digital Twin for Proactive Radio Resource Management in Vehicular Networks
The highly dynamic nature of vehicular networks necessitates proactive and site-specific radio resource management (RRM) to achieve ultra-reliable low-latency communications. While Network Digital Twins (NDTs) have emerged as a promising enabler, ray-tracing remains time-consuming, challenging accurate RRM under latency constraints. We propose AdaPTwin, an adaptive multi-fidelity predictive NDT for proactive and latency-aware RRM in vehicular networks. Unlike single- and multi-fidelity NDTs with fixed fidelity levels, AdaPTwin dynamically adjusts NDT fidelity based on network conditions. The framework adopts a hierarchical cloud-edge architecture, where computationally intensive fidelity selection is performed periodically in the cloud, and the proactive RRM loop operates in real-time at the edge. The edge-based proactive RRM task consists of channel prediction between vehicles and roadside units (RSUs) via trajectory forecasting and look-ahead ray tracing, followed by RRM execution. A transformer model enhanced with continual and transfer learning enables vehicular trajectory prediction while adapting to new environments and traffic patterns. Ray-tracing is performed using NVIDIA Sionna by exploiting a dynamically updated virtual environment to ensure realistic radio propagation within the NDT. Furthermore, a joint RSU beamforming and vehicle-RSU association problem is formulated to maximize proportionally fair sum-rate, and it is efficiently solved using a scalable multi-start iterative coordinate descent algorithm. Comparisons against reactive, single-fidelity, and non-adaptive predictive NDTs under realistic vehicular conditions confirm that AdaPTwin successfully adapts to diverse scenarios where other frameworks fail. Ultimately, AdaPTwin achieves up to 90% sum-rate gain and 80% outage probability reduction compared to non-adaptive NDTs, while maintaining real-time performance.
comment: This work has been submitted to the IEEE for possible publication
System Level Analysis and Management of Orbital Debris Using Empirical Dynamic Modeling
Orbital debris is a pressing problem which presents a danger to global space operations and a barrier to continued development of the space economy and space infrastructure. As research continues regarding orbital debris, there is a need for tools to understand the system-level implications of orbital debris solutions. This research considers the orbital debris problem as a dynamic process. Based on dynamic system theories, time-series variables of the numbers of orbital debris, orbital objects, and object launches should be causally linked, which means they share a common system attractor manifold. We propose a data-driven method based on complexity science to reconstruct a shadow attractor of the dynamic system using limited observable variables. The reconstructed shadow attractor helps us to understand the fundamental system dynamics for orbital debris and enables us to simulate the future of the orbital debris system based on changes to policy. These findings represent a significant advancement in our ability to understand high level impacts of space system policy with limited data available.
comment: 23 pages, 9 figures, Journal of Aerospace Information Systems (accepted)
Corruption-Tolerant Asynchronous Q-Learning with Near-Optimal Rates ICML
We study the problem of learning the optimal policy in a discounted, infinite-horizon reinforcement learning (RL) setting in the presence of adversarially corrupted rewards. To address this problem, we develop a novel robust variant of the \(Q\)-learning algorithm and analyze it under the challenging asynchronous sampling model with time-correlated data. Despite corruption, we prove that the finite-time guarantees of our approach match existing bounds, up to an additive term that scales with the fraction of corrupted samples. We also establish an information-theoretic lower bound, revealing that our guarantees are near-optimal. Notably, our algorithm is agnostic to the underlying reward distribution and provides the first finite-time robustness guarantees for asynchronous \(Q\)-learning. A key element of our analysis is a refined Azuma-Hoeffding inequality for almost-martingales, which may have broader applicability in the study of RL algorithms.
comment: To appear at the 43rd International Conference on Machine Learning (ICML)
SONIC: Supersizing Motion Tracking for Natural Humanoid Whole-Body Control
Despite the rise of billion-parameter foundation models trained across thousands of GPUs, similar scaling gains have not been shown for humanoid control. Current neural controllers for humanoids remain modest in size, target a limited set of behaviors, and are trained on a handful of GPUs. We show that scaling model capacity, data, and compute yields a generalist humanoid controller capable of natural, robust whole-body movements. We position motion tracking as a scalable task for humanoid control, leveraging dense supervision from diverse motion-capture data to acquire human motion priors without manual reward engineering. We build a foundation model for motion tracking by scaling along three axes: network size (1.2M to 42M parameters), dataset volume (100M+ frames from 700 hours of motion capture), and compute (21k GPU hours). Beyond demonstrating the benefits of scale, we further show downstream utility through: (1) a real-time kinematic planner bridging motion tracking to tasks such as navigation, enabling natural and interactive control, and (2) a unified token space supporting VR teleoperation and vision-language-action (VLA) models with a single policy. Through this interface, we demonstrate autonomous VLA-driven whole-body loco-manipulation requiring coordinated hand and foot placement. Scaling motion tracking exhibits favorable properties: performance improves steadily with compute and data diversity, and learned policies generalize to unseen motions, establishing motion tracking at scale as a practical foundation for humanoid control.
comment: Project page: https://nvlabs.github.io/SONIC/
Disturbance Attenuation Regulator I-B: Signal Bound Convergence and Steady-State
This paper establishes convergence and steady-state properties for the signal bound disturbance attenuation regulator (SiDAR). Building on the finite horizon recursive solution developed in a companion paper, we introduce the steady-state SiDAR and derive its tractable linear matrix inequality (LMI) with $O(n^3)$ complexity. Systems are classified as degenerate or nondegenerate based on steady-state solution properties. For nondegenerate systems, the finite horizon solution converges to the steady-state solution for all states as the horizon approaches infinity. For degenerate systems, convergence holds in one region of the state space, while a turnpike arises in the complementary region. When convergence holds, the optimal multiplier and control gain are obtained directly from the LMI solution. Numerical examples illustrate convergence behavior and turnpike phenomena. Companion papers address the finite horizon SiDAR solution and the stage bound disturbance attenuation regulator (StDAR).
Disturbance Attenuation Regulator I-A: Signal Bound Finite Horizon Solution
This paper develops a generalized finite horizon recursive solution to the discrete time signal bound disturbance attenuation regulator (SiDAR) for state feedback control. This problem addresses linear dynamical systems subject to signal bound disturbances, i.e., disturbance sequences whose squared signal two-norm is bounded by a fixed budget. The term generalized indicates that the results accommodate arbitrary initial states. By combining game theory and dynamic programming, we derive a recursive solution for the optimal state feedback policy valid for arbitrary initial states. The optimal policy is nonlinear in the state and requires solving a tractable convex scalar optimization for the Lagrange multiplier at each stage; the control is then explicit. For fixed disturbance budget $α$, the state space partitions into two distinct regions: $\mathcal{X}_L(α)$, where the optimal control policy is linear and coincides with the standard linear $H_{\infty}$ state feedback control, and $\mathcal{X}_{NL}(α)$, where the optimal control policy is nonlinear. We establish monotonicity and boundedness of the associated Riccati recursions and characterize the geometry of the solution regions. A numerical example illustrates the theoretical properties. This work provides a complete feedback solution to the finite horizon SiDAR for arbitrary initial states. Companion papers address the steady-state problem and convergence properties for the signal bound case, and the stage bound disturbance attenuation regulator (StDAR).
Geometric Conditions for Lossless Convexification in Linear Optimal Control with Discrete-Valued Inputs: Real-Time Implementation for Spacecraft Rendezvous
Optimal control problems with discrete-valued inputs are inherently challenging due to their mixed-integer nature, rendering them generally intractable for real-time, safety-critical aerospace applications. Lossless convexification offers a powerful alternative by reformulating these mixed-integer programs into computationally efficient convex programs. This paper develops a lossless convexification framework for the optimal control of linear time-varying systems with discrete-valued inputs. We extend existing theoretical results by demonstrating that system normality is preserved when reformulating Lagrange-form problems into Mayer-form via an epigraph transformation. Furthermore, we establish that under simple geometric conditions on the input set, the solution to the relaxed convex problem strictly satisfies the original non-convex input constraints. This framework enables the real-time computation of optimal discrete-valued controls without resorting to mixed-integer optimization. The proposed algorithm is validated on a spacecraft rendezvous maneuver utilizing discrete-valued reaction thrusters in an elliptical orbit. Numerical results from Monte Carlo simulations confirm that the algorithm consistently yields exact discrete-valued control inputs with computational timelines compatible with safety-critical, on-board applications.
Uncertainty-Aware Predictive Safety Filters for Probabilistic Neural Network Dynamics
Predictive safety filters (PSFs) leverage model predictive control to enforce constraint satisfaction during deep reinforcement learning (RL) exploration, yet their reliance on first-principles models or Gaussian processes limits scalability and broader applicability. Meanwhile, model-based RL (MBRL) methods routinely employ probabilistic ensemble (PE) neural networks to capture complex, high-dimensional dynamics from data with minimal prior knowledge. However, existing attempts to integrate PEs into PSFs lack rigorous uncertainty quantification. We introduce the Uncertainty-Aware Predictive Safety Filter (UPSi), a PSF that provides rigorous safety predictions using PE dynamics models by formulating future outcomes as reachable sets. UPSi introduces an explicit certainty constraint that prevents model exploitation and integrates seamlessly into common MBRL frameworks. We evaluate UPSi within Dyna-style MBRL on standard safe RL benchmarks and report substantial improvements in exploration safety over prior neural network PSFs while maintaining performance on par with standard MBRL. UPSi bridges the gap between the scalability and generality of modern MBRL and the safety guarantees of predictive safety filters.
A Short and Unified Convergence Analysis of the SAG, SAGA, and IAG Algorithms ICML
Stochastic variance-reduced algorithms such as Stochastic Average Gradient (SAG) and SAGA, and their deterministic counterparts like the Incremental Aggregated Gradient (IAG) method, have been extensively studied in large-scale machine learning. Despite their popularity, existing analyses for these algorithms are disparate, relying on different proof techniques tailored to each method. Furthermore, the original proof of SAG is known to be notoriously involved, requiring computer-aided analysis. Focusing on finite-sum optimization with smooth and strongly convex objective functions, our main contribution is to develop a single unified convergence analysis that applies to all three algorithms: SAG, SAGA, and IAG. Our analysis features two key steps: (i) establishing a bound on delays due to stochastic sub-sampling using simple concentration tools, and (ii) carefully designing a novel Lyapunov function that accounts for such delays. The resulting proof is short and modular, providing the first high-probability bounds for SAG and SAGA that can be seamlessly extended to non-convex objectives and Markov sampling. As an immediate byproduct of our new analysis technique, we obtain the best known rates for the IAG algorithm, significantly improving upon prior bounds.
comment: To appear at the 43rd International Conference on Machine Learning (ICML)
Quantifying Grid-Forming Behavior: Bridging Device-Level Dynamics and System-Level Strength
Grid-forming (GFM) technology is widely regarded as a promising solution for future power systems dominated by power electronics. However, a precise method for quantifying GFM converter behavior and a universally accepted GFM definition remain elusive. Moreover, the impact of GFM on system stability is not precisely quantified, creating a significant disconnect between device and system levels. To address these gaps from a small-signal perspective, at the device level, we introduce a novel metric, the Forming Index (FI) to quantify a converter's response to grid voltage fluctuations. Rather than enumerating various control architectures, the FI provides a metric for the converter's GFM ability by quantifying its sensitivity to grid variations. At the system level, we propose a new quantitative measure of system strength that captures the multi-bus voltage stiffness, which quantifies the voltage and phase angle responses of multiple buses to current or power disturbances. We further extend and define this concept to grid strength and bus strength to identify weak areas within the system. Finally, we bridge the device and system levels by formally proving that GFM converters enhance system strength. Our proposed framework provides a unified benchmark for GFM converter design, optimal placement, and system stability assessment.
comment: On arXiv, we initially submitted the short version, 2503.24152, and then submitted the long version, 2510.26953. However, the short version was rejected by journal. Since the short version was posted in arxiv earlier, we would like to merge the latest manuscript of the long version into the short version, for which we have already submitted an update, and then withdraw the long version
Quantifying Grid-Forming Behavior: Bridging Device-level Dynamics and System-Level Stability
Grid-forming (GFM) technology is widely regarded as a promising solution for future power systems dominated by power electronics. However, a universally accepted definition of GFM behavior and precise method for its quantification remain elusive. Moreover, the impact of GFM converter on system stability is not precisely quantified, creating a significant disconnect between device and system levels. To address these gaps from a small-signal perspective, at the device level, the paper introduces a novel metric, the Forming Index (FI) to quantify a converter's response to grid voltage fluctuations. Rather than enumerating various control architectures, the FI provides a metric for the converter's GFM ability by quantifying its sensitivity to grid variations. At the system level, a new quantitative measure of system strength that captures the multi-bus voltage stiffness is proposed, which quantifies the voltage and phase angle responses of multiple buses to current or power disturbances. The paper further extends and defines this concept to grid strength and bus strength to identify weak areas within the system. Finally, the device and system levels are bridged by formally proving that GFM converters enhance system strength. The proposed framework provides a unified benchmark for GFM converter design, optimal placement, and system stability assessment.
On the stability of event-based control with neuronal dynamics
Event-based control, unlike analogue control, poses significant analytical challenges due to its hybrid dynamics. This work investigates the stability and inter-event time properties of a control-affine system under event-based impulsive control. The controller consists of multiple neuronal units with leaky integrate-and-fire dynamics acting on a time-invariant, multivariable plant in closed loop. Both the plant state and the neuronal units exhibit discontinuities that cancel if combined linearly, enabling a direct correspondence between the event-based impulsive controller and a corresponding analogue controller. Leveraging this observation, we prove global practical stability of the event-based impulsive control system. In the general nonlinear case, we show that the event-based impulsive controller ensures global practical asymptotic stability if the analogue system is input-to-state stable (ISS) with respect to specific disturbances. In the linear case, we further show global practical exponential stability if the analogue system is stable. We illustrate our results with numerical simulations. The findings reveal a fundamental link between analogue and event-based impulsive control, providing new insights for the design of neuromorphic controllers.
comment: 11 pages, 4 figures; typos corrected, references added
Complex Frequency as Generalized Eigenvalue
This paper shows that the concept of complex frequency, originally introduced to characterize the dynamics of signals with complex values, constitutes a generalization of eigenvalues when applied to the states of linear time-invariant (LTI) systems. Starting from the definition of geometric frequency, which provides a geometrical interpretation of frequency in electric circuits that admits a natural decomposition into symmetric and antisymmetric components associated with amplitude variation and rotational motion, respectively, we show that complex frequency arises as its restriction to the two-dimensional Euclidean plane. For LTI systems, it is shown that the complex frequencies computed from the system's states subject to a non-isometric transformation, coincide with the original system's eigenvalues. This equivalence is demonstrated for diagonalizable systems of any order. The paper provides a unified geometric interpretation of eigenvalues, bridging classical linear system theory with differential geometry of curves. The paper also highlights that this equivalence does not generally hold for nonlinear systems. On the other hand, the geometric frequency of the system can always be defined, providing a geometrical interpretation of the system flow. A variety of examples based on linear and nonlinear circuits illustrate the proposed framework.
Toward Goal-Oriented Communication in Multi-Agent Systems: An overview
As multi-agent systems (MAS) become increasingly prevalent in autonomous systems, distributed control, and edge intelligence, efficient communication under resource constraints has emerged as a critical challenge. Traditional communication paradigms often emphasize message fidelity or bandwidth optimization, overlooking the task relevance of the exchanged information. In contrast, goal-oriented communication prioritizes the importance of information with respect to the agents' shared objectives. This review provides a comprehensive survey of goal-oriented communication in MAS, bridging perspectives from information theory, communication theory, and machine learning. We examine foundational concepts alongside learning-based approaches and emergent protocols. Special attention is given to coordination under communication constraints, as well as applications in domains such as swarm robotics, federated learning, and edge computing. The paper concludes with a discussion of open challenges and future research directions at the intersection of communication theory, machine learning, and multi-agent decision making.
comment: 37 pages
Robust Nash equilibrium seeking based on semi-Markov switching topologies
This paper investigates a distributed robust Nash Equilibrium (NE) seeking problem for second-order players subject to external disturbances and uncertain dynamics while communicating via semi-Markov switching topologies. To accommodate the above concerns, the following targets require to be reached simultaneously: (1) Disturbances and uncertain dynamics rejection in finite time; (2) NE seeking for the second-order players; (3) Distributed action estimation on non-neighboring players under semi-Markov switching. By combining supertwisting-based Integral Sliding-Mode Control (ISMC) with a leader-follower consensus protocol, a novel robust NE seeking algorithm is constructed. Furthermore, to lessen dispensable information transmission, a sampled-data-based event-triggered mechanism is introduced. Incorporating the advantages of both semi-Markov switching and event-triggered mechanism, another NE seeking algorithm is proposed. Theoretical analysis via a Lyapunov-Krasovskii functional proves the leader-follower consensus can be achieved in the mean-square sense. Finally, a connectivity control game is formulated to validate the algorithms.
comment: Extended version of a manuscript submitted to IEEE Transactions on Automatic Control, May 2026
Model Predictive Control of Thermo-Hydraulic Systems Using Primal Decomposition
Decarbonizing the global energy supply requires more efficient heating and cooling systems. Model predictive control enhances the operation of cooling and heating systems but depends on accurate system models, often based on control volumes. We present an automated framework including time discretization to generate model predictive controllers for such models. To ensure scalability, a primal decomposition exploiting the model structure is applied. The approach is validated on an underground heating system with varying numbers of states, demonstrating the primal decomposition's advantage regarding scalability.
comment: This work has been accepted at IFAC WC26
Toward Self-Organizing Production Logistics: A Multi-Agent Approach
Production logistics (PL) is increasingly exposed to variability, dynamic interdependencies, and operational disturbances that challenge conventional centralized planning and control. These characteristics are particularly pronounced in circular production systems, but are increasingly relevant across PL more generally. This paper addresses this challenge through the concept of Self-Organizing Production Logistics (SOPL) using the Design Science Research Methodology (DSRM) as a structuring framework. The paper identifies key technological and systemic drivers motivating SOPL, including autonomous logistics resources, distributed AI-based decision-making, and increasing operational uncertainty in circular production. Based on these drivers, system-level objectives and design requirements for SOPL are derived. Building on these requirements, an initial multi-agent architecture is proposed that combines embodied and non-embodied agents, event-driven coordination, semantic knowledge structures, and digital twins. In addition, a three-phase demonstration roadmap is presented, ranging from an initial laboratory demonstrator toward increasingly distributed and adaptive SOPL systems. The Phase I demonstrator serves as an experimental setup for investigating disturbance handling, human involvement, and supervisory coordination in an order-driven kitting and supply scenario. Overall, the paper contributes a conceptual foundation for the design, implementation, and experimental evaluation of SOPL systems.
comment: Submitted to IFIP International Conference on Advances in Production Management Systems 2026 (APMS 2026)
Active Defense Against False Data Injection Attacks in Robotic Manipulators
Robotic systems are vulnerable to False Data Injection Attacks (FDIAs), where adversaries corrupt sensor signals to gain malicious control. Feedback linearization exposes robotic systems to integrator vulnerability, making them susceptible to stealthy attacks that can cause significant deviations in end-effector behavior without raising alarms. This paper addresses the resilience of manipulators against finite-horizon FDIAs by formalizing two defense methods, namely anomaly-aware virtual damping and manipulability reduction, with probabilistic guarantees on nominal task execution. Simulations on a 7-DOF redundant manipulator show that the proposed defenses substantially reduce the impact of FDIA compared to using solely a threshold-based ADS like the Chi-squared, while preserving nominal task performance in the absence of attack.
comment: Extended 8-page version containing full proofs. An abridged 6-page version has been accepted for publication in the Proceedings of the 23rd IFAC World Congress (2026). v3: Minor typographical fixes and updated reference formatting
ProOPF: Benchmarking and Improving LLMs for Professional-Grade Power Systems Optimization Modeling
Growing renewable penetration introduces substantial uncertainty into power system operations, necessitating frequent adaptation of dispatch objectives and constraints and challenging expertise-intensive, near-real-time modeling workflows. Large Language Models (LLMs) provide a promising avenue for automating this process by translating natural-language (NL) operational requirements into executable optimization models via semantic reasoning and code synthesis. Yet existing LLM datasets and benchmarks for optimization modeling primarily target coarse-grained cross-domain generalization, offering limited, rigorous evaluation in power-system settings, particularly for Optimal Power Flow (OPF). We therefore introduce \textbf{ProOPF-D} and \textbf{ProOPF-B}, a dataset and benchmark for professional-grade OPF modeling: ProOPF-D contains 12K instances pairing NL requests with parameter adjustments and structural extensions to a canonical OPF, together with executable implementations; ProOPF-B provides 121 expert-annotated test cases with ground-truth code, enabling end-to-end evaluation under both concrete and abstract OPF modeling regimes.
Demographic Dependence of Vaccine Adoption under Opinion Persuasion
Inspired by contagion models of social belief formation, we develop an epistemically-informed modeling framework, SIS-Vo, in which vaccine-related information propagates on a signed opinion network. Our model allows for heterogeneous treatment effects of policy messages across subpopulations through demographic-specific responses. We derive fixed-point characterizations of the healthy (disease-free) and endemic equilibria of this model, and obtain conditions for local stability of the healthy state in terms of the contact network and opinion-dependent vaccination capacities. Using numerical simulations, we illustrate how suitably targeted policy interventions, acting through opinion dynamics, can stabilize the epidemic process by moving the system towards the healthy regime. The SIS-Vo framework thus provides a natural basis for control-theoretic analysis of vaccination policies that remain robust even when misinformation targets specific subgroups.
comment: 6 pages, 5 figures. Accepted at IFAC World Congress 2026
Universal Transient Stability Analysis: A Pre-trained Generative Transformer-Enabled Power System Dynamics Prediction Framework
Existing dynamics prediction frameworks for transient stability analysis (TSA) fail to achieve multi-scenario "universality": the inherent ability of a single, pre-trained architecture to generalize across diverse operating conditions, unseen faults, and heterogeneous systems. To address this, this paper proposes Uni-TSA, a pre-trained generative Transformer-enabled universal framework that models multivariate transient dynamics prediction as a univariate generative task with three key innovations: First, a novel data processing pipeline featuring channel independence decomposition to resolve dimensional heterogeneity, sample-wise normalization to eliminate separate stable/unstable pipelines, and temporal patching for efficient long-sequence modeling; Second, a parameter-efficient freeze-and-finetune strategy that augments the pre-trained generative Transformer backbone with dedicated input embedding and output projection layers while freezing core transformer blocks to preserve generic feature extraction capabilities; Third, a two-stage fine-tuning scheme that combines teacher forcing, which feeds the model ground-truth data during initial training, with scheduled sampling, which gradually shifts to leveraging model-generated predictions, to mitigate cumulative errors in long-horizon iterative prediction. Comprehensive testing demonstrates the framework's universality, as Uni-TSA trained solely on the New England 39-bus system achieves zero-shot generalization to mixed stability conditions and unseen faults, and matches expert performance on the Iceland 189-bus system with only 5% fine-tuning data. Additional cross-system experiments on the IEEE 68-bus and IEEE 118-bus systems, together with stability metrics and PEBS comparison, further confirm Uni-TSA's strong zero-shot transferability and data-efficient adaptation.
The E-Rocket: Low-cost Testbed for TVC Rocket GNC Validation
This paper presents the E-Rocket, an electric-powered, low-cost rocket prototype for validation of Guidance, Navigation & Control (GNC) algorithms based on Thrust Vector Control (TVC). Relying on commercially available components and 3D printed parts, a pair of contra-rotating DC brushless motors is assembled on a servo-actuated gimbal mechanism that provides thrust vectoring capability. A custom avionics hardware and software stack is developed considering a dual computer setup which leverages the capabilities of the PX4 autopilot and the modularity of ROS 2 to accommodate for tailored GNC algorithms. The platform is validated in an indoor motion-capture arena using a baseline PID-based trajectory tracking controller. Results demonstrate accurate trajectory tracking and confirm the suitability of the E-Rocket as a versatile testbed for rocket GNC algorithms.
comment: This work has been accepted for presentation at IFAC 2026
Analytical Framework for Power System Strength
This paper proposes a general framework to evaluate power system strength. The formulation features twelve indicators, grouped in three dynamical orders, that quantify the resistance of bus voltage phasors and their first and second order rates of change to sudden current injection changes. To quantify such changes the paper introduces a novel finite differentiation technique, that we named Delta operator, able to properly capture "jumps" of algebraic variables and utilizes the recently developed concept of complex frequency. The paper also shows how the proposed framework can be systematically applied to any system device, and provides a variety of examples based on synchronous machines, converters and loads models are given. Numerical results in a benchmark system validate the exactness of the formulation.
Distributed Safety Critical Control among Uncontrollable Agents Using Reconstructed Control Barrier Functions
This paper investigates the distributed safety critical control for multi-agent systems (MASs) in the presence of uncontrollable agents with uncertain behaviors. To ensure system safety, the control barrier function (CBF) is employed in this paper. However, a key challenge is that the CBF constraints are coupled when MASs perform collaborative tasks, which depend on information from multiple agents and impede the design of a fully distributed safe control scheme. To overcome this, a novel reconstructed CBF approach is proposed. In this method, the coupled CBF is reconstructed by leveraging state estimates of other agents obtained from a distributed adaptive observer. Furthermore, a prescribed performance adaptive parameter is designed to modify this reconstruction, ensuring that satisfying the reconstructed CBF constraint is sufficient to meet the original coupled one. Based on the reconstructed CBF, we design a safety-critical quadratic programming (QP) controller and prove that the proposed distributed control scheme rigorously guarantees the safety of the MAS, even in the uncertain dynamic environments involving uncontrollable agents. The effectiveness of the proposed method is illustrated through a simulation.
Reliability and Effectiveness of Autonomous AI Agents in Supply Chain Management
This paper studies autonomous generative AI agents in multi-echelon supply chains using the MIT Beer Game. We identify four inference-time levers that shape performance: model selection, policies and guardrails, centralized data sharing, and prompt engineering. Model capability is the dominant factor: an out-of-the-box reasoning model exceeds human-level performance, and optimized reasoning models reduce costs by up to 67% relative to human teams. However, strong average performance masks substantial reliability risks. We introduce agent bullwhip: the amplification of run-to-run decision instability in autonomous multi-echelon systems. A central component is decision bullwhip, the portion of order variability generated by stochastic agent decisions rather than by changes in customer demand. We show that decision instability can amplify both across facilities at a fixed point in time and within the same facility over time, even when the demand path is held fixed. Repeated sampling, a natural test-time remedy, fails to meaningfully reduce this instability, suggesting that reliability requires changing the underlying decision policy rather than merely averaging over model outputs. To address this limitation, we propose a Group Relative Policy Optimization (GRPO)-based reinforcement-learning post-training framework that trains a shared base LLM using system-level supply-chain rewards. Post-training substantially reduces tail events, curtails agent bullwhip, and improves the reliability of autonomous supply-chain agents.
A Learning With Errors based encryption scheme for dynamic controllers that discloses residue signal for anomaly detection
Although encrypted control systems ensure confidentiality of private data, it is challenging to detect anomalies without the secret key as all signals remain encrypted. To address this issue, we propose a homomorphic encryption scheme for dynamic controllers that automatically discloses the residue signal for anomaly detection, while keeping all other signals private. To this end, we characterize the zero-dynamics of an encrypted dynamic system over a finite field of integers and incorporate it into a Learning With Errors (LWE) based scheme. We then present a method to further utilize the disclosed residue signal for implementing dynamic controllers over encrypted data, which does not involve re-encryption even when they have non-integer state matrices.
comment: 12 pages, 5 figures, Accepted to IEEE Transactions on Control of Network Systems
Energy-Gain Control of Time-Varying Systems: Receding Horizon Approximation
Standard formulations of prescribed worst-case disturbance energy-gain control policies for linear time-varying systems depend on all forward model data. In discrete time, this dependence arises through a backward Riccati recursion. This article is about the infinite-horizon $\ell_2$ gain performance of state feedback policies with only finite receding-horizon preview of the model parameters. The proposed synthesis of controllers subject to such a constraint leverages the strict contraction of lifted Riccati operators under uniform controllability and observability. The main approximation result is a sufficient number of preview steps for the incurred performance loss to remain below any set tolerance, relative to the baseline gain bound of the associated infinite-preview controller. Aspects of the result are explored in a numerical example.
comment: Accepted to appear in IEEE TAC
Robotics
HITL-D: Human In The Loop Diffusion Assisted Shared Control ICRA 2026
Autonomous manipulation systems have achieved remarkable capabilities, yet the integration of human expertise with diffusion-based policies in shared control remains relatively unexplored. In this paper, we propose Human-In-The-Loop Diffusion (HITL-D), a shared control framework that enhances user performance in multi-step, insertion, and fine manipulation tasks. HITL-D leverages a novel combination of diffusion-based policies and human control to provide autonomous end effector orientation updates conditioned on a scene point cloud and the Cartesian position of the end effector. This approach reduces the number of joystick control axes required, thereby lowering mental workload. In a multi-task user study with 12 participants, HITL-D reduced average task completion times by 40%, decreased perceived workload by 37%, and improved Likert-scale ratings for independence, intuitiveness, and confidence compared to traditional teleoperation methods. These results demonstrate that HITL-D effectively integrates human expertise with autonomous assistance, improving both objective and subjective aspects of teleoperation.
comment: Accepted for presentation at ICRA 2026
Lost in Fog: Sensor Perturbations Expose Reasoning Fragility in Driving VLAs
Interpretable autonomous driving planners depend not only on generating explanations, but also on those explanations remaining reliable under real-world sensor degradation. In this paper we present a controlled perturbation study of Vision-Language-Action (VLA) robustness in autonomous driving, evaluating Alpamayo R1 (10B parameters) across 1,996 scenarios under eight sensor perturbations (Gaussian noise at four intensities, two lighting extremes, and two fog levels; ${\sim}18{,}000$ inference trials). We find that reasoning consistency is a high-fidelity indicator of trajectory reliability: when Chain-of-Causation (CoC) explanations change after perturbation, trajectory deviation spikes $5.3{\times}$ (21.8m vs 4.1m), with $r\!=\!0.99$ across attack types and $r_{pb}\!=\!0.53$ per-sample (Cohen's $d\!=\!1.12$). A controlled ablation provides evidence that enabling CoC generation is associated with improved trajectory accuracy (11.8% on average across conditions; $p < 0.0001$) under matched inference settings. Over the tested noise range ($σ\in \{10, 30, 50, 70\}$), degradation is approximately linear ($R^2\!=\!0.957$), while standard input preprocessing defenses provide only marginal relief. Together, these results establish CoC consistency as a quantitative proxy for planning safety and motivate reasoning-based runtime monitoring for safer VLA deployment.
Fully Actuated Manifold Constraint Based Output Feedback Control for Input-Constrained Uncertain Nonlinear Systems
This paper presents a low-complexity, model-free, output-feedback controller for a class of unknown time-varying nonlinear systems with unknown input constraints. The controller achieves the preset control accuracy when the actuator is not saturated and maintains flexible control accuracy after actuator saturation. This result extends existing constraint control methods for linear manifolds to a more general form, including the construction of nonlinear manifolds and various types of constraints, thereby achieving preset control accuracy within finite or fixed time. Additionally, flexible control under unknown saturation is achieved through the construction of an error-driven flexible constraint. Finally, second-order and higher-order control examples and simulations are provided.
comment: 22 pages, 12 figures, 2 tables
roto 2.0: The Robot Tactile Olympiad ICRA 2026
Tactile-based reinforcement learning (RL) is currently hindered by fragmented research and a focus on over-saturated orientation tasks. We introduce v2 of the Robot Tactile Olympiad (\texttt{roto 2.0}), a GPU-parallelised benchmark designed to standardise tactile-based RL across four distinct robotic morphologies (16-DOF to 24-DOF). Unlike prior benchmarks, roto focuses on end-to-end "blind" manipulation, utilising only proprioception and tactile sensing without state information or distillation. We demonstrate a significant performance leap, with our blind agents achieving 13 Baoding ball rotations in 10 seconds, an order of magnitude faster than current state-of-the-art speeds. By open-sourcing our environments and robustly tuned baselines, we reduce the barrier to entry and enable researchers to prioritise fundamental algorithmic challenges over tedious RL tuning. Website: https://elle-miller.github.io/roto/
comment: Accepted to 7th ViTac Workshop, ICRA 2026
PointACT: Vision-Language-Action Models with Multi-Scale Point-Action Interaction
Vision-Language-Action (VLA) models have shown strong potential for general-purpose robotic manipulation by leveraging large pretrained vision-language backbones. However, most existing VLAs rely primarily on 2D visual representations, which limit their ability to reason about fine-grained geometry and spatial grounding - capabilities that are essential for precise and robust manipulation in 3D environments. In this paper, we propose PointACT, a dual-system 3D-aware VLA policy that integrates hierarchical 3D point cloud representations directly into the action decoding process. PointACT employs a multi-scale point-action interaction mechanism with efficient bottleneck window self-attention, enabling evolving action tokens to densely attend to both local geometric detail and global scene structure. We evaluate PointACT on the LIBERO and RLBench benchmarks and systematically compare it against monolithic and dual-system VLA baselines, including variants augmented with point cloud inputs. PointACT achieves consistent improvements across both benchmarks, increasing success rates by 10% on the challenging RLBench-10Tasks suite over state-of-the-art pretrained VLAs, with even larger gains when the vision-language backbone is frozen and the action expert is trained from scratch. Extensive ablation studies demonstrate that tightly coupling hierarchical 3D geometry with pretrained 2D semantic representations is critical for robust and spatially grounded robot control. Our results also highlight the promise of pretrained 3D representations for 3D-aware VLA policies.
comment: Accepted to RSS 2026; project webpage: https://cshizhe.github.io/projects/pointact.html
MC-Risk: Multi-Component Risk Fields for Risk Identification and Motion Planning
We present MC-Risk, a planner-aligned, multi-component risk field on a bird's-eye-view grid that yields early, calibrated, and class-aware risk localization. MC-Risk linearly composes three interpretable modules: (i) a motorized-agent field that fuses a black-box multimodal trajectory predictor with an analytic Gaussian-torus construction whose lateral width grows with speed/curvature and whose height attenuates with look-ahead; (ii) a VRU risk field that replaces isotropic pedestrian blobs with a forward-biased anisotropic kernel aligned to heading and speed; and (iii) a road penalty field that exploits full HD-map topology, imposing an off-road penalty and lane-aware risk exposure for same/opposite directions. We conduct, to our knowledge, the first standardized quantitative evaluation of a risk-field formulation on RiskBench's collision subset. MC-Risk attains the best overall risk localization and the earliest hazard indication. Finally, we demonstrate a plug-and-play planning interface by using the field as an MPC cost density, enabling risk-aware trajectory generation without additional training.
From swept contact to pose: Probe-aware registration via complementary-shape docking ICRA 2026
Accurate registration between a prior model and the real scene is essential for high-precision robotic manipulation, yet optical methods suffer from long calibration chains, line-of-sight constraints, and fabrication errors. We propose a calibration-free alternative that reformulates contact registration as complementary-shape docking between the object and the probe's swept volume, explicitly accounting for probe geometry and leveraging both contact and non-contact evidence. Our solver integrates a global-to-local search via 3D FFT correlation over low-discrepancy SO(3) samples, then followed by continuous SE(3) refinement using Lie-algebra updates and analytic contact sensitivities. This pipeline yields efficient exploration and metric-grade convergence without fragile point correspondences. Simulation across free-form meshes achieved sub-0.04 mm and sub-0.4° accuracy and robustness to pose noise and contact loss. On a tooth-preparation robot, our method attained 0.42 mm and 3.75°, outperforming an optical tracker registration while requiring no external sensors. These results demonstrate a practical and precise registration strategy for surgical and industrial robots.
comment: 8 pages, 9 figures, accepted to ICRA 2026
Validating Navmesh using Geometry: Voxel-Based Analysis with Prioritized Exploration
Navigation mesh (Navmesh) inconsistencies affect the player experience by directly impacting the navigation systems used by non-playable characters (NPCs) in game environments. While navmeshes are generated from world geometry using well-established algorithms, environments change throughout development as terrain is adjusted and assets are moved or replaced, resulting in mismatches between the navmesh and the actual environment. Existing automated approaches attempt to detect navigation issues using exploration agents and reinforcement learning techniques. However, since these methods rely on the navigation data itself or evaluate navigation behavior indirectly, they do not explicitly verify whether the navigation representation reflects the walkable space defined by underlying geometry. This paper presents a framework for validating navigation meshes through an independent, geometry-driven analysis of navmesh correctness. The approach reconstructs walkable space directly from environment geometry using a voxel-based representation, followed by constraint-aware traversal and connectivity evaluation. Validation is formulated as a prioritized search problem over the voxel space, where reinforcement learning guides sampling toward regions more likely to exhibit inconsistencies. At each sampled location, reachability derived from the voxel representation is compared against reachability obtained from the navmesh via engine-level queries. Experiments across multiple large-scale open-world game environments show that the approach consistently lowers exploration effort while maintaining similar defect detection coverage. The framework runs offline within the game engine and can be integrated into automated quality assurance pipelines. Since the method relies on geometry, it can be adapted across game engines with minimal changes, making it suitable for production deployment.
Closed Loop Dynamic Driving Data Mixture for Real-Synthetic Co-Training
Data scaling is fundamental to modern deep learning, and grows increasingly critical as autonomous driving shifts to end-to-end learning. Real-world driving data is expensive to annotate and scene-biased, making real-synthetic co-training with near-infinite synthetic data a promising direction. However, naively incorporating all available synthetic data is inefficient and leads to distribution shifts, and optimizing data mixture under practical training budgets remains a critical yet under-explored problem. In this sense, we claim that the mixture of training data requires clear guidance in terms of scene types and quantities. Particularly in this work, we conceptualize the data mixture approximately as a dynamic optimization process that iteratively adjusts the training data mixture to maximize model performance, guided by closed-loop evaluation feedback, and propose AutoScale, a fully automated closed-loop data engine unifying scene representation, data mixture optimization and retrieval, as well as model training and evaluation. Specifically, we propose Graph Regularized AutoEncoder (Graph-RAE) for driving scene representations, introduce Cluster-aware Gradient Ascent (Cluster-GA) for cluster-wise importance estimation and reweighting, and perform cluster-guided vector retrieval to select high-value samples. Experiments on NavSim demonstrate that AutoScale outperforms vanilla co-training and cross-domain baselines, achieving better performance with fewer synthetic samples under constrained budgets.
Learning Robust Dexterous In-Hand Manipulation from Joint Sensors with Proprioceptive Transformer
In-hand object manipulation is a fundamental yet challenging capability for dexterous robots. Despite significant progress in dexterous manipulation, existing approaches rely heavily on vision or tactile sensing to track object states, while joint sensing -- the most readily available modality on any robotic hand -- remains largely overlooked, particularly for tendon-driven hands. In this paper, we study how far joint sensing alone can go by asking: (i) whether motor encoders or direct joint sensing provides better proprioceptive feedback, (ii) how to extract environment information from joint measurements, and (iii) whether joint-only control can achieve competitive real-world performance without external perception. We present the Proprioceptive Transformer (PT), an exteroceptive-free approach for continuous cube rotation on a tendon-driven dexterous hand that uses only joint sensing feedback. A teacher policy is first trained via reinforcement learning with privileged object information, then distilled into PT, which operates solely on joint position and velocity histories. The Transformer architecture effectively extracts implicit object state information from temporal patterns in joint sensor readings. Experiments on the real ORCA hand show that our approach achieves 3.1x higher rotation speed than baselines. We also demonstrate that our PT achieves a 23.4% lower RMSE for cube position estimation than the MLP baseline, indicating superior extraction of exteroceptive information from proprioceptive sources.
comment: 8 pages, 6 figures, 3 tables
Hyper-V2X: Hypernetworks for Estimating Epistemic and Aleatoric Uncertainty in Cooperative Bird's-Eye-View Semantic Segmentation
Cooperative perception enabled by Vehicle-to-Everything (V2X) communication enhances autonomous driving safety by creating a unified environmental representation through shared sensory data. While recent works have advanced multi-agent fusion for improved perception, uncertainty quantification in such cooperative frameworks remains largely unexplored. This paper introduces Hyper-V2X, a hypernetwork-based framework for estimating both epistemic and aleatoric uncertainties in V2X-based perception. Specifically, we propose a partial weight generation scheme and V2X context embedding module that conditions a Bayesian hypernetwork on fused multi-agent features to generate weight distributions for stochastic Bird's-Eye-View (BEV) segmentation. Unlike existing deterministic BEV models, Hyper-V2X enables efficient uncertainty estimation with little computation overhead. Our approach is architecture-agnostic, and can be seamlessly integrating with modern cooperative backbones such as CoBEVT. Experiments on the OPV2V benchmark demonstrate that Hyper-V2X provides accurate, well-calibrated uncertainty estimates and improves overall perception reliability. Our code and benchmark are publicly available under an open-source license: https://github.com/abhishekjagtap1/Hyper-V2X
comment: Accepted for IEEE Intelligent Vehicle Symposium (IV) 2026
Learning Structural Latent Points for Efficient Visual Representations in Robotic Manipulation
Current 3D-aware pretraining methods for embodied perception and manipulation are largely built on differentiable rendering frameworks, producing either fully implicit neural fields or fully explicit geometric primitives. Implicit representations, while expressive, lack explicit structural cues, whereas explicit ones preserve geometry but suffer from resolution limits and weak generalization. To address these limitations, we propose a novel pretraining framework that learns a hybrid representation-structural latent points. Specifically, we insert a point-wise latent variational autoencoder into the latent space of a point-cloud autoencoder, jointly regularizing point-wise features and coordinates toward a Gaussian prior. The resulting compact latent preserves coarse structural tendencies, which do not encode precise geometry but capture richer rough shape and semantic information, effectively combining the expressiveness of implicit representations with the structural priors of explicit ones. In addition, informed by shared design choices in prior work, we develop a streamlined, efficient 3DGS-based rendering pipeline that is deliberately kept lightweight, improving efficiency while leaving greater representational capacity to the front-end latent module. Extensive evaluations on RLBench, ManiSkill2, and a real-robot platform demonstrate consistent gains in task success, sample efficiency, and robustness to viewpoint and scene variations over strong baselines. Ablation studies further confirm that each component of our framework is critical to overall performance.
Reinforcement Learning for Risk Adaptation via Differentiable CVaR Barrier Functions
Planning through crowded environments under uncertain obstacle motions remains difficult, as stochastic interactions often induce overly conservative behavior or reduced efficiency. To address this challenge, we propose an end-to-end risk adaptation framework for crowd navigation under obstacle-motion uncertainty modeled by a Gaussian mixture model. The framework combines reinforcement learning~(RL) with a differentiable quadratic-program safety layer based on Conditional Value-at-Risk~(CVaR) barrier functions, jointly learning nominal control input, risk level, and safety margin and enforcing explicit probabilistic safety constraints. This design enables context-aware adaptation, promoting efficient behavior while invoking caution only when necessary. We conduct extensive evaluations in dynamic, uncertain, and crowded environments across varying obstacle densities and robot models, and further assess generalization under three out-of-distribution cases. Comparisons across optimization-based, RL-based, and integrated RL and optimization methods are provided, and the proposed method is shown to deliver the strongest overall performance in safety, efficiency, and generalization under uncertainty.
comment: Project page: https://anonymousrobotics9666.github.io/rlcvarbf/
To Select or not to Select, that is the Question: Distilling Robot Skill Prediction into a Small Ensemble
As robot fleets become more heterogeneous, including humanoids, rovers, quadrupeds, and drones, selecting the right robot for a task becomes a core systems problem. We study robot skill prediction: mapping a natural-language task description to the physical capabilities required to execute it, such as fly, wheels, legs, surface water, under water and hands. Since labelled data that maps natural-language task descriptions to robot's physical capabilities does not exist, we construct a synthetic task-to-skill dataset using LLM-assisted generation and targeted label auditing. Trained on this data, a ~133M-parameter ensemble of two fine-tuned sentence encoders (mpnet + MiniLM) reaches 83.5% task-to-skill matching on a stratified 200 task dataset, outperforming Kimi K2 (1T MoE) at 72.0%, GPT-OSS-120B at 71.5%, and Llama-4-Scout-17B at 69.0% under the same zero-shot prompt. These results suggest that, for fixed robot skill taxonomies, small specialized models trained on synthetic data can outperform much larger general-purpose LLMs for fleet-level task routing.
A Terrain-Adaptive epsilon-Constraint MPC for Uneven Terrain Kinodynamic Planning
Kinodynamic planning for car-like vehicles on uneven terrain requires simultaneously optimizing competing objectives such as path efficiency and pose stability. This work presents an adaptive epsilon-constraint method integrated into a Model Predictive Control (MPC) framework, where the epsilon bounds are dynamically adjusted based on terrain descriptors to explore the Pareto front in real time. To capture vehicle-terrain dynamics, we develop a semi-parametric model combining analytical vehicle dynamics with a Sparse Gaussian Process (SGP) trained on the same terrain descriptors. The proposed epsilon-MPC is evaluated against MPPI and GAKD baselines, achieving a 94% navigation success rate while reducing maximum orientation deviation by 24% and improving multi-objective trade-off quality by 23%.
Comparative Analysis of Military Detection Using Drone Imagery Across Multiple Visual Spectrums
In modern warfare, drones are becoming an essential part of intelligence gathering and carrying out precise attacks in different kinds of hostile environments. Their ability to operate in real-time and hostile environments from a safe distance makes them invaluable for surveillance and military operations. The KIIT-MiTA dataset is comprised of images of different military scenarios taken from drones, and these provide a foundation for detecting military objects, but it does not take into account the various types of real-world scenarios. With that in mind, to evaluate how the models are performing under varying conditions, four different types of datasets are created: Gray Scale, Thermal Vision, Night Vision, and Obscura Vision. These simulate the real-world environments such as low visibility, heat-based imagery, and nighttime conditions. The YOLOv11-small model is trained and used to detect objects across diverse settings. This research boosts the performance and reliability of drone-based operations by contributing to the development of advanced detection systems in both defensive and offensive missions.
comment: 6 pages, 7 figures. Accepted at the 16th International Conference on Computing, Communication and Networking Technologies (ICCCNT), July 6-11, 2025, IIT Indore. Proceedings pending publication
EllipseLIO: Adaptive LiDAR Inertial Odometry with an Ellipsoid Representation
LiDAR Inertial Odometry (LIO) is a critical component for many mobile robots that need to navigate without relying on external positioning (e.g., GPS). Platforms that operate autonomously in different environments and with heterogeneous LiDAR sensors require a LIO approach that can adapt to these different scenarios without human intervention. Existing LIO approaches can typically provide reliable and accurate odometry in scenarios with similar environments and sensors when suitably tuned. However, many approaches struggle to retain robust odometry across heterogeneous environments and sensors while using a consistent configuration. This paper presents EllipseLIO, a real-time LIO approach that generalises between scenarios by using methods for LiDAR scan filtering and registration that adapt to the sensor capabilities and environment without requiring scenario-specific tuning. Experiments with EllipseLIO and state-of-the-art LIO approaches on five datasets with diverse and challenging scenarios demonstrate that EllipseLIO is the best-performing approach overall. It achieves a 38% lower odometry error on average than the second-best approach and is the only approach that does not diverge in any experiment. An open-source version of EllipseLIO will be available at github.com/v4rl-ucy/ellipselio.
comment: 8 pages, 6 figures, 2 tables
Safety-Critical Control for Smoothed Implicit Contact Dynamics
Smoothed implicit contact dynamics enables gradient-based planning and control for contact-rich tasks without predefined mode sequences. However, safety-critical control remains challenging because implicit contact dynamics makes safety-filter design nontrivial. The smoothing parameter $κ$ relaxes contact complementarity constraints, which makes the dynamics smooth but affects the contact force. This paper provides a method for bounding the actual contact force despite the use of relaxed complementarity constraints. We show that constraint violations can be non-monotonic in $κ$. Smaller $κ$ reduces force-approximation error, but it does not necessarily improve safety performance. To address this issue, we introduce boundary-focused rollouts to screen $κ$ by comparing the safety margin with the approximation error. We then develop a discrete-time control barrier function (CBF) framework based on a first-order Taylor approximation of the implicitly defined contact force. To account for possible force under-prediction, we augment the resulting safety constraint with a fixed robust margin. Simulations on four contact-rich systems show that the proposed method eliminates force violations observed under a standard CBF.
Humanoid Whole-Body Manipulation via Active Spatial Brain and Generalizable Action Cerebellum
In this paper, we explore spatial-aware humanoid whole-body manipulation task. Compared with tabletop settings, this task poses two key challenges: 1) Spatial understanding is challenging in complex 3D environments with diverse spatial relations. 2) Action generation is difficult to generalize, as limited and costly real-robot data restricts data-driven models generalization. To address these challenges, we propose a generalizable humanoid loco-manipulation framework that leverages the spatial perception and action generation capabilities of multi-agent large models. Specifically, our framework includes two components: Active Spatial Brain for active spatial perception and decision-making, and Generalizable Action Cerebellum for executable robot action generation. The first component actively perceives the spatial scene and makes decisions on task planning and subtask decomposition. The second component generate executable robot actions based on the decisions made by the first module without needs of task-specific real robot data. To benchmark our framework, we design a set of spatial manipulation tasks from two perspectives: evaluating spatial perception and understanding, and assessing real-robot task performance. The results demonstrate strong performance on both aspects across diverse tasks and environments.
comment: Project page: https://leungchaos.github.io/Humanoid-Whole-Body-Manipulation-via-Active-Spatial-Brain-and-Generalizable-Action-Cerebellum/
Benchmarking Empirical and Learning-Based Approaches for Feedforward Steering Control in Autonomous Racing SC 2026
Feedforward steering control is a key component of hierarchical control architectures for autonomous racing. The goal is to reduce steering corrections from the feedback controllers by predicting the vehicle's inverse lateral dynamics. This paper presents a systematic benchmark of two learning-based and two empirical (analytical) feedforward steering controllers. We introduce a new \acf{ehd} formulation based on a polynomial surface fit that captures velocity-dependent nonlinear steering behavior with minimal parametrization. We test the feedforward controllers in a high-fidelity simulation framework based on the real-world Abu Dhabi Autonomous Racing League competition, using a high-fidelity double-track vehicle dynamics simulator. Open-loop evaluation shows that the learning-based controllers achieve the lowest prediction errors; however, closed-loop testing reveals that this improved accuracy does not translate into superior path tracking performance or lap times, even after iterative fine-tuning. In contrast, the proposed EHD approach achieves the best overall closed-loop robustness and lap time, highlighting the necessity of evaluating feedforward strategies within the complete trajectory planning and control software stack. Our code is available at https://github.com/TUMRT/steering_ff_control.
comment: 8 pages, 12 figures, Accepted to be published as part of the 2026 IEEE International Conference on Intelligent Transportation Systems (ITSC 2026), Naples, Italy, September 15-18, 2026
Anomaly-Informed Confidence Calibration for Vision-Based Safety Prediction
Reliable confidence estimates are important for safely deploying vision-based controllers in autonomous racing, where safety predictions must be derived from camera images, yet modern predictors become dangerously overconfident under test-time distribution shifts. We identify a critical perception-dynamics gap in existing anomaly signals: widely used scores, such as autoencoder reconstruction error, capture visual corruptions but miss dynamics anomalies (e.g., actuation bias, latency), where images remain plausible while the trajectory degrades. To address this, we propose an Anomaly-Informed Online Calibration approach that, without retraining any model component, fuses two complementary anomaly scores extracted from a world model: a perceptual score from reconstruction error and a dynamics score from epistemic uncertainty and control-stream statistics. Based on these fused scores, a lightweight temperature-scaling calibrator leverages test-time augmentation to selectively reduce overconfidence under shift while preserving nominal-condition performance. Experiments on a physical DonkeyCar under four real-world anomaly protocols unseen during training (darkness, blur, actuation bias, processing latency) reduce average expected calibration error from 0.184 to 0.116, a 37% improvement over the best baseline, without modifying the base safety predictor.
Grounding Driving VLA via Inverse Kinematics
Existing Driving VLAs predict trajectories while largely ignoring their visual tokens -- a phenomenon we trace not to insufficient training but to a structurally ill-posed task formulation. We show that trajectory recovery, when viewed through the lens of inverse kinematics, requires both a current and a future visual state as boundary conditions; existing VLAs supply only the former, which encourages the model to shortcut through ego status and text commands alone. To address this, we re-design Driving VLA in the style of an inverse kinematics solver. First, a next visual state prediction objective that requires the LLM to predict the future visual scene provides dense visual supervision and suppresses shortcut paths. Second, a separate Inverse Kinematics Network (a cross-attention-based conditional diffusion model) that takes only the current and future visual states as input is designed to suppress reliance on ego status and textual shortcuts during trajectory decoding. With this simple prescription alone, our 0.5B-scale model recovers visual grounding and reaches trajectory planning performance comparable to 7B--8B VLAs more than an order of magnitude larger, on both the closed-loop NAVSIM-v2 and the nuScenes benchmarks. Extensive analysis further shows that this improvement stems from a recovered ability to exploit visual features, with the effect being most pronounced in dynamic driving situations such as turning.
Perception of Social Robots as Communication Partners in Healthcare for Older Adults
Addressing the global caregiver shortage through socially assistive robots necessitates a deep understanding of their psychological and physiological impacts on older adults during human-robot interaction (HRI). This study addresses whether social robots can serve as effective interaction partners compared to humans, and if "positive prompts" can similarly enhance these interactions. We conducted a comparative study with 35 participants (aged 70+). Our multi-modal analysis, integrating facial expression data, heart rate variability, and subjective questionnaires, revealed no significant differences in overall stress levels between human and robot interactions. Facial expression analysis confirmed that the robot was accepted as a valid interaction partner, while physiological data showed slightly lower heart rates during robot interactions, suggesting a more relaxed state compared to human-led sessions. These findings indicate that social robots can engage older adults without inducing psychological strain and are capable of alleviating caregiver burden by performing structured tasks, such as health-sensing surveys. Future work should address the identified "appearance-content mismatch" in robot design to facilitate even more natural and effective interactions.
comment: 31 pages, 10 figures, Under review at International Journal of Social Robotics
Modeling and Control of a Pneumatic Morphing Soft Quadrotor based on the SOFA Framework for Dynamic Soft Robotic Simulation
This article presents a novel SOFA based finite element method for the soft body modeling and the corresponding dynamic simulation and control of a pneumatic morphing soft quadrotor. The proposed modeling preserves the physical interpretability and control structure of traditional quadrotor dynamics, while capturing the complex, time-varying behavior of pneumatically actuated soft arms. In SOFA, the soft pneumatically actuated arms are discretized as a tetrahedral mesh following an elastic material law that produces internal forces adequate to the real dynamic behavior of the body. Pneumatic actuation governed by both periodic and error-based control signals is applied within the internal cavities to analyze the morphing capability. Finally, a proportional-integral controller is proposed to study the controlled dynamic behavior and morphing capabilities of the pneumatic arm, wherein the pneumatic actuation to the soft arm is controlled to achieve the desired target position. The simulation results show the effectiveness of the proposed novel modeling framework and the related controller design.
comment: 8 pages, 10 figures
Component Influence-Driven Fastener Reduction for Robotic Disassemblability-Aware Design Simplification
To accelerate automated remanufacturing, robotic disassembly must be considered during the product design phase. However, designers currently lack quantitative feedback to identify which structural elements hinder robotic operations. To address this, this study proposes an analytical framework that provides actionable redesign guidance focused on fastener reduction, as fasteners are numerous and ubiquitous components found in almost all manufactured products. Using a Computer-Aided Design (CAD) model and its automatically generated Contact-Connection-Constraint (CCC) graph, the framework translates robotic disassembly sequence planning outcomes into component influence scores. These scores reflect how often a component causes structural constraint violations or evaluation objective deteriorations in the robotic disassembly sequence. To visually highlight structural hindrances, the framework projects these scores onto the CAD geometry as 3D heatmaps. The system then analytically simulates the removal of highly influential fasteners. It reports the expected reductions in structural constraints, tool changes, and robot travel distances, while preventing structurally unsafe modifications by evaluating geometric stability metrics. Experiments on seven household appliances demonstrate that the framework successfully targets redundant fasteners. Removing the recommended fasteners simplified the structural dependencies by eliminating between 8 and 132 structural constraints on the graph depending on each product's structural configuration. Furthermore, it improved robotic operational efficiency by eliminating unnecessary tool change operations and shortening travel distances by 165 to 1675 millimeters wherever structurally permissible.
comment: 7 pages, 8 figures
LiteViLNet: Lightweight Vision-LiDAR Fusion Network for Efficient Road Segmentation
Road segmentation is a fundamental perception task for autonomous driving and intelligent robotic systems, requiring both high accuracy and real-time inference, especially for deployment on resource-constrained edge devices. Existing multi-modal road segmentation methods often rely on heavy transformer-based encoders to achieve state-of-the-art performance, but their enormous computational cost prohibits real-time deployment on embedded platforms. To address this dilemma, we propose \textbf{LiteViLNet}, a lightweight multi-modal network that fuses RGB texture information and LiDAR geometric information for efficient road segmentation. Specifically, we design a dual-stream lightweight encoder and depth-wise separable convolutions to extract hierarchical features from both modalities with minimal parameters. We further propose a Multi-Scale Feature Fusion Module (MSFM) to facilitate cross-modal interaction at different levels, and a large-kernel-bridge module to capture long-range dependencies with linear complexity. Extensive experiments on the KITTI Road dataset and real-world applications demonstrate that LiteViLNet achieves a promising balance between accuracy and efficiency. Notably, with only 14.04M parameters, our model attains a 96.36\% MaxF score, ranking the best among all CNN-based methods and being comparable to larger transformer-based models, and runs at 163.79 FPS in model-only inference on RTX 4060 Ti (22.18 FPS on Jetson Orin NX). It outperforms numerous heavy-weight methods in inference speed while maintaining highly competitive accuracy, fully validating the potential of LiteViLNet for real-time embedded deployment in autonomous driving and intelligent robotics.
WiXus: A Wheeled-Legged Robot with Wire-Driven Environmental Utilizing to Integrate Mobility and Manipulation ICRA2026
Wheeled-legged robots, which have wheels at their feet and achieve high mobility by coordinating wheel drive and leg drive, have been developed. These robots have been developed purely as platforms specialized for locomotion. Therefore, they do not have a means to repurpose their legs for roles other than locomotion, such as object manipulation or tool utilization. In this paper, we address the problem of how to draw out the potential task-execution capability of the legs by freeing them from the roles of locomotion through external body support. To this end, we propose and develop a new robot, WiXus, which fuses a wheeled-legged mechanism with a wire-driven mechanism that utilizes the external environment. The developed WiXus demonstrates not only planar locomotion with wheeled-legged drive, but also three-dimensional mobility such as cliff climbing by coordinating wire-driven and wheeled-legged actuation. Furthermore, by suspending the body with wire-driven actuation, WiXus successfully repurpose its legs as arms to perform object manipulation, (e.g., rescuing a dog (stuffed animal)), and tool utilization (e.g., harvesting an apple (mockup) with loppers). This study demonstrates that the approach of utilizing the environment with wire-driven actuation is a new design principle that extends the operational domain of wheeled-legged robots.
comment: Accepted at ICRA2026, website - https://shin0805.github.io/wixus/, YouTube - https://youtu.be/32qhUslR0gM
STEAM: A Training-Free Congestion-Aware Enhancement Framework for Decentralized Multi-Agent Path Finding
We propose STEAM (Spatial, Temporal, and Emergent congestion Awareness for MAPF), a training-free test-time enhancement framework for learning-based decentralized Multi-Agent Path Finding (MAPF) in discrete environments. Given a pretrained decentralized policy, STEAM requires no retraining, architectural modification, or replacement by a centralized planner. Instead, it injects lightweight congestion-aware guidance into the original policy execution. STEAM first rolls out the shortest paths induced by the current cost-to-go maps to identify potential future congestion hotspots. Spatially avoidable congestion is mitigated by updating agent-specific cost-to-go information, while spatially unavoidable bottlenecks are handled through temporal logit correction. In addition, emergent local congestion is reduced by a density-aware logit correction based on neighboring agents' corrected cost-to-go maps. Extensive experiments on representative learning-based decentralized MAPF algorithms show that STEAM consistently improves success rate, makespan, and solution cost, with success-rate gains of up to 60% and only minor computational overhead. The implementation is available at https://anonymous.4open.science/r/STEAM-MAPF-7A62.
SubTGraph: Large-Scale Subterranean Environment Synthesis with Controllable Topological Variability for Robotic Autonomy Validation
Subterranean (SubT) environments have been a frontier for autonomous robotics, driven by the push for automation of mining operations and the interest in planetary exploration (Martian Lava Tubes). Due to the challenges involved in accessing real SubT environments, rigorous hardening of autonomy stacks in realistic simulation environments is critical. This article fills a well-known gap, which relates to the unavailability of a large-scale simulation-based benchmarking infrastructure for rigorous statistical evaluation of robotic autonomy, due to which it is common for SubT research articles to present validation results in a few environments at best. This article presents SubTGraph, a novel framework for rapid synthesis of multi-level SubT environments with high variability, incorporating user specifications related to topology, dimensionality, textures, etc., to generate distinct environments such as operational mines, natural caves and lava tubes. SubTGraph builds a cost matrix from user-specified structural constraints to guide the classical Dijkstra algorithm to procedurally generate SubT worlds utilizing topometric tiles from the DARPA World Generator. Three robotics case-studies are investigated to demonstrate the utility of SubTGraph for rigorous validation of different layers in the robotic autonomy stack. Structural semantic segmentation is validated against topometric ground truths, multi-agent path planning is widely tested for identification of patterns and trends in the algorithm behavior and LIO SLAM is stress-tested in challenging subterranean sections to identify failure cases. The SubTGraph world creation codebase is open-sourced (https://github.com/LTU-RAI/SubTGraph.git) along with a database consisting of 150 highly variable underground worlds.
comment: 16 pages, 18 figures
Mobile UMI: Cross-View Diffusion Policy with Decoupled Kinematics for Mobile Manipulation
Mobile imitation learning on portable demonstration interfaces faces two coupled bottlenecks: locomotion-contaminated action labels and inference-induced execution latency on a continuously moving base. Recent wrist-mounted interfaces lower the cost of tabletop data collection, yet a single wrist view does not capture the global context required for base navigation. Adding a body-mounted camera entangles human walking with hand motion. Meanwhile, generative policies introduce hundreds of milliseconds of inference latency, during which the base advances past predicted waypoints, forcing backward corrections at action splices. This paper presents Mobile UMI, a hardware-free demonstration framework that addresses both gaps through three components. First, a dual-camera capture system records chest-centric global context and wrist-centric local interaction without any robot present. Second, a one-shot ChArUco-based spatial anchor unifies the chest and hand visual-inertial frames; the hand pose is then re-expressed relative to the chest to extract decoupled SE(3) manipulation and SE(2) base trajectories. Third, an asynchronous receding-horizon executor performs online state matching: each generated action chunk is realigned with the current physical pose so that expired waypoints are discarded before execution. The full system is evaluated on four long-horizon household tasks, achieving an average success rate of 83.8% over 100 trials per task. Controlled comparisons against ACT and Diffusion Policy show that the chest-relative label alone closes much of the gap; online state matching closes the remainder. These results indicate that, for mobile imitation learning under the tested conditions, explicit kinematic factorization combined with state-level latency alignment provides an effective solution without requiring architectural changes to the underlying policy class.
DISC: Decoupling Instruction from State-Conditioned Control via Policy Generation
Language-conditioned manipulation policies typically process instructions and observations through shared network parameters. This task-state entanglement provides a pathway for observation leakage -- networks learn scene-to-action shortcuts that bypass language grounding entirely. DISC eliminates this failure structurally. Rather than conditioning a universal policy on language, DISC uses a hypernetwork to generate the entire parameter set of a task-specific visuomotor policy from the instruction alone. The generated policy never directly accesses language; therefore, its task-awareness must come from the language. Consequently, observation leakage has no pathway to emerge. On the other hand, generating coherent high-dimensional policy weights is itself a challenging problem. We address it with a two-stage hypernetwork whose refinement stage embeds the structure of gradient-based optimization as a feed-forward inductive bias, producing globally consistent parameters without actual gradient computation. Trained entirely from scratch on standard data budgets, DISC outperforms all entangled baselines on LIBERO-90 and Meta-World, with advantages that widen on complex, long-horizon tasks -- and surpasses the large-scale pretrained $π_0$ despite using no external pretraining data. On a real-world benchmark where all tasks share identical visual context, DISC substantially outperforms entangled alternatives, directly confirming that language-generated policy parameters, not visual shortcuts, drive behavior. The hypernetwork further learns a semantically structured parameter manifold that enables few-shot adaptation from minimal demonstrations and robust generalization across paraphrased instructions. Our code is available at: {https://github.com/ReNginx/DISC}.
SmoCap: Unified Scale-Pose Canonicalization with Proxy-Mapped Trust-Region QP
Objective: Stage-wise workflows that separate model scaling and inverse kinematics can induce morphology-posture compensation, resulting in anatomically inconsistent yet numerically acceptable solutions, especially in weakly observed directions. We present SmoCap, a leakage-resistant canonicalization framework that estimates morphology and posture jointly in each local trust-region quadratic program (QP) within a sparse control subspace. Methods: SmoCap solves a constrained trust-region QP with analytical proxy-mapped pose and scale Jacobians. The low dimensional proxy map stabilizes weakly observed directions and drives coordinated structures. An optional pre-solve provides warm starts in difficult configurations. The framework is evaluated using cohort fluoroscopy knee motion, anthropometric ground truth, and extreme yoga sequences. Results: SmoCap achieved 2.9 degree knee flexion RMSE against fluoroscopy, and a pooled anthropometric endpoint error around 3%. In the leakage audit against segment wise scaling, SmoCap also reduced marker RMSE, FE error, and anthropometric endpoint error. Proxy coupling preserved expressive and coordinated spine motion with marginal fitting error increase (+0.14 mm, +0.6%) against baseline models in yoga ablation. Median marker RMSE was around 20 mm, and median runtime was 0.204-0.332 ms/frame, achieved with consistently 2-3 iterations. Conclusion: SmoCap provides an externally validated unified coupling-aware scale-pose framework, making externally consistent motion canonicalization practical at dataset scale.
comment: 11 pages, 6 figures, 4 tables
VSCD: Video-based Scene Change Detection in Unaligned Scenes ICML 2026
Detecting what has changed in an environment is essential for long-term autonomy, yet most change detection settings assume fixed viewpoints, mild misalignment, or only a few changed objects. We introduce Video-based Scene Change Detection (VSCD), which predicts a pixel-wise change mask for each query frame, given a reference and a query RGB video of the same indoor space recorded at different times under unconstrained camera motion. The two videos are not temporally synchronized, and many object instances may appear or disappear. To study this setting, we build a large-scale benchmark with over 1.1 million frames annotated with pixel-accurate change masks, together with a real-world test set for evaluating transfer beyond simulation. We propose a query-centric multi-reference model that learns temporal matching implicitly from change-mask supervision, aligns candidate reference features to the query via local patch correspondence, and fuses per-candidate change features using frame-level and patch-level confidence before decoding a high-resolution mask once per frame. Our approach achieves state-of-the-art performance against strong image- and video-based baselines, and we validate its real-world impact by deploying it on a mobile robot for two downstream applications -- visual surveillance and object incremental learning.
comment: 18 pages, 7 figures. Accepted to the 43rd International Conference on Machine Learning (ICML 2026)
Demo-JEPA: Joint-Embedding Predictive Architecture for One-shot Cross-Embodiment Imitation
Robotic imitation learning is often treated as reproducing demonstrated actions, but actions are inherently embodiment-specific. When demonstrations come from humans or robots with different morphology, kinematics, or action spaces, this action-centric view requires shared action spaces, heuristic retargeting, or large-scale multi-embodiment co-training. We instead view demonstrations as implicit specifications of future goals: the target agent should infer what state the demonstrator is trying to realize, rather than how the demonstrator executes it. We propose Demo-JEPA, a cross-embodiment imitation framework that decouples demonstration intent from embodiment-specific execution. Built on a JEPA-based world model, Demo-JEPA translates source visual demonstrations into target-compatible future latent trajectories in a shared predictive representation space. The target agent then uses these latent trajectories as subgoals and realizes them through planning under its own learned forward dynamics. Because Demo-JEPA avoids action-level correspondence and requires only visual demonstrations plus the target agent's own interaction experience, it supports flexible imitation across heterogeneous embodiments. Experiments on RLBench and real-world manipulation tasks show that Demo-JEPA matches specialized in-domain planners and generalizes to unseen tasks and embodiment configurations where prior methods fail.
Q-SpiRL: Quantum Spiking Reinforcement Learning for Adaptive Robot Navigation
Adaptive robot navigation in dynamic environments requires policies that can reach the target reliably while producing efficient and stable trajectories. This paper presents Q-SpiRL, a quantum spiking reinforcement learning framework for obstacle-aware robot navigation. The framework develops and evaluates five agent families: tabular Q-learning, classical MLP, classical SNN, quantum-enhanced MLP (QMLP), and quantum-enhanced spiking neural network (QSNN). While all models are implemented under a unified training and evaluation pipeline, the QSNN is the central architecture of interest, as it combines spike-based temporal processing with variational quantum feature transformation. Experiments are conducted across three grid-world environments of increasing size, namely 20x20, 30x30, and 40x40, with both static and dynamic obstacles. Performance is assessed using success rate, success-weighted path length, path length, and turn rate under deterministic inference. Results show that QSNN achieves the strongest overall trade-off between task completion, trajectory efficiency, and motion smoothness, reaching up to 99% success rate while maintaining high path efficiency in the most challenging setting. Execution on IBM quantum hardware further demonstrates the feasibility of deploying the proposed hybrid policy under real-device conditions.
comment: 11 pages, 6 figures
CMC-Opt: Constraint Manifold with Corners for Inequality-Constrained Optimization
We introduce a manifold-based framework for addressing optimization problems with equality and inequality constraints found in robotics. Our approach transforms the original problem into an unconstrained optimization problem directly on the constrained state space. To achieve this, we introduce ``constraint manifolds with corners" to represent the state space satisfying mixed nonlinear equality and inequality constraints. We further extend manifold optimization algorithms to operate on this new topological structure. We demonstrate the power and robustness of our framework in the context of a large-scale kinodynamic planning problem, successfully generating dynamically feasible trajectories where standard methods fail.
VLA-REPLICA: A Low-Cost, Reproducible Benchmark for Real-World Evaluation of Vision-Language-Action Models
Vision-Language-Action (VLA) models have shown strong promise for general-purpose robotic manipulation, but their real-world evaluation remains limited by a lack of accessible, reproducible, and consistent benchmarks. Simulation benchmarks fail to capture real-world complexity, while existing real-world benchmarks often require expensive hardware, centralized evaluation, or are limited in task diversity. We introduce VLA-REPLICA, a low-cost, easily reproducible real-world benchmark for evaluating VLA models. Built from off-the-shelf components, our system can be quickly assembled and replicated across laboratories, providing a consistent environment for policy evaluation anywhere in the world. VLA-REPLICA includes a diverse suite of manipulation tasks and a small-scale demonstration dataset for target-domain adaptation, with real-world evaluation protocols for both in-distribution and out-of-distribution settings. Experiments with imitation learning and state-of-the-art VLA models reveal model strengths and limitations, while consistent results across independently constructed setups demonstrate the reproducibility of our benchmark.
Conflict-Aware Additive Guidance for Flow Models under Compositional Rewards ICML 2026
Inference-time guided sampling steers state-of-the-art diffusion and flow models without fine-tuning by interpreting the generation process as a controllable trajectory. This provides a simple and flexible way to inject external constraints (e.g., cost functions or pre-trained verifiers) for controlled generation. However, existing methods often fail when composing multiple constraints simultaneously, which leads to deviations from the true data manifold. In this work, we identify root causes of this off-manifold drift and find that the approximation error scales severely with gradient misalignment. Building on these findings, we propose Conflict-Aware Additive Guidance ($g^\text{car}$), a lightweight and learnable method, which actively rectifies off-manifold drift by dynamically detecting and resolving gradient conflicts. We validate $g^\text{car}$ across diverse domains, ranging from synthetic datasets and image editing to generative decision-making for planning and control. Our results demonstrate that $g^\text{car}$ effectively rectifies off-manifold drift, surpassing baselines in generation fidelity while using light compute. Code is available at https://github.com/yuxuehui/CAR-guidance.
comment: Forty-Third International Conference on Machine Learning (ICML 2026)
GaussianDream: A Feed-Forward 3D Gaussian World Model for Robotic Manipulation
Vision-language-action (VLA) policies have advanced language-conditioned robotic manipulation by transferring semantic priors from pretrained vision-language models to action generation. Yet, standard action-imitation training often provides limited explicit supervision for 3D geometry, dense visual structure, and short-horizon environment evolution, which are critical for physically precise manipulation. We introduce \textbf{GaussianDream}, a feed-forward 3D Gaussian world-model plug-in that turns robot trajectories into structured spatial-temporal supervision. The key idea is to couple current Gaussian reconstruction with horizon-conditioned future Gaussian prediction during training, forcing a compact spatio-temporal prefix to be decodable into renderable 3D Gaussian states. This enables dense RGB rendering, depth, and pseudo 3D scene-flow supervision without requiring test-time Gaussian decoding. At inference, GaussianDream discards all auxiliary decoding heads and retains only the learned prefix to condition action generation, avoiding rendering, video rollout, or additional planning during closed-loop control. Experiments on LIBERO, RoboCasa Human-50, and real-robot tasks demonstrate strong and highly competitive performance, achieving \textbf{98.4\%} average success on LIBERO, \textbf{52.6\%} on RoboCasa Human-50, and \textbf{50.0\%} in real-world evaluation.
comment: 18 pages, 9 figures
A Semantic and Occlusion-Aware GM-PHD Filter ICRA 2026
This paper proposes a new birth model including semantic information derived from deep learning to create an occlusion-aware Gaussian Mixture Probability Hypothesis Density (GM-PHD) filter. Unlike prior approaches that rely on simplistic or uniform assumptions, the proposed Semantic-Occlusion Aware (S-OA) birth model defines initialization terms by explicitly considering regions of occlusion and by leveraging semantic information about the environment. This enables the filter to accurately represent where new objects are more likely to appear, thereby improving tracking performance in complex and high-density driving scenarios. The method is evaluated through Monte Carlo simulations and experiments on the KITTI dataset. Performance is assessed by measuring the latency between first detection and track initiation, along with the mean absolute cardinality error and the Optimal Subpattern Assignment (OSPA) metric. Results demonstrate that the S-OA birth model reduces initialization delay in occlusion-heavy settings, matching or outperforming the strongest baseline in approximately 70% of cases. A sensitivity analysis of birth model weights is also provided. Overall, the findings underscore the benefits of integrating occlusion reasoning and semantic priors into Bayesian tracking frameworks for autonomous driving.
comment: Accepted at ICRA 2026
Jointly Learning Predicates and Actions Enables Zero-Shot Skill Composition
Learning from Demonstration (LfD) enables robots to learn complex behaviors from expert examples, yet existing approaches often fail to generalize to new compositions of known skills without retraining. Modern generative policies model distributions over action trajectories alone, thus are unable to reason about the symbolic outcomes required for robust composition. We propose that skills should jointly model action trajectories and the symbolic outcomes they induce. To address this gap, we introduce Predicate Action Skills (PACTS), a class of closed-loop visuomotor policies that model skills as a joint generative process over action and predicate belief trajectories, producing coherent action-outcome rollouts within a single model. Jointly generating actions and predicates enables PACTS to learn internal representations that improve both action generation and predicate classification. Furthermore, we demonstrate zero-shot composition of learned skills via planning by leveraging online predicate predictions from PACTS as a symbolic interface for sequencing and monitoring execution. Project website: https://planpacts.github.io/
Design for Manufacturing: A Manufacturability Knowledge-Integrated Reinforcement Learning Framework for Free-Form Pipe Routing in Aeroengines
Design for manufacturing plays a critical role in advanced aeroengine development, where complex components necessitate careful consideration of manufacturability. However, current practices in pipe routing remain largely decoupled from down-stream manufacturing, leading to labor-intensive, trial-and-error iterations to achieve manufacturable designs. To address this problem, this study proposes the Frenet-based pipe routing optimization (FPRO) framework, a manufacturability knowledge-integrated reinforcement learning approach for free-form pipe design in aeroengines. FPRO formulates the routing problem as a boundary value problem in the Frenet frame. In this framework, the pipe path is represented by curvature and torsion profiles, which are generated using cubic Hermite interpolation. To integrate design and manufacturing, domain-specific manufacturing knowledge is embedded as constraints on the permissible ranges of curvature and torsion. The path optimization is performed using the proximal policy optimization algorithm with stochastic exploration and a stage-guided reward mechanism. A unified mapping formulation then translates the optimized path into motion trajectories for the bending die, enabling direct fabrication on a six-axis free-bending machine. Experimental results demonstrate that FPRO consistently generates collision-free, manufacturable paths with smoother geometric profiles compared to Cartesian-based methods. It also achieves faster convergence and superior performance in terminal alignment, path length, obstacle avoidance, and manufacturability compared to state-of-the-art reinforcement learning baselines. Real-world validation confirms the close geometric correspondence between the manufactured pipe and its digital design, validating the practical feasibility of FPRO.
Time-To-Reach Separation and Safety Filtering for Safe, Fair, and Efficient Multi-Agent Coordination
Advanced Air Mobility (AAM) operations are expected to significantly increase aerial traffic in urban airspace, requiring autonomous traffic management systems to ensure collision-free operations in highly congested environments. In this paper, we propose a multi-agent coordination framework that uses minimum time-to-reach (TTR) as a unifying metric for priority assignment, temporal separation, and safety filtering. We focus on the problem of coordinating multiple aerial vehicles merging into an air corridor while maintaining safe separation between vehicles. Vehicles are assigned arrival-consistent priority based on TTR, and target TTR values are used to enforce temporal spacing that induces spatial separation. A priority-consistent safety filtering layer based on Hamilton-Jacobi reachability value functions ensures collision avoidance while minimally modifying the reference guidance. Simulation results in a highly congested corridor merging scenario show that the proposed method improves safety, fairness, and efficiency compared to time-optimal guidance and priority-agnostic safety filtering.
comment: 9 pages, 3 figures. Extended version (including appendix) of a paper submitted to the 65th IEEE Conf. on Decision and Control (2026)
Mechanistic Interpretability for Learning Assurance of a Vision-Based Landing System
EASA's learning-assurance guidance requires data-driven aviation systems to build and monitor their own situation representation, yet for neural networks the technical means to provide such evidence remain an open problem. We address this gap for a vision-based aircraft landing system: we propose that a minimally assurable model must at least be shown to separate content from style in its own situation representation. Showing that the model's predictions then rely largely on the contentful representation components leads to a concrete assurance path. To demonstrate this assurance path on a concrete model we train a vision transformer model for runway keypoint regression on the LARDv2 dataset. The model, which acts as the subject for our assurance demonstration, produces per-patch embeddings that we decompose into interpretable atoms via K-SVD sparse dictionary learning. A qualitative visualization confirms that contentful atoms track task-relevant runway structure and stylistic atoms track domain-specific appearance, and the regression head is shown to place almost all of its linear weight on contentful atoms. We further build on the content/style separation and define out-of-model-scope (OOMS) detection, a novel runtime assurance approach directly monitoring the model's situation representation. OOMS monitoring is complementary to operational design domain and output-space out-of-distribution monitoring and addresses concrete requirements of the recent EASA guidance. By directly analyzing a model's situation representation both at test time and runtime, this work delivers the first concrete piece of the representation-level evidence that EASA learning-assurance guidance demands, and points to mechanistic interpretability as a practical building block of future aviation safety cases.
comment: 10 pages, 4 figures
Intent-First Aerial V2V for Tactical Coordination and Separation: Protocol and Performance Under Density and Disturbance
Dense low-altitude aerial operations require more than pre-flight route coordination and last-resort collision avoidance. Once aircraft are airborne, disturbances can emerge on timescales shorter than strategic reauthorization can absorb, while collision avoidance is too late and disruptive to serve as routine traffic management. Although tactical separation is recognized as the intermediate layer, realizing it at scale requires a deployable neighborhood communication mechanism that provides fresh, trusted information for local coordination. This paper presents what is, to our knowledge, the first controller-coupled characterization of an all-airborne, sidelink-class, intent-first vehicle-to-vehicle (V2V) tactical neighborhood exchange stack for dense Unmanned Aircraft System Traffic Management (UTM) operations. Unlike awareness-only broadcast, the proposed exchange combines refreshed state and intent beacons for local awareness, cooperative perception, and degraded-mode assessment with event-triggered messages for yielding, sequencing, release, and contingency coordination. We implement and evaluate this model on an all-airborne V2V stack using sidelink-class C-V2X modules with authenticated freshness checks. Evaluation uses a scenario-driven, high-volume stress campaign supported by real-time, field-anchored infrastructure. Results show that V2V reduces stale-belief divergence, preserves observability through cooperative perception, rejects invalid tactical messages, suppresses false local inference, and structures shared-resource coordination. The implemented stack provides a viable communication layer for tactical separation in lower-to-moderate regimes, but transitions toward guarded fallback as density, impairment, and complexity increase. These findings position intent-first aerial V2V as a bounded enabler for scaling tactical coordination in disturbance-driven urban airspace.
comment: Submitted to IEEE Transactions on Intelligent Transportation Systems
Safe and Steerable Geometric Motion Policies for Robotic Dexterous Manipulation
Robotic dexterous manipulation requires continuously reconciling objectives and constraints defined on heterogeneous geometric spaces: a robot controlled on a $\mathbb{R}^7$ configuration manifold may need to track end effector poses on $\mathrm{SE}(3)$ while satisfying obstacle avoidance margins in $\mathbb{R}$. We present Safe Pullback Bundle Dynamical Systems (SafePBDS), a geometrically consistent framework that computes optimal, certifiably safe configuration manifold accelerations from objectives and safety requirements on arbitrary task manifolds. SafePBDS builds on prior work that combines predefined task manifold dynamical systems to produce autonomous motion. Its first innovation is a pullback control barrier function construction, which converts task manifold safety conditions into linear constraints on configuration manifold accelerations. The second innovation is a task manifold action interface that allows a high-level policy to inject low dimensional residual motions; zero input recovers the autonomous behavior, while safety is preserved under arbitrary inputs. This lets high-level policies efficiently steer exploration while leaving precise motion to the autonomous behavior. We validate SafePBDS in simulation and on a 23-DOF Franka Panda-Allegro Hand platform. On dexterous grasping, SafePBDS achieves a $92.5\%$ success rate across 20 household objects and 120 trials. Using the action interface, the method can exclude any one of the four fingers during grasping via a one-dimensional action, achieving $94.4\%$ 3-finger grasp success across 3 objects and 36 trials. The efficient planning and safety guarantee of SafePBDS also enables the first model-based, fully actuated palm-down in-hand reorientation, exceeding $360^\circ$ of yaw rotation in both directions under varying object weight and wrist motion. Demo video and details: https://tml.stanford.edu/safe-pbds
comment: 24 pages, 10 figures, 5 tables. Project page and demo video: https://tml.stanford.edu/safe-pbds
stable-worldmodel: A Platform for Reproducible World Modeling Research and Evaluation
World models are central to building agents that can reason, plan, and generalize beyond their training data. However, research on world models is currently fragmented, with disparate codebases, data pipelines, and evaluation protocols hindering reproducibility and fair comparison. Current practice is further limited by three key bottlenecks: fragile one-off codebases, slow video data loading, and the lack of standardized generalization benchmarks. We present stable-worldmodel (swm), an open-source platform for standardized and reproducible world modeling research and evaluation. It delivers (1) a high-performance Lance-based data layer with native support and conversion tools for MP4, HDF5, and LeRobot datasets, (2) clean, well-tested implementations of modern world model baselines and planning solvers, and (3) a broad suite of environments and tasks extended with controllable visual, geometric, and physical factors of variation for systematic in-silico evaluation of dynamics understanding, control performance, representation quality, and out-of-distribution generalization. By unifying the full pipeline under a single, scalable framework, \texttt{swm} dramatically reduces research overhead and accelerates trustworthy progress toward reliable world models.
SceneGraphGrounder: Zero-Shot 3D Visual Grounding via Structured Scene Graph Matching
Zero-shot 3D visual grounding requires localizing objects in unstructured environments from free-form natural language. Recent vision-language model (VLM) approaches achieve promising results but rely on view-dependent reasoning or implicit representations, limiting spatial consistency and interpretability for compositional queries. We propose SceneGraphGrounder, a framework that reformulates 3D grounding as structured graph matching over a reconstructed 3D scene graph. To enable this formulation, we introduce a visual marker prompting strategy that enables a VLM to infer object-object relationships from 2D views, which are subsequently lifted into a persistent 3D scene graph encoding both spatial and semantic relations. Given a query, we construct a query graph and perform constrained alignment with the scene graph, ensuring multi-view consistency and interpretable reasoning. Experiments on the ScanRefer benchmark demonstrate that our method achieves competitive performance among zero-shot approaches, using only RGB-D inputs. We further validate our framework through real-world deployment on a mobile robot, demonstrating robust spatial reasoning in long-horizon physical environments. We will make our code publicly available upon acceptance.
Improving 3D Labeling in Self-Driving by Inferring Vehicle Information using Vision Language Models
We present an approach to improve 3D vehicle labeling in self-driving applications through zero-shot inference of vehicle information, leveraging Vehicle Make and Model Recognition (VMMR) methods. The proposed approach utilizes a Vision Language Model (VLM) to both infer a vehicle's make, model, and generation from image crops, and output accurate 3D bounding box dimensions to seed manual labeling. We evaluate the impact of iterative prompt engineering and the choice of different VLMs on both vehicle bounding box inference and make/model/generation recognition. When compared to strong baselines, the proposed approach not only shows high accuracy, but also excels in mitigating specific failure modes where VLMs provide better dimensions than initial lidar-aided human annotated labels (e.g., in cases of significant vehicle occlusion). Experiments on both public and proprietary data strongly suggest that our conclusions are generalizable across different labelers and datasets. The results demonstrate that integrating VLMs into the labeling process can reduce manual labeling time while increasing label quality.
comment: To appear in Proceedings of the IEEE Intelligent Vehicles Symposium (IV), 2026. Accepted for oral presentation
Learning Altruistic Collaboration in Heterogeneous Multi-Team Systems
This paper studies heterogeneous multi-team collaboration through dynamic robot allocation, where robots are treated as transferable resources. Leveraging Hamilton's rule from ecology as an altruistic decision-making mechanism, we propose a multi-team collaborative resource allocation framework with heterogeneous capabilities, transfer costs, and capability-dependent contributions. The resulting allocation problem is combinatorial and is shown to be NP-hard. To address scalability, we develop a graph neural network policy under centralized training and decentralized execution that approximates the altruistic allocations based on Hamilton's rule. The model operates over the team interaction graph and predicts robot-level transfer decisions and next robot-to-team assignments. The proposed approach is validated in a firefighting scenario through simulations and experiments, demonstrating that the learned policy achieves near-optimal performance while scaling to larger systems.
Mind the Gaps: Multi-Robot Feedback-Driven Ergodic Coverage in Unknown Environments
In this work, we address the problem of multi-robot adaptive coverage, where teams of robots perform dynamic sampling by continuously adjusting their positions to collect data in an environment. This task can be challenging, particularly when robots must be efficiently allocated to new sampling locations over time. Ergodic search methods optimize robot trajectories by ensuring that the robots' time-averaged spatial distribution aligns with the spatial distribution of environmental information. While these methods promote effective exploration provided a target distribution, they often fail to account for unknown prior distributions of the environment. To overcome this limitation, we propose an adaptive coverage strategy that utilizes real-time feedback from an environmental model to adjust robot sampling behavior in response to unknown conditions. Our approach enhances traditional ergodic trajectory optimization by constructing a target spatial information distribution based on parametric models of the environment, which are updated online. This strategy assumes that the environment is either static or changes slowly compared to the robot's motion. Our framework allows robots to dynamically prioritize regions of high interest, improving coverage efficiency, synthesizing effective control policies for individual agents, and optimizing resource use in settings with unknown prior distributions. We validate our approach through simulations, demonstrating its effectiveness in enhancing coverage and resource allocation.
AVI-HT: Adaptive Vision-IMU Fusion for 3D Hand Tracking
We present AVI-HT, an adaptive visual-IMU fusion approach for tracking 3D hand poses by jointly modeling the egocentric image with on-glove 6-DoF IMU signals. AVI-HT achieves significantly improved accuracy and availability, particularly in hand-object interaction (HOI) scenarios involving heavy visual occlusion. Two complementary ingredients underpin its success: (1) synchronized multi-modal training data pairing on-body vision-IMU sensor streams with ground-truth 3D hand poses from a motion-capture system, and (2) a cross-sensor deep attention mechanism that adaptively modulates the trust assigned to the vision and individual IMU sensors. To evaluate AVI-HT in real-world settings, we conduct extensive experiments on our DexGloveHOI dataset that consists of 100K+ pairwise vision-IMU samples with synchronized 3D annotated poses, in which users manipulate a variety of objects during daily tasks. We compare against multiple single- and multi-modal tracking approaches under two hand models (UmeTrack, MANO). The results show that AVI-HT reduces mean keypoint error by 16.1% and its wrist-aligned variant by 24.2% over the baselines. Ablation studies further reveal the per-finger contribution of IMU sensors across activity types, and the model's sensitivity to IMU noise and temporal misalignment in vision-IMU fusion.
PGDG: Physically Grounded Data Generation for Robust Bimanual Policy Learning from a Single Demonstration
Behavior cloning for contact-rich bimanual manipulation remains challenging because diverse demonstrations are expensive to collect, and even small disturbances can push the system into off-manifold states where no recovery supervision is available. We propose PGDG, a data generation framework with zero-shot curation that expands a single demonstration into a compact dataset of physically plausible, successful, and diverse recovery behaviors without additional human labeling. PGDG iterates between a physics-grounded sampler and a dataset curator, where the curator selects informative, non-redundant, and recoverable behaviors to update the sampling distribution toward under-covered recovery modes, and the sampler draws physically plausible rollout candidates from this updated distribution and retains successful trajectories. To further improve data quality, PGDG applies short-horizon sampling-based control to relabel selected risky states with corrective actions. Across four bimanual manipulation tasks, PGDG consistently outperforms spatial-only augmentation in both simulation and zero-shot real-world transfer. On RotateBox-Pitch, success improves from 38% to 93% in simulation and from 35% to 82% in the real world. PGDG also enables effective foundation models fine-tuning such as GR00T, increasing success from 46% to 77%. Additional results are available in our website: https://cunxid.github.io/PGDG/.
Motion Design for Grasp-Based Dynamic Locomotion in Microgravity
Locomotion in microgravity often relies on sparsely and irregularly arranged anchors, motivating grasp-based mobility with multiple limbs. In this setting, dynamic locomotion is feasible only through deliberate regulation of both anchored interactions and whole-body coordination under coupled dynamic and kinematic constraints. This paper presents design insights for grasp-based dynamic locomotion with multi-limbed robotic systems in microgravity, targeting scenarios that require 6D limb manipulation to establish contacts with candidate anchors. The investigated design parameters include gait pattern, stride length, locomotion speed, and nominal posture. A parameterizable locomotion planning framework is proposed to support variations of these parameters and to evaluate the resulting locomotion performance in terms of stability and actuation demand. Two representative quadruped morphologies are adopted for evaluation in physics-based simulation. The results demonstrate that enlarging the feasible contact wrench space and attenuating impulsive whole-body dynamics improve locomotion performance. These findings inform strategies for contact configuration selection and whole-body coordination in microgravity locomotion with multi-limbed systems.
Closed-Loop Sim-to-Real Reinforcement Learning for Deformable Microfiber Shape Control
Autonomous contact-based micromanipulation is challenging because surface and interfacial interactions at the microscale are difficult to model accurately, limiting the use of conventional model-based control and sim-to-real learning. We present a closed-loop sim-to-real reinforcement learning (RL) approach for microfiber shape control on a surface. The central idea is to train geometric shape regulation in a simplified frictionless simulator and rely on real-time visual feedback during deployment to iteratively correct the observed effects of unmodeled surface interactions. An RL policy trained entirely in simulation is transferred directly to a physical dual-gripper micromanipulation system operating at 40 Hz, without retraining or domain adaptation. Using silk microfibers as a testbed, the policy achieves a mean point-wise shape error of 270 $\pm$ 80 $μ$m across twenty-four diverse initial configurations. Across nine specimens covering all combinations of three fiber diameters (50, 80, and 120 $μ$m) and three manipulated lengths (10 mm, 15mm, and 20 mm), the same policy achieves sub-millimeter final shape error without any retraining or retuning. These results show that a policy learned in a simplified simulator can achieve repeatable real-world microfiber shape regulation under surface contact, provided that the task-relevant effects of the sim-to-real mismatch remain observable and correctable within the closed feedback loop.
comment: 7 pages,7 figures
Distributed Multi-Coverage for Robot Swarms
Autonomous drone swarms deployed for surveillance, environmental monitoring, and infrastructure inspection must maintain reliable coverage of critical assets despite robot failures. This requires multicoverage: each asset must be observed by multiple robots for redundancy, with coverage requirements varying by asset importance. While recent work has solved the centralized problem optimally using integer programming, practical deployments face constraints that demand distributed solutions: robots operate with limited communication ranges, onboard computation restricts global planning, and partial system failures must not cause mission abort. We present a distributed multicoverage algorithm for robot swarms operating with local sensing, local communication, and no global coordination.
comment: Accepted at ANTS 2026 (International Conference on Swarm Intelligence), published by Springer Nature
Flying Together: Human-Guided Immersive Shared Control for Aerial Robot Teams in Unknown Environments
While autonomous multi-robots can achieve safe and coordinated navigation, they often struggle to adapt to unforeseen conditions and to capture operator-driven objectives in unstructured environments. We present a Virtual Reality (VR)-based shared control framework for teams of drones operating in constrained and unknown environments, enabling real-time, user-guided exploration. At the core of our approach is a novel, user-guided motion-primitive-based planner that computes continuous, collision-free trajectories while continuously integrating operator input. This planner is coupled with an admittance controller, allowing the operator to flexibly influence team behavior and guide drones toward regions of interest that autonomous planners may overlook. The system supports mixed-reality operations with both physical and simulated drones, and implements a bilateral VR-based interface, allowing the operator to guide the robot team via migration points while receiving immediate visual feedback of the team state. Experimental results show that shared control improves obstacle avoidance, maintains inter-agent spacing, and reduces operator effort, demonstrating the feasibility and advantages of immersive, human-in-the-loop multi-robot navigation.
comment: Accepted at IEEE International Conference in Robotics and Automation, Vienna 2026
PhysX-Omni: Unified Simulation-Ready Physical 3D Generation for Rigid, Deformable, and Articulated Objects
Simulation-ready physical 3D assets have emerged as a promising direction owing to their broad applicability in downstream tasks. However, most existing 3D generation methods either neglect physical properties or are limited to a single asset category, e.g., rigid, deformable, or articulated objects. To address these limitations, we introduce PhysX-Omni, a unified framework for simulation-ready physical 3D generation across diverse asset types. Specifically, we develop a novel and efficient geometry representation tailored for Vision-Language Models, which directly encodes high-resolution 3D structures without compression, significantly improving generation performance. In addition, we construct the first general simulation-ready 3D dataset, PhysXVerse, covering diverse indoor and outdoor categories. Furthermore, to comprehensively and flexibly evaluate both generative and understanding capabilities in the wild, we propose PhysX-Bench, which encompasses six key attributes: geometry, absolute scale, material, affordance, kinematics, and function description. Extensive experiments with conventional metrics and PhysX-Bench show that PhysX-Omni performs strongly in both generation and understanding. Moreover, additional studies further validate the potential of PhysX-Omni for applications in simulation-ready scene generation and robotic policy learning. We believe PhysX-Omni can significantly advance a wide range of downstream applications, particularly in embodied AI and physics-based simulation.
comment: Project page: https://physx-omni.github.io/
SPARC: Spatial-Aware Path Planning via Attentive Robot Communication
Efficient communication is critical for decentralized Multi-Robot Path Planning (MRPP), yet existing learned communication methods treat all neighboring robots equally regardless of their spatial proximity, leading to diluted attention in congested regions where coordination matters most. We propose Relation enhanced Multi Head Attention (RMHA), a communication mechanism that explicitly embeds pairwise Manhattan distances into the attention weight computation, enabling each robot to dynamically prioritize messages from spatially relevant neighbors. Combined with a distance-constrained attention mask and GRU gated message fusion, RMHA integrates seamlessly with MAPPO for stable end-to-end training. In zero-shot generalization from 8 training robots to 128 test robots on 40x40 grids, RMHA achieves approximately 75 percent success rate at 30 percent obstacle density outperforming the best baseline by over 25 percentage points. Ablation studies confirm that distance-relation encoding is the key contributor to success rate improvement in high-density environments. Index Terms-Multi-robot path planning, graph attention mechanism, multi-head attention, communication optimization, cooperative decision-making
comment: The manuscript is being withdrawn at the request of the first author for the purpose of revising content and re-uploading a revised version with updated data/figures/text . The revised manuscript will be resubmitted to arXiv promptly with the same author list and research theme
How to Utilize Failure Demo Data?: Effective Data Selection for Imitation Learning Using Distribution Differences in Attention Mechanism
Imitation learning for robotic tasks has relied primarily on policies trained only on successful demonstrations, although failures are unavoidable during human data collection. Many existing approaches for exploiting failure data require additional data processing or iterative policy updates through autonomous rollouts, making it difficult to directly and stably utilize failure data accumulated during data collection. In this work, we propose a method that learns latent representations of success-failure discrepancies and incorporates them into the attention mechanism. During inference, an appropriate latent mode is selected from the initial observation to improve action stability. Furthermore, we introduce a post-training metric that quantifies the attention discrepancy between each failure sample and successful demonstrations to select failure data. Simulation results show that the proposed method improves task success rates when trained with failure data and that the proposed metric identifies failure samples that are beneficial for learning when combined with successful demonstrations. These results suggest that the proposed method can support more efficient use of collected demonstrations in robotic data collection pipelines.
comment: 15 pages, 6 figures, 2 tables
FUSE: A Framework for Unified State Estimation in Vehicular and Robotic SLAM Systems
Tightly coupled SLAM formulations under mixed-rate sensing often bind temporal processing, local geometric association, estimator formulation, and map-update policy into method-specific designs. Such binding makes it difficult to vary one design choice without re-engineering the rest of the state-estimation process. This paper presents FUSE, a framework for unified state estimation in vehicular and robotic SLAM systems. FUSE organizes the state-estimation interface around observation ingestion, propagation, update, and state query, and uses this interface to separate temporal processing, residual-ready local geometric association, estimator formulation, and map-update policy. A LiDAR--IMU instantiation is developed to examine the framework under mixed-rate sensing and directional degeneracy, where high-rate inertial propagation, LiDAR-triggered geometric update, residual screening, and degeneracy-aware correction operate through the same interface boundaries. On a 418~m loop-corridor sequence, the instantiation reports a 1.626 m end-to-end trajectory error, corresponding to a 7.9% relative error reduction compared with Faster-LIO, the lowest-error baseline on this sequence. The results support FUSE as a framework for organizing state-estimation design choices and show how the evaluated instantiation regularizes updates along weakly observable directions.
Hand-in-the-Loop: Improving VLA Policies for Dexterous Manipulation via Seamless Hand-Arm Intervention
Vision-Language-Action (VLA) models are prone to compounding errors in dexterous manipulation, where high-dimensional action spaces and contact-rich dynamics amplify small policy deviations over long horizons. While Interactive Imitation Learning (IIL) can refine policies through human correction data, applying it to high-degree-of-freedom (DoF) robotic hands remains challenging due to a command mismatch between human teleoperation and policy execution at the intervention moment, which causes abrupt robot-hand configuration changes, or "gesture jumps". We present Hand-in-the-Loop (HandITL), a seamless human-in-the-loop intervention method that blends human corrective intent with autonomous policy execution to avoid gesture jumps during bimanual dexterous manipulation. Compared with taking over control using direct teleoperation, HandITL reduces intervention jitter by 99.8% and preserves robust post-intervention manipulation, reducing grasp failures by 87.5% and mean completion time by 19.1%. We validate HandITL on tasks requiring bimanual coordination, tool use, and fine-grained long-horizon manipulation. When used to collect correction data for policy refinement, HandITL yields policies that outperform those trained with standard teleoperation data by 19% on average across three long-horizon dexterous tasks.
Can VLMs Unlock Semantic Anomaly Detection? A Framework for Structured Reasoning
Autonomous driving systems remain critically vulnerable to the long-tail of rare, out-of-distribution semantic anomalies. While VLMs have emerged as promising tools for perception, their application in anomaly detection remains largely restricted to prompting proprietary models - limiting reliability, reproducibility, and deployment feasibility. To address this gap, we introduce SAVANT (Semantic Anomaly Verification/Analysis Toolkit), a novel model-agnostic reasoning framework that reformulates anomaly detection as a layered semantic consistency verification. By applying SAVANT's two-phase pipeline - structured scene description extraction and multi-modal evaluation - existing VLMs improve their scores in detecting anomalous driving scenarios from input images. Our approach replaces ad hoc prompting with semantic-aware reasoning, transforming VLM-based detection into a principled decomposition across four semantic domains. We show that across a balanced set of real-world driving scenarios, applying SAVANT improves VLM's absolute recall by approximately 18.5% compared to prompting baselines. Moreover, this gain enables reliable large-scale annotation: leveraging the best proprietary model within our framework, we automatically labeled around 10,000 real-world images with high confidence. We use the resulting high-quality dataset to fine-tune a 7B open-source model (Qwen2.5-VL) to perform single-shot anomaly detection, achieving 90.8% recall and 93.8% accuracy - surpassing all models evaluated while enabling local deployment at near-zero cost. By coupling structured semantic reasoning with scalable data curation, we provide a practical solution to data scarcity in semantic anomaly detection for autonomous systems. Supplementary material: https://TUM-AVS.github.io/SAVANT/.
comment: 8 pages, 5 figures
DeformMaster: An Interactive Physics-Neural World Model for Deformable Objects from Videos
World models for deformable objects should recover not only geometry and appearance, but also underlying physical dynamics, interaction grounding, and material behavior. Learning such a model from real videos is challenging because deformable linear, planar, and volumetric objects evolve under high-dimensional deformation, noisy interactions, and complex material response. The model must therefore infer a physical state from visual observations, roll it forward under new interactions, and render the resulting dynamics with high visual fidelity. We present DeformMaster, a video-derived interactive physics-neural world model that turns real interaction videos into an online interactive model of deformable objects within a unified dynamics-and-appearance framework. DeformMaster preserves structured physical rollout while using a neural residual to compensate for unmodeled effects, grounds sparse hand motion as distributed compliant actuator for hand-continuum interaction, represents material response with spatially varying constitutive experts, and drives high-fidelity 4D appearance from the predicted physical evolution. Experiments on real-world deformable-object sequences demonstrate DeformMaster's ability to roll out future dynamics and render dynamic appearance, outperforming state-of-the-art baselines while supporting novel action rollout, material-parameter variation, and dynamic novel-view synthesis. Project page: https://can-lee.github.io/deformmaster-web/
comment: Project page: https://can-lee.github.io/deformmaster-web/
Query-Calibrated Segmental Admission for Descriptor-Agnostic LiDAR Loop Closure in Repetitive Environments
Structurally repetitive environments produce visually plausible but aliased LiDAR loop candidates that can destabilize pose-graph optimization when admitted as loop factors. We propose Query-Calibrated Segmental Admission (QCSA), a descriptor-agnostic sparse loop-admission policy for graph-stability-oriented insertion. The policy scores short descriptor segments against hard negatives, calibrates which query-level segment hypotheses reach geometry, and inserts representative pairs validated by Generalized Iterative Closest Point (G-ICP). We evaluate it on the SNU Library Dataset (SNULib) and HeLiPR overlap routes. Aggregated over seven LiDAR descriptor families on SNULib, QCSA reduces inserted loop factors by 3.8 times, raises factor precision from 0.542 to 0.717, and sharply lowers false admissions per query group. With this sparser graph, it maintains comparable mean absolute trajectory error (ATE) and substantially reduces worst-sequence ATE versus dense Top1+G-ICP, from 1.064 to 0.778 m. The aggregate mean and worst-sequence ATE remain lower than the odometry-only reference. Under a matched factor budget, QCSA also attains lower trajectory error than SeqSLAM and sparse Top1+G-ICP selections. Fixed-transfer validation on HeLiPR, with no route-specific tuning, likewise suppresses hard-negative admissions. These results support the proposed admission layer for aliasing-heavy simultaneous localization and mapping (SLAM). Our implementation and dataset will be released at: https://github.com/wanderingcar/snu_library_dataset.
comment: 8 pages, 3 figures
TimeRewarder: Learning Dense Reward from Passive Videos via Frame-wise Temporal Distance ICML 2026
Designing dense rewards is crucial for reinforcement learning (RL), yet in robotics it often demands extensive manual effort and lacks scalability. One promising solution is to view task progress as a dense reward signal, as it quantifies the degree to which actions advance the system toward task completion over time. We present TimeRewarder, a simple yet effective reward learning method that derives progress estimation signals from passive videos, including robot demonstrations and human videos, by modeling temporal distances between frame pairs. We then demonstrate how TimeRewarder can supply step-wise proxy rewards to guide reinforcement learning. In our comprehensive experiments on ten challenging Meta-World tasks, we show that TimeRewarder dramatically improves RL for sparse-reward tasks, achieving nearly perfect success in 9/10 tasks with only 200,000 environment interactions per task. This approach outperformed previous methods and even the manually designed environment dense reward on both the final success rate and sample efficiency. Moreover, we show that TimeRewarder pretraining can exploit real-world human videos, highlighting its potential as a scalable approach to rich reward signals from diverse video sources.
comment: ICML 2026 spotlight paper
Depth Completion in Unseen Field Robotics Environments Using Extremely Sparse Depth Measurements ICRA 2026
Autonomous field robots operating in unstructured environments require robust perception to ensure safe and reliable operations. Recent advances in monocular depth estimation have demonstrated the potential of low-cost cameras as depth sensors; however, their adoption in field robotics remains limited due to the absence of reliable scale cues, ambiguous or low-texture conditions, and the scarcity of large-scale datasets. To address these challenges, we propose a depth completion model that trains on synthetic data and uses extremely sparse measurements from depth sensors to predict dense metric depth in unseen field robotics environments. A synthetic dataset generation pipeline tailored to field robotics enables the creation of multiple realistic datasets for training purposes. This dataset generation approach utilizes textured 3D meshes from Structure from Motion and photorealistic rendering with novel viewpoint synthesis to simulate diverse field robotics scenarios. Our approach achieves an end-to-end latency of 53 ms per frame on a Nvidia Jetson AGX Orin, enabling real-time deployment on embedded platforms. Extensive evaluation demonstrates competitive performance across diverse real-world field robotics scenarios.
comment: Accepted to ICRA 2026
Multimodal Fusion for Sim2real Transfer in Visual Reinforcement Learning
Depth information is robust to scene appearance variations and inherently carries 3D spatial details. Thus, a visual backbone based on the vision transformer is proposed to fuse RGB and depth modalities for enhancing generalization in this paper. Different modalities are first processed by separate CNN stems, and the combined convolutional features are delivered to the scalable vision transformer to obtain visual representations. Moreover, a contrastive learning scheme is designed with masked and unmasked tokens to enhance the sample efficiency and generalization performance. A curriculum-based domain randomization scheme is used to flexibly stabilize the training process. Finally, simulation results demonstrate that our fusion scheme outperforms the other baselines. The feasibility of our model is validated to perform real-world manipulation tasks via zero-shot transfer.
ARC-RL: A Reinforcement Learning Playground Inspired by ARC Raiders
Reinforcement learning for legged locomotion has matured into a stack of multi-component reward functions and physics-engine benchmarks whose morphologies are uniformly derived from real commercial hardware. Game NPCs, however, are bound by stylistic constraints absent from sim-to-real robotics and routinely take the form of creatures with no real-robot counterpart. We introduce ARC-RL, a suite of four MuJoCo continuous-control environments featuring robotic morphologies inspired by the bestiary of ARC Raiders: the 18-DoF tall hexapod Queen, the 12-DoF armoured hexapod Bastion, the 18-DoF compact hexapod Tick, and the 12-DoF quadruped Leaper. All four robots share a unified observation template, action convention, simulation cadence, and a single closed-form multi-component reward function whose only per-morphology variation lives in a small set of weights and parameters. The reward fuses a velocity-tracking tent, a healthy survive bonus, a phase-locked gait-compliance bonus/cost pair, action regularisers, three safety penalties, and a posture anchor; no motion-capture data enters the reward at any point. We additionally provide hand-crafted Central Pattern Generator demonstrators per morphology, which serve both as fixed expert references and as sources of prior data for offline-to-online training. On this playground, we conduct a controlled empirical study comparing standard online algorithms (SAC, SPEQ, SOPE-EO) and methods augmented with prior data (SACfD, SPEQ-O2O, SOPE), and characterise how each paradigm copes with the playground's morphological diversity and animation-style stylistic constraints. Source code is available at https://github.com/CarloRomeo427/ARC_RL.git.
VLANeXt: Recipes for Building Strong VLA Models ICML 2026
Following the rise of large foundation models, Vision-Language-Action models (VLAs) emerged, leveraging strong visual and language understanding from Vision-Language Models for general-purpose policy learning. Yet, the current VLA landscape remains fragmented and exploratory. Although many groups have proposed their own VLA models, inconsistencies in training protocols and evaluation settings make it difficult to identify which design choices truly matter. To bring structure to this evolving space, we reexamine the VLA design space under a unified framework and evaluation setup. Starting from a simple VLA baseline similar to RT-2, which is the origin of VLA, we systematically dissect design choices along three dimensions: foundational components, perception essentials, and action modelling perspectives. From this study, we distill 12 key findings that together form a practical recipe for building strong VLA models. The outcome of this exploration is a simple yet effective model, VLANeXt. It outperforms the state-of-the-art methods on the LIBERO and LIBERO-plus benchmarks and demonstrates strong performance in real-world experiments. We release a unified and easy-to-use codebase to reproduce our findings, explore the design space, and develop new VLA variants on top of a shared foundation. The codebase is available at https://github.com/DravenALG/VLANeXt.
comment: Accepted in ICML 2026, Project Page: https://dravenalg.github.io/VLANeXt/
CosFly-Track: A Large-Scale Multi-Modal Dataset for UAV Visual Tracking via Multi-Constraint Trajectory Optimization
Recent aerial vision-language navigation (VLN) datasets have grown rapidly, but they primarily address goal-oriented navigation to static destinations, leaving UAV visual tracking -- continuously following a moving target while maintaining visibility -- largely without dedicated training data. We introduce CosFlyTrack, a large-scale multi-modal dataset and scalable generation pipeline for UAV visual tracking in urban environments. The dataset provides approximately 12,000 expert and perturbed UAV trajectories generated from 6,000 pedestrian paths, comprising 2.4 million timesteps (approximately 334 hours) with seven aligned data channels: RGB, metric depth, semantic segmentation, six-degree-of-freedom drone pose, target state with visibility flag, bilingual (Chinese-English) instructions, and trajectory-pair metadata. To generate high-quality expert trajectories, we develop MuCO, a multi-constraint optimizer that plans directly in continuous three-dimensional space with BVH-accelerated collision and visibility queries, jointly enforcing target visibility, viewpoint quality, collision avoidance, smoothness, and kinematic feasibility, avoiding the discretization artifacts and post-hoc smoothing of grid-based planners. Fine-tuning experiments on seven vision-language models show that CosFlyTrack improves tracking performance to 78.3 to 95.6 percent SR@1 meter, a 53 to 69 percentage point gain over zero-shot baselines, supporting the dataset as a training resource for dynamic target-following agents. The dataset is publicly available at https://huggingface.co/datasets/AutelRobotics/CosFly; evaluation scripts and pre-trained checkpoints are hosted at https://huggingface.co/AutelRobotics/CosFly-Track.
COBALT: Crowdsourcing Robot Learning via Cloud-Based Teleoperation with Smartphones
The scarcity of large-scale, high-quality demonstration data remains a bottleneck in scaling imitation learning for robotic manipulation. We present COBALT, a teleoperation platform designed to democratize robot learning at scale both in simulation and in the real world. By leveraging vectorized environments, our scalable, load-balanced infrastructure supports concurrent teleoperation by multiple users on a single GPU, yielding a significant reduction in teleoperation cost. Operators can connect from nearly anywhere on Earth using commonly available devices, including single or dual smartphones, VR headsets, 3D mice, and keyboards. An inmemory data cache and efficient video streaming keep control and rendering synchronous, sustaining dozens of concurrent users at 20 Hz with sub-100 ms end-to-end latency for up to 8 concurrent users per GPU. We also demonstrate stable operation supporting 256 simulated clients across 8 GPUs, underscoring the system's ability to scale across hardware and within individual servers. We perform a comprehensive user study showing that phone-based teleoperation performs comparably to or better than specialized hardware, enabling faster, more ergonomic data collection. To ensure data quality, COBALT logs a suite of real-time metrics to automatically filter suboptimal demonstrations. We further demonstrate that a structured user training curriculum significantly improves data collection quality. Guided by insights from our user study, we crowdsource the collection of a large-scale, high-quality pilot dataset with 7500+ demonstrations (50+ hours) collected with smartphones across nine countries over five days. We validate the dataset's quality by training state-of-the-art imitation learning algorithms. Please visit https://cobalt-teleop.github.io/ for more details.
FocalPolicy: Frequency-Optimized Chunking and Locally Anchored Flow Matching for Coherent Visuomotor Policy
Visuomotor policies aim to learn complex manipulation tasks from expert demonstrations. However, generating smooth and coherent trajectories remains challenging, as it requires balancing proximal precision with distal foresight. Existing approaches typically focus on optimizing intra-chunk action distributions, often neglecting the inter-chunk coherence. Consequently, inter-chunk discontinuities significantly impede the learning of coherent long-horizon actions. To overcome this limitation and achieve a synergetic balance between precision and foresight, we propose FocalPolicy, a foresight-aware visuomotor policy that combines Frequency-Optimized Chunking with Locally Anchored flow matching. We introduce a foresight composite objective that supervises time-domain alignment within the proximal actions while regularizing frequency-domain structure over multiple future action chunks to improve cross-chunk coherence. To efficiently learn complex action distributions, we design locally anchored sampling to enhance target signal propagation efficiency during consistency flow matching training. Extensive experiments demonstrate that FocalPolicy outperforms existing approaches and confirm the generalizability of our modules to other baselines. Project website: https://focalpolicy.github.io/
Temporal Counterfactual Explanations of Behaviour Tree Decisions
Explainability, in particular, the ability for robots to explain why they have made a decision or behaved in a certain way, is a critical tool in helping users understand the robots they interact and coexist with. Behaviour trees are a popular framework for controlling the decision-making of robots, and thus a natural question to ask is whether or not a system driven by a behaviour tree is capable of answering "why" questions. While explainability for behaviour tree-driven robots has seen some prior attention, no existing methods are capable of generating causal, counterfactual explanations which detail the reasons for robot decisions and behaviour. Therefore, in this work, we introduce a novel approach which automatically generates counterfactual explanations in response to contrastive "why" questions. Our method achieves this by first automatically building a causal model from the structure of the behaviour tree as well as domain knowledge about the state and individual behaviour tree nodes. The resultant causal model is then queried and searched to find a set of diverse counterfactual explanations. We demonstrate that our approach is able to correctly explain the behaviour of a wide range of behaviour tree structures and states in real time, unlike previous methods which are either unable to answer contrastive questions with causal explanations, or are not guaranteed to provide consistent and accurate explanations. By being able to answer a wide range of causal queries, our approach represents a step towards more transparent, understandable, and ultimately safe and trustworthy robotic systems.
comment: 33 pages, 7 figures + 4 figures in appendices
Before the Body Moves: Learning Anticipatory Joint Intent for Language-Conditioned Humanoid Control
Natural language is an intuitive interface for humanoid robots, yet streaming whole-body control requires control representations that are executable now and anticipatory of future physical transitions. Existing language-conditioned humanoid systems typically generate kinematic references that a low-level tracker must repair reactively, or use latent/action policies whose outputs do not explicitly encode upcoming contact changes, support transfers, and balance preparation. We propose \textbf{DAJI} (\emph{Dynamics-Aligned Joint Intent}), a hierarchical framework that learns an anticipatory joint-intent interface between language generation and closed-loop control. DAJI-Act distills a future-aware teacher into a deployable diffusion action policy through student-driven rollouts, while DAJI-Flow autoregressively generates future intent chunks from language and intent history. Experiments show that DAJI achieves strong results in anticipatory latent learning, single-instruction generation, and streaming instruction following, reaching 94.42\% rollout success on HumanML3D-style generation and 0.152 subsequence FID on BABEL.
Affordance-R1: Reinforcement Learning for Generalizable Affordance Reasoning in Multimodal Large Language Model
Affordance grounding focuses on predicting the specific regions of objects that are associated with the actions to be performed by robots. It plays a vital role in the fields of human-robot interaction, human-object interaction, embodied manipulation, and embodied perception. Existing models often neglect the affordance shared among different objects because they lack the Chain-of-Thought(CoT) reasoning abilities, limiting their out-of-domain (OOD) generalization and explicit reasoning capabilities. To address these challenges, we propose Affordance-R1, the first unified affordance grounding framework that integrates cognitive CoT guided Group Relative Policy Optimization (GRPO) within a reinforcement learning paradigm. Specifically, we designed a sophisticated affordance function, which contains format, perception, and cognition rewards to effectively guide optimization directions. Furthermore, we constructed a high-quality affordance-centric reasoning dataset, ReasonAff, to support training. Trained exclusively via reinforcement learning with GRPO and without explicit reasoning data, Affordance-R1 achieves robust zero-shot generalization and exhibits emergent test-time reasoning capabilities. Comprehensive experiments demonstrate that our model outperforms well-established methods and exhibits open-world generalization. To the best of our knowledge, Affordance-R1 is the first to integrate GRPO-based RL with reasoning into affordance reasoning. The code of our method and our dataset is released on https://github.com/hq-King/Affordance-R1.
RankQ: Offline-to-Online Reinforcement Learning via Self-Supervised Action Ranking
Offline-to-online reinforcement learning (RL) improves sample efficiency by leveraging pre-collected datasets prior to online interaction. A key challenge, however, is learning an accurate critic in large state--action spaces with limited dataset coverage. To mitigate harmful updates from value overestimation, prior methods impose pessimism by down-weighting out-of-distribution (OOD) actions relative to dataset actions. While effective, this essentially acts as a behavior cloning anchor and can hinder downstream online policy improvement when dataset actions are suboptimal. We propose RankQ, an offline-to-online Q-learning objective that augments temporal-difference learning with a self-supervised multi-term ranking loss to enforce structured action ordering. By learning relative action preferences rather than uniformly penalizing unseen actions, RankQ shapes the Q-function such that action gradients are directed toward higher-quality behaviors. Across sparse reward D4RL benchmarks, RankQ achieves performance competitive with or superior to seven prior methods. In vision-based robot learning, RankQ enables effective offline-to-online fine-tuning of a pretrained vision-language-action (VLA) model in a low-data regime, achieving on average a 42.7% higher simulation success rate than the next best method. In a high-data setting, RankQ improves simulation performance by 13.7% over the next best method and achieves strong sim-to-real transfer, increasing real-world cube stacking success from 43.1% to 88.9% relative to the VLA's initial performance.
Constrained Policy Optimization via Sampling-Based Weight-Space Projection
Safety-critical learning requires policies that improve performance without leaving the safe operating regime. We study constrained policy learning where model parameters must satisfy rollout-based safety constraints that can be evaluated but not differentiated analytically. We propose SCPO, a sampling-based weight-space projection method that enforces safety directly in parameter space without requiring gradient access to the constraint functions. SCPO constructs a local safe region by combining rollout-based safety evaluations with smoothness bounds relating parameter perturbations to changes in safety metrics, and projects each gradient update via a convex QCQP. We establish a safe-by-induction guarantee: starting from any safe initialization, all intermediate policies remain safe given feasible projections. In constrained control settings with a stabilizing backup policy, SCPO further ensures closed-loop stability while enabling safe adaptation beyond the conservative backup. Experiments on constrained regression with harmful supervision and double-integrator imitation with a malicious expert show that SCPO rejects unsafe updates, maintains feasibility throughout training, and achieves meaningful objective improvement.
comment: Accepted for publication at IFAC World Congress 2026; fixed minor notation inconsistencies
Tackling the Kidnapped Robot Problem via Sparse Feasible Hypothesis Sampling and Reliable Batched Multi-Stage Inference
This paper addresses the Kidnapped Robot Problem (KRP), a core localization challenge of relocalizing a robot in a known map without prior pose estimate upon localization loss or at SLAM initialization. For this purpose, a passive 2-D global relocalization framework is proposed. It estimates the global pose efficiently and reliably from a single LiDAR scan and an occupancy grid map while the robot remains stationary, thereby enhancing the long-term autonomy of mobile robots. The proposed framework casts global relocalization as a non-convex problem and solves it via the multi-hypothesis scheme with batched multi-stage inference and early termination, balancing completeness and efficiency. The Rapidly-exploring Random Tree (RRT), under traversability constraints, asymptotically covers the reachable space to generate sparse, uniformly distributed feasible positional hypotheses, fundamentally reducing the sampling space. The hypotheses are preliminarily ordered by the proposed Scan Mean Absolute Difference (SMAD), a coarse beam-error level metric that facilitates the early termination by prioritizing high-likelihood candidates. The SMAD computation is optimized for limited scan measurements. The Translation-Affinity Scan-to-Map Alignment Metric (TAM) is proposed for reliable orientation selection at hypothesized positions and accurate final global pose evaluation to mitigate degradation in conventional likelihood-field metrics under translational uncertainty induced by sparse hypotheses, as well as non-panoramic LiDAR scan and environmental changes. Real-world experiments on a resource-constrained mobile robot with non-panoramic LiDAR scans show that the proposed framework achieves competitive performance in success rate, robustness under measurement uncertainty, and computational efficiency.
comment: 14 pages, 8 figures. Accepted for publication in IEEE Transactions on Instrumentation and Measurement. DOI: 10.1109/TIM.2026.3694741
Parallel OctoMapping: A Scalable Framework for Enhanced Path Planning in Autonomous Navigation
Mapping is essential in robotics and autonomous systems because it provides the spatial foundation for path planning. Efficient mapping enables planning algorithms to generate reliable paths while ensuring safety and adapting in real time to complex environments. Fixed-resolution mapping methods often produce overly conservative obstacle representations that lead to suboptimal paths or planning failures in cluttered scenes. To address this issue, we introduce Parallel OctoMapping (POMP), an efficient OctoMap-based mapping technique that maximizes available free space and supports multi-threaded computation. To the best of our knowledge, POMP is the first method that, at a fixed occupancy-grid resolution, refines the representation of free space while preserving map fidelity and compatibility with existing search-based planners. It can therefore be integrated into existing planning pipelines, yielding higher pathfinding success rates and shorter path lengths, especially in cluttered environments, while substantially improving computational efficiency.
A KL-regularization Framework for Learning to Plan with Adaptive Priors ICML2026
Effective exploration remains a central challenge in model-based reinforcement learning (MBRL), particularly in high-dimensional continuous control tasks where sample efficiency is crucial. A prominent line of recent work leverages learned policies as proposal distributions for Model-Predictive Path Integral (MPPI) planning. Initial approaches update the sampling policy independently of the planner distribution, typically maximizing a learned value function with deterministic policy gradient and entropy regularization. However, because the states encountered during training depend on the MPPI planner, aligning the sampling policy with the planner improves the accuracy of value estimation and long-term performance. To this end, recent methods update the sampling policy by minimizing KL divergence to the planner distribution or by introducing planner-guided regularization into the policy update. In this work, we unify these MPPI-based reinforcement learning methods under a single framework by introducing Policy Optimization-Model Predictive Control (PO-MPC), a family of KL-regularized MBRL methods that integrate the planner's action distribution as a prior in policy optimization. By aligning the learned policy with the planner's behavior, PO-MPC allows more flexibility in the policy updates to trade off Return maximization and KL divergence minimization. We clarify how prior approaches emerge as special cases of this family, and we explore previously unstudied variations. Our experiments show that these extended configurations yield significant performance improvements, advancing the state of the art in MPPI-based RL.
comment: Published at ICML2026
Noise-Space Attribution and Control of Chunk-Boundary Artifact
Action chunking is widely used in generative visuomotor policies, yet the recurring execution discontinuities at chunk boundaries still lack a mechanistic explanation. This paper treats chunk-boundary artifact as an analyzable mechanism variable. We first show that successful and failed episodes separate stably on artifact metrics. We then show that, in stochastic action-chunked policies, fixing the observation context and changing only latent noise is sufficient to modulate artifact systematically. On the same Diffusion Policy checkpoint, comparisons among DDPM, zero-variance DDPM, and DDIM further show that this local controllability depends on whether the information path from initial noise to action output remains intact. Finally, from controlled interventions at fixed local execution states, we find that artifact changes can carry through to final outcome, and that the preferred direction can reverse even within the same task: some contexts achieve higher success under lower artifact, whereas others achieve higher success under higher artifact. In a representative high-artifact-favoring key context selected by held-out matched-continuation validation, success rate increases from 0.033 to 0.717. These results show that chunk-boundary artifact is not a mere execution-side by-product, but a variable in noise space that can be attributed, controlled, and mechanistically linked to task outcome.
Multiagent Systems
Transforming Privacy Artifacts into Accessible Reports for Non-Technical Stakeholders
The transition toward Industry 5.0 is reshaping industrial work environments with an emphasis on human-centricity, enabling close collaboration between humans and machines to enhance productivity and flexibility. However, such systems typically require monitoring of human workers and operators, often involving sensitive data, raising significant privacy concerns. As a result, affected workers and unions frequently reject human-machine collaboration features due to a lack of transparency regarding privacy threats and implemented mitigation strategies. To enable early stakeholder involvement, establish trust, and support informed decision-making, privacy implications must be communicated in a way understandable to non-technical stakeholders. Yet, current Requirements Engineering (RE) practices provide limited methodological support for making privacy threats and mitigations accessible to non-technical stakeholders (e.g., individual workers or their representative unions). In this RE@Next paper, we propose a conceptual framework that guides software design from human monitoring-related use cases and requirements to informed decision-making guidance focusing on non-technical stakeholders. Building on principles such as Privacy by Design, the framework leverages Large Language Models (LLMs) to transform technical artifacts into accessible privacy reports. We share initial insights from two industry use cases, evaluate the quality of the generated reports, and outline future research directions toward integrating privacy transparency into RE processes for human-centric industrial systems.
comment: 8 pages (7+1), Accepted for publication at RE@Next'26
Decoupling Communication from Policy: Robust MARL under Bandwidth Constraints
Communication enables coordination in multi-agent reinforcement learning (MARL), but many real-world applications, e.g., search-and-rescue with drone swarms, operate under severe bandwidth constraints. Many communication architectures still expose a coupled bottleneck in which a shared latent representation is used for both policy execution and inter-agent communication. Consequently, reducing message size directly limits the policy's latent space, often leading to significant performance degradation. We address this with two contributions. First, we introduce $β$, a normalised per-agent bandwidth budget that unifies sparsity, rounds, and message dimension into a single comparable constraint. Second, we provide SLIM, a minimal architecture that decouples the communication pathway from the policy's latent representation, allowing us to isolate the effect of bandwidth from the effect of policy capacity while benefiting from in-step communication. We evaluate our method on several partially-observable MARL benchmarks, where communication is essential. Our approach achieves state-of-the-art performance and exhibits scalability and robustness under limited communication, with only marginal degradation as bandwidth is reduced.
ProCrit: Self-Elicited Multi-Perspective Reasoning with Critic-Guided Revision for Multimodal Sarcasm Detection
Multimodal sarcasm detection requires reasoning over cross-modal incongruities between literal expression and intended meaning, yet the specific analytical perspectives needed vary across samples due to the diversity of sarcastic mechanisms. While recent methods make this analytical process explicit, they still rely on fixed, predefined perspectives that operate independently under hand-crafted routing rules. We argue that multimodal sarcasm detection instead calls for self-elicited multi-perspective reasoning, where a model autonomously generates the perspectives needed for each sample and progressively integrates them into a coherent analysis. To realize this goal, we propose ProCrit, a Proposal-Critic two-agent framework with a proposal agent for multi-perspective reasoning and a critic agent for external evaluation and targeted revision guidance. First, to overcome the lack of process-level supervision in existing sarcasm datasets, ProCrit synthesizes process-level reasoning annotations through a dynamic-role agentic rollout: a strong vision-language model sequentially spawns analytical roles within a shared context, and the resulting multi-role trajectories are flattened into sequences that preserve cross-perspective dependencies while enabling efficient autoregressive generation. Second, to improve reasoning reliability, ProCrit adopts a draft-critique-revise paradigm in which an independent critic identifies reasoning deficiencies and provides targeted natural-language feedback for directed revision. Finally, we develop a mutual-refinement training framework that jointly optimizes proposal drafting and feedback-guided revision via dual-stage reinforcement learning, while refining the critic agent according to the actual effectiveness of its feedback. Experiments on three widely used benchmarks demonstrate the effectiveness of ProCrit.
Heartbeat-Bound Hierarchical Credentials: Cryptographic Revocation for AI Agent Swarms
Autonomous AI agents that spawn sub-agent swarms create a safety gap: existing credential revocation mechanisms, OAuth~2.0 introspection, OCSP, and W3C Status Lists, require network connectivity to a central authority, leaving ``zombie agents'' executing privileged operations for minutes to hours after operator shutdown. We present Heartbeat-Bound Hierarchical Credentials (HBHC), a cryptographic protocol that binds credential validity to periodic parent liveness proofs. Verifiers enforce freshness using only a cached public key and local clock; no network round-trip is required. When heartbeat generation ceases, all descendant credentials become unusable within a deterministically bounded window $W_z \le W_{\max} + Δ_h + ε$, conditional on bounded clock skew and parent keys held in secure enclaves. Evaluation at the protocol layer and with real LLM-backed agent swarms (GPT-4o-mini) demonstrates a 90$\times$ reduction in the zombie window over OAuth~2.0, 0.26~ms full authentication in Rust, 18,000+ verifications per second under concurrent HTTP load, and stable per-verification latency from 10 to 10,000 agents. Real-agent experiments show 0.71\% end-to-end overhead on tool calls, zero post-revocation tool calls under prompt injection that bypasses application-layer guardrails, and cascading revocation across a 49-agent four-level hierarchy within the theoretical bound.
CandorMD: An AI-Assisted Audio Simulation and Feedback System for Training Clinicians for Medical Error Disclosure
Clinicians are expected to disclose harmful medical errors to patients and families in line with ethical, regulatory, and patient care standards, yet these conversations remain challenging because of their emotional complexity and limited training opportunities. Most physicians still learn primarily through lectures and observation, while static video tools-though available-are underused, lack adaptability across specialties, and deliver delayed, generic feedback. These gaps restrict skill development, reduce self-efficacy, and contribute to avoidance of disclosure conversations, ultimately compromising patient care and eroding trust. To address these needs, we designed CandorMD -- an AI-assisted simulation system that provides real-time practice, actionable feedback, and diverse practice environments tailored to individual learning needs. We conducted semi-structured interviews with physicians, risk managers, patient advocates, and communication experts to understand current practices, identify gaps, and collect feedback on CandorMD. Based on these insights, we present findings and design recommendations for the future of AI-supported medical communication training.
Time-To-Reach Separation and Safety Filtering for Safe, Fair, and Efficient Multi-Agent Coordination
Advanced Air Mobility (AAM) operations are expected to significantly increase aerial traffic in urban airspace, requiring autonomous traffic management systems to ensure collision-free operations in highly congested environments. In this paper, we propose a multi-agent coordination framework that uses minimum time-to-reach (TTR) as a unifying metric for priority assignment, temporal separation, and safety filtering. We focus on the problem of coordinating multiple aerial vehicles merging into an air corridor while maintaining safe separation between vehicles. Vehicles are assigned arrival-consistent priority based on TTR, and target TTR values are used to enforce temporal spacing that induces spatial separation. A priority-consistent safety filtering layer based on Hamilton-Jacobi reachability value functions ensures collision avoidance while minimally modifying the reference guidance. Simulation results in a highly congested corridor merging scenario show that the proposed method improves safety, fairness, and efficiency compared to time-optimal guidance and priority-agnostic safety filtering.
comment: 9 pages, 3 figures. Extended version (including appendix) of a paper submitted to the 65th IEEE Conf. on Decision and Control (2026)
Intent-First Aerial V2V for Tactical Coordination and Separation: Protocol and Performance Under Density and Disturbance
Dense low-altitude aerial operations require more than pre-flight route coordination and last-resort collision avoidance. Once aircraft are airborne, disturbances can emerge on timescales shorter than strategic reauthorization can absorb, while collision avoidance is too late and disruptive to serve as routine traffic management. Although tactical separation is recognized as the intermediate layer, realizing it at scale requires a deployable neighborhood communication mechanism that provides fresh, trusted information for local coordination. This paper presents what is, to our knowledge, the first controller-coupled characterization of an all-airborne, sidelink-class, intent-first vehicle-to-vehicle (V2V) tactical neighborhood exchange stack for dense Unmanned Aircraft System Traffic Management (UTM) operations. Unlike awareness-only broadcast, the proposed exchange combines refreshed state and intent beacons for local awareness, cooperative perception, and degraded-mode assessment with event-triggered messages for yielding, sequencing, release, and contingency coordination. We implement and evaluate this model on an all-airborne V2V stack using sidelink-class C-V2X modules with authenticated freshness checks. Evaluation uses a scenario-driven, high-volume stress campaign supported by real-time, field-anchored infrastructure. Results show that V2V reduces stale-belief divergence, preserves observability through cooperative perception, rejects invalid tactical messages, suppresses false local inference, and structures shared-resource coordination. The implemented stack provides a viable communication layer for tactical separation in lower-to-moderate regimes, but transitions toward guarded fallback as density, impairment, and complexity increase. These findings position intent-first aerial V2V as a bounded enabler for scaling tactical coordination in disturbance-driven urban airspace.
comment: Submitted to IEEE Transactions on Intelligent Transportation Systems
Trace2Skill: Verifier-Guided Skill Evolution for Long-Context EDA Agents
Complex Verilog Design Problems (CVDP) challenge hardware LLM agents because solving them requires localizing verifier-relevant RTL, testbenches, include paths, and build dependencies inside large repository snapshots, making precise edits, and recovering from sparse hidden-verifier failures. We present Trace2Skill, a test-time scaling framework that improves a hardware agent without RTL-specialized model fine-tuning. Rather than training a new model or only sampling more candidate solutions, Trace2Skill treats the agent's natural-language skill as an evolvable policy. It mines repeated rollout traces for success and failure modes, converts them into dense diagnostics and oracle lessons, and uses an oracle, mutator, and selector loop to produce task-specific skills that guide later search, editing, validation, and recovery. Because final pass/fail labels are often too coarse for hard failures, Trace2Skill also supports bounded runtime dense verifier feedback that returns sanitized functional observations while keeping hidden harnesses and reference solutions inaccessible to the agent. This feedback helps guide skill evolution and agent execution by connecting skill text, verifier evidence, and downstream behavior. Across hard CVDP tasks that defeat the seed CVDP agent, including tasks that also defeat frontier coding agents, Trace2Skill with dense verifier feedback substantially improves task pass rates and produces breakthrough passes on previously unsolved tasks, without requiring high-quality fine-tuning data, specialized RTL model training, or model weight updates. The same framework provides a general test-time scaling strategy that can extend beyond digital design to other verifiable EDA tasks.
Secure Coordination for Vertiport Sequencing in Advanced Air Mobility
Advanced air mobility operations will require reliable coordination mechanisms for managing dense traffic near vertiports. However, sequencing decisions may become vulnerable when they rely on potentially falsified self-reported information such as estimated time of arrival. Self-interested vehicles may misreport their arrival times to obtain favorable landing priority, while malicious actors may spoof information to disrupt sequencing decisions or induce unnecessary congestion. This paper studies secure coordination for vertiport sequencing under sensing uncertainty. We consider a coordinator that combines self-reported Remote-ID information with externally obtained surveillance measurements to check reports and assign separation-feasible arrival schedules. Since surveillance-based estimates are uncertain, falsified reports may remain consistent with the sensing uncertainty region and cannot always be rejected outright. We therefore formulate sequencing as a robust design problem over this uncertainty region. Self-interested misreporting is modeled as a strategic deviation that improves the reporting vehicle's own sequencing outcome, whereas malicious spoofing is modeled as an adversarial disturbance that degrades the system-level objective. The final paper will develop robust sequencing rules over surveillance-consistent uncertainty sets and evaluate their performance in representative vertiport sequencing scenarios.
Memory-R2: Fair Credit Assignment for Long-Horizon Memory-Augmented LLM Agents
Memory-augmented LLM agents enable interactions that extend beyond finite context windows by storing, updating, and reusing information across sessions. However, training such agents with reinforcement learning in multi-session environments is challenging because memory turns the agent's past actions into part of its future environment. Once different rollouts write, update, or delete different memories, they no longer share the same intermediate memory state, making trajectory-level comparisons fundamentally unfair. This violates a key assumption behind group-relative methods such as GRPO, where rollouts are compared as if they were sampled from the same effective environment. Consequently, trajectory-level rewards provide noisy or biased credit signals for long-horizon memory operations. To address this challenge, we introduce Memory-R2, a training framework for long-horizon memory-augmented LLM agents. Its core algorithm, LoGo-GRPO, combines local and global group-relative optimization. The global objective preserves end-to-end learning from long-horizon trajectory-level rewards, while local rerollouts compare different memory-operation outcomes from the same intermediate memory state, yielding fairer group comparisons and more precise supervision for memory construction. Beyond credit assignment, Memory-R2 jointly optimizes memory formation and memory evolution with a shared-parameter co-learning design, where a fact extractor and a memory manager are instantiated from the same LLM backbone through role-specific prompts. To stabilize multi-step RL over long memory horizons, we adopt a progressive curriculum that increases the training horizon from 8 to 16 to 32 sessions. Together, these components provide an effective training paradigm for memory-augmented LLM agents in long-horizon multi-session settings.
Learning Altruistic Collaboration in Heterogeneous Multi-Team Systems
This paper studies heterogeneous multi-team collaboration through dynamic robot allocation, where robots are treated as transferable resources. Leveraging Hamilton's rule from ecology as an altruistic decision-making mechanism, we propose a multi-team collaborative resource allocation framework with heterogeneous capabilities, transfer costs, and capability-dependent contributions. The resulting allocation problem is combinatorial and is shown to be NP-hard. To address scalability, we develop a graph neural network policy under centralized training and decentralized execution that approximates the altruistic allocations based on Hamilton's rule. The model operates over the team interaction graph and predicts robot-level transfer decisions and next robot-to-team assignments. The proposed approach is validated in a firefighting scenario through simulations and experiments, demonstrating that the learned policy achieves near-optimal performance while scaling to larger systems.
Planning, Scheduling, and Behavior in EV Charging Systems: A Critical Survey and Trilemma Framework
The rapid growth of electric vehicles is shifting the main constraint on transport electrification from vehicle adoption to the deployment and operation of charging infrastructure. Charging-network design requires decisions across three interdependent layers: Planning, which determines where and how much infrastructure to build; Scheduling, which governs charging dispatch, pricing, and grid interaction; and Behavior, which captures how users choose stations, charging times, and charging durations. Existing studies have advanced each layer substantially, but the literature remains fragmented, and cross-layer interactions are often treated through simplifying assumptions. This survey develops a three-layer Planning-Scheduling-Behavior (PSB) framework to organize EV charging research according to decision horizon, actor objective, and coupling structure. We further identify a fidelity-tractability tradeoff, termed the PSB trilemma: each layer is computationally difficult in isolation, and realistic integration across layers generally requires reducing the fidelity of at least one layer. Reviewing the three pairwise-coupling literatures - Planning-Scheduling, Scheduling-Behavior, and Planning-Behavior - we show that the omitted third layer is typically fixed exogenously or represented by a static aggregate surrogate. These simplifications enable tractability but impose distinct costs: they can obscure long-term investment feedback, temporal grid and emissions dynamics, or heterogeneous user response and equity outcomes. Building on this diagnosis, we identify open challenges in emerging charging technologies, behavioral incentives, equity metrics, and city-scale learning-based methods that balance fidelity, interpretability, and policy relevance.
comment: Review article; 56 pages excluding references; 1 figure and 3 tables
Argo: Efficient Importance Labeling for Enterprise Email Systems
Email importance labeling has long been a critical yet challenging problem for businesses and individuals. Traditional approaches; such as keyword matching, user-defined rules, and sender-based heuristics; demand extensive manual feature engineering and fail to scale effectively or generalize. Recent advances in large language models (LLMs) demonstrate strong potential and a natural fit for this task, offering deep contextual understanding and superior labeling quality. However, using LLM models like GPT-4.1 at enterprise email volumes incurs prohibitive computational costs and hinders real-world deployment. We explore the trade-off space of using alternative labeling schemes as opposed to GPT4.1 scale LLMs, with the goal of achieving near GPT level labeling quality with significantly lower cost. We develop Argo, an enterprise email labeling framework, where we construct a profiler to efficiently search the cost quality trade-off space of labeling and identify cost-efficient alternatives to labeling emails. Additionally, we design an on-demand provisioning scheme to intelligently scale Argo with real time load, to minimize cost increases during peak load inference. Over 3 open-source email datasets, Argo achieves 148-167X inference cost reduction with negligible quality degradation and 20-640000X lower profiling costs, making large-scale, context-aware email labeling practical for enterprises.
comment: 15 pages, 19 figures
CTFExplorer: Evaluating LLM Offensive Agents Through Multi-Target Web CTF Benchmarking
Existing benchmarks for LLM-based offensive security agents use isolated, single-target setups with a known vulnerable service and fixed objective. They measure exploitation effectively, but miss how real Capture-the-Flag (CTF) participants triage unknown surfaces, prioritize targets, and allocate effort under uncertainty. Current evaluations therefore fail to assess strategic reasoning beyond exploitation alone. To address this, we introduce \textit{CTFExplorer}, a benchmark suite that shifts offensive security evaluation toward a multi-target setting, which tests how agents explore, prioritize, and chain attacks. CTFExplorer deploys 40 web-based vulnerable services within a single environment, where agents must autonomously discover, distinguish, and exploit targets without predefined guidance. We also present a reactive multi-agent setup as a reference agent framework and develop an agent-agnostic evaluation framework that records structured reasoning traces for fine-grained assessment. This enables behavioral evaluation beyond binary flag capture, such as how agents manage target selection, handle failed hypotheses, coordinate across multiple stages, and extract security intelligence.
Distributed Non-Uniform Scaling Control of Multi-Agent Formation via Matrix-Valued Constraints
Distributed formation maneuver control refers to the problem of maneuvering a group of agents to change their formation shape by adjusting the motions of partial agents, where the controller of each agent only requires local information measured from its neighbors. Although this problem has been extensively investigated, existing approaches are mostly limited to uniform scaling transformations. This article proposes a new type of local matrix-valued constraints, via which non-uniform scaling control of position formation can be achieved by tuning the positions of only two agents (i.e., leaders). Here, the non-uniform scaling transformation refers to global scaling the position formation with different ratios along different orthogonal coordinate directions. Moreover, by defining scaling and translation of attitudes, we propose a distributed control scheme for scaling and translation maneuver control of joint position-attitude formations. It is proven that the proposed controller achieves global convergence, provided that the sensing graph among agents is a 2-rooted bidirectional graph. Compared with the affine formation maneuver control approach, the proposed approach leverages a sparser sensing graph, requires fewer leaders, and additionally enables scaling transformations of the attitude formation. A simulation example demonstrates our theoretical results.
DRAMA: Next-Gen Dynamic Orchestration for Resilient Multi-Agent Ecosystems in Flux
Multi-agent systems (MAS) have demonstrated significant effectiveness in addressing complex problems through coordinated collaboration among heterogeneous agents. However, real-world environments and task specifications are inherently dynamic, characterized by frequent changes, uncertainty, and variability. Despite this, most existing MAS frameworks rely on static architectures with fixed agent capabilities and rigid task allocation strategies, which greatly limits their adaptability to evolving conditions. This inflexibility poses substantial challenges for sustaining robust and efficient multi-agent cooperation in dynamic and unpredictable scenarios. To address these limitations, we propose DRAMA: a Dynamic and Robust Allocation-based Multi-Agent System designed to facilitate resilient collaboration in rapidly changing environments. DRAMA features a modular architecture with a clear separation between the control plane and the worker plane. Both agents and tasks are abstracted as resource objects with well-defined lifecycles, while task allocation is achieved via an affinity-based, loosely coupled mechanism. The control plane enables real-time monitoring and centralized planning, allowing flexible and efficient task reassignment as agents join, depart, or become unavailable, thereby ensuring continuous and robust task execution. The worker plane comprises a cluster of autonomous agents, each with local reasoning, task execution, the ability to collaborate, and the capability to take over unfinished tasks from other agents when needed.
Learning Incentive Structures for Cooperative Resilience in Multi-Agent Systems under Social Dilemmas
Multi-agent social dilemmas, such as the tragedy of the commons, capture settings where individual incentives conflict with collective well-being, making these systems highly vulnerable to collapse under disruptions. In this context, this work studies cooperative resilience, understood as the system-level ability to maintain collective well-being under perturbations through adaptive agent behavior. We propose a framework for learning incentive structures aligned with collective well-being in multi-agent reinforcement learning systems, where reward functions shape individual decision-making and collective behavior. A resilience metric is used to score and rank agent trajectories, allowing the inference of reward functions that promote resilient collective behavior. These inferred reward functions are integrated into the multi-agent reinforcement learning process to shape agent interactions in social dilemma settings. The approach is evaluated in resource-sharing environments subject to disruptions, using three incentive structures: individual incentives, resilience-aligned incentives, and a hybrid incentive structure that combines both individual and collective components. The results show that the hybrid incentive structure promotes sustained collective behavior, reduces collapse events associated with resource depletion, and preserves system performance under disruption. These findings highlight the role of incentive design as a mechanism for promoting resilient collective behavior and provide a computational framework for multi-agent social dilemmas under disruptions.
comment: Supplementary material in https://github.com/mavivi95/supplementary_files/blob/main/Learning_TCSS___Supplementary_File__AN_.pdf Updated version submitted to IEEE Transactions on Computational Social Systems (TCSS). This preprint is under review for possible publication in IEEE
Proportional Selection in Networks IJCAI'26
We address the problem of selecting $k$ representative nodes from a network, aiming to achieve two objectives: identifying the most influential nodes and ensuring the selection proportionally reflects the network's diversity. We propose two approaches to accomplish this, analyze them theoretically, and demonstrate their effectiveness through a series of experiments.
comment: This version has been accepted for publication at IJCAI'26
MASFactory: A Graph-centric Framework for Orchestrating LLM-Based Multi-Agent Systems with Vibe Graphing ACL 2026
Large language model-based (LLM-based) multi-agent systems (MAS) are increasingly used to extend agentic problem solving via role specialization and collaboration. MAS workflows can be naturally modeled as directed computation graphs, where nodes execute agents or sub-workflows and edges encode dependencies and message passing. However, implementing complex graph workflows in current frameworks still requires substantial manual effort, offers limited reuse, and makes it difficult to integrate heterogeneous external context sources. To overcome these limitations, we present MASFactory, a graph-centric framework for orchestrating LLM-based MAS. It introduces Vibe Graphing, a human-in-the-loop approach that compiles natural-language intent into an editable workflow specification and then into an executable graph. In addition, the framework provides reusable components, skill support, multimodal message handling, and pluggable context integration, as well as a visualizer for topology preview, runtime tracing, and human-in-the-loop interaction. We evaluate MASFactory on seven public benchmarks, validating both reproduction consistency for representative MAS methods and the effectiveness of Vibe Graphing. Our code (https://github.com/BUPT-GAMMA/MASFactory, licensed under Apache-2.0) and video demonstration (https://youtu.be/ANynzVfY32k) are publicly available.
comment: Accepted to the ACL 2026 Demo Track. Camera-ready version. 10 pages, 6 figures. Code and documentation are available at: https://github.com/BUPT-GAMMA/MASFactory
MonoScale: Scaling Multi-Agent System with Monotonic Improvement
In recent years, LLM-based multi-agent systems (MAS) have advanced rapidly, using a router to decompose tasks and delegate subtasks to specialized agents. A natural way to expand capability is to scale up the agent pool by continually integrating new functional agents or tool interfaces, but naive expansion can trigger performance collapse when the router cold-starts on newly added, heterogeneous, and unreliable agents. We propose MonoScale, an expansion-aware update framework that proactively generates a small set of agent-conditioned familiarization tasks, harvests evidence from both successful and failed interactions, and distills it into auditable natural-language memory to guide future routing. We formalize sequential augmentation as a contextual bandit and perform trust-region memory updates, yielding a monotonic non-decreasing performance guarantee across onboarding rounds. Experiments on GAIA and Humanity's Last Exam show stable gains as the agent pool grows, outperforming naive scale-up and strong-router fixed-pool baselines.
Beyond the Black Box: Interpretability of Agentic AI Tool Use
AI agents are promising for high-stakes enterprise workflows, but dependable deployment remains limited because tool-use failures are difficult to diagnose and control. Agents may skip required tool calls, invoke tools unnecessarily, or take actions whose consequence becomes visible only after execution. Existing observability methods are mostly external: prompts reveal correlations, evaluations score outputs, and logs arrive only after the model has already acted. In long-horizon settings, these failures are especially costly because an early tool mistake can alter the rest of the trajectory, increase token consumption, and create downstream safety and security risk. We introduce a mechanistic-interpretability toolkit built on Sparse Autoencoders (SAEs) and linear probes. The framework reads model states before each action and infers both whether a tool is needed and how consequential the next tool action is likely to be. By decomposing activations into sparse features, it identifies the internal layers and features most associated with tool decisions and tests their functional importance through feature ablation. We train the probes on multi-step trajectories from the NVIDIA Nemotron function-calling dataset and apply the same workflow to GPT-OSS 20B and Gemma 3 27B models. The goal is not to replace external evaluation, but to add a missing layer: visibility into what the model signaled internally before action. This helps surface deeper causes of agent failure, especially in long-horizon runs where an early mistake can reshape the rest of the agentic interaction. More broadly, the paper shows how mechanistic interpretability can support practical internal observability for monitoring tool calls and risk in agent systems.
comment: 12 pages, 4 figures, 17 tables
Context-Mediated Domain Adaptation in Multi-Agent Sensemaking Systems
Domain experts possess tacit knowledge that they cannot easily articulate through explicit specifications. When experts modify AI-generated artifacts by correcting terminology, restructuring arguments, and adjusting emphasis, these edits reveal domain understanding that remains latent in traditional prompt-based interactions. Current systems treat such modifications as endpoint corrections rather than as implicit specifications that could reshape subsequent reasoning. We propose context-mediated domain adaptation, a paradigm where user modifications to system-generated artifacts serve as implicit domain specification that reshapes LLM-powered multi-agent reasoning behavior. Through our system Seedentia, a web-based multi-agent framework for sense-making, we demonstrate bidirectional semantic links between generated artifacts and system reasoning. Our approach enables specification bootstrapping where vague initial prompts evolve into precise domain specifications through iterative human-AI collaboration, implicit knowledge transfer through reverse-engineered user edits, and in-context learning where agent behavior adapts based on observed correction patterns. We present results from an evaluation with domain experts who generated and modified research questions from academic papers. Our system extracted 46 domain knowledge entries from user modifications, demonstrating the feasibility of capturing implicit expertise through edit patterns, though the limited sample size constrains conclusions about systematic quality improvements.
Systems and Control (EESS)
Fully Actuated Manifold Constraint Based Output Feedback Control for Input-Constrained Uncertain Nonlinear Systems
This paper presents a low-complexity, model-free, output-feedback controller for a class of unknown time-varying nonlinear systems with unknown input constraints. The controller achieves the preset control accuracy when the actuator is not saturated and maintains flexible control accuracy after actuator saturation. This result extends existing constraint control methods for linear manifolds to a more general form, including the construction of nonlinear manifolds and various types of constraints, thereby achieving preset control accuracy within finite or fixed time. Additionally, flexible control under unknown saturation is achieved through the construction of an error-driven flexible constraint. Finally, second-order and higher-order control examples and simulations are provided.
comment: 22 pages, 12 figures, 2 tables
Output Feedback Control of Linear Time-Invariant Systems with Operational Constraints
This paper introduces a systematic method for designing robust linear controllers using output feedback in the presence of operational constraints. The design uses Nagumo's Theorem and the Comparison Lemma to guarantee constraint satisfaction, while incorporating min-norm optimal control principles inspired by Control Barrier Functions. The resulting controller is a continuous piecewise-linear output feedback policy that preserves the closed-loop system's analyzability using linear systems theory. Due to the linear control design, multi-input multi-output (MIMO) robustness margins can be derived with and without active operational constraints. This paper shows that operational constraints on the system's state can be satisfied using an observer-based output feedback control design. Through flight control trade studies, we demonstrate the practical relevance of the framework in safety-critical aircraft control applications.
Grid-Aware Peer-to-Peer Energy Trading: A Learning-Augmented Framework
Distribution networks are transitioning from passive to active systems due to the growing integration of distributed energy resources (DERs). Peer to Peer (P2P) energy trading has emerged as a viable framework that enables local energy exchange among participants, represented here as aggregated microgrids (MGs). Incorporating network constraints is essential to ensure that P2P transactions remain physically feasible and consistent with grid's operating limits. However, existing P2P frameworks still lack advanced predictive mechanisms that allow prosumers to anticipate network feasibility or the distribution system operator (DSO) response during trade formulation. This paper proposes a learning augmented P2P and DSO interface that predicts the DSOs response to the proposed P2P trades, allowing prosumers to self-assess and refine their trading decisions. A supervised transformer based regression model is trained to enable MGs to locally predict the DSOs response without sharing their proposed trades, thereby reducing transaction overhead, alleviating DSO burden, and preserving information privacy. The proposed framework is validated on the modified IEEE 33 bus distribution power system with interconnected microgrids. Case studies are presented to validate the effectiveness of the proposed model in terms of market efficiency, trade acceptance and computational burden.
Beyond Nonlinear Small-Gain Design: DADS with Partial-State Feedback
Eduardo Sontag and coauthors studied Input-to-Output Stability (IOS) and the output asymptotic gain property. These notions changed control theory and recently had an impact on robust adaptive control through the Deadzone-Adapted Disturbance Suppression (DADS) control scheme. Moreover, recently the notion of IOS was extended to systems described by Partial Differential Equations (PDEs). In this work, we celebrate Eduardo Sontag by combining DADS and IOS for PDEs: we study the partial-state regulation problem for a scalar Ordinary Differential Equation (ODE) which is interconnected with a possibly infinite-dimensional system. In such a case the DADS control scheme can allow an escape from the requirements of the small-gain theorem that is mainly used for partial-state feedback. We show the design procedure of partial-state DADS controllers and we prove robust regulation even in the presence of external inputs (disturbances) without assuming knowledge of any disturbance/parameter bounds. The DADS controller is applied to three different cases of the interconnection of an ODE with an almost completely unknown: (a) heat PDE, (b) transport PDE, and (c) wave PDE with viscous damping. We show that the same DADS controller can achieve robust regulation in all three cases.
comment: 30 pages, 4 figures
Reinforcement Learning-based Control via Y-wise Affine Neural Networks: Comparative Case Studies for Chemical Processes
In this work we present an efficient and practically implementable approach for the application of reinforcement learning (RL)-based control in chemical process systems. This is an area that has yet to widely adopt RL-based control largely due to inherent challenges in trusting RL algorithms and the time-consuming process of training reliable agents. To address these challenges, we leverage a class of RL algorithms termed Y-wise Affine Neural Network (YANN)- RL, which we have developed in our prior work (Braniff and Tian, 2025a). By strategically initializing actor and critic networks YANN-RL algorithms provide confident and interpretable starting points within control schemes. We apply this RL-based control approach to three different process engineering case studies publicly available on the PC-Gym library (Bloor et al., 2026): (i) a continuous stirred tank reactor (CSTR), (ii) a four-tank system, and (iii) a multistage extraction column. Our approach is compared to several popular RL algorithms (PPO, SAC, DDPG, and TD3) and is benchmarked against nonlinear model predictive control (NMPC). These case studies demonstrate that YANN-RL can greatly reduce the training time and data needed, can be deployed with confidence for chemical process systems, and can approach the performance of NMPC without the knowledge of a full nonlinear model.
comment: Accepted for publication at the 23rd IFAC World Congress, 2026
Collaborative Optimization of Battery Charging / Swapping Stations for eVTOLs Based on Closed-Loop Supply Chain and Space-Time Network
Against the backdrop of the burgeoning global low-altitude economy, countries have successively introduced a series of policies to accelerate the application and commercialization of electric vertical take-off and landing (eVTOL) aircraft. Nevertheless, purely electric eVTOLs confront constraints including limited battery energy density, high operational power requirements, and challenges associated with rapid energy replenishment, which collectively restrict their flight endurance and application scenarios. Furthermore, while eVTOL deployment is scaling up, supporting charging infrastructure and regulations remain underdeveloped. This situation presents emerging power distribution networks with new challenges in maintaining adequate electricity supply and ensuring operational continuity. To tackle these issues, following an investigation into battery energy replenishment strategies, a closed-loop supply chain-based model for eVTOL battery charging and swapping is proposed. Time-space network methods are utilized to characterize the scheduling of batteries and logistics throughout the system. Subsequently, aiming to maximize the operational revenue of the model, optimized management of battery swapping, transportation, and charging processes is implemented, facilitating coordinated operation among eVTOLs, swapping stations, and charging stations. Finally, the model is solved by Gurobi, verifying its feasibility. Simulation results further indicate that the model alleviates range anxiety for eVTOLs, offering strong support for their commercialization. Moreover, it enables coordinated scheduling between eVTOLs and the distribution network, thereby facilitating the network's gradual improvement and upgrading.
Coordinated Optimal Power Quality Management in Distribution Systems Using The Residual Capacity of Community IBRs
This letter proposes a network-wide coordinated optimization model to mitigate voltage unbalance (VU) by unleashing the remaining capacity of community inverter-based resources (IBRs). Existing single-sequence strategies ignore coupled capacity constraints and cause idle headroom. Meanwhile, they fail to harness the collective governance capabilities of community IBRs. To solve this discrepancy and exploit the unused potential, we developed a sequence-domain network model in dual commonly shared synchronous reference frames. Strict phase current and apparent power limits are formulated and convexified via polyhedral approximations. A quadratic objective function flexibly balances sequence capacity allocation. Simulation and experimental results validate the effectiveness of the proposed strategy.
LoRa and LoRaWAN simulator-cum-emulator with CAD and capture effect in Python
Existing LoRaWAN/LoRa simulators consist of large, complicated C++ codebases and often do not support all device classes. This paper presents the design of a simple to use, Python-based discrete-event simulator that addresses these gaps while also introducing a novel method for evaluating real device firmware in the simulator. The simulator is built on a custom asyncio-based simulation kernel, a three-phase packet delivery model that reproduces the capture effect, a full LoRaWAN 1.0.4 stack, and a containerized firmware system that cross-compiles real STM32 C firmware and redirects HAL calls into the simulator via CFFI. The simulator is distributed as a Python package via Github (https://github.com/MatthijsReyers/lora-simulator) and requires no external simulation framework or dependencies.
comment: Totally 11 Pages; Github link ncluded
Scaled Graph Bounding Techniques for Reset Systems
Reset systems can overcome fundamental limitations of linear time-invariant control. The recently introduced notion of scaled (relative) graphs provides a promising framework for developing graphical analysis and design tools for reset systems, in line with widely adopted loopshaping methods for linear systems. The aim of this paper is to derive techniques for over-bounding the scaled graph of reset systems, and obtain insights in their accuracy. We exploit connections between quadratic dissipativity and scaled graphs to recast the over-bounding problem as the search for piecewise quadratic storage functions. Using specific sampling techniques, we reveal a fundamental limitation of general scaled graph approximation methods that are based on quadratic dissipativity.
comment: 6 pages, 5 figures, To appear in 23rd IFAC World Congress Busan South Korea 2026
Benchmarking Empirical and Learning-Based Approaches for Feedforward Steering Control in Autonomous Racing SC 2026
Feedforward steering control is a key component of hierarchical control architectures for autonomous racing. The goal is to reduce steering corrections from the feedback controllers by predicting the vehicle's inverse lateral dynamics. This paper presents a systematic benchmark of two learning-based and two empirical (analytical) feedforward steering controllers. We introduce a new \acf{ehd} formulation based on a polynomial surface fit that captures velocity-dependent nonlinear steering behavior with minimal parametrization. We test the feedforward controllers in a high-fidelity simulation framework based on the real-world Abu Dhabi Autonomous Racing League competition, using a high-fidelity double-track vehicle dynamics simulator. Open-loop evaluation shows that the learning-based controllers achieve the lowest prediction errors; however, closed-loop testing reveals that this improved accuracy does not translate into superior path tracking performance or lap times, even after iterative fine-tuning. In contrast, the proposed EHD approach achieves the best overall closed-loop robustness and lap time, highlighting the necessity of evaluating feedforward strategies within the complete trajectory planning and control software stack. Our code is available at https://github.com/TUMRT/steering_ff_control.
comment: 8 pages, 12 figures, Accepted to be published as part of the 2026 IEEE International Conference on Intelligent Transportation Systems (ITSC 2026), Naples, Italy, September 15-18, 2026
Runtime-Certified Bounded-Error Quantized Attention
KV cache quantization reduces the memory cost of long-context LLM inference, but introduces approximation error that is typically validated only empirically. Existing systems rely on average-case robustness, with no mechanism to detect or recover from failures at runtime. We present a tiered KV cache architecture that enables runtime-certified attention: INT8 keys and INT4 values are stored in GPU memory, while FP16 originals are retained in system RAM for deterministic fallback. A two-term error decomposition yields per-head, per-step bounds on (i) attention distribution distortion from key quantization and (ii) value reconstruction error. These bounds are computed online and used to drive adaptive precision selection and a multi-stage fallback ladder, which guarantees recovery to the exact dense attention output when required. Across PG-19, NIAH, and RULER benchmarks on LLaMA~3.1-8B with contexts up to 128K, the system matches dense FP16 KV quality within noise for language modelling and retrieval tasks, while recovering catastrophic failures observed in naive INT8/INT4 baselines. Value-sensitive tasks at short context expose a controlled trade-off between compression and fidelity, which can be eliminated via tighter value tolerances or FP16-value fallback. The certification is local (per-head, per-step) and does not guarantee end-to-end model correctness, but ensures that each attention computation is either bounded relative to an FP16 reference or exactly recovered via fallback. This reframes KV cache quantization as a runtime-verified computation rather than a fixed approximation. The goal is not raw speedups, but enabling safe deployment of aggressive KV compression under strict quality constraints.
comment: 32 pages, 1 figure
PACD-Net: Pseudo-Augmented Contrastive Distillation for Glycemic Control Estimation from SMBG
Effective diabetes management requires continuous monitoring of glycemic levels. Clinically, glycemic control is assessed using metrics such as Time in Range (TIR), Time Below Range (TBR), and Time Above Range (TAR), typically derived from continuous glucose monitoring (CGM). However, many patients rely on self-monitoring of blood glucose (SMBG) due to the high cost and limited accessibility of CGM. Unlike CGM, SMBG provides sparse and irregular measurements, making accurate estimation of these metrics challenging. Conventional supervised learning approaches struggle under such sparsity, leading to poor generalization and unstable performance. To address this, we propose PACD-Net, a self-supervised contrastive knowledge distillation framework for estimating glycemic control from SMBG. Pseudo-SMBG samples with richer temporal coverage are used as teacher signals to guide learning from sparse observations. In addition, multi-view contrastive learning enforces representation consistency across diverse sampling patterns. The model adopts a hybrid Swin Transformer-CNN backbone to capture temporal dependencies in sparse SMBG sequences. Experimental results demonstrate that PACD-Net consistently outperforms existing methods in estimating TAR, TIR, and TBR from real-world SMBG data, achieving improved accuracy as well as enhanced stability and generalization under extremely sparse observation settings. The proposed framework provides a practical tool for clinical SMBG interpretation and offers a generalizable approach for learning from sparse and irregularly sampled sensor data in broader applications.
Cooling Channel Design Optimization for High Power Multi-chip Packages
Thermal management is a major challenge in next-generation high-performance computing systems, particularly for heterogeneous multi-chip packages such as the NVIDIA GB200 Grace Blackwell Superchip. In this work, a physics-based computational framework is developed to optimize embedded cooling channel layouts for high-power multi-chip modules. The model couples steady-state heat conduction with a porous media-based representation of coolant transport, coupled with a row-wise coolant energy balance, to estimate chip temperature fields within microchannel networks. Unlike conventional designs, an interdigitated cooling architecture is parameterized using geometric variables, including channel count, width, and expansion over chip regions, enabling systematic design exploration. To enable efficient optimization, a surrogate-based approach is employed to approximate the relationship between geometric parameters and temperature metrics. The resulting model is optimized using a mixed-integer quadratic programming algorithm to minimize a weighted objective based on peak and average chip temperatures. To improve physical relevance, channel placement is further constrained to increase cooling coverage near GPU regions, where thermal loads are highest. The framework is applied to a representative multi-chip configuration based on NVIDIA GB200 architecture, consisting of two graphics processing units and one central processing unit. The results demonstrate that the optimal design reduces the peak chip temperature by 140.45°C and the average chip temperature by 35.87°C compared to the baseline configuration.
comment: 9 pages, 8 figures
Distributed and Decentralized Optimization Algorithms via Consensus ALADIN
Distributed optimization has found widespread applications in smart grids, optimal control, and machine learning. This paper studies distributed consensus optimization. We extend the Augmented Lagrangian-based Alternating Direction Inexact Newton (ALADIN) framework to propose Consensus ALADIN (C-ALADIN) with a central coordinator, which directly handles consensus constraints. Our C-ALADIN algorithm admits both a first-order variant and a second-order variant that employs a Hessian approximation, avoiding direct transmission of second-order information while preserving fast local convergence. We then develop a decentralized version of C-ALADIN that operates over directed graphs with quantized communication, using a finite-time coordination protocol. For both versions, we establish global convergence guarantees for convex problems and local convergence guarantees for non-convex problems. For the decentralized case, the iterates converge to a neighborhood of the optimum determined by the quantization level. Numerical results demonstrate that our methods retain fast convergence while substantially reducing communication and computational costs compared to existing decentralized approaches.
Time-To-Reach Separation and Safety Filtering for Safe, Fair, and Efficient Multi-Agent Coordination
Advanced Air Mobility (AAM) operations are expected to significantly increase aerial traffic in urban airspace, requiring autonomous traffic management systems to ensure collision-free operations in highly congested environments. In this paper, we propose a multi-agent coordination framework that uses minimum time-to-reach (TTR) as a unifying metric for priority assignment, temporal separation, and safety filtering. We focus on the problem of coordinating multiple aerial vehicles merging into an air corridor while maintaining safe separation between vehicles. Vehicles are assigned arrival-consistent priority based on TTR, and target TTR values are used to enforce temporal spacing that induces spatial separation. A priority-consistent safety filtering layer based on Hamilton-Jacobi reachability value functions ensures collision avoidance while minimally modifying the reference guidance. Simulation results in a highly congested corridor merging scenario show that the proposed method improves safety, fairness, and efficiency compared to time-optimal guidance and priority-agnostic safety filtering.
comment: 9 pages, 3 figures. Extended version (including appendix) of a paper submitted to the 65th IEEE Conf. on Decision and Control (2026)
Secure Coordination for Vertiport Sequencing in Advanced Air Mobility
Advanced air mobility operations will require reliable coordination mechanisms for managing dense traffic near vertiports. However, sequencing decisions may become vulnerable when they rely on potentially falsified self-reported information such as estimated time of arrival. Self-interested vehicles may misreport their arrival times to obtain favorable landing priority, while malicious actors may spoof information to disrupt sequencing decisions or induce unnecessary congestion. This paper studies secure coordination for vertiport sequencing under sensing uncertainty. We consider a coordinator that combines self-reported Remote-ID information with externally obtained surveillance measurements to check reports and assign separation-feasible arrival schedules. Since surveillance-based estimates are uncertain, falsified reports may remain consistent with the sensing uncertainty region and cannot always be rejected outright. We therefore formulate sequencing as a robust design problem over this uncertainty region. Self-interested misreporting is modeled as a strategic deviation that improves the reporting vehicle's own sequencing outcome, whereas malicious spoofing is modeled as an adversarial disturbance that degrades the system-level objective. The final paper will develop robust sequencing rules over surveillance-consistent uncertainty sets and evaluate their performance in representative vertiport sequencing scenarios.
On the Sample Complexity of Discounted Reinforcement Learning with Optimized Certainty Equivalents
We study risk-sensitive reinforcement learning in finite discounted MDPs, where a generative model of the MDP is assumed to be available. We consider a family or risk measures called the optimized certainty equivalent (OCE), which includes important risk measures such as entropic risk, CVaR, and mean-variance. Our focus is on the sample complexities of learning the optimal state-action value function (value learning) and an optimal policy (policy learning) under recursive OCE. We provide an exact characterization of utility functions $u$ for which the corresponding OCE defines an objective that is PAC-learnable. We analyze a simple model-based approach and derive PAC sample complexity bounds. We establish that whenever $u$ does not have full domain $\text{dom}(u)\neq \mathbb{R}$, the corresponding problem is not PAC-learnable. Finally, we establish corresponding lower bounds for both value and policy learning, demonstrating tightness in the size $SA$ of state-action space, and for a more restricted class of utilities, we derive lower bounds that makes the dependence on the effective horizon $\frac{1}{1-γ}$ explicit. Specifically, for $\text{CVaR}_τ$ we show that the correct dependence on $τ$ is $\frac{1}{τ^2}$, thus improving by a factor of $\frac{1}τ$ over state-of-the-art although our bound has a suboptimal dependence on $\frac{1}{1-γ}$.
comment: Accepted to RLC 2026. arXiv admin note: substantial text overlap with arXiv:2506.00286
Learning Altruistic Collaboration in Heterogeneous Multi-Team Systems
This paper studies heterogeneous multi-team collaboration through dynamic robot allocation, where robots are treated as transferable resources. Leveraging Hamilton's rule from ecology as an altruistic decision-making mechanism, we propose a multi-team collaborative resource allocation framework with heterogeneous capabilities, transfer costs, and capability-dependent contributions. The resulting allocation problem is combinatorial and is shown to be NP-hard. To address scalability, we develop a graph neural network policy under centralized training and decentralized execution that approximates the altruistic allocations based on Hamilton's rule. The model operates over the team interaction graph and predicts robot-level transfer decisions and next robot-to-team assignments. The proposed approach is validated in a firefighting scenario through simulations and experiments, demonstrating that the learned policy achieves near-optimal performance while scaling to larger systems.
Mind the Gaps: Multi-Robot Feedback-Driven Ergodic Coverage in Unknown Environments
In this work, we address the problem of multi-robot adaptive coverage, where teams of robots perform dynamic sampling by continuously adjusting their positions to collect data in an environment. This task can be challenging, particularly when robots must be efficiently allocated to new sampling locations over time. Ergodic search methods optimize robot trajectories by ensuring that the robots' time-averaged spatial distribution aligns with the spatial distribution of environmental information. While these methods promote effective exploration provided a target distribution, they often fail to account for unknown prior distributions of the environment. To overcome this limitation, we propose an adaptive coverage strategy that utilizes real-time feedback from an environmental model to adjust robot sampling behavior in response to unknown conditions. Our approach enhances traditional ergodic trajectory optimization by constructing a target spatial information distribution based on parametric models of the environment, which are updated online. This strategy assumes that the environment is either static or changes slowly compared to the robot's motion. Our framework allows robots to dynamically prioritize regions of high interest, improving coverage efficiency, synthesizing effective control policies for individual agents, and optimizing resource use in settings with unknown prior distributions. We validate our approach through simulations, demonstrating its effectiveness in enhancing coverage and resource allocation.
Disturbance Rejection Control under Nested Signal Temporal Logic Specifications: A Recursive Design Approach
This paper investigates the control synthesis for continuous-time uncertain systems under nested Signal Temporal Logic (STL) specifications containing nested temporal operators. Control Barrier Functions (CBFs) are utilized herein to encode STL formulas into system constraints. However, traditional CBF designs fail to encode nested STL formulas, whereas recent reachability analysis-based methods capable of handling such formulas are inapplicable to uncertain systems and suffer from a severe computational burden. To overcome these challenges, a novel recursive CBF design procedure based on a modified STL tree (sTLT) is proposed to yield explicit parameterized CBFs. Within this framework, sliding window variables are introduced to capture complex temporal relationships. Crucially, satisfying the resulting CBF constraints is proven to guarantee the fulfillment of the STL specifications. To render the proposed recursive CBF design applicable to systems subject to uncertain disturbance, a novel controller based on reconstructed CBF using quadratic programming (QP) is proposed, ensuring strict CBF constraint satisfaction under disturbances. In contrast to existing methods, the proposed reconstructed CBF approach requires no prior knowledge of the disturbances while relaxing initial safety assumptions. Simulation results validate the efficacy of the proposed approach.
Motion Design for Grasp-Based Dynamic Locomotion in Microgravity
Locomotion in microgravity often relies on sparsely and irregularly arranged anchors, motivating grasp-based mobility with multiple limbs. In this setting, dynamic locomotion is feasible only through deliberate regulation of both anchored interactions and whole-body coordination under coupled dynamic and kinematic constraints. This paper presents design insights for grasp-based dynamic locomotion with multi-limbed robotic systems in microgravity, targeting scenarios that require 6D limb manipulation to establish contacts with candidate anchors. The investigated design parameters include gait pattern, stride length, locomotion speed, and nominal posture. A parameterizable locomotion planning framework is proposed to support variations of these parameters and to evaluate the resulting locomotion performance in terms of stability and actuation demand. Two representative quadruped morphologies are adopted for evaluation in physics-based simulation. The results demonstrate that enlarging the feasible contact wrench space and attenuating impulsive whole-body dynamics improve locomotion performance. These findings inform strategies for contact configuration selection and whole-body coordination in microgravity locomotion with multi-limbed systems.
DAE-Embedded Neural Control Verification for Shipboard Microgrids under Transient Shocks
Neural control offers strong potential for handling highly nonlinear dynamics in shipboard microgrids (SMGs), yet its black-box nature can trigger abrupt control spikes and actuator saturation during initial transient shocks. This letter devises a formal verification method for SMG neural controller to assess its shock responses. Our contributions include: 1) a set-based SMG differential-algebraic equation(DAE) model compatible with set propagation; 2) a DAE-embedded bound propagation approach to compute tight envelopes of all possible neural control output. Extensive case studies demonstrate the effectiveness of the devised method in formally certifying SMG control performance under uncertain disturbances.
Resilient Energy-Based Control for DC Data Centers under Grid and Load Disturbances
This paper presents a passivity-based control framework for AC-DC converters supplying non-passive Information Technology rack loads in DC data centers. Unlike conventional cascaded proportional-integral controllers that ensure stability only near nominal operating points, the proposed method is derived from the system total energy balance using the Port-Hamiltonian formulation. By shaping the stored energy and injecting virtual damping through a lossless interconnection with a PH controller, the converter behaves as a passive system even when interfaced with non-passive loads or under grid disturbances. The closed-loop system guarantees asymptotic voltage regulation and strict energy dissipation without assuming constant grid voltage or frequency. Simulation studies under realistic load and fault scenarios validate that the proposed controller achieves smaller voltage deviations, faster recovery, and superior robustness, demonstrating its suitability for future high-efficiency DC data-center architectures.
Closed-Loop Sim-to-Real Reinforcement Learning for Deformable Microfiber Shape Control
Autonomous contact-based micromanipulation is challenging because surface and interfacial interactions at the microscale are difficult to model accurately, limiting the use of conventional model-based control and sim-to-real learning. We present a closed-loop sim-to-real reinforcement learning (RL) approach for microfiber shape control on a surface. The central idea is to train geometric shape regulation in a simplified frictionless simulator and rely on real-time visual feedback during deployment to iteratively correct the observed effects of unmodeled surface interactions. An RL policy trained entirely in simulation is transferred directly to a physical dual-gripper micromanipulation system operating at 40 Hz, without retraining or domain adaptation. Using silk microfibers as a testbed, the policy achieves a mean point-wise shape error of 270 $\pm$ 80 $μ$m across twenty-four diverse initial configurations. Across nine specimens covering all combinations of three fiber diameters (50, 80, and 120 $μ$m) and three manipulated lengths (10 mm, 15mm, and 20 mm), the same policy achieves sub-millimeter final shape error without any retraining or retuning. These results show that a policy learned in a simplified simulator can achieve repeatable real-world microfiber shape regulation under surface contact, provided that the task-relevant effects of the sim-to-real mismatch remain observable and correctable within the closed feedback loop.
comment: 7 pages,7 figures
Data-driven approximation of regions of attraction via an LP-based selection of PWA Lyapunov functions
This paper presents a method to approximate regions of attraction of unknown nonlinear dynamical systems from data. Assuming point-wise evaluations of the vector field and known Lipschitz bounds, a polyhedral uncertainty set of admissible dynamics is constructed. This uncertainty description enables the synthesis of a continuous piece-wise affine Lyapunov candidate via a linear program, enforcing a robust decrease condition for all admissible vector fields. The approach allows certification of a region of attraction consistent with the available data. Numerical examples illustrate the effectiveness of the proposed method in extracting certified regions of attraction from sparse data.
Distributed Non-Uniform Scaling Control of Multi-Agent Formation via Matrix-Valued Constraints
Distributed formation maneuver control refers to the problem of maneuvering a group of agents to change their formation shape by adjusting the motions of partial agents, where the controller of each agent only requires local information measured from its neighbors. Although this problem has been extensively investigated, existing approaches are mostly limited to uniform scaling transformations. This article proposes a new type of local matrix-valued constraints, via which non-uniform scaling control of position formation can be achieved by tuning the positions of only two agents (i.e., leaders). Here, the non-uniform scaling transformation refers to global scaling the position formation with different ratios along different orthogonal coordinate directions. Moreover, by defining scaling and translation of attitudes, we propose a distributed control scheme for scaling and translation maneuver control of joint position-attitude formations. It is proven that the proposed controller achieves global convergence, provided that the sensing graph among agents is a 2-rooted bidirectional graph. Compared with the affine formation maneuver control approach, the proposed approach leverages a sparser sensing graph, requires fewer leaders, and additionally enables scaling transformations of the attitude formation. A simulation example demonstrates our theoretical results.
Learning Dynamics from Infrequent Output Measurements for Uncertainty-Aware Optimal Control
Reliable optimal control is challenging when the dynamics of a nonlinear system are unknown and only infrequent, noisy output measurements are available. This work addresses this setting of limited sensing by formulating a Bayesian prior over the continuous-time dynamics and latent state trajectory in state-space form and updating it through a targeted Metropolis-Hastings sampler equipped with a numerical ODE integrator. The resulting posterior samples are used to formulate a scenario-based optimal control problem that accounts for the uncertainty in the dynamics and latent state and is solved using standard nonlinear programming methods. The approach is validated in a numerical case study on glucose regulation using a Type 1 diabetes model.
comment: Accepted for publication in the Proceedings of the 2026 IFAC World Congress
On Integrating Resilience and Human Oversight into LLM-Assisted Modeling Workflows for Digital Twins
LLM-assisted modeling holds the potential to rapidly build executable Digital Twins of complex systems from only coarse descriptions and sensor data. However, resilience to LLM hallucination, human oversight, and real-time model adaptability remain challenging and often mutually conflicting requirements. We present three critical design principles for integrating resilience and oversight into such workflows, derived from insights gained through our work on FactoryFlow - an open-source LLM-assisted framework for building simulation-based Digital Twins of manufacturing systems. First, orthogonalize structural modeling and parameter fitting. Structural descriptions (components, interconnections) are LLM-translated from coarse natural language to an intermediate representation (IR) with human visualization and validation, which is algorithmically converted to the final model. Parameter inference, in contrast, operates continuously on sensor data streams with expert-tunable controls. Second, restrict the model IR to interconnections of parameterized, pre-validated library components rather than monolithic simulation code, enabling interpretability and error-resilience. Third, and most important, is to use a density-preserving IR. When IR descriptions expand dramatically from compact inputs hallucination errors accumulate proportionally. We present the case for Python as a density-preserving IR : loops express regularity compactly, classes capture hierarchy and composition, and the result remains highly readable while exploiting LLMs strong code generation capabilities. A key contribution is detailed characterization of LLM-induced errors across model descriptions of varying detail and complexity, revealing how IR choice critically impacts error rates. These insights provide actionable guidance for building resilient and transparent LLM-assisted simulation automation workflows.
Analytical PI Tuning for Second-Order Plants with Monotonic Response and Minimum Settling Time
This study presents two analytical closed-form PI controller tuning solutions for second-order plants with real poles, each achieving monotonic step response and minimum settling time. The first solution employs pole-zero cancellation, placing the controller zero at the slower plant pole and reducing the closed-loop dynamics to a critically damped second-order system. The second solution, applicable when the plant pole ratio is less than two, places all three closed-loop poles at a common location without cancelling any plant pole, yielding a closed-loop transfer function with a triple real pole and a zero. Despite retaining a closed-loop zero, this solution achieves strictly faster settling time than the pole-zero cancellation method in its region of applicability. The two solutions coincide at the boundary pole ratio of two and together form a continuous piecewise-analytical tuning covering the full range of plant pole ratios. This study further establishes that closed-loop transfer functions of the form a^n/(s + a)^n possess a maximum sensitivity Ms together with phase margin and gain margin that are independent of the pole location a and depend solely on the order n, yielding universal robustness constants for each n. A closed-form expression GM(n) = 1 + sec^n(pi/n) is established for the gain margin of the family. Numerical verification confirms the analytical results across multiple plant configurations.
comment: 7 figures
Safe Bayesian Optimization for Uncertain Correlation Matrices in Linear Models of Co-Regionalization
This paper extends safety guarantees for multi-task Bayesian optimization with uncertain co-regionalization matrices from intrinsic co-regionalization models to linear models of co-regionalization. The latter allows for more flexible modeling of the inter-task correlations by composing multiple features. We derive uniform error bounds for vector-valued functions sampled from a Gaussian process with a linear model of co-regionalization kernel. Furthermore, we show the potential performance gains of linear models of co-regionalization in a numerical comparison on a safe multi-task Bayesian optimization benchmark.
comment: Accepted at IFAC WC26
Multi-Axis Additive Manufacturing for Customized Automotive Components
The reproduction of automobile components through additive manufacturing presents significant geometric challenges, as many automotive parts feature complex, organically shaped surfaces that are difficult to fabricate accurately using conventional 3D printing approaches without wasteful support structures. Multi-axis Digital Light Processing (DLP) 3D printing addresses this by orienting a robotic arm to cure resin layers at varying angles and positions, enabling the fabrication of geometries that fixed-axis systems cannot reliably reproduce. However, this flexibility introduces a key challenge: layers printed at non-orthogonal orientations exhibit non-uniform thickness across their cross-section, which traditional DLP systems cannot accommodate without subdividing the layer, increasing total layer count, print time, and the need for supporting structures. This paper introduces a variable exposure method to address this challenge. Rather than splitting a non-uniform layer into multiple uniform ones, our approach divides each layer into sublayers and modulates the UV illumination duration for each sublayer proportionally to its local thickness. This is governed by an established cure-depth equation relating exposure time to material penetration depth, allowing precise control over curing without additional hardware. The result is a meaningful reduction in total layer count for printed objects. Fewer layers directly translates to faster print times and a reduction in wasteful support structures. Our contribution is a practical and low-overhead extension to existing multi-axis DLP pipelines that improves print efficiency without sacrificing geometric accuracy, with clear applications in the rapid prototyping and reproduction of automotive components.
comment: 6 pages, 4 figures
Emissions and cost tradeoffs of time-matched clean electricity procurement under inter-annual weather variability -- case study of hydrogen production
Regulators and voluntary corporate sustainability efforts are increasingly adopting time-matching requirements (TMRs) for clean electricity procurement for large loads, such as data centers, and electricity-intensive fuel production, such as hydrogen. We use a stochastic capacity expansion model (CEM) framework to assess how inter-annual weather variability affects the cost, composition, and emissions of procurement-driven infrastructure to meet annual and hourly TMRs using the case study of a grid-connected hydrogen producer in Texas. Our approach, which relies on co-optimizing investments and hourly operations over nine weather scenarios, reveals that hourly TMR comes at a higher cost premium compared to annual TMR than previously estimated by single-scenario deterministic modeling, while emissions outcomes remain directionally consistent. Demand flexibility and partial hourly TMR (80-90%) lower the cost premium while preserving emissions benefits. We further examine how binding renewable portfolio standards (RPS) interact with TMR costs and emissions outcomes. When an RPS is applied to non-H2 electricity demand, annual TMR reduces emissions comparably to hourly TMR at a lower cost. Incorporating H2-related electricity demand directly into the RPS constraint, rather than imposing a separate TMR, achieves similar emissions outcomes at still lower cost, suggesting that TMR-based clean electricity procurement, particularly hourly matching, offers limited additional value in regions with stringent grid decarbonization policies.
comment: 7 Figures, 1 table (main text)
Verifiable Error Bounds for Physics-Informed Neural Network Solutions of Lyapunov and Hamilton-Jacobi-Bellman Equations
Many core problems in nonlinear systems analysis and control can be recast as solving partial differential equations (PDEs) such as Lyapunov and Hamilton-Jacobi-Bellman (HJB) equations. Physics-informed neural networks (PINNs) have emerged as a promising mesh-free approach for approximating their solutions, but in most existing works there is no rigorous guarantee that a small PDE residual implies a small solution error. This paper develops verifiable error bounds for approximate solutions of Lyapunov and HJB equations, with particular emphasis on PINN-based approximations. For both the Lyapunov and HJB PDEs, we show that a verifiable residual bound yields relative error bounds with respect to the true solutions as well as computable a posteriori estimates in terms of the approximate solutions. For the HJB equation, this also yields certified upper and lower bounds on the optimal value function on compact sublevel sets and quantifies the optimality gap of the induced feedback policy. We further show that one-sided residual bounds already imply that the approximation itself defines a valid Lyapunov or control Lyapunov function. We illustrate the results with numerical examples.
comment: The paper will appear in the IEEE Control Systems Letters
Two-Level Distributed Interference Management for Large-Scale HAPS-Empowered vHetNets
High altitude platform stations (HAPS) offer a promising solution for achieving ubiquitous connectivity in next-generation wireless networks (xG). Integrating HAPS with terrestrial networks, creating HAPS-empowered vertical heterogeneous networks (vHetNets), significantly improves coverage and capacity and supports emerging novel use cases. In HAPS-empowered vHetNets, HAPS and terrestrial network tiers can share the same spectrum, forming harmonized spectrum vHetNets that enhance spectral efficiency (SE). However, harmonized spectrum vHetNets face major challenges, including severe co-channel interference and scalability in large-scale deployments. To address the first challenge, we adopt a cell-free multiple-input multiple-output (MIMO) network architecture in which users are simultaneously served by multiple base stations using beamforming. However, beamforming weight design leads to a nonconvex, high-dimensional optimization problem, highlighting the scalability challenge. To address this second challenge, we develop a two-level distributed proportional fairness beamforming weight design (PFBWD) algorithm. This algorithm combines the augmented Lagrangian method (ALM) with a three-block ADMM framework. Simulation results demonstrate the performance improvements achieved by integrating HAPS with standalone terrestrial networks, as well as the reduced complexity and signaling overhead of the distributed algorithm compared to centralized algorithms.
Secure Parameter Identification for Multi-Participant ARX Systems via CKKS Cryptosystem-Based Proxy Re-Encryption
This paper investigates the parameter identification for multi-participant autoregressive exogenous input (ARX) systems while protecting the system input and output. To do so, the discrete Gaussian noise in the standard Cheon-Kim-Kim-Song (CKKS) cryptosystem is replaced with a truncated one. By using the CKKS cryptosystem with the truncated discrete Gaussian noise and the key-switching technique, a proxy re-encryption scheme is developed. Based on this scheme, a secure parameter identification algorithm is proposed for multi-participant ARX systems. By rigorously proving that the statistical distance between the discrete Gaussian noise and the truncated one is negligible, the polynomial-time reduction between the standard Ring-Learning with Errors (RLWE) problem and the RLWE problem with the truncated discrete Gaussian noise is established. This result ensures the indistinguishability under chosen-plaintext attacks (IND-CPA) security of the algorithm. By giving a lower bound condition on the size of the plaintext space, the computational overflow in encryption is avoided. Based on this condition, the mean square convergence and convergence rate of the algorithm are given. The trade-off between the security level and the convergence of the algorithm is presented. Finally, a numerical example is given to verify the effectiveness of the algorithm.
A PAC-Bayes Approach for Controlling Unknown Linear Discrete-time Systems
This paper presents a PAC-Bayes framework for learning controllers for unknown stochastic linear discrete-time systems, where the system parameters are drawn from a fixed but unknown distribution. We derive a data-dependent high probability bound on the performance of any learned (stochastic) controller, and propose novel efficient learning algorithms with theoretical guarantees, which can be implemented for both finite and infinite controller spaces. Compared to prior work, our bound holds for unbounded quadratic cost. In the special case where LQG is optimal, our numerical results suggest that the learned controllers achieve comparable performance to LQG.
comment: 10 pages, 3 figures, IFAC 2026 conference
Parallel OctoMapping: A Scalable Framework for Enhanced Path Planning in Autonomous Navigation
Mapping is essential in robotics and autonomous systems because it provides the spatial foundation for path planning. Efficient mapping enables planning algorithms to generate reliable paths while ensuring safety and adapting in real time to complex environments. Fixed-resolution mapping methods often produce overly conservative obstacle representations that lead to suboptimal paths or planning failures in cluttered scenes. To address this issue, we introduce Parallel OctoMapping (POMP), an efficient OctoMap-based mapping technique that maximizes available free space and supports multi-threaded computation. To the best of our knowledge, POMP is the first method that, at a fixed occupancy-grid resolution, refines the representation of free space while preserving map fidelity and compatibility with existing search-based planners. It can therefore be integrated into existing planning pipelines, yielding higher pathfinding success rates and shorter path lengths, especially in cluttered environments, while substantially improving computational efficiency.
$π$MPC: A Parallel-in-horizon and Construction-free NMPC Solver
The alternating direction method of multipliers (ADMM) has gained increasing popularity in embedded model predictive control (MPC) due to its code simplicity and pain-free parameter selection. However, existing ADMM solvers either target general quadratic programming (QP) problems or exploit sparse MPC formulations via Riccati recursions, which are inherently sequential and therefore difficult to parallelize for long prediction horizons. This technical note proposes a novel \textit{parallel-in-horizon} and \textit{construction-free} nonlinear MPC algorithm, termed $π$MPC, which combines a new variable-splitting scheme with a velocity-based system representation in the ADMM framework, enabling horizon-wise parallel execution while operating directly on system matrices without explicit MPC-to-QP construction. Numerical experiments and accompanying code are provided to validate the effectiveness of the proposed method.
comment: 8 pages
Robotics
Hamilton--Jacobi Reachability for Spacecraft Collision Avoidance
This article presents a Hamilton--Jacobi (HJ) reachability framework for a two--satellite collision avoidance problem operating in the same circular orbit, where relative motion is modeled in the radial--tangential--normal (RTN) frame using planar Hill--Clohessy--Wiltshire (HCW) dynamics. We define the target state space as unsafe relative configurations in the orbit plane corresponding to minimum separation requirements consistent with Federal Communications Commission (FCC) orbital standards. The interaction between spacecraft is formulated as a zero--sum differential game, where Player 1 is the controlled satellite and Player 2 is modeled as a bounded adversarial disturbance with unknown intent. We present the HJ formulation and compute backward reachable sets that characterize relative states from which collision cannot be avoided under worst-case disturbances, while states outside this set admit provably collision-free trajectories. These reachable sets are integrated with supervisory hybrid control logic to determine when evasive maneuvers must be initiated, enabling mathematically grounded safety guarantees for scalability.
comment: Accepted to the 20th IEEE International Conference on Control & Automation (IEEE ICCA 2026). 6 pages, 4 figures
Topology-Optimized Pneumatic Soft Actuator: Design and Experimental Validation
This paper demonstrates the computational design of soft elastomeric pneumatic actuators using nonlinear topology optimization. An existing density- and porohyperelasticity-based topology optimization framework was extended from 2D to 3D and used to generate two manufacturable actuator designs, which were then studied numerically and experimentally. For both designs, the objective was to maximize the bending response for a prescribed actuation pressure under two different allowable strain limits. A key advantage of the employed topology optimization framework is that it can consistently, during the optimization, account for the very large deformations induced upon pressurization. The two optimized 3D designs were fabricated using stereolithography and experimentally tested to validate their performance.
comment: 20 pages, 13 figures
Probing Embodied LLMs: When Higher Observation Fidelity Hurts Problem Solving
Large Language Models are increasingly proposed as cognitive components for robotic systems, yet their opaque decision processes make it difficult to explain success or failure in closed-loop embodied tasks. Following an empirical AI methodology, we study embodied LLM agents behaviorally by varying the information available to the agent and measuring the resulting changes in behavior. Using the Lockbox, a sequential mechanical puzzle with hidden interdependencies, we evaluate LLMs across RGB, RGB-D, and ground-truth symbolic observations in a physical robotic setup and use controlled simulation to probe the resulting behavior. Counterintuitively, agents perform best under raw RGB input and worst under perfect ground-truth observations. In simulation, we probe this effect by randomly flipping perceived action outcomes and find that moderate noise improves performance, peaking at a 40% flip probability with a 2.85-fold success rate increase over the noise-free baseline. Further analysis links this gain to a reduction in repetitive action loops. These findings suggest that success rates alone are insufficient for evaluating LLMs, as measured performance may reflect the interaction between perceptual errors and reasoning failures rather than robust problem solving.
comment: Submitted to From Animals to Animats: The 18th International Conference on the Simulation of Adaptive Behavior (SAB)
Towards LLM-Assisted Architecture Recovery for Real-World ROS~2 Systems: An Agent-Based Multi-Level Approach to Hierarchical Structural Architecture Reconstruction
Explicit software architecture models are essential artifacts for communicating, analyzing, and evolving complex software-intensive systems. In ROS~2-based robotic systems, however, structural (de-)composition and integration semantics are often only implicitly encoded across distributed artifacts such as source code and launch files, making recovery of hierarchical architecture particularly difficult. Existing approaches mainly focus on node-level entities and communication wiring, while providing limited support for recovering hierarchical structural (de-)composition across multiple abstraction levels. In this paper, we extend our previously proposed blueprint-guided LLM-assisted architecture recovery pipeline for ROS~2 systems through two major enhancements: (1) refined prompting to improve the consistency and controllability of architecture synthesis, and (2) a staged recovery strategy based on multi-level intermediate architectural representations that incorporate the atomic ROS node list and launch file dependencies, thereby enabling structurally constrained reconstruction across multiple abstraction levels. The approach is evaluated on a real-world automated product disassembly system based on cooperative robotic arms and heterogeneous ROS~2 artifacts. Compared to our previous work, the considered case study exhibits substantially higher integration complexity and richer functionality. The results demonstrate improved structural consistency, scalability, and robustness of architecture recovery, while also revealing remaining challenges related to dynamic integration semantics in large-scale ROS~2 systems.
Minimalist Visual Inertial Odometry
Visual-Inertial Odometry(VIO), which is critical to mobile robot navigation, uses cameras with a large number of pixels. Capturing and processing camera images requires significant resources. This work presents a minimalist approach to planar odometry, demonstrating that just four visual measurements and an IMU can provide robust motion estimation for differential-drive robots. Our key insight is that four downward-facing photodiodes that sense the world through optical Gabor masks produce signals that encode speed. Based on this, we jointly optimize the mask parameters alongside a Temporal Convolutional Network (TCN) using a physically-grounded simulator. The resulting model decodes speed from just the four measurements produced by the photodiodes. Pairing these estimates with the angular speed from an IMU yields a continuous planar trajectory. We validate our approach with a prototype sensor mounted on a differential drive robot. Across diverse indoor and outdoor terrains, our system closely tracks the reference ground truth without any real-world fine-tuning. Our work shows that minimalist sensing enables efficient and accurate planar odometry.
comment: This work has been submitted to the IEEE for possible publication
Beyond Binary Success: A Diagnostic Meta-Evaluation Framework for Fine-Grained Manipulation
Fine-grained manipulation marks a regime where global scene context no longer suffices, and success hinges on the tight coupling of local attribute grounding, high-fidelity spatial perception, and constraint-respecting motor execution. However, current embodied AI benchmarks collapse these capacities into binary success rates, systematically inflating reported capabilities by up to 70% and masking the architectural bottlenecks that impede real-world deployment. We introduce MetaFine, a diagnostic meta-evaluation framework that disentangles manipulation competency along three axes: understanding, perception, and controlled behavior. Built on a compositional task graph, MetaFine absorbs heterogeneous external benchmarks and reconstructs them into diagnostic scenarios of varying complexity under a unified protocol. Evaluating state-of-the-art vision-language-action (VLA) models through this lens exposes severe dimension-specific failures invisible to conventional metrics. Through targeted causal intervention, we identify the visual encoder's ability to preserve local spatial structure as a key bottleneck for fine-grained precision: improving it directly unlocks previously inaccessible manipulation capabilities without modifying downstream policies. MetaFine further supports hybrid real-sim validation, using limited paired real-world rollouts to calibrate scalable simulation-based estimates for more stable physical benchmarking. By shifting evaluation from ranking to diagnosis, MetaFine turns benchmarking into an actionable compass for repairing the layered capacities underlying genuine physical dexterity. The MetaFine framework, benchmarks, and supporting resources will be publicly released at our project page: https://metafine.github.io/.
comment: Project page: https://metafine.github.io/
CEER: Compliant End-Effector and Root Control as a Unified Interface for Hierarchical Humanoid Loco-Manipulation
Humanoid robots have achieved impressive locomotion performance, yet contact-rich and long-horizon manipulation remains a major bottleneck. Manipulation is inherently contact-rich and demands compliant whole-body control for stable interaction, while its diversity and long-horizon nature favor modular, planner-compatible interfaces over joint-space tracking. We propose CEER, a compliant end-effector-root (EE-root) control abstraction for modular humanoid loco-manipulation within a hierarchical planning framework. CEER enables compliance-aware whole-body control in an interpretable task space defined by root motion commands and end-effector pose targets, and supports plug-and-play integration with heterogeneous high-level planners. A teacher-student framework is adopted to distill a general motion-tracking controller into a low-level policy that consumes only EE-root commands. We further construct a hierarchical system that integrates heterogeneous planners and task modules through the EE-root interface, enabling diverse manipulation tasks without retraining the underlying whole-body policy. Experiments in simulation and on hardware demonstrate 3.3 cm end-effector tracking accuracy with substantially reduced jerk compared to baselines, stable contact-rich manipulation under teleoperation, and up to 70% success in simulated single-object loco-manipulation tasks within a room-scale environment. These results indicate that compliant EE-root control provides a practical abstraction for humanoid loco-manipulation, enabling modular and scalable integration of diverse skills.
comment: Project page: https://robotproject8.github.io/ceer_page/. 9 pages, 7 figures
TravExplorer: Cross-Floor Embodied Exploration via Traversability-Aware 3-D Planning
Zero-shot Object Navigation (ZSON) has shown promise for open-vocabulary target search in unseen environments, yet most existing systems remain tied to planar representations and single-floor assumptions. These assumptions become inadequate in real buildings, where navigation involves floors, stairs, landings, and vertically overlapping spaces. This article presents TravExplorer, a cross-floor embodied exploration framework that couples zero-shot semantic guidance with traversability-aware 3-D planning. TravExplorer maintains a unified volumetric map that distinguishes occupied structures from robot-reachable support surfaces and extracts traversable frontiers from connected support surfaces, including floors, stairs, and landings. A FOV-aware active perception strategy further resolves incomplete observations during cross-floor traversal. To reduce semantic-reasoning latency, a lightweight guidance module aligns a probabilistic instance map from online open-vocabulary segmentation with a spatial value map from fast image-to-text matching. Based on these geometric and semantic memories, a hierarchical planner performs target-aware frontier touring over object hypotheses, traversable frontiers, and stair landmarks, and generates executable cross-floor motions through foothold-guided 3-D search and vertically constrained local trajectory optimization. Experiments over 4,195 simulated episodes on HM3D and MP3D demonstrate consistent advantages over representative ObjectNav baselines. Fifty real-world trials on a Unitree Go2 further validate open-vocabulary target search across single-floor and cross-floor indoor environments without prior maps or human intervention. The code will be released at https://github.com/wuyi2121/TravExplorer.
World-Ego Modeling for Long-Horizon Evolution in Hybrid Embodied Tasks
World models are widely explored in embodied intelligence, yet they typically predict distinct evolutions of the world and the ego within a single stream, where the world captures persistent instruction-agnostic scene regularities and the ego captures robot-centric instruction-conditioned dynamics. This world-ego entanglement leads to a degradation in long-horizon embodied scenarios, particularly in hybrid tasks with interleaved navigation and manipulation behaviors. In this paper, we introduce \emph{World-Ego Modeling}, a new conceptual paradigm that decomposes future evolution into world and ego components. We define the world-ego boundary from three perspectives, i.e., motion-, semantic-, and intention-based views, and analyze three disentanglement strategies with post-, pre-, and full disentanglement. Further, we instantiate this paradigm as the World-Ego Model (WEM), a unified embodied world model that couples an implicit separate world-ego planner with a cascade-parallel mixture-of-experts (CP-MoE) diffusion generator. To enable rigorous evaluation, we further construct HTEWorld, the first benchmark for long-horizon world modeling with hybrid navigation-manipulation tasks, providing 125K video clips (over 4.5M frames) with fine-grained action annotations and 300 multi-turn evaluation trajectories (over 2K instructions). Extensive experiments show that WEM achieves state-of-the-art performance on HTEWorld while remaining competitive on existing manipulation-only benchmarks.
Robotics-Inspired Guardrails for Foundation Models in Socially Sensitive Domains
Foundation models are increasingly deployed in socially sensitive domains such as education, mental health, and caregiving, where failures are often cumulative and context-dependent. Existing guardrail approaches -- ranging from training-time alignment to prompting, decoding constraints, and post-hoc moderation -- primarily provide empirical risk reduction rather than enforceable behavioral guarantees, and largely treat safety as a property of individual outputs rather than interaction trajectories. We reframe guardrails as a problem of runtime behavioral control over interaction trajectories, drawing on robotics to introduce formal constructs for constraint enforcement in uncertain, closed-loop systems. We instantiate these ideas in the Grounded Observer framework and apply it across three real-world deployments: small talk, in-home autism therapy, and behavioral de-escalation in schools. Across settings, the framework enables runtime interventions that mitigate drift into undesirable interaction regimes while adapting to diverse social contexts. We discuss extensions to the framework and propose research directions toward stronger guarantees.
comment: Under review at Journal of Artificial Intelligence Research (JAIR)
RoHIL: Robust Human-in-the-Loop Robotic Reinforcement Learning Against Illumination Variations
Human-in-the-loop reinforcement learning systems achieve near-perfect success on the workstation where they are trained, but collapse when the same robot is moved to a workstation a few meters away due to shifts in the visual input distribution caused by new lamp positions and window light. Re-collecting demonstrations and re-running HIL on every workstation is incompatible with deployment, and naively fine-tuning on shifted-light data triggers catastrophic forgetting of the source workstation. To close this cross-domain gap, we present RoHIL, an offline fine-tuning framework that uses no extra real-robot interaction. RoHIL combines (i) a world-model-based image relighter that re-synthesises the visual stream of source-workstation trajectories under multiple virtual HDRI environments, leaving actions and rewards real; (ii) Illumination-Retention Replay (IRR), a data-level anti-forgetting mechanism that interleaves relit adaptation transitions with original-light retention transitions to preserve source-workstation Bellman coverage; and (iii) an anchored Bellman-actor regulariser that constrains representation and policy drift from the original source-workstation policy. Across four real-robot manipulation tasks under significant cross-workstation illumination variations, RoHIL substantially improves shifted-light performance where standard HIL-RL collapses, while preserving source-workstation performance, eliminating the need to re-collect data and retrain for every new workstation and environment. Project page: https://anonymous4365.github.io/RoHIL/
Beyond Action Residuals: Real-World Robot Policy Steering via Bottleneck Latent Reinforcement Learning
Pretrained imitation policies have become a strong foundation for robot manipulation, but they often require online improvement to overcome execution errors, limited dataset coverage, and deployment mismatch. A central question is therefore how reinforcement learning (RL) should adapt policies after offline pretraining. Existing lightweight methods commonly apply residual corrections directly in action space, but this often leads to noisy and poorly structured exploration. In this work, we propose Z-Perturbation Reinforcement Learning (ZPRL), an approach that steers pretrained policies through a compact bottleneck latent rather than through policy weights or output actions. During offline training, we augment the policy with a plug-and-play variational information bottleneck (VIB) module to extract a task-relevant latent interface from observation embeddings. During online finetuning, the base policy is frozen and RL learns only a residual perturbation on this latent, whose decoded representation conditions the frozen action generator. We instantiate ZPRL on flow-matching policies and evaluate it on eight simulation tasks and four real-world tasks. Across diverse manipulation settings, ZPRL improves both sample efficiency and final performance over strong post-training baselines. In the real world, ZPRL improves the average success rate on four tasks by 33.7% over imitation base policies while producing smoother exploration behaviors than an action residual counterpart. These results suggest that a compact, task-aligned bottleneck latent provides an effective interface for online RL adaptation. More videos can be found at https://manutdmoon.github.io/ZPRL/.
DAG-Based QoS-Aware Dynamic Task Placement for Networked Multi-Stage Control Pipelines
Current Physical AI (PAI) relies heavily on closed-loop visual-servoing pipelines, whose perception and planning stages may become computationally intensive onboard due to complex models embedded on robots. In practice, offloading the perception task to on-site edges statically is inappropriate for latency-sensitive, precise industrial settings over a standardized industrial network. This emphasizes the importance of Control-Communication-Computing (3C) co-design in industrial automation: monolithic local execution saturates AI-accelerated machine and robot hardware, while static edge offloading exposes the control loop to network jitter. Existing adaptive task placement (ATP) controllers can partially address the gap by relocating a single pipeline stage on binary threshold rules, without a multi-stage model and an explicit cost on placement switching. In this Work-in-Progress (WiP) paper, we propose a directed acyclic graph (DAG) based quality-of-service (QoS)-aware dynamic task placement (DTP) framework for sensing-perception-planning-control pipelines in networked robotics. This pipeline is formalized as a DAG with task-level and node-level attributes for compute cost, communication delay, and feasible placement sets; over a small interpretable candidate set (fully local, static offload, hybrid), a window-based cost function combines tail end-to-end latency, deadline violation rate, hardware utilization, and a Hamming-distance switching penalty, and a DTP algorithm with hysteresis and a minimum dwell-time bounds placement chatter. Our WiP paper presents the theoretical framework, a structured qualitative analysis, and a two-phase simulation plus hardware-in-the-loop validation roadmap.
comment: 4 pages, 1 figure, 1 algorithm, accepted as a Work-in-Progress (WiP) paper, on the 24th IEEE International Conference on Industrial Informatics (INDIN), 26-29 July, 2026, Melbourne, Australia
Trajectory Planning and Control near the Limits: an Open Experimental Benchmark on the RoboRacer Platform SC
We present a modular framework to benchmark new and existing methods for trajectory planning and control in high-acceleration maneuvers that push autonomous driving to the limits. Our framework includes time-optimal raceline generation, online time-optimal velocity replanning, geometric path tracking controllers, and a new model-structured neural network (MS-NN) to learn the inverse dynamics for steering control. We deploy our framework on a 1:10-scale RoboRacer platform, using two circuits. Through several ablations with cautious and aggressive racelines, we study the performance of single modules and their combinations. We show that our MS-NN significantly improves tracking accuracy, decreases steering oscillations, and is physically interpretable. Moreover, online velocity replanning improves lap times by compensating for execution errors, and enables the vehicle to safely reach higher speeds and accelerations. To support future research, our code, datasets, videos and results are publicly available at https://roboracer-benchmark.github.io/planning_control_benchmark/.
comment: Accepted - 2026 IEEE 29th International Conference on Intelligent Transportation Systems (ITSC)
Justifying bio-inspired robotics research: A taxonomy of strategies
For most of human history, we have not thought systematically about how and why we incorporate aspects of the natural world into our designs. The lack of a systematic approach has resulted in inconsistencies in motivations and methods that make it difficult to predict or evaluate the success of bio-inspired design. This mismatch between expectations and results can lead to disappointment when a reader considers a bio-inspired design to be superficial, weak, or incomplete. This is especially true in the field of Robotics, in which similarity to a biological system might be the driving motivation for construction. In an effort to assist robotics researchers justify their specific bio-inspired approach and to assist funding program managers with discerning the value of different bio-inspired approaches, here we propose a taxonomy of motivations for bio-inspired design and describe the potential significant contributions that are likely to result from different approaches.
CADENet: Condition-Adaptive Asynchronous Dual-Stream Enhancement Network for Adverse Weather Perception in Autonomous Driving
Adverse weather (rain, fog, sand, and snow) degrades camera-based object detection in autonomous vehicles. Existing enhancement-then-detect approaches stall the safety-critical perception loop, violating hard real-time requirements. Progress on this problem is also constrained by an under-recognized evaluation ceiling: ground truth annotated on degraded images cannot credit a detector that recovers objects the annotators themselves could not see, so a genuinely useful enhancement can register as a near-flat F1 gain. This paper presents CADENet (Condition-Adaptive Asynchronous Dual-stream Enhancement Network), a training-free three-thread system: Thread S (YOLOv11n) delivers detections at full frame rate with zero added latency; Thread Q applies condition-adaptive enhancement (CAPE) and fuses results via entropy-guided NMS (EG-NMS) without blocking Thread S; Thread E provides CLIP zero-shot weather classification, so new weather categories require only a new text prompt, with no labeled data and no retraining. Evaluated on 1327 DAWN images (YOLOv11m, IoU = 0.5, confidence = 0.25), CADENet achieves Recall = 0.0103 (micro), F1 = 0.0230 on snow, and F1 = 0.0038 on rain. We formalize the annotation completeness bias on DAWN-class data, so the reported F1 values are lower bounds on the true gain; recall is the annotation-gap-immune headline metric. Thread S sustains approximately 44 FPS regardless of enhancement load. No model retraining or additional sensor hardware is required.
From Prompts to Pavement Through Time: Temporal Grounding in Agentic Scene-to-Plan Reasoning
Recent attempts to support high-level scene interpretation and planning in Autonomous Vehicles (AVs) using ensembles of Large Language Models (LLMs) and Large Multimodal Models (LMMs) continue to treat time as a secondary property. This lack of temporal grounding leads to inconsistencies in reasoning about continuous actions, undermining both safety and interpretability. This work explores whether temporal conditioning within inter-agent communication can preserve or enhance coherence without introducing degradation in semantic or logical consistency. To investigate this, we introduce three planner architectures with progressively increasing temporal integration and evaluate them on curated subsets of the BDD-X dataset using semantic, syntactic, and logical metrics. Results show that while temporal conditioning reshapes reasoning style, it yields no statistically significant improvements in standard NLP-based correctness metrics. However, qualitative analysis reveals predictive hazard reasoning, stable corrective behavior, and strategic divergence in the Sentinel. These findings clarify the limits of prompt-based temporal grounding and establish the first empirical benchmark for temporal scene-to-plan reasoning.
Beyond Imitation: Learning Safe End-to-End Autonomous Driving from Hard Negatives
Existing imitation learning methods for end-to-end autonomous driving predominantly learn from successful demonstrations by minimizing geometric deviations from expert trajectories. This paradigm implicitly assumes that spatial proximity implies behavioral safety, leading to a critical objective mismatch: trajectories with nearly identical imitation losses may exhibit drastically different safety outcomes, where one remains recoverable while the other results in collision. To address this limitation, we propose BeyondDrive, a failure-aware imitation learning framework that jointly learns from successful and failed driving behaviors. First, we introduce a flow matching-based negative trajectory generator that synthesizes safety-critical yet expert-proximate trajectories, enabling explicit modeling of safety asymmetry. Second, we develop a diversity-aware sampling strategy that mitigates mode collapse and improves coverage of diverse failure modes during negative trajectory generation. Third, we propose a Repulsive Distance Loss that simultaneously attracts predictions toward expert demonstrations while repelling them from hard negative trajectories, thereby establishing discriminative safety boundaries in trajectory space. Applied to the uni-modal baseline Latent TransFuser, BeyondDrive achieves 89.7 PDMS on the NAVSIMv1 closed-loop benchmark, outperforming prior state-of-the-art methods. Moreover, BeyondDrive generalizes effectively across different autonomous driving architectures, including multi-modal planners, and further demonstrates strong zero-shot transferability on the HUGSIM benchmark.
KIO-planner: Attention-Guided Single-Stage Motion Planning with Dual Mapping for UAV Navigation
Autonomous UAV flight in confined, wall-dense environments requires low-latency and reliable motion planning under strict safety constraints. Traditional optimization-based planners suffer from mapping latency and easily fall into local minima when navigating through dense structural obstacles. Meanwhile, existing end-to-end learning methods struggle to extract fine-grained geometric features from raw depth images and lack hard kinodynamic constraints, leading to unpredictable collisions near walls. To address these issues, we propose KIO-planner, an attention-guided single-stage trajectory planning framework. First, we integrate a Convolutional Block Attention Module (CBAM) into the perception backbone to adaptively focus on critical structural edges and traversable space. Second, we introduce a novel Dual Mapping mechanism--comprising physical bounds activation and a deterministic Geometric Safety Shield in the depth-pixel space--to enforce kinodynamic feasibility and collision-free flight without global map fusion. Extensive high-fidelity simulated experiments demonstrate that KIO-planner enables highly agile navigation at speeds up to 3.0 m/s. Compared to the state-of-the-art baseline, KIO-planner achieves lower inference latency (approximately 24 ms) and generates significantly smoother trajectories, reducing control cost by 28.4%. Most notably, our Dual Mapping substantially increases the worst-case safety margin, measured by minimum distance to obstacles, from 0.48 m to 0.76 m, ensuring fast, smooth, and safer navigation in highly constrained environments.
comment: Accepted by an IEEE Vehicular Technology Conference. 6 pages, 4 figures, 1 table
Multi-Session Ground Texture SLAM in Low-Dynamic Environments
The simultaneous localization and mapping community has introduced a growing number of systems adapted for multi-session operations where the operational environment features low-dynamic changes that impact mapping, such as surface wear, weather phenomena, or seasonal change. These systems allow for lifelong operations by a robot within these environments. There is also growing interest in operations in environments where the unique ground texture is the only mapping feature available for use. These ground texture systems are not yet targeted for multi-session low-dynamic-change environments though. This work explores the impact of three different techniques on trajectory estimation accuracy in these multi-session low-dynamic ground texture environments. Of the three, the use of Kullback-Leibler Divergence, as a similarity score and a bias influencing loop closure confidence, is found to have the most success. We show an analysis of all three methods and a deeper exploration of the impact of Kullback-Leibler Divergence. We also introduce a dataset for use by the robotics community that contains multi-session images where the ground changes between sessions and also high-accuracy pose information for use in evaluation.
comment: 8 pages, 9 figures. To appear at the 23rd International Conference on Ubiquitous Robots, Osaka, Japan. Distribution Statement A: Approved for public release; distribution is unlimited, as submitted under NAVAIR Public Release Authorization 2025-0098
D-CLING: Prior-Preserving Depth-Conditioned Fine-Tuning for Navigation Foundation Models ICRA 2026
Navigation Foundation Models (NFMs) trained on large cross-embodied datasets have demonstrated powerful generalizability in various scenarios. Adopting in-domain fine-tuning for an NFM efficiently calibrates the visuomotor policy, promising further improvement even in a novel scenario. However, the fine-tuned models still suffer from poor obstacle avoidance or fail to properly reach the provided goals. Furthermore, model updates using a small subset of data typically erode the pre-trained prior, compromising the pre-training generalization. Consequently, fine-tuning deteriorates the capability of the model for robust and accurate navigation. In this work, we present a novel fine-tuning method that leverages large-scale pre-training while efficiently learning in novel setups, such as environments or camera configurations. In particular, inspired by ControlNet, we fine-tune an NFM by attaching a trainable copy of the pre-trained backbone using zero-initialized residual pathways, thereby learning geometric cues. This design enables the model to efficiently acquire in-domain geometry while preserving pre-trained knowledge across various behaviors. Despite its simplicity, our comprehensive evaluation of real-world navigation suggests that our proposal effectively enables robust long-horizon navigation with minimal collisions and human intervention. Additionally, our offline analysis shows that the proposed method maintains or further improves action prediction capabilities beyond the fine-tuned dataset, providing a key insight into continual learning for general navigation. The project page: https://toyotafrc.github.io/DCLING-Proj/
comment: This paper has been accepted to the 2026 IEEE International Conference on Robotics and Automation (ICRA 2026), which will be held in Vienna, Austria, from June 1 to 5, 2026
RoVLA: Multi-Consistency Constraints for Robust Vision-Language-Action Models
Vision-Language-Action (VLA) models have shown strong performance on embodied manipulation, yet they remain brittle under visual observation changes, paraphrased language instructions, and compounded perturbations. This limitation suggests that existing methods still rely heavily on shallow correlations in the training distribution, rather than learning stable couplings among task semantics, environment states, and action generation. Although recent efforts improve robustness through larger-scale training, post-training adaptation, or enhanced predictive modeling, they rarely enforce invariance-oriented consistency within the end-to-end policy itself. To address this issue, we propose RoVLA, a robust vision-language-action framework with multi-consistency constraints. RoVLA enforces consistency under three complementary transformations: instruction semantics, trajectory evolution, and observation perturbation. Specifically, Instructional Consistency (IC) promotes stable grounding under semantically equivalent instruction rewrites, Evolutionary Consistency (EC) preserves coherent action intent throughout the generation process, and Observational Consistency (OC) improves robustness to visual and proprioceptive perturbations by enforcing consistent predictions before and after targeted disturbances. By explicitly modeling these invariances during training, RoVLA reduces reliance on superficial correlations and improves robustness and generalization. Experiments on LIBERO-Plus, RoboTwin 2.0, and real-world manipulation tasks show that RoVLA consistently outperforms strong baseline methods and exhibits superior robustness under diverse task and observation shifts. These results demonstrate the effectiveness of multi-consistency learning for robust embodied control. Codes will be available at https://github.com/HCPLab-SYSU/RoVLA.
HEAT: Heterogeneous End-to-End Autonomous Driving via Trajectory-Guided World Models
End-to-end autonomous driving has emerged as a compelling alternative to traditional modular pipelines by directly mapping raw sensor data to driving actions. While recent approaches achieve strong performance on single-domain datasets, their performance degrades significantly when trained jointly across multiple heterogeneous domains. In practice, however, autonomous systems must operate across diverse environments with heterogeneous distributions, including different cities, sensor configurations, and traffic patterns, without domain-specific retraining. This gap highlights a key challenge in multi-domain learning: domain-specific variations across heterogeneous domains introduce conflicting learning signals, driving models toward compromised solutions that are suboptimal across domains. To address this, we propose a trajectory-driven learning paradigm that organizes training around planning trajectories, enabling the model to capture domain-invariant representations of driving intent. Furthermore, we incorporate a world model that predicts future latent features conditioned on ego actions, improving feature consistency and mitigating domain-induced biases. We evaluate our approach on three benchmarks, nuScenes, NAVSIM, and the Waymo end-to-end dataset, and show substantial improvements over existing methods across all domains. Our results demonstrate that a single unified model can be trained on heterogeneous datasets while maintaining strong performance within each domain, highlighting a step toward scalable real-world deployment. We will make our code publicly available.
FlyMirage: A Fully Automated Generation Pipeline for Diverse and Scalable UAV Flight Data via Generative World Model
In the field of Vision-Language Navigation (VLN), aerial datasets remain limited in their ability to combine scale, diversity, and realism, often relying on either costly real-world scenes or visually limited simulations. To address these challenges, we introduce FlyMirage, a highly scalable and fully automated data generation pipeline for aerial VLN. Our approach leverages large language models (LLM) as an environment designer to promote scene diversity, paired with a generative world model that instantiates these designs into high-fidelity 3D Gaussian Splatting (3DGS) scenes. To substantially reduce human labor and ensure the feasibility of flight data, FlyMirage automates scene exploration and semantic information acquisition, and further integrates a dynamically feasible planner for uncrewed aerial vehicle (UAV) trajectory generation. Utilizing this toolchain, we generate a large-scale, diverse, and photorealistic aerial VLN dataset, with dynamically feasible flying trajectories, designed to support the development of next-generation embodied navigation models.
MCNav: Memory-Aware Dynamic Cognitive Map for Zero-shot Goal-oriented Navigation
Navigating to instance-level targets in complex environments is a challenging problem. Many existing zero-shot methods achieve strong performance by modeling the entire environment and leveraging large language models for scene understanding. However, such strategies primarily focus on exploring new regions while lacking a deeper exploitation of information from previously explored areas. Consequently, when targets are missed or misidentified within previously visited regions, navigation failures occur frequently. To address these limitations, we propose MCNav, a memory-aware navigation framework with a dynamic cognitive map. This map stores efficiently queryable information about relevant objects in explored areas. Building on this memory structure, MCNav introduces two memory-aware exploration strategies: goal re-validation, which re-assesses previously seen objects to correct matching failures, and missed goal re-exploration, which estimates the likelihood that a target is present in an explored region from contextual cues. These strategies are further stabilized by a blacklist mechanism to prevent repeated errors and a double-check mechanism for high-confidence confirmation. We evaluate MCNav on the HM3Dv1 and HM3Dv2 datasets across three different tasks, where it achieves state-of-the-art performance, particularly on the instance-level goal navigation task.
Implicit Action Chunking for Smooth Continuous Control
Reinforcement learning often produces high-frequency oscillatory control signals that undermine the safety and stability required for physical deployment. Explicit action chunking addresses this by predicting fixed-horizon trajectories but scales the policy output dimension proportionally with the horizon length, leading to optimization difficulties and incompatibility with standard step-wise interaction. To overcome these challenges, this paper proposes Dual-Window Smoothing (DWS), an implicit action chunking framework for smooth continuous control. Unlike explicit methods, DWS enforces temporal coherence without expanding the action space. It uses a dual-window design: an execution window that ensures physical smoothness through deterministic modulation, and a value window that aligns temporal-difference targets over the horizon to correct critic bias caused by open-loop execution. DWS also includes a lightweight actor-side temporal regularizer based on first-order action differences to promote global continuity. This design effectively bridges the gap between temporal abstraction and reactive step-wise control. Experiments on benchmarks including the DeepMind Control Suite and industrial energy management tasks show that DWS outperforms state-of-the-art (SOTA) baselines. In complex vision-based autonomous driving tasks, DWS achieves smoother control, safer behavior with reduced jitter, and attains a 100% success rate.
PAPO-VLA: Planning-Aware Policy Optimization for Vision-Language-Action Models
Vision-Language-Action (VLA) models show promising ability in language-guided robotic tasks. However, making VLA policies reliable remains challenging, because a manipulation task is completed through closed-loop interaction, where each action affects subsequent execution. To analyze this problem, we revisit VLA policy during execution and argue that a VLA policy acts both as a planner, which makes task-oriented decisions that change the direction of execution, and as an executor, which realizes these decisions through dense continuous actions. This view suggests that improving VLA reliability requires particular attention to planning actions. Existing optimization methods can imitate actions or improve complete trajectories, but they usually do not explicitly identify planning actions or measure their importance for task success. To address this issue, we propose Planning-Aware Policy Optimization for VLA models (PAPO-VLA). PAPO-VLA first identifies planning actions by jointly considering action variation and trajectory outcome, then estimates their importance through causal sufficiency and causal necessity, and finally incorporates this importance into GRPO advantage estimation. In this way, more important planning actions receive stronger optimization emphasis, while the whole trajectory is still optimized by trajectory-level feedback. Experiments on multiple benchmarks demonstrate the effectiveness of PAPO-VLA.
Learning-Accelerated Optimization-based Trajectory Planning for Cooperative Aerial-Ground Handover Missions
This paper presents a learning-augmented trajectory planning framework for cooperative unmanned aerial vehicle (UAV) and unmanned ground vehicle (UGV) handover missions. While centralized trajectory optimization ensures dynamic feasibility and task optimality, its high computational cost limits real-time applicability. We propose a neural surrogate planner utilizing decoupled encoder-decoder long short-term memory (LSTM) networks to generate coordinated handover trajectory predictions from the task specifications. These predictions serve as informed warm starts for the downstream centralized optimizer, thereby accelerating convergence to dynamically feasible solutions. Benchmark evaluations demonstrate that the learning-augmented planning framework achieves more than a threefold speedup and 100% optimization success rate compared to cold start optimization. The results indicate that combining data-driven inference with model-based refinement enables fast and reliable trajectory generation for heterogeneous multi-robot systems.
comment: Preprint of a contribution accepted for publication in the RoManSy 2026 Springer proceedings
SafeAlign-VLA: A Negative-Enhanced Safe Alignment Framework for Risk-Aware Autonomous Driving
End-to-end autonomous driving systems excel in common scenarios but struggle with safety-critical long-tail cases. Vision-Language-Action (VLA) models are promising due to their strong reasoning capabilities. However, most VLA-based approaches rely on positive expert demonstrations, rarely exploiting negative samples, leading to insufficient understanding of risky behaviors and safety boundaries. To address this limitation, we propose SafeAlign-VLA, a unified negative-enhanced safe alignment framework that incorporates negative data into supervised learning and reinforcement learning. First, we develop a counterfactual safety pairing paradigm to generate structured safety labels and counterfactual positive trajectories from risky scenarios via counterfactual reasoning. Then, a two-stage training strategy is adopted: negative-enhanced supervised fine-tuning for failure feedback and trajectory correction, followed by anchor-based group relative policy optimization that uses positive and negative trajectories as contrastive anchors to steer sampling and penalize high-risk behaviors via group-relative advantages. Experiments on NAVSIM and DeepAccident validate the proposed framework. SafeAlign-VLA achieves 89.1 PDMS on the NAVSIM v1 testset, improving over the baseline without negative data by 1.3%. On DeepAccident, it reduces the collision rate to 3.36%, while achieving 84.2% language accuracy and 85.8% risk prediction accuracy. These results demonstrate the effectiveness of the proposed negative-enhanced safe alignment framework for safe and robust autonomous driving.
ARC-RL: A Reinforcement Learning Playground Inspired by ARC Raiders
Reinforcement learning for legged locomotion has matured into a stack of multi-component reward functions and physics-engine benchmarks whose morphologies are uniformly derived from real commercial hardware. Game NPCs, however, are bound by stylistic constraints absent from sim-to-real robotics and routinely take the form of creatures with no real-robot counterpart. We introduce ARC-RL, a suite of four MuJoCo continuous-control environments featuring robotic morphologies inspired by the bestiary of ARC Raiders: the 18-DoF tall hexapod Queen, the 12-DoF armoured hexapod Bastion, the 18-DoF compact hexapod Tick, and the 12-DoF quadruped Leaper. All four robots share a unified observation template, action convention, simulation cadence, and a single closed-form multi-component reward function whose only per-morphology variation lives in a small set of weights and parameters. The reward fuses a velocity-tracking tent, a healthy survive bonus, a phase-locked gait-compliance bonus/cost pair, action regularisers, three safety penalties, and a posture anchor; no motion-capture data enters the reward at any point. We additionally provide hand-crafted Central Pattern Generator demonstrators per morphology, which serve both as fixed expert references and as sources of prior data for offline-to-online training. On this playground, we conduct a controlled empirical study comparing standard online algorithms (SAC, SPEQ, SOPE-EO) and methods augmented with prior data (SACfD, SPEQ-O2O, SOPE), and characterise how each paradigm copes with the playground's morphological diversity and animation-style stylistic constraints.
CANINE: Coaching Visually Impaired Users for Interactive Navigation with a Robot Guide Dog
Robot guide dogs offer navigation assistance that greatly expands the independent mobility of the visually impaired, but their effective use requires subtle human-robot coordination that is difficult for users to learn from generic verbal instructions. To tackle this challenge, we present CANINE, an automated coaching system that trains users for interactive navigation with a robot guide dog, through personalized, adaptive verbal feedback. CANINE decomposes a complex coordination task into sub-skills and operates at two levels. At the high level, it decides what to train by tracking the learner's proficiency across sub-skills using knowledge tracing and prioritizing training on the weakest areas. At the low level, CANINE decides how to train each sub-skill by observing each human practice episode, using foundation models to infer the underlying causes of errors, and generating targeted verbal corrections adaptively. A controlled study with blindfolded participants, treated as a proxy population for quantitative evaluation, demonstrates that CANINE significantly improves both learning efficiency and final navigation performance compared to generic verbal instructions. We further validate CANINE through a retention study and an exploratory case study. The retention study shows lasting skill improvement after two weeks. The case study confirms CANINE's effectiveness in training a visually impaired user, while revealing additional design considerations for real-world deployment. Both are well aligned with the findings of the controlled study. Project page: https://cunjunyu.github.io/project/canine/
comment: Accepted to RSS 2026
Closed-Loop Hybrid Digital Twin Platform for Connected and Automated Vehicle Validation
Comprehensive and efficient validation of connected and automated vehicles (CAVs) is critical prior to real-world deployment. While simulation-based testing offers scalability, existing approaches often lack seamless integration with real vehicles and field data, limiting their fidelity in capturing dynamic, real-world interactions. To bridge this gap, this paper proposes a novel real-time hybrid digital twin platform. Its core innovation lies in the tight coupling of a high-fidelity CARLA-SUMO co-simulation with a physical test site and vehicle via a low-latency Vehicle-to-Everything (V2X) communication link. A custom-developed middleware serves as the critical bridge, synchronizing a real CAV's kinematic state as a shadow vehicle in the simulation and translating virtual control commands into chassis-actuating Controller Area Network (CAN) messages for closed-loop control. Detailed implementation includes using photogrammetry for full-scale asset reconstruction and a cloud-edge collaborative architecture for scalable, multi-user operation. Experimental results demonstrate stable synchronization and effective closed-loop control with low latency, confirming the platform's practicality for multi-scenario CAV verification.
Sampling-Based Safe Reinforcement Learning
Safe exploration remains a fundamental challenge in reinforcement learning (RL), limiting the deployment of RL agents in the real world. We propose Sampling-Based Safe Reinforcement Learning (SBSRL), a model-based RL algorithm that maintains safety throughout the learning process by enforcing constraints jointly across a finite set of dynamics samples. This formulation approximates an intractable worst-case optimization over uncertain dynamics and enables practical safety guarantees in continuous domains. We further introduce an exploration strategy based on constraining epistemic uncertainty, eliminating the need for explicit exploration bonuses. Under regularity conditions, we derive high-probability guarantees of safety throughout learning and a finite-time sample complexity bound for recovering a near-optimal policy. Empirically, SBSRL achieves safe and efficient exploration both in simulation and in real robotic hardware, and readily extends to practical deep-ensemble implementations that scale to high-dimensional continuous control problems.
Self-assembling Modular Aerial Robot for Versatile Aerial Tasks
Multirotor aerial robots excel at maneuvering in three-dimensional space, and recent advances enable nimble navigation in cluttered and confined environments, especially for small airframes. By contrast, platforms built for high-altitude work tend to be larger to deliver high thrust for stable physical interaction with the environment. However, these conflicting design requirements create a long-standing trade-off between nimble navigation and robust aerial manipulation. Here, we present LEGION units, which are reconfigurable modular aerial robots capable of in-flight self-assembly for cooperative manipulation, drawing inspiration from the self-organized collectives formed by ants. Each unit retains nimble maneuverability while joint-equipped docking interfaces at both ends enable end-to-end self-assembly into a flying manipulator. We show that multiple units autonomously dock in flight; once latched, they maintain a zero-clearance interlock by controlling the contact force and torque, enabling reliable aggregation and articulated motion even outdoors. We further show that self-reconfigurability enables morphological switching between nimble individual flight and collective articulated manipulation, while realizing core in-flight manipulation primitives including pushing, pulling, rotating, grasping, and carrying. LEGION's self-organization enables aerial robots, especially in swarms, to shift from passive observers to active participants in their environment, broadening the scope of aerial physical interaction.
Neuromorphic Control of a Flapping-Wing Robot on Resource-Constrained Hardware
Flapping-Wing Micro Aerial Vehicles (FWMAVs) provide exceptional maneuverability and aerodynamic efficiency but pose significant challenges for onboard control due to nonlinear dynamics and stringent Size, Weight, and Power (SWaP) constraints, as exemplified by a butterfly-inspired robot less than 30 gram. To this end, we present a hierarchical neuromorphic control framework that enables fully onboard, closed-loop flight on a widely available, resource-constrained ESP32 microcontroller with a unit cost of approximately $5. Specifically, our method deploys two lightweight Spiking Neural Networks (SNNs) onboard: one for state estimation from raw sensory feedback and another for control via modulation of a Central Pattern Generator (CPG) for wing actuation. Trained by imitation learning, the system achieves stable pitch and heading angle tracking during untethered real-world flight. Experimental results further reveal that the SNN-based controller reduces latency by 36% (1059us to 680us) and power by 18% (0.033W to 0.027W) for inference compared to the conventional Artificial Neural Network (ANN) baseline, demonstrating the viability of spike-based computation without specialized hardware. To the best of our knowledge, this work constitutes the first demonstration of fully onboard neuromorphic control for autonomous flight of a FWMAV, highlighting the potential of SNNs to enable energy-efficient autonomy under stringent SWaP constraints. Visual abstract: http://bit.ly/4nI8ECY
Beyond Waypoints: Dual-Heatmap Grounding for Cross-Embodiment Semantic Navigation
Grounding open-ended semantic instructions into physically executable local goals is a fundamental challenge in human-robot interaction. While existing navigation frameworks often regress deterministic waypoints, this rigid formulation collapses spatial uncertainty and frequently targets non-traversable object centers, leading to severe execution failures. In this work, we focus on the practical setting of in-FOV semantic navigation, where a robot receives concise, interleaved multimodal (text and image) prompts. To bridge the gap between abstract semantic intent and physical reachability, we propose a unified Vision-Language framework that abandons single-point regression in favor of a Dual-Heatmap representation. Our framework predicts a navigation affordance heatmap that captures continuous reachable regions, coupled with a facing heatmap for orientation constraints. These dense outputs inherently function as a differentiable semantic potential field, integrating seamlessly with downstream local planners. To support this paradigm, we build a fully automated, foundation-model-assisted synthetic data pipeline and establish a comprehensive simulation benchmark. Extensive experiments demonstrate that our framework achieves state-of-the-art performance among comparable 8B baselines. Crucially, a feature-fusion study and simulation studies across diverse robot embodiments (Jetbot, H1, Aliengo) reveal that explicit heatmap prediction drastically improves the Affordance Rate (AR). By placing targets reliably in executable free space, our framework effectively mitigates the brittleness of point regression, offering a transferable path toward safe cross-embodiment semantic navigation.
RoboJailBench: Benchmarking Adversarial Attacks and Defenses in Embodied Robotic Agents
Recent advances in Vision-Language Models (VLMs) facilitate a new class of embodied AI systems, where these models are integrated into physical platforms, e.g. robots and autonomous vehicles, to interpret visual scenes and execute natural language commands in diverse environments. Previous research has introduced jailbreak attacks and defenses for embodied AI. Their evaluations, however, rely on ad-hoc datasets, limited metrics, and emphasize attack success while neglecting the trade-off between security and the ability to follow benign commands. Existing benchmarks and evaluation frameworks either target traditional chat-based models or focus on non-adversarial safety evaluation for embodied AI; neither captures the adversarial risks, inputs, consequences, and evaluation criteria necessary for jailbreak attacks in embodied AI systems. In this paper, we address this gap with RoboJailBench, which consists of three core components. We establish a security taxonomy derived from ISO standards, regulatory rules, and documented incidents. This effort yields 18 categories of security violation consequences for embodied AI. We introduce an intent contrast dataset pipeline that augments existing datasets with paired adversarial and benign goals to measure both security and utility. Lastly, we provide an evolving repository with standardized metrics and a unified process for assessing and integrating new attacks and defenses. With this benchmark, we construct a new taxonomy-balanced dataset and augment five existing datasets. We integrate four attacks and two defenses to evaluate their performance on leading embodied VLMs. This benchmark provides the first standardized evaluation framework for jailbreak attacks in embodied AI and supports future research. We release our code, datasets, and artifacts, and maintain a leaderboard at https://purseclab.github.io/benchmark-for-robotics-security.
ContextFlow: Hierarchical Task-State Alignment for Long-Horizon Embodied Agents
Long-horizon embodied agents increasingly delegate navigation, search, approach, and manipulation to specialist executors. As these executors become stronger, the main bottleneck shifts from local skill execution to maintaining a coherent task frontier across planning, monitoring, memory, and execution. We study task-state misalignment, a task-level consistency failure in which the planner's active stage, runtime evidence, remembered context, and delegated executor no longer justify the same next-step decision. This failure can lead to unsupported handoffs, stage lock, executor-context mismatch, and unnecessary replanning. We propose ContextFlow, an inspectable alignment framework that represents stages as explicit contracts, converts runtime observations into evidence packets, and applies scoped updates including continue, refine, transfer, promote, and repair. ContextFlow keeps specialist executors responsible for local closed-loop control while making task-frontier alignment explicit and auditable. Experiments and demonstration traces on long-horizon embodied tasks illustrate how evidence-grounded scoped updates diagnose and mitigate recurring task-state failures.
DEFLECT: Delay-Robust Execution via Flow-matching Likelihood-Estimated Counterfactual Tuning for VLA Policies
Vision-Language-Action (VLA) policies are typically deployed with asynchronous inference: the robot executes a previously predicted action chunk while the model computes the next one. This creates a prediction-execution misalignment: the chunk is conditioned on the observation taken before inference began, but executes in a physical state that has already drifted forward by several control steps; naive asynchronous rollover collapses from 89% to under 1% on Kinetix as the inference cycle covers up to seven control steps. We introduce DEFLECT, a fully offline post-training refinement that applies as a near drop-in upgrade to existing async-VLA stacks by converting latency itself into a label-free preference signal: counterfactual fresh/stale action pairs are constructed from a frozen reference policy and scored under the deployment-time conditioning via an implicit flow-matching likelihood-ratio surrogate, with no human labels, reward models, or online rollouts. DEFLECT substantially extends the usable delay envelope of async VLA control, with +6.4 success-rate gain in the high-latency regime (5-7 control steps), +4.6 when transferred to a real-scale VLA at the longest delay, and consistent improvements on two real-robot tasks (a bimanual conveyor pick-and-place and a reactive whack-a-mole).
Domain-Adaptive Communication-Rate Optimization for Sim-to-Real Humanoid-Robot Wireless XR Teleoperation
Wireless extended reality (XR) teleoperation provides embodied interaction capability for collecting humanoid robot demonstrations, but the large-scale adoption is restricted by the overhead of high-frequency motion transmission. This paper develops a system framework that integrates sampling, transmission, interpolation, and reconstruction and formulates a communication-rate optimization that aims to minimize the communication energy while maintaining the reconstruction accuracy of robot motion trajectories through dimension-wise sampling-rate control. Since acquiring real-time feedback from physical robots is limited by hardware costs, it is necessary to solve the problem through simulator interaction with offline real-domain data correction. To guide sim-to-real adaptation, we provide a PAC-Bayes generalization characterization that reveals the effects of latent density-ratio estimation, finite-sample deviation, and encoder bias. Building on this analysis, we propose a proximal policy optimization (PPO) method with density-ratio weighting and trust-region regularization. Experiments on public humanoid teleoperation dataset show that the proposed method improves the tradeoff between reconstruction error and communication energy consumption under sim-to-real distribution shift. We further analyze the effectiveness of the proposed algorithm across various wireless channels and dynamic motion trajectories.
comment: submitted to IEEE journal
PRISM-SLAM: Probabilistic Ray-Grounded Inference for Scale-aware Metric SLAM
Monocular SLAM historically suffers from scale ambiguity and tracking failure in dynamic environments. While recent vision foundation models (VFMs) provide remarkable zero-shot depth priors, naively integrating these deterministic predictions ignores predictive uncertainty and frame-to-frame scale inconsistencies. We propose PRISM-SLAM, a real-time framework that rigorously integrates VFM priors into a structured Bayesian factor graph to achieve scale-aware, metric-consistent localization and mapping. Specifically, we introduce a Plücker Ray-Distance Factor to anchor monocular observations in absolute space within a globally consistent metric coordinate system, mathematically resolving scale drift by making the metric scale Fisher-identifiable. To handle environmental dynamics, we derive an epistemic uncertainty proxy from temporal depth consistency and formulate a Dynamic Scene Uncertainty Gating (DSUG) mechanism. This soft-gating approach probabilistically down-weights dynamic distractors without incurring the heavy computational overhead associated with traditional semantic segmentation masks. By employing a multi-process architecture that asynchronously processes VFM inference and geometric tracking, PRISM-SLAM provides verified metric output at 30 FPS using solely RGB input, bridging the gap between foundation models and real-world robotic applications. Evaluated on the TUM RGB-D and 7-Scenes benchmarks, PRISM-SLAM achieves a metric $SE(3)$ Absolute Trajectory Error (ATE) nearly identical to its oracle-aligned $Sim(3)$ error. This demonstrates that our system can produce deployment-ready metric trajectories by delivering robust metric SLAM solutions without any post-hoc scale correction. Project page: https://prismslam-cmd.github.io/prismslam_pr/
Bilateral Teleoperation with Compliant 6-DOF Pose-and-Force Sensing
Existing bilateral teleoperation platforms still rely on costly rigid six-axis force/torque sensors, tightly coupled leader-follower hardware, and kilohertz control loops. We present a Cartesian bilateral framework built on the hardware-agnostic WinGs Operating Studio (WOS) middleware, in which a low-cost compliant 6-DOF pose-and-force sensing end-effector, Delta6, is mounted on both sides so that each manipulator behaves as an end-effector 6-DOF series elastic actuator (SEA). The leader runs a damping-only admittance loop with a 6-D biquad notch filter; the follower realizes a stiffness-damping impedance through a position-based outer loop with a PID wrench-to-pose mapping. Three time scales (hardware I/O, mid-rate impedance/admittance, low-rate teleoperation messages) are explicitly decoupled, enabling the same application to drive heterogeneous arms. On a Lite6/FR3 testbed at 150 Hz, the system tracks stably under delays up to $120\pm40$ ms and 1% packet loss, matches the prescribed virtual stiffness in contact, and shows a favorable cumulative energy signature in passivity-style tests.
comment: 8 pages, 16 figures, 2 tables. Preprint
Graph Neural Planning and Predictive Control for Multi-Robot Communication-Constrained Unlabeled Motion Planning ICRA
The multi-robot unlabeled motion planning problem of concurrently assigning robots to goals and generating safe trajectories is central in many collaborative tasks. Recent Graph Neural Network methods offer scalable decentralized solutions but rely on simplified dynamics and simulation environments, overlooking key challenges of real-world deployment such as dynamic feasibility and communication constraints. To address these gaps, we propose a hierarchical framework that combines a Graph ATtention Planner (GATP) with a decentralized Nonlinear Model Predictive Controller (NMPC). GATP provides intermediate subgoals through multi-robot cooperation, and the NMPC enforces safety under nonlinear dynamics and actuation constraints. We evaluate our framework in both simulation and real-world quadrotor experiments. Thanks to attention mechanisms and minimal communication requirements, we demonstrate improved generalization to larger teams, robustness to communication delays up to 200 ms and practical feasibility with decentralized on-board inference.
comment: 8 pages, 6 figures, Accepted at the IEEE International Conference on Robotics and Automation (ICRA) 2026
CLUE: Adaptively Prioritized Contextual Cues by Leveraging a Unified Semantic Map for Effective Zero-Shot Object-Goal Navigation
Zero-shot object-goal navigation (ZSON) is a challenging problem in robotics that requires a comprehensive understanding of both language and visual observations. Contextual cues from rooms and objects are critical, but their relative importance depends on the target: some objects are strongly tied to specific room types, while others are better predicted by nearby co-located objects. Existing methods overlook this distinction, leading to inefficient and inaccurate exploration. We present CLUE, a novel navigation framework that adaptively balances the use of contextual rooms and objects by leveraging commonsense knowledge extracted from an offline large language model (LLM). By estimating a target's association with room types using LLM, the agent prioritizes room cues for predictable objects and object cues for those with weak room associations. Our framework constructs a unified semantic value map that integrates both types of contextual information, adaptively weighted by the target's ambiguity to guide exploration. Combined with multi-viewpoint verification and an exploration strategy informed by contextual cues, CLUE achieves robust and efficient navigation. Extensive experiments in simulation and real-world deployments show that our method consistently outperforms state-of-the-art baselines in both success rate (SR) and success weighted by path length (SPL), demonstrating its effectiveness and practicality for real-world navigation tasks.
comment: 8 pages, 5 figures
Aerial Inspection Behaviors via RL-based Quadrotor Control for Under-canopy Forest Environments
This paper addresses the problem of using a deep Reinforcement Learning (RL)-based low-level Quadrotor controller within an autonomous Quadrotor navigation stack for aerial inspection missions in under-canopy forest environments. Specifically, the article presents an end-to-end (mapping states to RPMs) Quadrotor control policy that achieves inspection view-pose tracking (simultaneous position and yaw reference tracking), which is crucial for various target inspection behaviors and point-to-point navigation in forests. To ensure safe and reliable deployment of the end-to-end RL controller in long-range missions, this article utilizes a higher navigation guidance layer comprising of a Traveling Salesman Problem planner (TSP) and a Rapidly-exploring Random Tree Star (RRT*) planner. Over a known map of a forest and a set of user-specified inspection regions, the TSP planner finds the optimal visitation sequence. Between two target regions, collision-free paths that respect the tracking limitations of the lower end-to-end RL policy are generated by an RRT* planner. Through five target inspection scenarios, this article demonstrates that an RL-based motor-level stabilizing controller, supported by a navigation guidance layer, can be used effectively as the low-level inspection execution module for under-canopy forest inspection missions.
comment: Submitted to 2026 IEEE 22nd International Conference on Automation Science and Engineering
Conflict-Aware Active Perception and Control in 3D Gaussian Splatting Fields via Control Barrier Functions
Active perception in uncertain environments requires robots to navigate safely while acquiring informative observations to reduce map uncertainty. These objectives inherently conflict, as informative viewpoints often lie near uncertain regions with higher collision risk. To address this challenge, we develop a conflict-aware active perception and control framework for robotic systems operating in environments represented by 3D Gaussian Splatting (3DGS). Safety is enforced using a Control Barrier Function (CBF) derived from an Average Value-at-Risk AV@R collision-risk metric that accounts for geometric uncertainty and guarantees forward invariance of a safe set. To improve perception, we propose a risk-aware Expected Information Gain (EIG) formulation for selecting the next-best-view and introduce perception barrier functions that align the camera orientation with the local information-ascent direction. To obtain a tractable formulation for these conflicting safety and perception objectives, we propose a unified safety-critical, perception-aware quadratic program that enforces safety as a hard constraint while relaxing perception constraints through slack variables. Simulation results demonstrate that the proposed method improves both safety and information acquisition compared to existing 3DGS-based approaches.
comment: Project website: https://sircesoc.github.io/Conflict_Aware_Active_Perception/
Fault-Tolerant, Rigidity-Preserving Control of Inflatable Truss Robots
Isoperimetric robotic trusses can adapt to different tasks and environments because they have a high strength-to-weight ratio, can change their own shape dramatically, and can be reconfigured into a variety of different shapes. However, motor failures in operational environments can severely limit operational capabilities if not properly addressed. This paper presents a fault-tolerant control framework for an inflatable robotic truss that maintains functionality despite motor failures, shown through three key contributions. First, we extend the kinematic optimization to handle arbitrary combinations of motor failures by imposing equality constraints to ensure failed actuators are not used. Second, we introduce discrete-time control barrier function (DTCBF) constraints that mathematically guarantee structural rigidity while maximizing workspace utilization, a critical requirement for reliable operation of truss robots under discrete-time control. Third, we implement closed-loop position control using onboard encoder feedback and a forward kinematics-based state estimator, improving positional accuracy in the presence of disturbances. We validate our approach through simulation and hardware experiments on a 2D isoperimetric truss testbed. For a 2D configuration with 6 actuators, we demonstrate >69% workspace preservation under single-motor failures and a >25% improvement in tracking accuracy with closed-loop control. These results establish a foundation for more robust and resilient isoperimetric truss robots operating under degraded actuation.
Faster or Stronger: Towards Flexible Visual Place Recognition via Weighted Aggregation and Token Pruning
Visual Place Recognition (VPR) aims to match a query image to reference images of the same place in a large-scale database. Recent state-of-the-art methods employ Vision Transformers (ViTs) as backbone foundation models to extract patch-level features that are robust to viewpoint, illumination, and seasonal variations, which are then aggregated into a compact global descriptor for retrieval. Most existing aggregation methods uniformly pool patch tokens into learned clusters, despite the fact that different clusters often encode distinct spatial or semantic patterns and contribute unequally to VPR performance. To address this limitation, we propose Weighted Aggregated Descriptor (WeiAD), which assigns weights to clusters during aggregation, producing more discriminative global representations. Beyond accuracy, retrieval latency is a critical concern for large-scale deployments and resource-constrained edge devices. Prior work mainly reduces latency by compressing global descriptors, while overlooking the cost of feature extraction, an issue exacerbated by ViT-based backbones. We therefore introduce WeiToP, a VPR-oriented token pruning framework that reduces feature extraction cost via self-distillation, where aggregation-induced token importance supervises a lightweight pruning module attached to an early transformer layer, enabling inference-time token pruning. After a single joint training phase, WeiToP enables plug-and-play token pruning at inference time, allowing flexible and on-demand control over the accuracy-efficiency trade-off without additional training. Moreover, WeiToP outperforms existing token pruning methods adapted from general vision tasks.
The Yes-Man Syndrome: Benchmarking Abstention in Embodied Robotic Agents
Vision-language models (VLMs) are used as high-level planners for embodied agents, translating natural language instructions and visual observations into action plans. While prior work has studied abstention in LLMs, existing benchmarks are largely text-only and do not capture the perceptual grounding and physical constraints inherent to embodied robotics environments. In such settings, abstention requires recognizing when instructions are ambiguous, physically infeasible, based on false premises, or otherwise unresolvable given the available sensory modalities and context. To address this gap, we introduce a taxonomy to categorize abstention in the context of embodied robotics and present RoboAbstention, a scalable and auditable framework for generating abstention instructions grounded in images gathered from five robotics datasets. RoboAbstention instantiates the taxonomy through a three-phase pipeline: (1) structured visual grounding, (2) deterministic constraint derivation, and (3) controlled instruction generation via category-specific templates. This enables the construction of a diverse dataset with verifiable abstention conditions. We evaluate several frontier VLMs and find that all models exhibit significant weaknesses in abstention, including those with advanced reasoning capabilities. The best-performing model, Gemini 2.5 Flash, abstains on only 39.0% of our 6,069 benchmark instructions, while the embodied planner Gemini Robotics ER 1.6 Preview abstains on just 16.5%. We further explore methods for improving abstention in VLM planners, such as defensive prompting and in-context learning, and find that these interventions substantially improve performance, reaching 93.6% abstention rate for Gemini Robotics ER 1.6 Preview and 88.6% for GPT 5.4 Mini, yet no approach fully solves the problem. We open-source RoboAbstention at https://purseclab.github.io/RoboAbstention/.
Enhancing Graph-Based SLAM in GNSS-Denied environments by leveraging leg odometry ICRA
Autonomous navigation in GNSS-denied environments remains a core challenge for legged robots, where exteroceptive sensors such as LiDAR are prone to elevation drift in geometrically sparse or repetitive scenes. We present a factor graph architecture that augments the LIO-SAM framework with a parallel kinematic lane driven by proprioceptive leg odometry, coupled to the main LiDAR-inertial lane via an identity relative pose constraint with a selective noise model. Applied to a Linxai D50 quadruped platform across two outdoor loops totaling over one kilometer, our approach reduces elevation drift from over 30m to under 30cm and enables convergence in a scene where the baseline pipeline fails entirely. These results suggest that proprioceptive data, already computed onboard for gait control, constitutes a lightweight and effective vertical anchor for SLAM in GNSS-denied settings.
comment: 4 pages, 3 figures, 2 tables, for ICRA workshop on Robot Meets GNSS and Ranging for Seamless Autonomy
Spacetime Optimal-Transport Attention for Visuo-Haptic Imitation Learning of Contact-Rich Manipulation
Contact-rich manipulation tasks such as tight-clearance insertion, connector mating, polishing, and surface-conforming wiping remain difficult for data-driven controllers because they couple discontinuous contact dynamics, partial observability, and strict safety constraints. No single sensing modality suffices: vision supplies global context before contact, force/torque (F/T) feedback governs interaction after contact, and proprioceptive pose provides a consistent kinematic backbone. Most prior imitation-learning policies for contact-rich tasks operate on uni- or bi-modal signals, and the few that fuse three modalities typically adopt off-the-shelf attention modules with no explicit prior on how attention mass should be distributed across task-relevant regions. We present Spacetime Optimal-Transport Attention (SO-TA), a tri-modal fusion backbone that replaces softmax-normalized patch attention by an entropy-regularized Optimal Transport (OT) alignment between force-pose-derived sub-queries and visual patches. Explicit marginal constraints act as a structured inductive bias for contact-rich tasks, encouraging conditioning-aware spatial selection that is stable across illumination, distractors, and partial occlusion. SO-TA is paired with a diffusion-based sequence policy mapping observation windows to pose-action chunks. We evaluate SO-TA on three real-robot tasks: tight peg-in-hole assembly, BCM wiring-connector insertion, and curved-surface mark erasing. With ~200 rollouts per condition, SO-TA reaches 100% success on tight peg-in-hole versus 93% for cross-attention at matched capacity, and retains 82.5% success under illumination, distractor, and partial-occlusion perturbations where a concatenation baseline drops to 43.5%. OT-derived patch heatmaps and leave-one-out modality-influence ratios provide interpretable, phase-dependent diagnostics.
comment: 8 pages, 16 figures, 3 tables. Preprint
Multi-Week, In-Class Deployments of Telepresence Robots With Four Homebound K-12 Students: Benefits, Challenges, and Recommendations
Missing significant amounts of school during K-12 education is known to put students' cognitive and social development at risk. Alternatives such as home instruction and online learning are common, but lack sufficient interaction with peers and teachers in the classroom. Mobile remote presence systems, or telepresence robots, are promising for homebound students because they provide embodiment and mobility in addition to the real-time participation offered by video conferencing technologies. Research is needed, however, for telepresence robots to meet the complex needs of homebound students participating remotely in the K-12 classroom context. We present findings from four multi-week deployments with homebound K-12 students attending classes via telepresence robots. The homebound students' experiences were documented in a total of 15 interviews and analyzed qualitatively as case studies. The homebound student participants and their deployment contexts differed from one another along multiple dimensions, and while some benefits of mobile remote attendance were enjoyed by all participants, each participant also experienced unique benefits. Some challenges with hearing, seeing, and moving the robot around the classroom warranted improvements to the design of the telepresence system. Other challenges suggested priorities for managing a classroom deployment, such as ensuring that the remote student is included in classroom activities, accountable to the teacher, and treated with respect by classmates. Based on insights from the study, we make recommendations for real-world deployment procedures in similar contexts.
Scalable Multi-robot Motion Planning via Hierarchical Subproblem Expansion and Workspace Decomposition Refinement
A fundamental challenge in multi-robot motion planning is achieving sufficient coordination to avoid inter-robot conflicts without incurring the large computational expense of searching the joint configuration space of the robot group. In this work, we present a method for multiple mobile robot motion planning that achieves an improvement in planning time up to an order of magnitude by leveraging the insight that we can use discrete search over a workspace decomposition to provide coordination between robots during planning. While prior work uses workspace topology to inform when coordination between robots is needed and then composes robots into their joint configuration space, we take a step further by iteratively refining our workspace representation to allow our planner to search smaller, decoupled configuration spaces.
comment: Accepted to WAFR 2026
VBT-MPC: Vision-Based Tactile MPC for Contour Following
Tactile sensing plays a key role in robotic manipulation, particularly in tasks like surface inspection. Successful execution requires maintaining contact while accurately tracking object contours. In this work, we propose a Vision-Based Tactile Model Predictive Control (VBT-MPC) framework for robotic contour following using a Vision-Based Tactile Sensor (VBTS) mounted in an eye-in-hand configuration. The proposed controller operates directly in contour features space, thereby avoiding the need for separate pose-estimation modules or complex force-control architectures. We further compare our VBT-MPC with visual-servoing strategies adapted to tactile features, and evaluate contour tracking on objects with diverse geometries and materials in both simulation and real-world experiments.
comment: This article has been accepted for publication in IEEE Robotics and Automation Letters. This is a preprint version. This work was supported by the Interreg-VI Sudoe and European Regional Development Funds through the REMAIN Project under Grant S1/1.1/E0111
STELLAR: Scaling 3D Perception Large Models for Autonomous Driving
Model scaling has demonstrated remarkable success through large-scale training on diverse datasets. It remains an open question whether the same paradigm would apply to autonomous driving perception systems due to unique challenges, such as fusing heterogeneous sensor data and the need for sophisticated 3D spatial understanding. To bridge this gap, we present a comprehensive study on systematically analyzing the impact of scale on these systems. We develop our STELLAR model based on Sparse Window Transformer, by extending the input modalities to include LiDAR, radar, camera, and map prior. We train the model on a large-scale dataset of 50 million driving examples with up to 500 million parameters. Our large-scale experiments reveal empirical scaling trends that connect model performance to model size, data, and compute. The resulting model establishes a new state-of-the-art on the Waymo Open Dataset challenge, outperforming prior arts by a large margin. Our work demonstrates that large-scale training is a highly promising path for advancing the capabilities of perception models for autonomous driving.
SUGAR: A Scalable Human-Video-Driven Generalizable Humanoid Loco-Manipulation Learning Framework
Building humanoid robots capable of generalizable whole-body loco-manipulation in the real world remains a fundamental challenge. Existing methods either rely on laborious task-specific reward engineering, rigidly replay reference motions that fail to generalize, or depend on costly teleoperation that limits scalability. While human videos capture diverse human behaviors, motion priors inferred from them are inherently imperfect, suffering from occlusion, contact artifacts, and retargeting errors that render them unsuitable for direct policy learning. To address this, we present SUGAR, a scalable data-driven framework that converts diverse human videos into deployable humanoid loco-manipulation skills, without any task-specific reward engineering or reference-motion conditioning at inference. SUGAR proceeds in three stages. First, a fully automated pipeline extracts kinematic interaction priors including human-object motion trajectories and contact labels from unstructured human videos. Second, a privileged physics-based refiner uses a unified mimic reward and progressive state pool to transform imperfect priors into physically feasible, high-fidelity skills. Third, refined skills are distilled into a hierarchical autonomous policy consisting of a command generator and a command tracker. We evaluate SUGAR on six representative loco-manipulation tasks in simulation and real-world humanoid hardware. Our method substantially outperforms reference-tracking baselines, and performance scales clearly with the amount of human video data. It also achieves zero-shot real-world transfer with reliable closed-loop execution, autonomous failure recovery, and stable long-horizon performance under external perturbations. Project Page: https://tianshuwu.github.io/sugar-humanoid/
comment: Project Page: https://tianshuwu.github.io/sugar-humanoid/
Proximal State Nudging: Reducing Skill Atrophy from AI Assistance
Skill atrophy, the gradual decline of human capability under AI assistance, poses a safety risk in shared-control of semi-autonomous systems, where operators may be unable to distinguish their own inputs from autonomous corrections. We propose Proximal State Nudging (PSN), a shared autonomy algorithm that jointly optimizes for skill development and task performance by nudging users toward states estimated to be most learnable. We first show that PSN outperforms existing shared autonomy baselines in balancing student improvement in unassisted reward with overall shared performance, using simulated students in the classic LunarLander environment. We then present, to the best of our knowledge, the first human subject studies of a planner incorporating learning-compatible shared autonomy: across two driving tasks in the CARLA simulator (High Performance Racing and Parallel Parking, n = 60), PSN produces up to 7x larger gains in unassisted skill than standard blended shared autonomy, while incurring 50% fewer collisions than unassisted self-practice.
comment: 9 pages
Terrestrial Soft Mobile Robots: A Review
Soft mobile robots have emerged as a promising area of research with potential applications in various disciplines including but not limited to search-and-rescue, service, surveillance, explorations, and manufacturing. In this article, we provide a comprehensive review of the current state of soft mobile robot research, focusing on wheelless terrestrial locomotive systems. We include past and present developments in locomotion strategies, actuation methods, modeling approaches, and control systems. Further, we identify key research challenges that must be overcome to enable the widespread adoption of soft mobile robots in various applications. Overall, this article provides a valuable resource for researchers and practitioners interested in the field of soft mobile robots and soft robotics.
Mechanisms of Misgeneralization in Physical Sequence Modeling
Generative sequence models are often trained to plan motion in physical domains, from robotics to mechanical simulations. When constructing a dataset to train such a model, engineers may curate demonstrations to specify how trajectories should be distributed over a physical quantity like travel distance or mechanical energy. For example, a roboticist building a maze navigation agent might choose demonstrations whose travel distances cover a fixed range uniformly, hoping to constrain the agent's expected power usage. We find that standard deep learning can violate this intent: each generated trajectory can seem plausible on its own, but the aggregate distribution over the physical quantity is wrong. We call this failure physical misgeneralization, and develop an account of its mechanism. Using controlled synthetic tasks, we show that physical misgeneralization arises when local errors typical of the model class propagate through the physical measurement to shift the recovered distribution. We estimate these errors with a data deviation kernel, and we use it to predict which physical quantities gain or lose mass in both our synthetic and more applied maze navigation and double-pendulum motion tasks. Finally, our mechanistic interpretation helps identify which mitigation strategies are structurally promising, and we use it to propose a kernel-informed intervention.
comment: Preprint. kentonishi.com/physical-misgeneralization
Robots that learn to evaluate models of collective behavior
Understanding and modeling animal behavior is essential for studying collective motion, decision-making, and bio-inspired robotics. Yet, evaluating the accuracy of behavioral models still often relies on offline comparisons to static trajectory statistics. Here we introduce a reinforcement-learning-based framework that uses a biomimetic robotic fish (RoboFish) to evaluate computational models of live fish behavior through closed-loop interaction. We trained policies in simulation using four distinct fish models-a simple constant-follow baseline, two rule-based models, and a biologically grounded convolutional neural network model-and transferred these policies to the real RoboFish setup, where they interacted with live fish. Policies were trained to guide a simulated fish to goal locations, enabling us to quantify how the response of real fish differs from the simulated fish's response. We evaluate the fish models by quantifying the sim-to-real gaps, defined as the Wasserstein distance between simulated and real distributions of behavioral metrics such as goal-reaching performance, inter-individual distances, wall interactions, and alignment. The neural network-based fish model exhibited the smallest gap across goal-reaching performance and most other metrics, indicating higher behavioral fidelity than conventional rule-based models under this benchmark. More importantly, this separation shows that the proposed evaluation can quantitatively distinguish candidate models under matched closed-loop conditions. Our work demonstrates how learning-based robotic experiments can uncover deficiencies in behavioral models and provides a general framework for evaluating animal behavior models through embodied interaction.
Active Learning of Fractional-Order Viscoelastic Model Parameters for Realistic Haptic Rendering
Effective medical simulators necessitate realistic haptic rendering of biological tissues that exhibit viscoelastic material properties, such as creep and stress relaxation. Fractional-order models provide an effective means of describing intrinsically time-dependent viscoelastic dynamics with few parameters, as they naturally capture memory effects. However, due to the unintuitive, frequency-dependent coupling among the order of the fractional element and other parameters, determining appropriate parameter values for fractional-order models that yield high perceived realism remains a significant challenge. In this study, we propose a systematic means of determining the parameters of fractional-order viscoelastic models that optimizes the perceived realism of haptic rendering across general populations. First, we demonstrate that the parameters of fractional-order models can be effectively optimized through active learning, using qualitative feedback-based human-in-the-loop (HiL) optimization, to ensure consistently high realism ratings for each individual. Second, we propose a rigorous method to combine HiL optimization results into an aggregate perceptual map trained on the entire dataset, and demonstrate how to select population-level optimal parameters from this representation that are broadly perceived as realistic across general populations. Finally, we provide evidence of the effectiveness of the generalized fractional-order viscoelastic model parameters for three viscoelastic materials by characterizing their perceived realism through human-subject experiments. Overall, generalized fractional-order viscoelastic models established through the proposed HiL optimization and aggregation approach possess the potential to significantly improve the sim-to-real transition performance of medical training simulators.
comment: This work has been submitted to the IEEE Transactions on Haptics for possible publication. 14 pages, 8 figures
Learn2Decompose: Learning Problem Decomposition for Efficient Sequential Multi-object Manipulation Planning
We present an efficient task and motion replanning approach for sequential multi-object manipulation in dynamic environments. Conventional Task And Motion Planning (TAMP) solvers experience an exponential increase in planning time as the planning horizon and number of objects grow, limiting their applicability in real-world scenarios. To address this, we propose learning problem decompositions from demonstrations to accelerate TAMP solvers. Our approach consists of three key components: goal decomposition learning, computational distance learning, and object reduction. Goal decomposition identifies the necessary sequences of states that the system must pass through before reaching the final goal, treating them as subgoal sequences. Computational distance learning predicts the computational complexity between two states, enabling the system to identify the temporally closest subgoal from a disturbed state. Object reduction minimizes the set of active objects considered during replanning, further improving efficiency. We evaluate our approach on three benchmarks, demonstrating its effectiveness in improving replanning efficiency for sequential multi-object manipulation tasks in dynamic environments.
comment: Extension of RAL version: added PR2 Whole-body kitchen task and detailed discussion on limitations in main text; added pseudocode and robustness analysis of our approach, and formal analysis on why and when task goals are decomposable in appendix
STABLE: Simulation-Ready Tabletop Layout Generation via a Semantics-Physics Dual System ICML 2026
Generating simulation-ready tabletop scenes from task instructions is an intriguing and promising research direction in the field of Embodied AI. However, existing task-to-scene generation methods rely exclusively on large language models (LLMs) to predict scene layouts, inevitably yielding object collisions or floating due to LLMs' inherent limitations in 3D spatial reasoning. In this paper, we present STABLE, a semantics-physics dual-system tailored for simulation-ready tabletop scene generation. STABLE consists of two complementary modules: (i) a Semantic Reasoner, a fine-tuned LLM trained on a structured tabletop scene dataset to generate coarse layouts from input task instructions, and (ii) a Physics Corrector, a physics-aware flow-based denoising model that outputs pose updates to refine layouts, which ensures the physical plausibility of scenes while preserves semantic alignment with task instructions. STABLE adopts a progressive generation paradigm: by alternating between the Semantic Reasoner and Physics Corrector, it incrementally expands the scene from task-critical objects to background objects. Experiments demonstrate that STABLE successfully generates simulation-ready tabletop scenes that strictly conform to task instructions and significantly enhances the physical validity of scenes over prior art.
comment: ICML 2026
A Practical Framework of Key Performance Indicators for Multi-Robot Lunar and Planetary Field Tests ICRA 2026
Robotic prospecting for critical resources on the Moon, such as ilmenite, rare earth elements, and water ice, requires robust exploration methods given the diverse terrain and harsh environmental conditions. Although numerous analog field trials address these goals, comparing their results remains challenging because of differences in robot platforms and experimental setups. These missions typically assess performance using selected, scenario-specific engineering metrics that fail to establish a clear link between field performance and science-driven objectives. In this paper, we address this gap by deriving a structured framework of KPI from three realistic multi-robot lunar scenarios reflecting scientific objectives and operational constraints. Our framework emphasizes scenario-dependent priorities in efficiency, robustness, and precision, and is explicitly designed for practical applicability in field deployments. We validated the framework in a multi-robot field test and found it practical and easy to apply for efficiency- and robustness-related KPI, whereas precision-oriented KPI require reliable ground-truth data that is not always feasible to obtain in outdoor analog environments. Overall, we propose this framework as a common evaluation standard enabling consistent, goal-oriented comparison of multi-robot field trials and supporting systematic development of robotic systems for future planetary exploration.
comment: Presented at ICRA 2026 Workshop on Multi-Agent Robotic Systems: Real-World Collaboration and Interaction
EfficientTDMPC: Improved MPC Objectives for Sample-Efficient Continuous Control
We introduce EfficientTDMPC, a sample-efficient model-based reinforcement learning method for continuous control built on the TD-MPC family of algorithms. Central to this family is a planner that aims to find an action sequence that maximizes the estimated return. The return is estimated using a learned model and value networks, each of which can introduce error. EfficientTDMPC proposes to reduce this error in two ways. First, it introduces an ensemble of dynamics models and averages the return estimates across those models and across different rollout depths. Second, it adds the option to apply an uncertainty penalty to the planner objective, yielding a planner that avoids actions with uncertain return estimates. It then adds practical improvements which increase buffer data freshness and reduce compute. Lastly, we find that our contributions enable EfficientTDMPC to benefit more from a higher update-to-data (UTD) ratio, further improving sample efficiency. To the best of our knowledge, in the low data regime of each benchmark, EfficientTDMPC achieves state-of-the-art (SOTA) in terms of sample efficiency on HumanoidBench-Hard and DMC hard, while matching SOTA on DMC easy.
Receptogenesis in a Vascularized Robotic Embodiment
Equipping robotic systems with the capacity to generate $\textit{ex novo}$ hardware during operation extends control of physical adaptability. Unlike modular systems that rely on discrete component integration pre- or post-deployment, we envision the possibility that physical adaptation and development emerge from dynamic material restructuring to shape the body's intrinsic functions. Drawing inspiration from circulatory systems that redistribute mass and function in biological organisms, we utilize fluidics to restructure the material interface, a capability currently unpaired in robotics. Here, we realize this synthetic growth capability through a vascularized robotic composite designed for programmable material synthesis, demonstrated via receptogenesis - the on-demand construction of sensors from internal fluid reserves based on environmental cues. By coordinating the fluidic transport of precursors with external localized UV irradiation, we drive an $\textit{in situ}$ photopolymerization that chemically reconstructs the vasculature from the inside out. This reaction converts precursors with photolatent initiator into a solid dispersion of UV-sensitive polypyrrole in PETG, establishing a sensing modality validated by a characteristic decrease in electrical impedance. The newly synthesized sensor closed a local control loop to regulate wing flapping in a moth-inspired robotic demonstrator. This physical update increased the robot's capability in real time. Material-level functional restructuring of the vascularized robot body provides a proof-of-concept materials basis for $\textit{ex novo}$ hardware generation in situated robotic systems - a step toward situated robots in which a reaction to environmental stimuli autonomously produces hardware updates to match new environmental demands.
comment: Supplementary Files currently unavailable online. Please contact the First Author to request any Supplementary Files Version 2 - revision
Certifiable Alignment of GNSS and Local Frames via Lagrangian Duality
Estimating the absolute orientation of a local system relative to a global navigation satellite system (GNSS) reference often suffers from local minima and high dependency on satellite availability. Existing methods for this alignment task rely on abundant satellites unavailable in GNSS-degraded environments, or use local optimization methods which cannot guarantee the optimality of a solution. This work introduces a globally optimal solver that transforms raw pseudo-range or Doppler measurements into a convexly relaxed problem. The proposed method is certifiable, meaning it can numerically verify the correctness of the result, filling a gap where existing local optimizers fail. We first formulate the original frame alignment problem as a nonconvex quadratically constrained quadratic program (QCQP) problem and relax the QCQP problem to a concave Lagrangian dual problem that provides a lower cost bound for the original problem. Then we perform relaxation tightness and observability analysis to derive criteria for certifiable optimality of the solution. Finally, simulation and real world experiments are conducted to evaluate the proposed method. The experiments show that our method provides certifiably optimal solutions even with only 2 satellites with Doppler measurements and 2D vehicle motion, while the traditional velocity-based VOBA method and the advanced GVINS alignment technique may fail or converge to local optima without notice. To support the development of GNSS-based navigation techniques in robotics, all code and data are open-sourced at https://github.com/Baoshan-Song/Certifiable-Doppler-alignment.
comment: Final version in RA-L
RE-SAC: Disentangling aleatoric and epistemic risks in bus fleet control: A stable and robust ensemble DRL approach
Bus holding control is challenging due to stochastic traffic and passenger demand. While deep reinforcement learning (DRL) shows promise, standard actor-critic algorithms suffer from Q-value instability in volatile environments. A key source of this instability is the conflation of two distinct uncertainties: aleatoric uncertainty (irreducible noise) and epistemic uncertainty (data insufficiency). Treating these as a single risk leads to value underestimation in noisy states, causing catastrophic policy collapse. We propose a robust ensemble soft actor-critic (RE-SAC) framework to explicitly disentangle these uncertainties. RE-SAC applies Integral Probability Metric (IPM)-based weight regularization to the critic network to hedge against aleatoric risk, providing a smooth analytical lower bound for the robust Bellman operator without expensive inner-loop perturbations. To address epistemic risk, a diversified Q-ensemble penalizes overconfident value estimates in sparsely covered regions. This dual mechanism prevents the ensemble variance from misidentifying noise as a data gap, a failure mode identified in our ablation study. Experiments in a realistic bidirectional bus corridor simulation demonstrate that RE-SAC achieves the highest cumulative reward (approx. -0.4e6) compared to vanilla SAC (-0.55e6). Mahalanobis rareness analysis confirms that RE-SAC reduces Oracle Q-value estimation error by up to 62% in rare out-of-distribution states (MAE of 1647 vs. 4343), demonstrating superior robustness under high traffic variability.
R$^3$L: Reasoning 3D Layouts from Relative Spatial Relations ICML 2026
Relative spatial relations provide a compact representation of spatial structure and are fundamental to relative spatial reasoning in 3D layout generation. Recent works leverage Multimodal Large Language Models (MLLMs) to infer such relations, but the inferred relations are often unreliable and are typically handled with post-hoc heuristics. In this paper, we propose R$^3$L, a general framework that improves the reliability and consistency of relative spatial reasoning for 3D layout generation. Our key motivation is that multi-hop reasoning requires repeated reference-frame transformations, which accumulate errors in inferred relations and lead to semantic and metric drift. To mitigate this, we propose invariant spatial decomposition to break coupled relation chains, and consistent spatial imagination to promote self-consistency through an imagine-and-revise loop. We further introduce supportive spatial optimization to ease pose optimization via global-to-local coordinate re-parameterization. Extensive experiments across diverse scene types and instructions demonstrate that R$^3$L produces more physically feasible and semantically consistent layouts. Notably, our analysis shows that resolving frame-induced inconsistencies is crucial for reliable multi-hop relative spatial reasoning. The code is available at https://github.com/Neal2020GitHub/R3L.
comment: Accepted to ICML 2026
Data-centric Design of Learning-based Surgical Gaze Perception Models in Multi-Task Simulation
In robot-assisted minimally invasive surgery (RMIS), reduced haptic feedback and depth cues increase reliance on expert visual perception, motivating gaze-guided training and learning-based surgical perception models. However, operative expert gaze is costly to collect, and it remains unclear how the source of gaze supervision, both expertise level (intermediate vs. novice) and perceptual modality (active execution vs. passive viewing), shapes what attention models learn. We introduce a paired active-passive, multi-task surgical gaze dataset collected on the da Vinci SimNow simulator across four drills. Active gaze was recorded during task execution using a VR headset with eye tracking, and the corresponding videos were reused as stimuli to collect passive gaze from observers, enabling controlled same-video comparisons. We quantify skill- and modality-dependent differences in gaze organization and evaluate the substitutability of passive gaze for operative supervision using fixation density overlap analyses and single-frame saliency modeling. Across settings, MSI-Net produced stable, interpretable predictions, whereas SalGAN was unstable and often poorly aligned with human fixations. Models trained on passive gaze recovered a substantial portion of intermediate active attention, but with predictable degradation, and transfer was asymmetric between active and passive targets. Notably, novice passive labels approximated intermediate-passive targets with limited loss on higher-quality demonstrations, suggesting a practical path for scalable, crowd-sourced gaze supervision in surgical coaching and perception modeling.
comment: 8 pages, conference pre-print
Causality-Aware End-to-End Autonomous Driving via Ego-Centric Joint Scene Modeling
End-to-end autonomous driving, which bypasses traditional modular pipelines by directly predicting future trajectories from sensor inputs, has recently achieved substantial progress. However, existing methods often overlook the causal inter-dependencies in ego-vehicle planning, ignoring the reciprocal relations between the ego vehicle and surrounding agents. This causal oversight leads to inconsistent and unreliable trajectory predictions, especially in interaction-critical scenarios where ego decisions and neighboring agent behaviors must be reasoned about jointly. To address this limitation, we propose CaAD, a Causality-aware end-to-end Autonomous Driving framework that captures these dependencies within a shared latent scene representation. First, we propose an ego-centric joint-causal modeling module that builds on the marginal prediction branch, and learns causal dependencies between the ego vehicle and interaction-relevant agents. Second, we employ a causality-aware policy alignment stage implemented with joint-mode embeddings to align the stochastic ego policy with planning-oriented closed-loop feedback computed from surrounding traffic and map context. On the Bench2Drive and NAVSIM benchmarks, CaAD demonstrates strong closed-loop planning performance, achieving a Driving Score of 87.53 and Success Rate of 71.81 on Bench2Drive, and a PDMS of 91.1 on NAVSIM. The project page is available at https://moonseokha.github.io/CaAD/.
HEX: Humanoid-Aligned Experts for Cross-Embodiment Whole-Body Manipulation
Humans achieve complex manipulation through coordinated whole-body control, whereas most Vision-Language-Action (VLA) models treat robot body parts largely independently, making high-DoF humanoid control challenging and often unstable. We present HEX, a state-centric framework for coordinated manipulation on full-sized bipedal humanoid robots. HEX introduces a humanoid-aligned universal state representation for scalable learning across heterogeneous embodiments, and incorporates a Mixture-of-Experts Unified Proprioceptive Predictor to model whole-body coordination and temporal motion dynamics from large-scale multi-embodiment trajectory data. To efficiently capture temporal visual context, HEX uses lightweight history tokens to summarize past observations, avoiding repeated encoding of historical images during inference. It further employs a residual-gated fusion mechanism with a flow-matching action head to adaptively integrate visual-language cues with proprioceptive dynamics for action generation. Experiments on real-world humanoid manipulation tasks show that HEX achieves state-of-the-art performance in task success rate and generalization, particularly in fast-reaction and long-horizon scenarios.
comment: Project page: https://hex-humanoid.github.io/
Preserving Foundational Capabilities in Flow-Matching VLAs through Conservative SFT
Unconstrained fine-tuning of flow-matching Vision-Language-Action (VLA) models drives dense parameter overwrites, degrading pre-trained capabilities. We present Conservative Supervised Fine-Tuning (ConSFT), an optimization objective that adapts to target distributions while mitigating catastrophic forgetting, requiring zero prior data or architectural overhead. By dynamically scaling learning signals based on model confidence, ConSFT suppresses excessive gradients from low-confidence samples to prevent disproportionate parameter updates, thereby bounding the intrinsic parameter disruption risk. Inspired by reinforcement learning's trust-region clipping, this formulation establishes a progressive learning dynamic to secure target convergence and prior capability retention, maintaining sparse parameter updates without relying on the parallel reference networks required by explicit regularization. We evaluate ConSFT on the LIBERO and RoboTwin benchmarks across state-of-the-art flow-matching VLAs ($π_0$, $π_{0.5}$, and GR00T-N1.6-3B). The method outperforms vanilla SFT in capability retention by an average absolute margin of over 20\%, matching the efficacy of data-heavy Experience Replay in a prior-data-free regime. Real-world robotic deployments confirm that ConSFT precludes spatial overfitting during downstream adaptation, preserving pre-trained physical skills while acquiring sequential target tasks.
comment: 20 pages, 9 figures
VECTOR-Drive: Tightly Coupled Vision-Language and Trajectory Expert Routing for End-to-End Autonomous Driving
End-to-end autonomous driving requires models to understand traffic scenes, infer driving intent, and generate executable motion plans. Recent vision-language-action (VLA) models inherit semantic priors from large-scale vision-language pretraining, yet still face a coupling trade-off: fully shared backbones preserve multimodal interaction but may entangle language reasoning and trajectory prediction, whereas decou pled reasoning-action pipelines reduce task conflict but weaken semantic-motion coupling. We propose VECTOR-DRIVE, a tightly coupled VLA framework built on Qwen2.5-VL-3B. VECTOR-DRIVE keeps all tokens coupled through shared self attention and routes feed-forward computation according to token semantics. Vision and language tokens are processed by a Vision-Language Expert to preserve semantic priors, while target-point, ego-state, and noisy action tokens are routed to a Trajectory Expert for motion-specific computation. On the action-token pathway, a flow-matching planner refines noisy action tokens into future waypoints and speed profiles. This design couples semantic reasoning and motion planning within a single multimodal Transformer while separating task-specific FFN computation. On Bench2Drive, VECTOR-DRIVE achieves 88.91 Driving Score and outperforms representative end-to end and VLA-based baselines. Qualitative results and ablations further validate the benefits of shared attention, semantic-aware expert routing, progressive training, and flow-based action de coding.
COMPASS: Confined-space Manipulation Planning with Active Sensing Strategy ICRA 2026
Manipulation in confined and cluttered environments remains a significant challenge due to partial observability and complex configuration spaces. Effective manipulation in such environments requires an intelligent exploration strategy to safely understand the scene and search the target. In this paper, we propose COMPASS, a multi-stage exploration and manipulation framework featuring a manipulation-aware sampling-based planner. First, we reduce collision risks with a near-field awareness scan to build a local collision map. Additionally, we employ a multi-objective utility function to find viewpoints that are both informative and conducive to subsequent manipulation. Moreover, we perform a constrained manipulation optimization strategy to generate manipulation poses that respect obstacle constraints. To systematically evaluate method's performance under these difficulties, we propose a benchmark of confined-space exploration and manipulation containing four level challenging scenarios. Compared to exploration methods designed for other robots and only considering information gain, our framework increases manipulation success rate by 24.25% in simulations. Real-world experiments demonstrate our method's capability for active sensing and manipulation in confined environments.
comment: Accepted to the 2026 IEEE International Conference on Robotics and Automation (ICRA 2026)
Efficient Emotion-Aware Iconic Gesture Prediction for Robot Co-Speech
Co-speech gestures increase engagement and improve speech understanding. Most data-driven robot systems generate rhythmic beat-like motion, yet few integrate semantic emphasis. To address this, we propose a lightweight transformer that derives iconic gesture placement and intensity from text and emotion alone, requiring no audio input at inference time. The model outperforms GPT-4o in both semantic gesture placement classification and intensity regression on the BEAT2 dataset, while remaining computationally compact and suitable for real-time deployment on embodied agents.
Hybrid Training for Vision-Language-Action Models ICLR 2026
Using Large Language Models to produce intermediate thoughts, a.k.a. Chain-of-thought (CoT), before providing an answer has been a successful recipe for solving complex language tasks. In robotics, similar embodied CoT strategies, generating thoughts before actions, have also been shown to lead to improved performance when using Vision-Language-Action models (VLAs). As these techniques increase the length of the model's generated outputs to include the thoughts, the inference time is negatively affected. Delaying an agent's actions in real-world executions, as in robotic manipulation settings, strongly affects the usability of a method, as tasks require long sequences of actions. However, is the generation of long chains-of-thought a strong prerequisite for achieving performance improvements? In this work, we explore the idea of Hybrid Training (HyT), a framework that enables VLAs to learn from thoughts and benefit from the associated performance gains, while enabling the possibility to leave out CoT generation during inference. Furthermore, by learning to conditionally predict a diverse set of outputs, HyT supports flexibility at inference time, enabling the model to either predict actions directly, generate thoughts or follow instructions. We evaluate the proposed method in a series of simulated benchmarks and real-world experiments.
comment: Published as a conference paper at ICLR 2026
Reflection-Based Relative Localization for Cooperative UAV Teams Using Active Markers
Reflections of active markers in the environment are a common source of ambiguity in onboard visual relative localization. This work presents a novel approach that exploits these typically unwanted reflections for onboard relative localization in heterogeneous multi-UAV teams. The method operates without prior knowledge of robot size or predefined marker configurations, remains independent of surface properties, and explicitly accounts for uncertainties caused by surface irregularities, including dynamic water surfaces relevant for marine deployments. We validated the approach in both indoor and outdoor experiments, demonstrating reliable operation across varying lighting conditions and achieving greater effective range (above 30 m) and accuracy than state-of-the-art methods. The video is available under the following link: https://youtu.be/y0zp8cIwkig.
HDFlow: Hierarchical Diffusion-Flow Planning for Long-horizon Tasks ICML 2026
Recent advances in generative models have shown promise in generating behavior plans for long-horizon, sparse reward tasks. While these approaches have achieved promising results, they often lack a principled framework for hierarchical decomposition and struggle with the computational demands of real-time execution, due to their iterative denoising process. In this work, we introduce Hierarchical Diffusion-Flow (HDFlow), a novel hierarchical planning framework that optimally leverages the strengths of diffusion and rectified flow models to overcome the limitations of single-paradigm generative planners. HDFlow employs a high-level diffusion planner to generate sequences of strategic subgoals in a learned latent space, capitalizing on diffusion's powerful exploratory capabilities. These subgoals then guide a low-level rectified flow planner that generates smooth and dense trajectories, exploiting the speed and efficiency of ordinary differential equation (ODE)-based trajectory generation. We evaluate HDFlow on four challenging furniture assembly tasks in both simulation and real-world, where it significantly outperforms state-of-the-art methods. Furthermore, we also showcase our method's generalizability on two long-horizon benchmarks comprising diverse locomotion and manipulation tasks. Project website: https://hdflow-page.github.io/
comment: ICML 2026 (Spotlight)
HoloMotion-1 Technical Report
In this report, we present HoloMotion-1, a humanoid motion foundation model for zero-shot whole-body motion tracking. A key innovation of HoloMotion-1 is to scale control-policy training with a large-scale hybrid motion corpus, where video-reconstructed motions from in-the-wild videos provide the dominant source of motion diversity, while curated motion-capture and in-house motion data provide higher-fidelity supervision and deployment-oriented coverage. This data regime enables HoloMotion-1 to move beyond conventional MoCap-only training and exposes the policy to substantially broader behaviors, capture conditions, and motion styles. Learning from such heterogeneous data introduces new challenges, including reconstruction noise, source-domain mismatch, uneven motion quality, and the need for temporal modeling under large behavioral variation. To address these challenges, HoloMotion-1 integrates large-capacity temporal modeling, a sparsely activated Mixture-of-Experts Transformer with KV-cache inference for real-time control, and a sequence-level training strategy that improves learning efficiency on extended motion sequences. Extensive experiments on multiple unseen motion benchmarks show that HoloMotion-1 generalizes robustly across diverse motion types and capture conditions, significantly improves tracking accuracy over prior methods, and transfers directly to a real humanoid robot without task-specific fine-tuning.
comment: 20 pages, 4 figures, 6 tables. Technical report
SAMe: A Semantic Anatomy Mapping Engine for Robotic Ultrasound
Robotic ultrasound has advanced local image-driven control, contact regulation, and view optimization, yet current systems lack the anatomical understanding needed to determine what to scan, where to begin, and how to adapt to individual patient anatomy. These gaps make systems still reliant on expert intervention to initiate scanning. Here we present SAMe, a semantic anatomy mapping engine that provides robotic ultrasound with an explicit anatomical prior layer. SAMe addresses scan initiation as a target-to-anatomy-to-action process: it grounds under-specified clinical complaints into structured target organs, instantiates a patient-specific anatomical representation for the grounded targets from a single external body image, and translates this representation into control-facing 6-DoF probe initialization states without any additional registration using preoperative CT or MRI. The anatomical representation maintained by SAMe is explicit, lightweight (single-organ inference in 0.08s), and compatible with downstream control by design. Across semantic grounding, anatomical instantiation, and real-robot evaluation, SAMe shows strong performance across the full initialization pipeline. In real-robot experiments, centroid-based SAMe initialization outperformed the body-keypoint-based heuristic baseline under a budget-matched single-target setting for both liver (86.7% versus 46.7%) and kidney (80.0% versus 73.3%) initialization. Furthermore, The trial-level organ-hit rate reached 97.3% for liver and 83.3% for kidney when multiple candidate targets were available. These results establish an explicit anatomical prior layer that addresses scan initialization and is designed to support broader downstream autonomous scanning pipelines, providing the anatomical foundation for complaint-driven, anatomically informed robotic ultrasonography.
comment: Supplementary information included. Code will be released at https://github.com/MiliLab/Echo-SAMe
Compliant Explicit Reference Governor for Contact Friendly Robotic Manipulators
This paper introduces the Compliant Explicit Reference Governor (CERG), a modular reference management system that enables robots to interact physically with their environment under provable guarantees. The CERG is an intermediate layer that can be placed between a high-level planner and a low-level controller: it enforces operational constraints and enables smooth transitions between free-motion and contact operations. The CERG ensures safety by limiting the total energy available to the robotic arm at the time of contact. In the absence of contact, however, the CERG does not penalize the system performance. Simulation and hardware experiments validate the CERG on increasingly complex systems.
comment: Updated paper with current contributions and author list , accepted at IFAC World Congress, Busan, 2026
TwinRL: Digital Twin-Driven Reinforcement Learning for Real-World Robotic Manipulation
Despite strong generalization capabilities, Vision-Language-Action (VLA) models remain constrained by the high cost of expert demonstrations and limited real-world interaction. While online reinforcement learning (RL) has shown promise, its application to real-world VLA manipulation is hindered by low exploration efficiency and restricted exploration coverage. Through systematic real-world experiments, we observe that the effective exploration space of online RL is largely constrained by the trajectory distribution induced during supervised fine-tuning (SFT). Motivated by this observation, we propose TwinRL, a digital twin-real-world collaborative post-training framework that expands and guides RL exploration for VLA models through three stages: SFT warm-up, twin RL warm-up, and real-world RL. TwinRL first reconstructs a high-fidelity digital twin from smartphone-captured scenes. During the SFT stage, we introduce an exploration space expansion strategy that expands the support of the trajectory distribution beyond real demonstrations, reshaping the exploration space for more effective RL. Rather than treating the twin as a data augmentation tool, we propose a twin RL warm-up strategy that enables it to act as an exploration guide for real-world RL. Specifically, TwinRL performs efficient parallel RL in the digital twin to generate interactive trajectories that populate the replay buffer and stabilize subsequent real-world RL learning. This process also identifies failure-prone yet informative configurations, enabling targeted human-in-the-loop rollouts to further improve on-robot efficiency. Across four tasks, TwinRL achieves near-100% success in both in-distribution and out-of-distribution regions, delivering over 30% faster convergence than prior real-world RL methods with only 20 minutes of on-robot interaction.
Neural Configuration-Space Barriers for Manipulation Planning and Control
Planning and control for high-dimensional robot manipulators in cluttered dynamic environments require computational efficiency and robust safety guarantees. Inspired by recent advances in learning configuration-space distance functions (CDFs) as representations of robot bodies, we propose a unified approach for motion planning and control that formulates safety constraints as CDF barriers. A CDF barrier approximates the local free configuration space, substantially reducing the number of collision-checking operations during motion planning. However, learning a CDF barrier with a neural network and relying on online sensor observations introduces uncertainties that must be considered during control synthesis. To address this, we develop a distributionally robust CDF barrier formulation for control that accounts for modeling errors and sensor noise without assuming a known underlying distribution. Simulations and hardware experiments on a UFactory xArm6 manipulator show that our neural CDF barrier formulation enables efficient planning and robust safe control in cluttered and dynamic environments, relying only on onboard point-cloud observations.
MAPLE: Latent Multi-Agent Play for End-to-End Autonomous Driving
Vision-language-action (VLA) models are effective as end-to-end motion planners, but can be brittle when evaluated in closed-loop settings due to being trained under traditional imitation learning framework. Existing closed-loop supervision approaches lack scalability and fail to completely model a reactive environment. We propose MAPLE, a novel framework for reactive, multi-agent rollout of a dynamic driving scenario in the latent space of the VLA model. The ego vehicle and nearby traffic agents are independently controlled over multi-step horizons, while being reactive to other agents in the scene, enabling closed-loop training. MAPLE consists of two training stages: (1) supervised fine-tuning on the latent rollouts based on ground-truth trajectories, followed by (2) reinforcement learning with global and agent -specific rewards that encourage safety, progress, and interaction realism. We further propose diversity rewards that encourage the model to generate planning behaviors that may not be present in logged driving data. Notably, our closed-loop training framework is scalable and does not require external simulators, which can be computationally expensive to run and have limited visual fidelity to the real-world. MAPLE achieves state-of-the-art driving performance on Bench2Drive and demonstrates scalable, closed-loop multi-agent play for robust E2E autonomous driving systems.
comment: 19 pages, 9 figures
WestWorld: A Knowledge-Encoded Scalable Trajectory World Model for Diverse Robotic Systems ICML 2026
Trajectory world models play a crucial role in robotic dynamics learning, planning, and control. While recent works have explored trajectory world models for diverse robotic systems, they struggle to scale to a large number of distinct system dynamics and overlook domain knowledge of physical structures. To address these limitations, we introduce WestWorld, a knoWledge-Encoded Scalable Trajectory World model for diverse robotic systems. To tackle the scalability challenge, we propose a novel system-aware Mixture-of-Experts (Sys-MoE) that dynamically combines and routes specialized experts for different robotic systems via a learnable system embedding. To further enhance zero-shot generalization, we incorporate domain knowledge of robot physical structures by introducing a structural embedding that aligns trajectory representations with morphological information. After pretraining on 89 complex environments spanning diverse morphologies across both simulation and real-world settings, WestWorld achieves significant improvements over competitive baselines in zero- and few-shot trajectory prediction. Additionally, it shows strong scalability across a wide range of robotic environments and significantly improves performance on downstream model-based control for different robots. Finally, we deploy our model on a real-world Unitree Go1, where it demonstrates stable locomotion performance. The code is available at https://github.com/511205787/WestWorld.
comment: ICML 2026 spotlight
Active Defense Against False Data Injection Attacks in Robotic Manipulators
Robotic systems are vulnerable to False Data Injection Attacks (FDIAs), where adversaries corrupt sensor signals to gain malicious control. Feedback linearization exposes robotic systems to integrator vulnerability, making them susceptible to stealthy attacks that can cause significant deviations in end-effector behavior without raising alarms. This paper addresses the resilience of manipulators against finite-horizon FDIAs by formalizing two defense methods, namely anomaly-aware virtual damping and manipulability reduction, with probabilistic guarantees on nominal task execution. Simulations on a 7-DOF redundant manipulator show that the proposed defenses substantially reduce the impact of FDIA compared to using solely a threshold-based ADS like the Chi-squared, while preserving nominal task performance in the absence of attack.
comment: Extended 8-page version containing full proofs. An abridged 6-page version has been accepted for publication in the Proceedings of the 23rd IFAC World Congress (2026). v2: Minor typographical fixes and updated reference formatting
Dual Quaternion Based Contact Modeling for Fast and Smooth Collision Recovery of Quadrotors
Unmanned aerial vehicles (UAVs) operating in cluttered environments require efficient and accurate impact modeling to maintain stability post collisions, however classical impulse contact models decouple the normal and tangential components. This letter presents a dual quaternion impulse reset map directly on the SE(3) manifold. By operating on the unified spatial twist (unified linear and angular velocities), the proposed formulation retains the cross-coupling between normal and tangential impulse components in a single closed-form expression, and recovers the classical decoupled Newton impulse model as a special case. A recovery controller is designed that couples linear and angular momentum to enforce kinetic energy dissipation across impacts. Hardware-in-the-loop benchmarks demonstrate a 24\% reduction in execution latency compared to an optimized matrix-based implementation, and a 20\% reduction relative to a position-plus-quaternion (PQ) formulation. MuJoCo simulations across Monte Carlo sweeps over impact angles and friction coefficients show a 50.8\%-75.1\% reduction in position root-mean-square error (RMSE) and a 68.7\%-85\% decrease in peak kinetic energy compared to published linear-admittance baselines.
comment: 8 pages, 5 figures, 2 tables
Multiagent Systems
When Skills Don't Help: A Negative Result on Procedural Knowledge for Tool-Grounded Agents in Offensive Cybersecurity
Agent Skills, structured packages of procedural knowledge loaded into an LLM agent at inference time, are widely reported to improve task pass rates by an average of 16.2~percentage points across diverse domains. Yet the same benchmarks show wide variance, with 16 of 84 tasks suffering negative deltas when Skills are introduced. The community has not yet articulated a clean mechanism for \emph{when} Skills help and when they are merely redundant overhead. We re-analyze a recently published 180-run controlled study of an MCP-grounded autonomous Capture-the-Flag (CTF) agent under four documentation conditions of increasing richness (55, 1{,}478, 1{,}976, and 4{,}147 lines), and show that these conditions correspond almost exactly to a No-Skills, Experiential-Skills, Curated-Skills, and Comprehensive-Skills ablation. In offensive cybersecurity, a domain not deeply covered by existing Skills benchmarks, the marginal benefit of Skills collapses. The spread between the no-Skills and full-Skills conditions is only 8.9~pp ($p = 0.71$, $χ^2$; $p = 0.25$, Cochran--Armitage trend test; five of six pairwise Cohen's $h$ values fall below the $0.2$ small-effect threshold). We argue that the missing variable is \emph{environment-feedback bandwidth}. When an agent's tool layer returns strict, schema-validated, low-latency observations, the environment itself supplies the procedural correction signal that Skills are normally needed to provide. As a result, the marginal benefit of curated Skills diminishes substantially, and, in some cases (e.g., our timing side-channel setting), actively degrades performance. We articulate a falsifiable hypothesis, sketch its design implications for compound AI systems, and will release the reanalysis pipeline to support replication.
comment: Accepted as a poster at ACM CAIS 2026 AgentSkills Workshop
Equilibria in Multiplayer Graph Games: An Algorithmic Study
To verify the robustness of a program or protocol, it is common in the computer science community to rely on the theoretical framework of game theory. In particular, if one seeks to enforce a desired property, or specification, despite an unpredictable environment, a useful abstraction is to model the situation as a two-player zero-sum game. The goal is then to find a strategy for the system that guarantees the specification against any strategy of the environment. However, to model more complex situations, such as multiple systems with different objectives or an environment composed of various agents, the richer framework of multiplayer games must be considered. In this setting, a natural question is to identify equilibria, i.e., strategy profiles that are robust in the sense that no player has an incentive to deviate. The most well-known equilibrium concept is the Nash equilibrium, but several alternatives exist. We study five such notions and, for each of them, we provide complexity results for the constrained existence problem, which consists of deciding whether a given game contains an equilibrium that ensures each player a payoff within a specified interval.
LLM Agents Make Collective Belief Dynamics Programmable: Challenges and Research Directions
Classical models of opinion dynamics assume human participants with bounded rationality and limited coordination. The rise of LLM-based agents introduces a qualitative shift: agents can now participate in online discussions at scale, maintain consistent persuasion strategies, and coordinate systematically. This paper argues that LLM agents make collective belief dynamics programmable, enabling deliberate steering of population-level beliefs. We term this emerging problem programmable collective belief control. Through controlled multi-agent simulations, we provide proof-of-concept evidence that coordinated AI agents can induce measurable belief shifts that stabilize within a few interaction rounds. We identify four structural properties (indistinguishability, persistence, contextuality, and configurability) that make detection and defense fundamentally difficult. Based on these findings, we outline a research agenda spanning theoretical foundations for adversarial belief dynamics, operational methods for system-level detection and intervention, and simulation infrastructure for scalable experimentation. Our goal is not to present a complete solution, but to articulate why this problem demands urgent attention and to provide a conceptual foundation for future work.
DAG-Based QoS-Aware Dynamic Task Placement for Networked Multi-Stage Control Pipelines
Current Physical AI (PAI) relies heavily on closed-loop visual-servoing pipelines, whose perception and planning stages may become computationally intensive onboard due to complex models embedded on robots. In practice, offloading the perception task to on-site edges statically is inappropriate for latency-sensitive, precise industrial settings over a standardized industrial network. This emphasizes the importance of Control-Communication-Computing (3C) co-design in industrial automation: monolithic local execution saturates AI-accelerated machine and robot hardware, while static edge offloading exposes the control loop to network jitter. Existing adaptive task placement (ATP) controllers can partially address the gap by relocating a single pipeline stage on binary threshold rules, without a multi-stage model and an explicit cost on placement switching. In this Work-in-Progress (WiP) paper, we propose a directed acyclic graph (DAG) based quality-of-service (QoS)-aware dynamic task placement (DTP) framework for sensing-perception-planning-control pipelines in networked robotics. This pipeline is formalized as a DAG with task-level and node-level attributes for compute cost, communication delay, and feasible placement sets; over a small interpretable candidate set (fully local, static offload, hybrid), a window-based cost function combines tail end-to-end latency, deadline violation rate, hardware utilization, and a Hamming-distance switching penalty, and a DTP algorithm with hysteresis and a minimum dwell-time bounds placement chatter. Our WiP paper presents the theoretical framework, a structured qualitative analysis, and a two-phase simulation plus hardware-in-the-loop validation roadmap.
comment: 4 pages, 1 figure, 1 algorithm, accepted as a Work-in-Progress (WiP) paper, on the 24th IEEE International Conference on Industrial Informatics (INDIN), 26-29 July, 2026, Melbourne, Australia
Memory-Augmented Reinforcement Learning Agent for CAD Generation
Automatic generation of computer-aided design (CAD) models is a core technology for enabling intelligence in advanced manufacturing. Existing generation methods based on large language models (LLMs) often fall short when handling complex CAD models characterized by long operation sequences, diverse operation types, and strong geometric constraints, primarily because reasoning chains break and effective error-correction mechanisms are lacking. To address this problem, this paper proposes a memory-augmented reinforcement learning framework for CAD generation agents. The framework encapsulates the underlying geometric kernel into a structured toolchain callable by the agent and builds a closed-loop mechanism of design intent understanding, global planning, execution, and multi-dimensional verification. It also designs a dual-track memory module consisting of a case library and a skill library, and proposes a dynamic utility retrieval algorithm. By introducing reinforcement learning into retrieval and policy optimization, the agent can effectively avoid retrieval traps in which examples are semantically similar but geometrically infeasible, enabling online self-correction and continual evolution without additional large-scale annotated data. Experiments show that the proposed method significantly improves both the success rate and geometric consistency on complex CAD model generation tasks.
comment: 26 pages; multilingual submission: English version first, followed by Chinese version
EngiAI: A Multi-Agent Framework and Benchmark Suite for LLM-Driven Engineering Design
Large Language Model (LLM) agents are increasingly applied to engineering design tasks, yet existing evaluation frameworks do not adequately address multi-agent systems that combine simulation, retrieval, and manufacturing preparation. We introduce a benchmark suite with three evaluation dimensions: (1) a workflow benchmark with seven prompt styles targeting distinct cognitive demands-including direct tool use, semantic disambiguation, conditional branching, and working-memory tasks; (2) a Retrieval-Augmented Generation (RAG) benchmark with gated scoring isolating retrieval contributions to parameter selection; and (3) an High Performance Computing (HPC) benchmark evaluating end-to-end ML training orchestration on a SLURM cluster. Alongside the benchmark we present EngiAI, a Multi-Agent System (MAS) reference implementation built on LangGraph that operationalizes the benchmark by coordinating seven specialized agents through a supervisor architecture, unifying topology optimization, document retrieval, HPC job orchestration, and 3D printer control. Across four LLM backends and two EngiBench problems, proprietary models achieve 96-97% average task completion on Beams2D, while open-source 4B-parameter models reach 55-78%, with clear generational improvement. Conditional branching proves most challenging, with task completion dropping to 20-53% for the conditional style on Photonics2D. RAG gating confirms near-perfect retrieval-augmented scores ($\approx 1.0$) versus near-zero without retrieval, validating the evaluation design. On HPC orchestration, one model completes all pipeline steps in 100% of runs while another drops to 50%, revealing that multi-step instruction following degrades over long-running workflows.
comment: 26 pages, 10 figures, to be published at IDETC 2026
PAVE: A Cognitive Architecture for Legitimate Violation in Generative Agent Societies
Generative agents based on large language models reproduce believable human behavior in cooperative settings, but how they should reason in situations where rule-breaking may be required, such as fire evacuation or authority-supervised emergency, remains poorly characterized. We propose PAVE (Perception, Assessment, Verdict, Emulation), a novel four-module cognitive architecture that addresses this gap end to end: (i) Perception extracts a structured context with explicit authority distance, peer behaviors, and severity-tagged situational cues; (ii) Assessment scores the context along five scalars including an explicit legitimacy judgment that checks necessity, proportionality, and absence of alternatives; (iii) Verdict decides to comply or violate under a hard legitimacy gate, with a per-agent threshold elicited from the persona; (iv) Emulation enacts the verdict and scopes the violation to the rule the trigger justifies. We instantiate PAVE in Voville, a tile-based traffic environment forked from Smallville, and evaluate across three scenarios, four LLM backbones, and a focused ablation. PAVE agents satisfy four properties simultaneously: legitimate violation (only when a trigger justifies it), authority deference (officer instructions override even high legitimacy), bounded scope (violations confined to the targeted rule), and recovery (baseline restored once the trigger ends). PAVE agents make more structured and interpretable decisions than vanilla across all four properties, and human evaluators rate them as more plausible. Ablating the legitimacy gate reproduces vanilla-like failures. We release Voville, the PAVE prompts and code, and the evaluation pipeline.
comment: Preprint. 23 pages, 4 figures. Code and environment will be released upon publication
STAR-PólyaMath: Multi-Agent Reasoning under Persistent Meta-Strategic Supervision
Frontier AI models and multi-agent systems have led to significant improvements in mathematical reasoning. However, for problems requiring extended, long-horizon reasoning, existing systems continue to suffer from fundamental reliability issues: hallucination accumulation, memory fragmentation, and imbalanced reasoning-tool trade-offs. In this paper, we introduce STAR-PólyaMath, a multi-agent framework that systematically addresses these challenges through meta-level supervision and structured Reasoner-Verifier interaction. STAR-PólyaMath is structured as an orchestrated state machine with nested challenge-step-replan loops, governed by a reasoning-free Python orchestrator that separates control from inference and bounds error propagation through trace-back and re-planning. Our key innovation is a persistent Meta-Strategist that maintains cross-attempt memory and exercises meta-level control by issuing high-level strategic guidance or mandatory directives, so the system can escape unproductive loops rather than stagnate or over-rely on tools. STAR-PólyaMath achieves state-of-the-art results on all eight top-tier competition benchmarks: AIME 2025-2026, MathArena Apex Shortlist, MathArena Apex 2025, Putnam 2025, IMO 2025, HMMT February 2026, and USAMO 2026. It obtains perfect scores on AIMEs, Putnam, and HMMT, and shows its largest margin on Apex 2025, scoring 93.75% compared with 80.21% by the strongest baseline GPT-5.5. Ablation studies show that the gains arise from the framework's orchestration rather than from model-level diversity since removing key components or substituting in mixed backbones consistently weakens performance. Code is available at https://github.com/Julius-Woo/STAR-PolyaMath.
comment: 25 pages, 4 figures. Code: https://github.com/Julius-Woo/STAR-PolyaMath
Swimming with Whales: Analysis of Power Imbalances in Stake-Weighted Governance
Voting methods weighted by stakes are the fundamental governance paradigm in Proof-of-Stake (PoS) blockchains. Such a paradigm is known to be prone to power distortions: a few users possessing large stakes may completely control decision making, even without owning the totality of the stakes. We study this phenomenon through the lens of computational social choice, focusing on the extent of power imbalances in stake-weighted voting when power is quantified using the Penrose-Banzhaf power index. Our work presents both analytical and empirical contributions. Analytically, we demonstrate that while a perfect alignment between power and relative stake ownership is generally unattainable, it can be approximated in expectation under specific conditions. Empirically, using data from a real-world on-chain governance system (Project Catalyst), we provide a more fine-grained understanding of the power imbalances that are likely to occur in current stake-weighted governance systems.
AQuaUI: Visual Token Reduction for GUI Agents with Adaptive Quadtrees
Large Multimodal Models (LMMs) have recently emerged as promising backbones for GUI-agent models, where high-resolution GUI screenshots are introduced to the prompts at each iteration step. However, these screenshots exhibit highly non-uniform spatial information density: large regions may carry little information and are visually homogeneous, while key text and icons may require high visual fidelity. Existing approaches to this problem either require additional training or rely on attention-based token compression, ignoring the structured layout and spatial redundancy of GUI screenshots. To fill the gap, this paper proposes AquaUI, a training-free inference-time token reduction method for GUI agent models that utilizes the non-uniform information density in screenshots. AQuaUI constructs an adaptive quadtree on each screenshot input and keeps one representative merged token per leaf of the quadtree. AQuaUI preserves the spatial positions of retained tokens throughout the pipeline to ensure that all position-encoding stages remain consistent. To further improve temporal consistency across multi-step GUI interactions, we propose a conditional quadtree algorithm that leverages the continuity between consecutive screenshots within a single request. Specifically, it refines the current quadtree using previous quadtrees as references, helping preserve fine-grained regions across static or mildly shifted GUI states. We implement AQuaUI on state-of-the-art GUI agent models and conduct experiments on standard grounding and navigational benchmarks. AQuaUI consistently shows improved accuracy-efficiency trade-offs over prior baselines. Notably, on GUI-Owl-1.5-32B-Instruct, AQuaUI achieves up to 13.22% speedup and 29.52% fewer visual tokens while retaining 99.06% of full-token performance, suggesting that the spatial redundancy of GUI screenshots can be exploited at inference without retraining.
CASPIAN: Online Detection and Attribution of Cascade Attacks in LLM Multi-Agent Systems via Cross-Channel Causal Monitoring
Cascade attacks in LLM multi-agent systems (MAS) arise when adversarial influence propagates across agents and leads to escalated system-level failures through complex agent interactions. Detecting such cascades is challenging, as their signals are distributed, tightly coupled across interaction channels, and often appear plausibly benign locally but may unfold quickly either within a single turn or gradually across multiple turns. Existing defenses, being largely local and text-centric, fail to capture such cross-channel, temporally coordinated dynamics of cascade propagation. Therefore, we propose CASPIAN, the first framework that provides a unified, cross-channel causal analysis of cascade behavior in LLM-MAS through online monitoring of dynamic influence propagation across agents. CASPIAN models multi-agent interactions using a unified, dynamic causal influence matrix across channels, estimated efficiently via a late-interaction conditional transfer entropy (LI-CTE) formulation, thereby enabling the detection of cascade onset from emergent system-level structure rather than isolated anomalies. It further performs online causal attribution, identifying the origin, bridge, and amplifier agents driving the cascade and reconstructing its principal propagation pathways, capabilities not supported by existing methods. Across diverse multi-agent frameworks and benchmarks, CASPIAN consistently outperforms semantic guardrails, LLM-based judges, and graph-based anomaly detectors in both detection accuracy and early cascade identification while operating with sub-1% relative overhead latency. These results demonstrate that unified cross-channel causal modeling is essential for reliably detecting and understanding cascade failures in LLM multi-agent systems.
comment: https://github.com/caspian-detector/caspian
Graph Neural Planning and Predictive Control for Multi-Robot Communication-Constrained Unlabeled Motion Planning ICRA
The multi-robot unlabeled motion planning problem of concurrently assigning robots to goals and generating safe trajectories is central in many collaborative tasks. Recent Graph Neural Network methods offer scalable decentralized solutions but rely on simplified dynamics and simulation environments, overlooking key challenges of real-world deployment such as dynamic feasibility and communication constraints. To address these gaps, we propose a hierarchical framework that combines a Graph ATtention Planner (GATP) with a decentralized Nonlinear Model Predictive Controller (NMPC). GATP provides intermediate subgoals through multi-robot cooperation, and the NMPC enforces safety under nonlinear dynamics and actuation constraints. We evaluate our framework in both simulation and real-world quadrotor experiments. Thanks to attention mechanisms and minimal communication requirements, we demonstrate improved generalization to larger teams, robustness to communication delays up to 200 ms and practical feasibility with decentralized on-board inference.
comment: 8 pages, 6 figures, Accepted at the IEEE International Conference on Robotics and Automation (ICRA) 2026
Multi-agent Collaboration with State Management
Recent advances in multi-agent systems have shown great potential for solving complex tasks. However, when multiple agents edit a shared codebase concurrently, their changes can silently conflict and inconsistent views lead to integration failures. Existing multi-agent systems address this through workspace isolation (e.g., one git worktree per agent), but this defers conflict resolution to a post-hoc merge step where recovery is expensive. In this paper, we propose STORM, i.e., STate-ORiented Management for multi-agent collaboration. Specifically, STORM manages agent states by mediating their interactions with the shared workspace, ensuring that each agent operates on a consistent view of the codebase and that conflicting edits are detected and resolved at write time. We evaluate STORM on Commit0 and PaperBench across multiple LLMs. STORM outperforms the git-worktree-based multi-agent baseline by +18.7 on Commit0-Lite and +1.4 on PaperBench, while achieving comparable or better cost efficiency. Combined with single-agent runs, STORM reaches highest scores of 87.6 and 78.2 on the two benchmarks respectively, suggesting that explicit state management is a more effective foundation for multi-agent collaboration than workspace isolation. STORM can also be plugged into any multi-agent system seamlessly.
What Do Agents Communicate? Characterizing Information Exchange in Multi-Agent Systems
Large Language Models (LLMs) have enabled collaborative Multi-Agent (MA) systems, where interacting agents improve performance through diverse reasoning and iterative refinement. However, these systems remain vulnerable to error propagation, where early-stage information degrades downstream reasoning. To address this, we conduct a systematic analysis of inter-agent communication to identify which information drives MA performance. We find that the absence of reasoning and verification in inter-agent communication significantly degrades performance. Based on these insights, we propose Category-Aware Recovery Augmentation (technique), which enforces the presence of critical information during communication. recovers up to 86.2% of failed cases. Our results highlight the key role of information quality in effective MA collaboration. Our code is available at https://anonymous.4open.science/r/cara_mas
Agentic Agile-V: From Vibe Coding to Verified Engineering in Software and Hardware Development
Agentic AI coding systems can inspect repositories, plan implementation steps, edit files, call tools, run tests, and submit pull requests. These capabilities make software and hardware development faster in some settings, but current evidence does not support the simple claim that autonomous code generation automatically improves engineering outcomes. Controlled studies report productivity gains in some enterprise tasks, slowdowns in mature open-source work, moderate but heterogeneous meta-analytic effects, and persistent failures in repository setup, dependency handling, permission gating, and hardware verification. This paper argues that the central problem is no longer prompt engineering; it is engineering process control. It synthesizes evidence from agentic software engineering, GitHub-scale adoption studies, repository-level agent configuration, productivity trials, issue-resolution benchmarks, and hardware/RTL verification research. It proposes Agentic Agile-V, a process framework that uses Agile-V as the lifecycle backbone and a task-level SCOPE-V loop - Specify, Constrain, Orchestrate, Prove, Evolve, and Verify - to convert conversational intent into structured engineering artifacts and acceptance evidence. The paper contributes: (i) a taxonomy of minimum input artifacts for agentic software, firmware, and hardware work; (ii) a conversation-to-contract gate that separates exploratory dialogue from implementation; (iii) risk-adaptive feature, bug-fix, testing, and hardware workflows; and (iv) an evidence-bundle acceptance model for agent-generated artifacts. The paper concludes that agentic AI does not eliminate engineering discipline; it increases the value of requirements, constraints, traceability, independent verification, and human approval.
comment: 7 pages, 1 figure
Pramana: A Protocol-Layer Treatment of Claim Verification in Autonomous Agent Networks
Autonomous agents deployed in regulated domains must produce a verification artifact per consequential output: a record an auditor can re-execute offline, capturing what was claimed, against what source, by whom, when, and how. Production verification today splits into two unstandardized halves. Probabilistic verdict patterns (self-consistency voting, reviewer LLM ensembles) produce judgments, not artifacts. Artifact-producing patterns (RAG, tool-augmented traces, generator-verifier loops) produce vendor-specific records no external auditor can reconstruct without bespoke integration. Pramana defines the missing wire format. Every consequential agent output is wrapped in a typed ClaimAttestation with one of four variants (measurement, inference, analogy, citation), each paired with a verify() operation against the recorded source. verify() is deterministic for MeasurementClaim and CitationClaim. For InferenceClaim and AnalogyClaim, determinism is conditional on the oracle (audit-replayable when LLM-backed). The four-way typology derives from classical Indian epistemology (pramana, valid means of knowledge). The lifecycle is specified in TLA+ and exhaustively verified under TLC across three symmetry-reduced models: 38,563 distinct reachable states, zero invariant violations. The Python reference implementation passes 84 tests. An A2A and MCP wire-extension manifest layers three deployment-grade invariants: reachability, SLA bound, and offline re-verifiability. An exploratory pilot (n=100, 2,275 reviewer calls) probes LLM-as-judge in code generation. The strongest observation is a 40-percentage-point raw FPR delta across corpora, consistent with reference-solution quality contributing significantly. The pilot does not validate Pramana on its own; the structural argument and formal verification do that.
comment: 23 pages, 4 figures, 5 tables, 42 references
A Practical Framework of Key Performance Indicators for Multi-Robot Lunar and Planetary Field Tests ICRA 2026
Robotic prospecting for critical resources on the Moon, such as ilmenite, rare earth elements, and water ice, requires robust exploration methods given the diverse terrain and harsh environmental conditions. Although numerous analog field trials address these goals, comparing their results remains challenging because of differences in robot platforms and experimental setups. These missions typically assess performance using selected, scenario-specific engineering metrics that fail to establish a clear link between field performance and science-driven objectives. In this paper, we address this gap by deriving a structured framework of KPI from three realistic multi-robot lunar scenarios reflecting scientific objectives and operational constraints. Our framework emphasizes scenario-dependent priorities in efficiency, robustness, and precision, and is explicitly designed for practical applicability in field deployments. We validated the framework in a multi-robot field test and found it practical and easy to apply for efficiency- and robustness-related KPI, whereas precision-oriented KPI require reliable ground-truth data that is not always feasible to obtain in outdoor analog environments. Overall, we propose this framework as a common evaluation standard enabling consistent, goal-oriented comparison of multi-robot field trials and supporting systematic development of robotic systems for future planetary exploration.
comment: Presented at ICRA 2026 Workshop on Multi-Agent Robotic Systems: Real-World Collaboration and Interaction
Grassroots Bonds as a Foundation for Market Liquidity
Global cryptocurrencies are unbacked and have high transaction cost incurred by global consensus. In contrast, grassroots cryptocurrencies are backed by the goods and services of their issuers -- any person, natural or legal -- and have no transaction cost beyond operating a smartphone. Liquidity in grassroots cryptocurrencies arises from mutual credit via coin exchange among issuers. However, as grassroots coins are redeemable 1-for-1 against any other grassroots coin, the credit-forming exchange must also be 1-for-1, lest prompt redemption after exchange would leave the parties with undue profit or loss. Thus, grassroots coins are incongruent with liquidity through interest-bearing credit. Here we introduce grassroots bonds, which extend grassroots coins with a maturity date, reframing grassroots coins -- cash -- as mature grassroots bonds. Bond redemption generalises coin redemption, allowing the lending of liquid coins in exchange for interest-bearing future-maturity bonds. We show that digital social contracts -- voluntary agreements among persons, specified, fulfilled, and enforced digitally -- can express the full gamut of financial instruments as the voluntary swap of grassroots bonds, including loans, sale of debt, forward contracts, options, and escrow-based instruments, and that classical liquidity ratios are applicable just as well to grassroots bonds. Grassroots bonds may thus allow local digital economies to form and grow without initial capital or external credit, harnessing mutual trust within communities into liquidity. The formal specification presented here was implemented in GLP, a concurrent logic programming language running on Dart for smartphone deployment. The implementation is illustrated by a running multiagent village market scenario in GLP.
ORCA: An Agentic Reasoning Framework for Hallucination and Adversarial Robustness in Vision-Language Models
Large Vision-Language Models (LVLMs) exhibit strong multimodal capabilities but remain vulnerable to hallucinations from intrinsic errors and adversarial attacks from external exploitations, limiting their reliability in real-world applications. We present ORCA, an agentic reasoning framework that improves the factual accuracy and adversarial robustness of pretrained LVLMs through inference-time structured inference reasoning with a suite of small vision models (less than 3B parameters). ORCA operates via an Observe-Reason-Critique-Act loop, querying multiple visual tools with evidential questions, validating cross-model inconsistencies, and refining predictions iteratively without access to model internals or retraining. ORCA also stores intermediate reasoning traces, which supports auditable decision-making. Though designed primarily to mitigate object-level hallucinations, ORCA also exhibits emergent adversarial robustness without requiring adversarial training or defense mechanisms. We evaluate ORCA across three settings: (1) clean images on hallucination benchmarks, (2) adversarially perturbed images without defense, and (3) adversarially perturbed images with defense applied. On the POPE hallucination benchmark, ORCA improves standalone LVLMs performance by +3.64% to +40.67% across different subsets. Under adversarial perturbations on POPE, ORCA achieves an average accuracy gain of +20.11% across LVLMs. When combined with defense techniques on adversarially perturbed AMBER images, ORCA further improves standalone LVLM performance, with gains ranging from +1.20% to +48.00% across metrics. These results demonstrate that ORCA offers a promising path toward building more reliable and robust multimodal systems.
comment: Accepted at the ACM International Conference on Cloud and Big Data Computing (ICCBDC 2026)
Dual-Gated Epistemic Time-Dilation: Autonomous Compute Modulation in Asynchronous MARL
While Multi-Agent Reinforcement Learning (MARL) algorithms achieve unprecedented successes across complex continuous domains, their standard deployment strictly adheres to a synchronous operational paradigm. Under this paradigm, agents are universally forced to execute deep neural network inferences at every micro-frame, regardless of immediate necessity. This dense throughput acts as a fundamental barrier to physical deployment on edge-devices where thermal and metabolic budgets are highly constrained. We propose Epistemic Time-Dilation MAPPO (ETD-MAPPO), augmented with a Dual-Gated Epistemic Trigger. Instead of depending on rigid frame-skipping (macro-actions), agents autonomously modulate their execution frequency by interpreting aleatoric uncertainty (via Shannon entropy of their policy) and epistemic uncertainty (via state-value divergence in a Twin-Critic architecture). To format this, we structure the environment as a Semi-Markov Decision Process (SMDP) and build the SMDP-Aligned Asynchronous Gradient Masking Critic to ensure proper credit assignment. Empirical findings demonstrate massive improvements (> 60% relative baseline acquisition leaps) over current temporal models. By assessing LBF, MPE, and the 115-dimensional state space of Google Research Football (GRF), ETD correctly prevented premature policy collapse. Remarkably, this unconstrained approach leads to emergent Temporal Role Specialization, reducing computational overhead by a statistically dominant 73.6% entirely during off-ball execution without deteriorating centralized task dominance.
comment: 14 pages, 5 figures. Code available at: https://github.com/xaiqo/edtmappo. Related materials available on Zenodo: 10.5281/zenodo.19206838
AMBER: A Columnar Architecture for High-Performance Agent-Based Modeling in Python
Python is widely used for agent-based modelling because it is accessible and has a mature scientific ecosystem, but object-per-agent execution incurs interpreter overhead that restricts the population sizes feasible in interactive modelling, calibration, and parameter sweeps. This paper presents AMBER, a Python framework that stores agent state in a Polars-backed columnar table and exposes population operations through a compact view API. The framework preserves conventional model and agent abstractions while translating common population updates into compiled column operations; behaviours that do not vectorise remain expressible through a buffered object-oriented path. We evaluate AMBER on wealth transfer, random walk, and spatial SIR benchmarks against Mesa, AgentPy, SimPy, Melodie, Agents.jl, and AMBER's own loop path, using invariant checks to verify comparable model outputs before timing. Across the tested workloads, AMBER has the lowest execution time among Python-hosted implementations and achieves speedups of up to $1118\times$ over Mesa; on the largest SIR benchmark it is also faster than the Julia-based Agents.jl implementation.
FACET: Teacher-Centred LLM-Based Multi-Agent Systems-Towards Personalized Educational Worksheets
The increasing heterogeneity of student populations poses significant challenges for teachers, particularly in mathematics education, where cognitive, motivational, and emotional differences strongly influence learning outcomes. While AI-driven personalization tools have emerged, most remain performance-focused, offering limited support for teachers and neglecting broader pedagogical needs. This paper presents the FACET framework, a teacher-facing, large language model (LLM)-based multi-agent system designed to generate individualized classroom materials that integrate both cognitive and motivational dimensions of learner profiles. The framework comprises three specialized agents: (1) learner agents that simulate diverse profiles incorporating topic proficiency and intrinsic motivation, (2) a teacher agent that adapts instructional content according to didactical principles, and (3) an evaluator agent that provides automated quality assurance. We tested the system using authentic grade 8 mathematics curriculum content and evaluated its feasibility through a) automated agent-based assessment of output quality and b) exploratory feedback from K-12 in-service teachers. Results from ten internal evaluations highlighted high stability and alignment between generated materials and learner profiles, and teacher feedback particularly highlighted structure and suitability of tasks. The findings demonstrate the potential of multi-agent LLM architectures to provide scalable, context-aware personalization in heterogeneous classroom settings, and outline directions for extending the framework to richer learner profiles and real-world classroom trials.
Systems and Control (EESS)
The OAPS solution: a real-time predictive system for flexible PWR operation
This paper presents an innovative solution designed to facilitate safe and flexible operation of nuclear power plants. The purpose of this new device, named OAPS system, is to provide optimal strategies (e.g., axial offset control, xenon oscillations mitigation, effluent minimization) and real-time recommendations (e.g., dilution and boration flowrates, turbine power setpoints and variation rates) to help NPP operators perform power variations confidently and efficiently. In fact, just as a GPS navigator optimizes and modifies its planned route according to the current position of the user, the OAPS system regularly updates its recommendations based on the latest plant measurements. To achieve this, the OAPS system relies on a well-established -yet cutting-edge in the nuclear industry -advanced control technique known as model predictive control. The conventional axial offset control strategy of the OAPS system was previously validated on both Framatome's full-scope PWR simulator and EDF's full-scope N4 simulator. In this paper, three new advanced strategies are showcased on an intermediate-complexity PWR simulator developed by Framatome: 1) determination of the fastest feasible power variation rates, 2) accelerated cancellation of axial power oscillations and 3) minimization of water and boron effluents.
comment: ICAPP 2025 - International Congress on Advances in Nuclear Power Plants, SFEN, Sep 2025, Juan-les-Pins / Antibes, France
A Unified Framework for Attack-Resilient CLF-CBF Quadratic Programs for Nonlinear Control-Affine Systems
This letter introduces attack-resilient Control Lyapunov Functions (AR-CLFs) and attack-resilient Control Barrier Functions (AR-CBFs) for nonlinear control-affine systems subject to control-input false data injection attacks (FDIA) satisfying an at-most-exponentially growing envelope. The proposed framework embeds a unified adaptive compensation term into both the CLF decrease and CBF safety constraints. In contrast to input-to-state stability/safety (ISS/ISSf)-based methods that certify disturbance-dependent enlarged safe sets, the proposed approach enables finite-time recovery to the nominal safe set without requiring a prior magnitude bound on the FDIA, relying instead on a growth-rate characterization used for analysis and an online gain tuning law that regulates the compensation term. A unified quadratic program (QP) is developed to enforce the AR-CLF and AR-CBF conditions simultaneously, guaranteeing uniformly ultimately bounded (UUB) stability and uniform ultimate safety (UUS) under unbounded FDIA. Numerical results demonstrate improved resilience compared to existing ISS-CLF, ISSf-CBF, and robust CLF-CBF-QP approaches.
comment: Under review for possible publication
Hamilton--Jacobi Reachability for Spacecraft Collision Avoidance
This article presents a Hamilton--Jacobi (HJ) reachability framework for a two--satellite collision avoidance problem operating in the same circular orbit, where relative motion is modeled in the radial--tangential--normal (RTN) frame using planar Hill--Clohessy--Wiltshire (HCW) dynamics. We define the target state space as unsafe relative configurations in the orbit plane corresponding to minimum separation requirements consistent with Federal Communications Commission (FCC) orbital standards. The interaction between spacecraft is formulated as a zero--sum differential game, where Player 1 is the controlled satellite and Player 2 is modeled as a bounded adversarial disturbance with unknown intent. We present the HJ formulation and compute backward reachable sets that characterize relative states from which collision cannot be avoided under worst-case disturbances, while states outside this set admit provably collision-free trajectories. These reachable sets are integrated with supervisory hybrid control logic to determine when evasive maneuvers must be initiated, enabling mathematically grounded safety guarantees for scalability.
comment: Accepted to the 20th IEEE International Conference on Control & Automation (IEEE ICCA 2026). 6 pages, 4 figures
Enabling Real-Time Phase Control in Traffic Signal Hardware-in-the-Loop Simulation SC 2026
Advanced Traffic Signal Control (TSC) algorithms require real-time phase control, yet existing Hardware-in-the-Loop Simulation (HILS) testbeds only support pre-programmed timing plans. In this paper, we present the first HILS testbed for real-time phase control. We develop a novel middleware architecture that translates dynamic phase actions (selection, switch, and duration) into commands for NTCIP-compliant commercial hardware controllers. This middleware manages phase transitions, synchronizes signal states, and handles errors without interrupting the hardware's internal operations. Experimental validation demonstrates that the system executes real-time phase commands, handles system conflicts, and achieves a low system internal latency at sub-millisecond on average.
comment: 7 pages, 5 figures, accpeted to IEEE ITSC 2026
k-Inductive Neural Barrier Certificates for Unknown Nonlinear Dynamics
While conventional (k=1) discrete-time barrier certificate conditions impose strict safety constraints by requiring the function to be non-increasing at every step, k-inductive barrier certificates relax this by allowing a temporary increase -- up to k-1 times, each within a threshold $ε$ -- while maintaining overall safety, and improving flexibility. This paper leverages neural networks and constructs k-inductive neural barrier certificates (k-NBCs) for (partially) unknown nonlinear systems. While neural networks offer scalability in the design process, they lack formal guarantees, requiring additional approaches such as counterexample-guided inductive synthesis (CEGIS) with satisfiability modulo theories (SMT) for verification. However, the CEGIS-SMT framework requires knowledge of system dynamics, which is unavailable in practical settings. To address this, we leverage the generalization of the Willems et al.'s fundamental lemma, using a single state trajectory, to construct a data-driven representation of (partially) unknown models for SMT verification without sacrificing accuracy. Additionally, CEGIS-SMT further removes the constraint of restricting barrier certificates to specific function classes, such as sum-of-squares, enabling greater flexibility in their design. We validate our approach on three nonlinear case studies with (partially) unknown dynamics.
comment: 18 pages, 5 figures, 3rd International Conference on Neuro-Symbolic Systems (NeuS)
A New Simple-to-Configure Self-Perturbing Multivariable Extremum-Seeking Controller
This paper presents a new stochastic relay-based extremum-seeking controller (ESC) for multi-input-single-output (MISO) systems. The goal of this work was to create an algorithm that is much simpler to configure than alternative approaches making deployment to real-world problems easier. A solution is developed first for a static map and then adapted for a general class of dynamic systems. The number of configurable parameters is one per input channel for the static case and only one additional parameter is needed for the dynamic version. The problem of gradient identification is solved via the use of stochastic relay gains and a simple stability proof for the static case is presented. Simulation tests demonstrate the performance of the strategy for optimizing both static and dynamic systems.
Robust synchronization for multi-agent systems governed by PDEs with observable and unobservable disturbances
This paper investigates robust synchronization for multi-agent systems (MASs) governed by parabolic partial differential equations in the presence of both observable and unobservable disturbances. Using only boundary output measurements, a disturbance observer is designed to estimate observable Dirichlet boundary disturbances while ensuring robustness of the observer error system with unobservable disturbances occurring in the domain. Using only the reference signal and local output information, distributed synchronization controllers are then constructed to enable all agents to track the reference trajectory. In particular, exponential tracking is achieved in the absence of unobservable disturbances, while robustness is preserved when additional unobservable disturbances occur during controller implementation. We further analyze the impact of unobservable Dirichlet-Robin boundary disturbances on synchronization performance by proving the boundedness of solutions to the synchronization error system. Moreover, to characterize the influence of all disturbances, input-to-state stability (ISS) is established for the closed-loop system. For the involved systems, the generalized Lyapunov method and the recursion technique are extensively employed in the stability analysis, and the lifting technique and semigroup theory are used to prove the well-posedness. Simulation results validate the proposed control scheme, demonstrating effective disturbance estimation and rejection, robust synchronization, and the ISS properties under various scenarios.
Safe Deep Reinforcement Learning for Spacecraft Reorientation with Pointing Keep-Out Constraint
This paper implements deep reinforcement learning (DRL) with a safety filter for spacecraft reorientation control with a single pointing keep-out zone. A new state space representation is designed which includes a compact representation of the attitude constraint zone. A reward function is formulated to achieve the control objective while enforcing the attitude constraint. The soft actor-critic (SAC) algorithm is adopted to handle continuous state and action space. A curriculum learning approach is implemented for agent training. To guarantee the compliance of the attitude constraint, a control barrier function (CBF)-based safety filter is implemented for agent deployment. Simulation results demonstrate the effectiveness of the proposed state space presentation and the designed reward function. Monte Carlo simulations underscore that reward shaping alone cannot guarantee the safety during reorientation maneuver. In contrast, with the CBF-based safety filter, the constraint can be guaranteed during maneuvers.
Data-driven approximation of regions of attraction via an LP-based selection of PWA Lyapunov functions
This paper presents a method to approximate regions of attraction of unknown nonlinear dynamical systems from data. Assuming point-wise evaluations of the vector field and known Lipschitz bounds, a polyhedral uncertainty set of admissible dynamics is constructed. This uncertainty description enables the synthesis of a continuous \ac{PWA} Lyapunov candidate via a linear program, enforcing a robust decrease condition for all admissible vector fields. The approach allows certification of a region of attraction consistent with the available data. Numerical examples illustrate the effectiveness of the proposed method in extracting certified regions of attraction from sparse data.
DAG-Based QoS-Aware Dynamic Task Placement for Networked Multi-Stage Control Pipelines
Current Physical AI (PAI) relies heavily on closed-loop visual-servoing pipelines, whose perception and planning stages may become computationally intensive onboard due to complex models embedded on robots. In practice, offloading the perception task to on-site edges statically is inappropriate for latency-sensitive, precise industrial settings over a standardized industrial network. This emphasizes the importance of Control-Communication-Computing (3C) co-design in industrial automation: monolithic local execution saturates AI-accelerated machine and robot hardware, while static edge offloading exposes the control loop to network jitter. Existing adaptive task placement (ATP) controllers can partially address the gap by relocating a single pipeline stage on binary threshold rules, without a multi-stage model and an explicit cost on placement switching. In this Work-in-Progress (WiP) paper, we propose a directed acyclic graph (DAG) based quality-of-service (QoS)-aware dynamic task placement (DTP) framework for sensing-perception-planning-control pipelines in networked robotics. This pipeline is formalized as a DAG with task-level and node-level attributes for compute cost, communication delay, and feasible placement sets; over a small interpretable candidate set (fully local, static offload, hybrid), a window-based cost function combines tail end-to-end latency, deadline violation rate, hardware utilization, and a Hamming-distance switching penalty, and a DTP algorithm with hysteresis and a minimum dwell-time bounds placement chatter. Our WiP paper presents the theoretical framework, a structured qualitative analysis, and a two-phase simulation plus hardware-in-the-loop validation roadmap.
comment: 4 pages, 1 figure, 1 algorithm, accepted as a Work-in-Progress (WiP) paper, on the 24th IEEE International Conference on Industrial Informatics (INDIN), 26-29 July, 2026, Melbourne, Australia
A Closed-loop, State-centric, Multi-agent Framework for Passenger Load Estimation from Heterogeneous Data Streams SC
To support operations and passenger-facing services, transit agencies need reliable passenger load trajectories. Currently, load estimates are typically inferred from imperfect sensing systems rather than fully observed, and the accuracy of modern automatic passenger counting (APC) systems still varies with station layout, flow intensity, and operating conditions. To address the challenges of robust passenger load estimation from heterogeneous data streams, including incremental count errors, evidence conflicts, and context-dependent sensor reliability, we propose a closed-loop, state-centric, multi-agent framework. This method enforces physical feasibility at every step, allocates trust dynamically among evidence sources, and feeds physics-derived violation residuals back into training for robustness improvement. The architecture consists of a unified stop-event backbone, a coupled Perception--Physical--Fusion loop for stop-by-stop inference, and optional trip-level macro-correction and closed-loop calibration modules.
comment: Preprint version of a paper accepted by the 2026 IEEE 29th International Conference on Intelligent Transportation Systems (ITSC). 7 pages, 4 figures
Motion-Coupled Sensing: When the State Change Powers Its Own Sensing
Batteryless IoT systems have largely followed two paths: ambient-energy sensing, where energy arrival is decoupled from the event being monitored, and kinetic event telegrams, where a user actuation powers a short report of the actuation itself. Mechanically gated states expose a third case: the access motion is not only an event to report, but the moment at which a latent physical state may have changed and must be measured. We show that routine hinge motion can supply enough energy for one bounded wake-sense-transmit transaction, including ultrasonic sensing and a long-range LoRa uplink. We call this principle motion-coupled sensing and instantiate it with an open-source compact electromagnetic harvester that retrofits to bins, doors, and cabinets with no structural modification. We size the platform for the most demanding workload, waste-bin monitoring, where each actuation must power both an ultrasonic measurement and a long-range LoRa uplink. Across five campus locations and 5,945 lid actuations, the bin deployment achieves 99.3% per-event transmission reliability. Field deployments on room doors with 1,870 actuations and office cabinets with 1,636 actuations achieve 92% and 94% transmission success respectively, demonstrating that the same energy envelope transfers across hinge geometries without hardware redesign. These results show that mechanical access can be treated as a self-powered sensing transaction, removing periodic polling and scheduled battery maintenance for IoT deployments.
comment: 9 Pages, 12 Figures
Equalized Coverage in Motion Control Performance Prediction for Self-Adaptive Road Vehicles SC
Automated driving systems require monitoring mechanisms to ensure operation as intended, especially when system elements degrade and/or fail. Hence, capability monitoring is crucial in order to evaluate the system's remaining performance and implement capability-based behavior. In this paper, we investigate the dynamics of a highly over-actuated automated vehicle under actuator degradations and failures, affecting the vehicle's motion control capabilities. We propose a lightweight prediction model based on conformalized quantile regression that predicts whether an automated vehicle can be controlled with sufficiently low lateral deviation from a planned trajectory under nominal, degraded, and failed actuator conditions. We recognize that statistical guarantees should hold not only across all data (marginal coverage) but also for different regimes within the data (conditional coverage). We therefore employ equalized coverage methods to address this challenge. During runtime behavior generation our predictor can provide a heuristic for determining the admissible action space. Its application and limitations are discussed in this paper.
comment: Accepted to be published in 2026 IEEE 29th International Conference on Intelligent Transportation Systems (ITSC), Naples, Italy, September 15-18, 2026
MagCeptor: Encoding Broadcast-Addressable Logic into Magnetic Receptors
Multicellular coordination relies on broadcast-addressable receptors, yet engineered magnetic systems face an addressability bottleneck because global fields intrinsically conflate power and control. Here, we introduce MagCeptors to resolve this by encoding selectivity directly into magnetic topology. Establishing an energetic isomorphism with biological receptors, these arrays utilize local couplings to shape potential landscapes where global field vectors act as spatial keys, triggering deterministic snap-through instabilities. This architecture decouples force from source distance, achieving a density of 385 mN/mm3 (>50-fold increase over prior art). We validate this primitive through signal demultiplexing, embodied sequential logic, and untethered distributed networking. This framework enables distributed systems to orchestrate complex tasks without tethers or electronics, relying solely on the intrinsic logic of matter.
comment: 20 pages, 5 figures, journal
Revisiting angle stability in power systems with grid-forming power converters
This letter presents a comprehensive analysis of the stability phenomenon related to the ability of generators to remain in synchronism when subjected to small or large disturbances, in power systems with both synchronous machines and grid-forming voltage source converters (GFM-VSC). This phenomenon is associated with two stability classes in the IEEE/PES classification, namely, rotor-angle stability (when involving synchronous machines and slow-interaction converter-driven stability (when involving power converters). However, this work shows that this phenomenon is fully characterised with the slow dynamics of the angle difference between the voltage sources connected to the power system, regardless of whether they are synchronous machines (with rotors) or GFM-VSCs. Therefore, we suggest using the term angle stability to refer to this phenomenon, while slow-interaction converter-driven stability should only include slow interactions of different nature involving power converters.
comment: 4 pages
UAV-Assisted Cooperative Edge Inference for Low-Altitude Economy via MoE-based Hierarchical Deep Reinforcement Learning
The low-altitude economy (LAE) is reshaping the industrial landscape by deploying unmanned aerial vehicles (UAVs) to facilitate a wide range of applications demanding flexible aerial mobility. Integrating edge artificial intelligence (AI) into LAE platforms creates a compelling paradigm where UAVs provide real-time AI-driven analysis while simultaneously executing their primary aerial mission duties. However, realizing this paradigm remains challenging due to the strict mission constraints imposed by these primary duties and the throughput bottlenecks of wireless links. To bridge this gap, we propose a UAV-assisted cooperative edge inference framework where UAVs execute mission-critical LAE duties, quantified by trajectory deviations from reference paths, while concurrently supporting ground devices via intermediate feature offloading. Within this framework, UAV trajectories, inference task offloading decisions, and feature compression ratios are jointly optimized to maximize the system performance. We cast this joint optimization task into a constrained partially observable Markov decision process (POMDP) framework. To efficiently solve it, we propose HDRL-MoE, a novel hierarchical deep reinforcement learning framework that decouples the optimization of slow-varying inference decisions from rapidly changing UAV trajectory control. Furthermore, HDRL-MoE integrates a mixture-of-experts (MoE) architecture, where a router network orchestrates discrete offloading decisions while expert networks independently optimize the feature compression ratios. Extensive simulations show that HDRL-MoE achieves significant inference accuracy gains over baselines and exhibits high scalability and efficiency through its MoE design.
comment: 13 pages and 9 figures. This article was submitted to IEEE for possible publication
ERFSL: An Efficient Reward Function Searcher via Language Models for Custom-Environment Multi-Objective Optimization (Student Abstract)
We propose ERFSL, an efficient reward function searcher using large language models (LLMs) for custom-environment, multi-objective learning-based methods (LB). ERFSL generates reward components based on explicit user requirements, rectifies them using a reward critic, and iteratively optimizes the weights of these components based on textual context generated by the training log analyzer. Applied to a simulation-based benchmark task, the reward critic corrects reward codes with only one feedback iteration per requirement, and the reward weight initializer acquires diverse reward functions within the Pareto set. Even when a weight is off by a factor of 500, an average of only 5.2 iterations is needed to meet user requirements. The approach works adequately with GPT-4o mini and does not require advanced understanding capabilities.
Detecting and Mitigating Backdoor Attacks in OTA-FL Systems: A Two-Stage Robust Aggregation Scheme
Over-the-air federated learning (OTA-FL) improves communication efficiency by exploiting the superposition property of wireless channels, but this same property also creates a critical security vulnerability: the parameter server (PS) cannot access individual local updates, making it difficult to identify and exclude poisoned gradients. The challenge is further exacerbated under non-independent and identically distributed (Non-IID) training data, where benign gradient drift can closely resemble malicious updates. In this paper, we propose a two-stage robust aggregation framework for defending against backdoor attacks in OTA-FL. Under our scheme, each client is first assigned a modality-aware multi-indicator trust score, where the specific indicators are selected according to the data modality (e.g., waveform, text, image) and model architecture to capture the most discriminative footprint of backdoor updates. Based on this score, the PS then performs trust-based multiple access (TBMA) to separate clients into trusted, suspicious, and malicious categories. Suspicious clients are further examined through PS-side layer-wise inspection and a longitudinal reputation mechanism. Experimental results on several datasets demonstrate that the proposed methodology effectively suppresses stealthy backdoor attacks, including bounded-scaling attacks, Euclidean-constrained attacks, Cosine-constrained attacks, and Neurotoxin, while maintaining competitive main-task accuracy.
A New Approach for ARMA Pole Estimation Using Higher-Order Crossings
The paper describes a new method for estimating the poles of an ARMA model using higher-order crossings. The method involves transforming counts of crossing events into estimates of ARMA poles via the autocorrelation domain. An important advantage of the method is that the crossing counts are the only features that need to be stored from the original data. The poles of an ARMA model of a control loop correspond to the roots of the characteristic equation and are thus useful for evaluating control performance.
Quadratic Characterizations for Reachability Analysis of Neural Networks
Quadratic constraints (QCs) are widely used to characterize nonlinearities and uncertainties, but generic analytical characterizations can be conservative on bounded domains. This paper develops a framework for constructing verified quadratic characterizations of scalar relations in the two-dimensional real plane. Candidate quadratic inequalities are locally generated by solving convex quadratic programs using samples from the relation and exterior sample points. They are then verified globally using sum-of-squares certificates over an exact semialgebraic description or, in the case of nonpolynomial relations, over relaxed polynomial descriptions. The resulting verified constraints define a sound overapproximation of the scalar relations over the considered domains. These constraints are directly compatible with existing analysis frameworks based on QCs and pointwise integral quadratic constraints (IQCs) for static nonlinearities and uncertainties, and they can also be embedded in QC-based semidefinite programs for reachability and safety analysis of feedforward neural networks. For smooth activations such as $\tanh$, the method yields domain-dependent quadratic characterizations that constitute an alternative to generic sector- or slope-based descriptions. For ReLU networks, we give methods to reduce conservatism in QC-based reachability analysis of feedforward networks by exploiting dependencies between neurons and tighter local bounds. Numerical examples demonstrate improved reachability results for smooth activations, reduced conservatism for ReLU networks, and applicability beyond neural networks through an example involving saturation.
Max-Entropy Moment Filtering for Stochastic Hybrid Systems
Stochastic hybrid systems combine continuous-time stochastic dynamics with discrete reset events, producing intrinsically non-Gaussian and often multimodal uncertainty. A consistent propagation law must also account for boundary-induced probability flux across guard sets, making direct density propagation through hybrid Fokker-Planck equations expensive. We develop a hybrid extension of the Max-Entropy Moment Kalman Filter (MEM-KF) that performs filtering from partial statistical information by propagating a finite collection of moments through stochastic hybrid dynamics and reconstructing beliefs using moment-constrained maximum-entropy distributions. The key step is a moment propagation rule derived from Dynkin's formula with a jump-sum, in which reset effects appear as a boundary-flux correction over the guard set. This yields tractable moment dynamics without solving the underlying hybrid PDE. In a stochastic bouncing-ball example, the proposed method captures reset-induced non-Gaussianity through corrected moment equations while retaining the MEM-KF's optimization-based maximum-entropy representation.
comment: 8 pages, 6 figures
A Bounded-Confidence Model of Opinion Dynamics with Adaptive Interaction Probabilities
Models of opinion dynamics aim to capture how individuals' opinions change when they interact with each other. One well-known model of opinion dynamics is the Deffuant--Weisbuch (DW) model, which is a type of bounded-confidence model (BCM). In the DW model, agents have pairwise interactions, and they are receptive to other agents' opinions when their opinions are sufficiently close to each other. In this paper, we extend the DW model by studying it on networks with heterogeneous and adaptive edge weights between pairs of agents. These edge weights govern the interaction probabilities between the agents and thereby encode the idea that people are more likely to communicate with individuals with whom they have previously compromised or had other positive interactions. We prove theoretical guarantees of our adaptive edge-weighted DW model's convergence properties, the long-time dynamics of its edge weights, and the model's associated ``effective graph", which is a time-dependent subgraph that includes edges only between agents that are receptive to each other's opinions. We support our theoretical results with numerical simulations of our adaptive edge-weighted DW model on a variety of networks and find that including adaptive edge weights yields different qualitative dynamics for different types of networks. In particular, for small confidence bounds, we observe that incorporating adaptive edge weights decreases the convergence time for dense networks but increases the convergence time for sparse networks.
comment: 22 pages, 10 figures
Analytical PI Tuning for Second-Order Plants with Monotonic Response and Minimum Settling Time
This study presents two analytical closed-form PI controller tuning solutions for second-order plants with real poles, each achieving monotonic step response and minimum settling time. The first solution employs pole-zero cancellation, placing the controller zero at the slower plant pole and reducing the closed-loop dynamics to a critically damped second-order system. The second solution, applicable when the plant pole ratio is less than two, places all three closed-loop poles at a common location without cancelling any plant pole, yielding a closed-loop transfer function with a triple real pole and a zero. Despite retaining a closed-loop zero, this solution achieves strictly faster settling time than the pole-zero cancellation method in its region of applicability. The two solutions coincide at the boundary pole ratio of two and together form a continuous piecewise-analytical tuning covering the full range of plant pole ratios. This study further establishes that closed-loop transfer functions of the form a^n/(s + a)^n possess a maximum sensitivity Ms together with phase margin and gain margin that are independent of the pole location a and depend solely on the order n, yielding universal robustness constants for each n. A closed-form expression GM(n) = 1 + sec^n(π/n) is established for the gain margin of the family. Numerical verification confirms the analytical results across multiple plant configurations.
comment: 7 figures
On robotic manipulators with time-dependent inertial parameters: From physical consistency to boundedness of the mass matrix
We generalize the robotics equation describing the dynamics of open kinematic chains by including the effect of time-dependent change of inertial parameters as well as the effects of causative mass-density redistribution, triggered by internal movement of mass-carrying particles relative to their body-fixed frames. Time dependency of inertial parameters that results from the sole addition of mass to the robot prominently occurs during the loading of end-effectors--a scenario covered by our model without restriction from the restraint that kinematic parameters of the robot must remain constant. Further, our model also includes internal mass-density redistributions that adhere to this kinematic restraint such as trolleys attached to the robot or the movement of passengers. To accompany the generalized robotics equation with some theoretical infrastructure, we then introduce the concepts of uniform physical consistency and upper boundedness of inertial parameters under which desirable, structural properties regarding the existence of finite, positive uniform bounds of the mass matrix can be shown to carry over to the more involved case of time-dependent inertial parameters. These findings have implications for adaptive control, as they facilitate more realistic testing for robustness against unforeseen time dependencies. Moreover, the results in this paper also provide a pathway to ensuring the desirable existence of finite, positive uniform bounds of the estimated mass matrix under upper bounded, uniformly physically consistent estimation regimes.
comment: to be published in Nonlinear Dynamics
Active Learning of Fractional-Order Viscoelastic Model Parameters for Realistic Haptic Rendering
Effective medical simulators necessitate realistic haptic rendering of biological tissues that exhibit viscoelastic material properties, such as creep and stress relaxation. Fractional-order models provide an effective means of describing intrinsically time-dependent viscoelastic dynamics with few parameters, as they naturally capture memory effects. However, due to the unintuitive, frequency-dependent coupling among the order of the fractional element and other parameters, determining appropriate parameter values for fractional-order models that yield high perceived realism remains a significant challenge. In this study, we propose a systematic means of determining the parameters of fractional-order viscoelastic models that optimizes the perceived realism of haptic rendering across general populations. First, we demonstrate that the parameters of fractional-order models can be effectively optimized through active learning, using qualitative feedback-based human-in-the-loop (HiL) optimization, to ensure consistently high realism ratings for each individual. Second, we propose a rigorous method to combine HiL optimization results into an aggregate perceptual map trained on the entire dataset, and demonstrate how to select population-level optimal parameters from this representation that are broadly perceived as realistic across general populations. Finally, we provide evidence of the effectiveness of the generalized fractional-order viscoelastic model parameters for three viscoelastic materials by characterizing their perceived realism through human-subject experiments. Overall, generalized fractional-order viscoelastic models established through the proposed HiL optimization and aggregation approach possess the potential to significantly improve the sim-to-real transition performance of medical training simulators.
comment: This work has been submitted to the IEEE Transactions on Haptics for possible publication. 14 pages, 8 figures
Smoothness of the Augmented Lagrangian Dual in Convex Optimization
This paper focuses on the general linearly constrained optimization problem: $\min_{x \in \mathbb{R}^d} f(x) \ \text{s.t.} \ Ax = b$, where $f: \mathbb{R}^d \rightarrow \mathbb{R} \cup \{+\infty\}$ is a closed proper convex function, $A \in \mathbb{R}^{p \times d}$, and $b \in \mathbb{R}^p$. We define the standard dual function $φ(λ) = \inf_x \{f(x) + \langle λ, A x - b \rangle\}$, the augmented Lagrangian $\mathcal{L}_ρ(x, λ) = f(x) + \langle λ, Ax - b \rangle + \fracρ{2}\|Ax - b\|^2$ ($ρ> 0$), and the augmented Lagrangian dual function $φ_ρ(λ) = \inf_x \mathcal{L}_ρ(x, λ)$. Under the fundamental condition that $\text{dom} \ φ\neq \emptyset$, we establish that: (1) $φ_ρ$ is $\frac{1}ρ$-smooth everywhere; and (2) the solution to $\min_{x \in \mathbb{R}^d} \mathcal{L}_ρ(x, λ)$ exists for any $λ\in \mathbb{R}^p$. These theoretical findings substantially weaken the stringent assumptions typically imposed in the literature to ensure such properties.
An integration-free approach for particle flow filtering
Log-homotopy particle flow filters realize nonlinear Bayesian estimation by continuously migrating samples from the prior to the posterior distribution. This transport is governed by a pseudo-time ordinary differential equation (ODE). A major practical challenge of these filters is the need for numerical integration, which suffers from high computational cost and susceptibility to stiffness. This paper develops an exact, integration-free closed-form solution for the exact Daum--Huang deterministic particle flow under vector linear Gaussian measurements. By transforming the ODE into a specific eigenspace, we derive closed-form algebraic expressions for both the homogeneous state transition matrix and the inhomogeneous forcing term. We prove that this analytic solution is equivalent to the exact Kalman measurement update. We embed this closed-form evaluation within an $N$-step piecewise method for nonlinear measurement models. We further propose a constant contraction rate substep schedule that equalizes the per-step contraction along the eigendirection of $D$ associated with the largest eigenvalue $α_{\max}$. The result is a stiffness-mitigating, integration-free particle update for highly nonlinear measurement models. On a bearings-only tracking benchmark, it achieves the lowest error among the compared filters, at a per-update cost comparable to deterministic particle flow baselines and substantially lower than stochastic flows.
MDP-based Energy-aware Task Scheduling for Battery-less IoT
Battery-less Internet of Things (IoT) devices rely on ambient energy harvesting and therefore require scheduling policies that jointly account for energy intermittency and hard timing constraints. This challenge is especially acute in periodic monitoring applications, where a sensing--computing--transmitting task chain must be completed within each reporting cycle. In this paper, we formulate this problem within a setting characterized by independently and identically distributed (i.i.d.) energy arrivals as a long-term average-reward Markov decision process (MDP) that explicitly captures capacitor-voltage evolution, task ordering, permissible start windows, and safe-execution requirements. We further propose rewards that promote reliable task completion while penalizing risky low-energy execution. We prove that the considered MDP is unichain and that the optimal stationary policy has a threshold structure, which leads to an optimal stationary threshold-based (OSTB) scheduler. To account for more realistic energy sources, we additionally study a correlated harvesting model based on a finite-state Markov process and show that the proposed framework can be applied to this richer setting under conservative sufficient conditions. Finally, numerical results show that OSTB outperforms representative baselines in terms of long-term full-chain completion rate, power failures, and latency, particularly when harvested energy is scarce.
SKYLINK: Scalable and Resilient Link Management in LEO Satellite Network
The rapid growth of space-based services has established LEO satellite networks as a promising option for global broadband connectivity. Next-generation LEO networks leverage inter-satellite links (ISLs) to provide faster and more reliable communications compared to traditional bent-pipe architectures, even in remote regions. However, the high mobility of satellites, dynamic traffic patterns, and potential link failures pose significant challenges for efficient and resilient routing. To address these challenges, we model the LEO satellite network as a time-varying graph comprising a constellation of satellites and ground stations. Our objective is to minimize a weighted sum of average delay and packet drop rate. Each satellite independently decides how to distribute its incoming traffic to neighboring nodes in real time. Given the infeasibility of finding optimal solutions at scale, due to the exponential growth of routing options and uncertainties in link capacities, we propose SKYLINK, a novel fully distributed learning strategy for link management in LEO satellite networks. SKYLINK enables each satellite to adapt to the time-varying network conditions, ensuring real-time responsiveness, scalability to millions of users, and resilience to network failures, while maintaining low communication overhead and computational complexity. To support the evaluation of SKYLINK at global scale, we develop a new simulator for large-scale LEO satellite networks. For 25.4 million users, SKYLINK reduces the weighted sum of average delay and drop rate by 29% compared to the bent-pipe approach, and by 92% compared to Dijkstra. It lowers drop rates by 95% relative to k-shortest paths, 99% relative to Dijkstra, and 74% compared to the bent-pipe baseline, while achieving up to 46% higher throughput. At the same time, SKYLINK maintains constant computational complexity with respect to constellation size.
Online Learning-Based Control with Guaranteed Error Bounds for a Class of Nonlinear Systems
In this paper, we present a learning-based control for a class of nonlinear systems that guarantees exponential stability as well as bounded output errors. The control is based on the Gaussian Process Submodel Online Learning (GPSOL) algorithm and the Disturbance Error Rate Limiting (DERL) algorithm, both of which were developed in previous work. The GPSOL algorithm provides a method to learn Gaussian Process (GP) models for subsystems online, whereas the DERL algorithm allows to limit the rate of the prediction error of these GP models. The focus of this paper is the utilization of the GP model within an adaptive controller and the derivation of corresponding stability conditions and system peak-to-peak gains by means of linear matrix inequalities (LMIs). These peak-to-peak gains are then used to prescribe a desired prediction error rate for the DERL algorithm to achieve user-defined output error bounds. The gains and the related bounds were successfully verified using a simulation model. Furthermore, results form a successful experimental validation of the bounds and the overall control structure on a pneumatic test rig are presented. While the control scheme and error bounds proposed in this paper are limited to first-order single-input-single-output systems, an extension to certain classes of higher-order and multiple-input-multiple-output systems is expected to be forthcoming.
comment: Accepted at IFAC 2026 (23rd IFAC World Congress, Busan, Korea)
Solution Sets for Inverse Infinite-Horizon Linear-Quadratic Descriptor Differential Games
In this letter, we study a model-based inverse problem for infinite-horizon linear-quadratic differential games with descriptor dynamics. Given an observed feedback strategy profile, we seek to identify all cost functions that rationalize it as a feedback Nash equilibrium; this collection is referred to as the solution set. We characterize the solution set, show that it is rectangular and convex, and provide an algorithm for computing an admissible realization whenever it is nonempty. We also show that, compared with the corresponding inverse problem for standard state-space dynamics, descriptor dynamics modify the geometry of the solution set and may reduce identifiability. Finally, we illustrate the results with numerical examples.
A Unified Framework for Multi-Stability Constrained Optimization in IBR-Dominated Power Systems
Conventional optimization frameworks for power-system operation and planning primarily focus on steady-state conditions, which become increasingly inadequate as rising penetrations of inverter-based resources (IBRs) strengthen the coupling between stability and steady-state operating conditions. Meanwhile, the software-defined nature of IBRs provides additional flexibility to co-optimize operating points and dynamic behavior. This paper proposes a unified stability-constrained optimization framework that incorporates synchronization, voltage, and frequency stability within a single scheduling model. Established stability criteria are selected and translated into explicit operational limits, after which a general formulation is developed to embed all three criteria in a common structure. The resulting second-order cone (SOC) constraints are convex and can be integrated seamlessly into existing optimization models. The proposed framework enables the simultaneous pursuit of economic efficiency and multi-dimensional stability enhancement, providing a tractable pathway for secure operation in future IBR-dominated power systems.
Sensor Attack Detection Method for Encrypted State Observers
This paper proposes an encrypted state observer that is capable of detecting sensor attacks without decryption. We first design a state observer that operates over a finite field of integers with the modular arithmetic. The observer generates a residue signal that indicates the presence of attacks under sparse attack and sensing redundancy conditions. Then, we develop a homomorphic encryption scheme that enables the observer to operate over encrypted data while automatically disclosing the residue signal. Unlike our previous work restricted to single-input single-output systems, the proposed scheme is applicable to general multi-input multi-output systems. Given that the disclosed residue signal remains below a prescribed threshold, the full state can be recovered as an encrypted message.
comment: Accepted to IFAC World Congress 2026
Beyond Bounded Noise: Stochastic Set-Membership Estimation for Nonlinear Systems
In this paper, we derive a novel procedure for set-membership estimation of dynamical systems affected by stochastic noise with unbounded support. Employing a bound on the sample covariance matrix, we are able to provide a finite- sample uncertainty set containing the true system parameters with high probability. Our approach can be natively applied to a wide class of nonlinear systems affected by sub-Gaussian noise. Our analysis provides conditions under which the proposed uncertainty set converges to the true system parameters and establishes an upper bound on the convergence rate. The proposed uncertainty set can be used directly for robust controller synthesis with probabilistic stability and performance guarantees. Concluding numerical examples demonstrate the advantages of the proposed formulation over established approaches.
Compliant Explicit Reference Governor for Contact Friendly Robotic Manipulators
This paper introduces the Compliant Explicit Reference Governor (CERG), a modular reference management system that enables robots to interact physically with their environment under provable guarantees. The CERG is an intermediate layer that can be placed between a high-level planner and a low-level controller: it enforces operational constraints and enables smooth transitions between free-motion and contact operations. The CERG ensures safety by limiting the total energy available to the robotic arm at the time of contact. In the absence of contact, however, the CERG does not penalize the system performance. Simulation and hardware experiments validate the CERG on increasingly complex systems.
comment: Updated paper with current contributions and author list , accepted at IFAC World Congress, Busan, 2026
A State-Space Representation of Coupled Linear Multivariate PDEs and Stability Analysis using SDP
Physical processes evolving in both time and space are often modeled using Partial Differential Equations (PDEs). Recently, it has been shown how stability analysis and control of coupled PDEs in a single spatial variable can be more conveniently performed using an equivalent Partial Integral Equation (PIE) representation. The construction of this PIE representation is based on an analytic expression for the inverse of the spatial differential operator, $\partial_s^{d}$, on the domain defined by boundary conditions. In this paper, we show how this univariate representation may be extended inductively to multiple spatial variables by representing the domain as the intersection of lifted univariate domains. Specifically, we show that if each univariate domain is well-posed, then there exists a readily verified consistency condition which is necessary and sufficient for existence of an inverse to the multivariate spatial differential operator, $D^α=\partial_{s_1}^{α_1}\cdots\partial_{s_N}^{α_N}$, on the PDE domain. Furthermore, we show that this inverse is an element of a $*$-algebra of Partial Integral (PI) operators defined by polynomial semi-separable kernels. Based on this operator algebra, we show that the evolution of any suitably well-posed linear multivariate PDE may be described by a PIE, parameterized by elements of the PI algebra. A convex computational test for PDE stability is then proposed using a positive matrix parameterization of positive PI operators, and software (PIETOOLS) is provided which automates the process of representation and stability analysis of such PDEs. This software is used to analyze stability of 2D heat, wave, and plate equations, obtaining accurate bounds on the rate of decay.
Neural Configuration-Space Barriers for Manipulation Planning and Control
Planning and control for high-dimensional robot manipulators in cluttered dynamic environments require computational efficiency and robust safety guarantees. Inspired by recent advances in learning configuration-space distance functions (CDFs) as representations of robot bodies, we propose a unified approach for motion planning and control that formulates safety constraints as CDF barriers. A CDF barrier approximates the local free configuration space, substantially reducing the number of collision-checking operations during motion planning. However, learning a CDF barrier with a neural network and relying on online sensor observations introduces uncertainties that must be considered during control synthesis. To address this, we develop a distributionally robust CDF barrier formulation for control that accounts for modeling errors and sensor noise without assuming a known underlying distribution. Simulations and hardware experiments on a UFactory xArm6 manipulator show that our neural CDF barrier formulation enables efficient planning and robust safe control in cluttered and dynamic environments, relying only on onboard point-cloud observations.
AMBER: A Columnar Architecture for High-Performance Agent-Based Modeling in Python
Python is widely used for agent-based modelling because it is accessible and has a mature scientific ecosystem, but object-per-agent execution incurs interpreter overhead that restricts the population sizes feasible in interactive modelling, calibration, and parameter sweeps. This paper presents AMBER, a Python framework that stores agent state in a Polars-backed columnar table and exposes population operations through a compact view API. The framework preserves conventional model and agent abstractions while translating common population updates into compiled column operations; behaviours that do not vectorise remain expressible through a buffered object-oriented path. We evaluate AMBER on wealth transfer, random walk, and spatial SIR benchmarks against Mesa, AgentPy, SimPy, Melodie, Agents.jl, and AMBER's own loop path, using invariant checks to verify comparable model outputs before timing. Across the tested workloads, AMBER has the lowest execution time among Python-hosted implementations and achieves speedups of up to $1118\times$ over Mesa; on the largest SIR benchmark it is also faster than the Julia-based Agents.jl implementation.
Analysis and Design of Spare Strategy for Large-Scale Satellite Constellation Using Direct Insertion under (r,q) Policy
This paper introduces a Markov chain-based approach for the analysis and optimization of spare-management policies in large-scale satellite constellations. Focusing on the direct strategy, we model spare replenishment as a periodic-review reorder-point/order-quantity policy, where spares are deployed directly to constellation planes. The stochastic behavior of satellite failures and launch vehicle lead times is captured through Markov representations of both failure and replenishment dynamics. Based on this efficient and accurate framework, we construct and solve an optimization problem aimed at minimizing operational costs. The effectiveness of the proposed method is demonstrated through a case study using a real-world mega-constellation.
Active Defense Against False Data Injection Attacks in Robotic Manipulators
Robotic systems are vulnerable to False Data Injection Attacks (FDIAs), where adversaries corrupt sensor signals to gain malicious control. Feedback linearization exposes robotic systems to integrator vulnerability, making them susceptible to stealthy attacks that can cause significant deviations in end-effector behavior without raising alarms. This paper addresses the resilience of manipulators against finite-horizon FDIAs by formalizing two defense methods, namely anomaly-aware virtual damping and manipulability reduction, with probabilistic guarantees on nominal task execution. Simulations on a 7-DOF redundant manipulator show that the proposed defenses substantially reduce the impact of FDIA compared to using solely a threshold-based ADS like the Chi-squared, while preserving nominal task performance in the absence of attack.
comment: Extended 8-page version containing full proofs. An abridged 6-page version has been accepted for publication in the Proceedings of the 23rd IFAC World Congress (2026). v2: Minor typographical fixes and updated reference formatting
When Does Adaptation Win? Scaling Laws for Meta-Learning in Quantum Control
Quantum hardware suffers from intrinsic device heterogeneity and environmental drift, forcing practitioners to choose between suboptimal non-adaptive controllers or costly per-device recalibration. We derive a scaling law lower bound for meta-learning showing that the adaptation gain (expected fidelity improvement from task-specific gradient steps) saturates exponentially with gradient steps and scales linearly with task variance, providing a quantitative criterion for when adaptation justifies its overhead. Validation on quantum gate calibration shows negligible benefits for low-variance tasks but >40% fidelity gains on two-qubit gates under extreme out-of-distribution conditions (10$\times$ the training noise), with implications for reducing per-device calibration time on cloud quantum processors. Further validation on classical linear-quadratic control confirms these laws emerge from general optimization geometry rather than quantum-specific physics. We further introduce a few-shot pre-adaptation protocol that estimates the optimal adaptation budget from $N{=}3$-5 probe steps within 3-19% relative error across out-of-distribution regimes.
comment: 28 pages, 11 figures
Region of Attraction Estimation for Linear Quadratic Regulator, Linear and Robust Model Predictive Control on a Two-Wheeled Inverted Pendulum
Nonlinear underactuated systems such as two-wheeled inverted pendulums (TWIPs) exhibit a limited region of attraction (RoA), which defines the set of initial conditions from which the closed-loop system converges to the equilibrium. The RoA of nonlinear and constrained systems is generally nonconvex and analytically intractable, requiring numerical or approximate estimation methods. This work investigates the estimation of the RoA for a TWIP stabilized under three model-based control strategies: saturated linear quadratic regulator (LQR), linear model predictive control (MPC), and constraint tightening MPC (CTMPC). We first derive a Lyapunov-based invariant set that provides a certified inner approximation of the RoA. Since this analytical bound is highly conservative, a Monte Carlo-based estimation procedure is then employed to obtain a more representative approximation of the RoA, capturing how the controllers behave beyond the analytically guaranteed region. The proposed methodology combines analytical guarantees with data-driven estimation, providing both a formally certified inner bound and an empirical characterization of the RoA, offering a practical way to evaluate controller performance without relying solely on conservative analytical bounds or purely empirical simulation.
comment: 6 pages, 2 figures, accepted for presentation at ICCAD 2026
PID Tuning via Desired Step Response Curve Fitting
This paper presents a PID tuning method based on step response curve fitting (PID-SRCF) that utilizes L2-norm minimization for precise reference tracking and explicit transient response shaping. The algorithm optimizes controller parameters by minimizing the root-mean-square error between desired and actual step responses. The proposed approach determines optimal PID parameters by matching any closed-loop response to a desired system step response. Practically a first-order plus time delay model or a second-order system with defined settling time and overshoot requirements are preferred. The method has open-source implementation using constrained nonlinear optimization in MATLAB. Comparative evaluations demonstrate that PID-SRCF can replace known analytical methods like Ziegler Nichols, Lambda Tuning, Pole Placement, Dominant Pole and MATLAB proprietary PID tuning applications.
comment: 4 tables, 4 figures
Robotics
RGB-only Active 3D Scene Graph Generation for Indoor Mobile Robots
Current approaches to 3D scene graph generation rely on dedicated depth sensors, such as LiDAR or RGB-D cameras, for metric 3D reconstruction. This limits deployment to specialized robotic platforms and excludes settings where only RGB cameras are available, such as fixed external infrastructure. Existing pipelines also typically operate on passively collected observation trajectories, rather than selecting viewpoints based on the partially built scene representation, and therefore fail to effectively exploit the semantic and spatial information encoded within the graph during exploration. This paper presents a fully visual framework for the active, incremental construction of 3D scene graphs from RGB input only, addressing both limitations. The proposed approach unifies perception and planning around a shared structured representation that captures object semantics, 3D geometry, relational context, and information from multiple viewpoints. Because the framework is hardware-agnostic and relies only on RGB observations, it can incorporate inputs from both onboard robot cameras and fixed external cameras within the same representation. Experiments on the Replica dataset show that the RGB-only pipeline achieves F1-score parity with baselines using ground-truth depth. Active exploration experiments on ReplicaCAD further show that semantic-driven viewpoint selection detects more than twice as many objects as a geometric frontier-based baseline under the same exploration budget. Finally, the external-camera setting demonstrates that complementary RGB views can effectively bootstrap the scene graph and improve contextual understanding at no additional exploration cost.
Fixed External Cameras as Common Prior Maps for Active 3D Scene Graph Generation
Commonly available prior information, such as BIM models, floor plans, and remote sensing images, can provide valuable geometric and semantic context for autonomous robotic systems. In this paper, we treat observations from fixed external RGB cameras as Common Prior Maps (CPMs): wide-field views of the environment that initialize a semantic and geometric scene prior before any robot motion begins. We present an RGB-only framework for active, incremental 3D scene graph (3DSG) generation that seamlessly fuses observations from both onboard robot cameras and fixed external cameras within a single hardware-agnostic pipeline. By relying solely on RGB observations processed by a feed-forward 3D reconstruction model, the system treats all cameras - onboard or external - identically, requiring no hardware modifications. A graph-based active semantic exploration framework then directly leverages the partial scene graph to guide the robot toward regions of high semantic uncertainty, progressively completing and refining the prior. Experiments demonstrate that bootstrapping the scene graph with even a single external camera increases initial object recall by up to +79%, and that the richer context of the prior significantly improves the efficiency of subsequent active exploration.
TaskGround: Structured Executable Task Inference for Full-Scene Household Reasoning
In real home deployments, household agents must often operate from a complete household scene and a situated household request, rather than from a clean task specification. Such requests require agents to identify task-relevant entities, recover intended task conditions, and resolve ordering constraints from the surrounding scene context. We formalize this capability as full-scene household reasoning: given a complete household scene and a situated household request, an agent must infer executable task structure before producing a grounded skill-level action sequence. This setting is challenging because complete household scenes contain substantial task-irrelevant information, making direct complete-scene prompting inefficient and error-prone. In practical deployment, this challenge is further amplified by privacy and local compute constraints, which favor compact open-weight models with limited long-context reasoning ability. We propose TaskGround, a training-free and model-agnostic Ground-Infer-Execute framework that grounds complete scenes into compact task-relevant scene slices, infers executable task structure, and compiles it into grounded skill-level action sequences. To evaluate this setting, we introduce FullHome, a human-validated evaluation suite of 400 household tasks spanning diverse home-scale environments and both goal-oriented and process-constrained requirements. On FullHome, TaskGround improves task success rates by large margins across both proprietary and open-weight models. Notably, it makes Qwen3.5-9B competitive with GPT-5 under direct complete-scene prompting while reducing total input-token cost by up to 18x. Our results identify executable task-structure inference as a central bottleneck in full-scene household reasoning and show that structured grounding can make compact local models substantially more effective for practical household deployment.
comment: Project page: https://aaronfengzy.github.io/TaskGround/
4DLidarOpen: An Open 4D FMCW Lidar Dataset for Motion-Aware Autonomous Driving
We present 4DLidarOpen, a large-scale open multi-modal dataset for autonomous driving, centered on 4D frequency-modulated continuous-wave (FMCW) Lidar sensing. Unlike conventional time-of-flight Lidar datasets that mainly provide geometric measurements, 4DLidarOpen includes point-wise radial velocity measurements from a forward-facing 4D FMCW Lidar, together with multiple Lidars of different types, including rotating, solid-state, and blind-spot variants, surround-view cameras, and 6-DOF ego-vehicle poses. The dataset was collected in complex urban environments in Beijing and covers dense pedestrian interactions, congested traffic, high-speed driving, and unprotected maneuvers. 4DLidarOpen provides synchronized multi-sensor data and 3D bounding-box annotations with persistent track IDs across five object categories. A hybrid annotation strategy is adopted, where large-scale auto-labeled data support scalable training and human experts refine annotations for the human-annotated training and validation sets. Based on this dataset, we establish benchmarks for 3D object detection, birds-eye view (BEV) segmentation and flow prediction, and motion forecasting with planning. Extensive experiments show that direct velocity measurements from 4D FMCW Lidar provide complementary motion cues for dynamic-scene understanding. Compared with geometric-only sensing, the velocity-aware representation improves motion-related perception and downstream forecasting and planning, especially in scenarios involving vulnerable road users and fast-moving objects. These results indicate that 4D FMCW Lidar is a promising sensing modality for motion-aware autonomous driving. The dataset and evaluation toolkit are publicly released to support research on 4D scene understanding, multi-Lidar fusion, and velocity-aware perception and planning.
comment: 15pages, 9 figures
Bench2Drive-Robust: Benchmarking Closed-Loop Autonomous Driving under Deployment Perturbations
Robustness is a critical requirement for deploying autonomous driving systems in the real world. Existing robustness benchmarks for autonomous driving have made important progress in studying the effects of image-level corruptions, such as adverse weather or camera degradation, on perception modules and open-loop planning outputs. However, deployment can also involve system-level imperfections, such as inference latency and ego-state estimation errors, which remain less studied in closed-loop E2E-AD evaluation. These imperfections can accumulate through the feedback loop and destabilize control. In this work, we present Bench2Drive-Robust, to our knowledge the first device-centric robustness benchmark for closed-loop end-to-end autonomous driving under realistic deployment perturbations. We systematically evaluate deployment-oriented perturbations arising from three major sources: camera-stream failures (frame drop, partial observation), ego-state estimation errors (GPS noise, and speed or odometry errors), and compute-induced control delay (model inference delay). We evaluate representative end-to-end driving methods and analyze their robustness under different perturbation severities. Our results show that these deployment-related perturbations can substantially degrade closed-loop driving performance, revealing robustness challenges that are not fully captured by conventional image-level corruption evaluations. By establishing a closed-loop evaluation protocol and demonstrating the substantial impact of these deployment-oriented perturbations, Bench2Drive-Robust defines practical robustness problems for end-to-end autonomous driving and encourages further research on deployment-aware robust driving systems.
FUSE: A Framework for Unified State Estimation in Robotic SLAM Systems
Tightly coupled SLAM formulations under mixed-rate sensing often bind temporal processing, local geometric association, estimator formulation, and map-update policy into method-specific designs. Such binding makes it difficult to vary one design choice without re-engineering the rest of the state-estimation process. This paper presents FUSE, a framework for unified state estimation in robotic SLAM systems. FUSE organizes the state-estimation interface around observation ingestion, propagation, update, and state query, and uses this interface to separate temporal processing, residual-ready local geometric association, estimator formulation, and map-update policy. A LiDAR--IMU instantiation is developed to examine the framework under mixed-rate sensing and directional degeneracy, where high-rate inertial propagation, LiDAR-triggered geometric update, residual screening, and degeneracy-aware correction operate through the same interface boundaries. On a 418 m loop-corridor sequence, the instantiation reports a 1.626~m end-to-end trajectory error, corresponding to a 7.9% relative error reduction compared with Faster-LIO, the lowest-error baseline on this sequence. The results support FUSE as a framework for organizing state-estimation design choices and show how the evaluated instantiation regularizes updates along weakly observable directions.
Confidence-Gated Robot Autonomy: When Does Uncertainty Actually Help? ICRA 2026
Robotic systems often use predictive uncertainty to decide whether to act autonomously or defer to a fallback policy. In threshold-gated autonomy, uncertainty matters mainly through its ability to rank likely errors. Standard metrics such as expected calibration error and AUROC do not directly test whether uncertainty changes act/defer decisions. We therefore evaluate uncertainty using Spearman rank correlation, paired bootstrap equivalence testing, and act/defer agreement. Across three temporal activity-recognition benchmarks, we find a dataset-dependent competence regime below which uncertainty provides a weak and unstable error ranking. Above this regime, softmax heuristics, MC Dropout, and ensembles produce similar gating behavior, while threshold choice has a much larger effect on execution outcomes. A multi-seed embodied simulation shows the same pattern for collision rate and cost once realized autonomy is matched. Under temporal covariate shift, ranking quality remains stable, but fine grained semantic OOD detection remains near chance. These results suggest that simple uncertainty proxies can suffice for selective gating once the base model is competent, but not for semantic novelty detection.
comment: ICRA 2026 workshop paper
Scenario Generation in Roundabouts with Adjustable Interaction Intensity
Roundabouts, characterized by frequent merging and yielding interactions, remain a safety-critical corner case for the development and testing of intelligent driving functions. However, extracting sufficient near-critical scenarios from naturalistic data is inefficient. Most existing scenario generation methods provide limited controllability over interaction intensity and criticality, making systematic safety testing and detailed analysis difficult. This paper presents an interaction-aware roundabout scenario generator with continuously adjustable interaction intensity. Geometric routes and temporal progress profiles are first decoupled and mapped to latent codes using pretrained autoencoders. Conditional latent generation is then performed with Wasserstein Generative Adversarial Networks (WGAN) to generate scenarios. Yielding is modeled as a controllable timing intervention via a compact yield code during the approach-to-entry segment, where interaction intensity is modulated by scaling the code with a factor $λ$. Results demonstrate enhanced timing-latent fidelity and plausible interaction responses compared to a baseline model. Under criticality-calibrated scaling, increasing $λ$ expands the safety margin, providing a scalable and controlled testing mechanism.
See Silhouettes in Motion with Neuromorphic Vision
Quasi-bimodal objects, such as text, road signs, and barcodes, play a basic yet vital role in daily visual communication. By boiling these down to clear silhouettes, binarization uses a minimal language to convey essential vision cues for maximum downstream efficiency. The catch is that frame-based imaging often struggles on mobile platforms like drones, self-driving cars, and underwater vehicles. In these dynamic scenes, rapid motion and harsh lighting can make it blind, causing severe motion blur and erasing crucial details. To overcome the limits, neuromorphic vision via event cameras, featuring microsecond-level temporal resolution and high dynamic range, steps in as a natural solution. Building upon this event-driven sensing paradigm, we introduce a simple yet effective dual-modal approach that harnesses the synergy between frames and events to achieve real-time, high-frame-rate binarization on CPU-only devices. Extensive evaluations present that it earns competitive performance against leading techniques in reducing motion blur, while delivering impressive improvements under challenging illumination. Besides, our asynchronous workflow bypasses event scarcity that breaks traditional time-binning reconstruction, maintaining clear target shapes even at extreme kilohertz frame rates. Its binary results further serve as reliable representations that facilitate a range of downstream tasks. This work paves the way towards lightweight perception and interaction in embodied intelligence on resource-constrained edge platforms.
comment: 12 pages, 12 figures, and 3 tables. This work is under review. Project page: https://github.com/pz-even/event_binarization
Active Defense Against False Data Injection Attacks in Robotic Manipulators
Robotic systems are vulnerable to False Data Injection Attacks (FDIAs), where adversaries corrupt sensor signals to gain malicious control. Feedback linearization exposes robotic systems to integrator vulnerability, making them susceptible to stealthy attacks that can cause significant deviations in end-effector behavior without raising alarms. This paper addresses the resilience of manipulators against finite-horizon FDIAs by formalizing two defense methods, namely anomaly-aware virtual damping and manipulability reduction, with probabilistic guarantees on nominal task execution. Simulations on a 7-DOF redundant manipulator show that the proposed defenses substantially reduce the impact of FDIA compared to using solely a threshold-based ADS like the Chi-squared, while preserving nominal task performance in the absence of attack.
comment: Extended 8-page version containing full proofs. An abridged 6-page version has been accepted for publication in the Proceedings of the 23rd IFAC World Congress (2026)
TacSE3: Equivariant SE(3) Motion Estimation from Low-Texture Visuotactile Images for In-Gripper Tracking and Compensation
Robotic in-hand manipulation requires reliable object-motion tracking under frequent visual occlusion, yet low-texture visuotactile images provide few stable correspondences for conventional image- or geometry-matching methods. This paper presents TacSE3, a tactile motion-estimation pipeline that converts low-texture visuotactile observations into a decoupled three-dimensional force field and estimates incremental rigid-body motion on SE(3). The method derives planar translation from contact-centroid motion and estimates rotation primarily from shear-related tactile responses, yielding a physically interpretable signal for in-gripper tracking and compensation. Experiments with paired DM-Tac fingertip sensors show that dual-sensor sensing reduces translation-rotation ambiguity, supports rotation tracking across axes and object geometries, and provides a lightweight compensation signal that improves disturbance tolerance in downstream manipulation tasks without retraining the base policy.
Transfer Learning for Customized Car Racing Environments
Transfer Learning, a technique where a model/agent can use the knowledge/expertise that it gained from one task and exploit that to solve another closely-related task, is often used in tackling problems in deep learning. Through this project, we explore transfer learning in the purview of deep reinforcement learning. Specifically, we want to use transfer learning to achieve the fast lap times in OpenAI's Car racing environment by training the agent on one circuit, and racing it on other customized target environments by zero-shot transfer or by additional fine-tuning. In addition, we compare the performance of model-based and model-free approaches, and observe that model-based approaches dominate in performance and converge faster than model-free approaches in this environment. We observe that transfer learning in most setups not only boosts the performance on the target domain, but also shows high performance ability during learning.
Learning-Based Adaptive Control for Surgical Robotic Exposure Task on Deformable Tissues
In various surgical procedures, regions of interest (ROIs) such as organs or lesions are often occluded by overlying tissues, requiring surgeons to achieve adequate exposure for precise intervention. However, the irregular geometry, nonlinear biomechanical properties of overlying tissues, and limited intraoperative visibility of the ROI pose significant challenges to the autonomous execution of tissue retraction. To address this, we formulate a realistic model of the tissue retraction task and propose a learning-based adaptive control framework for achieving ROI exposure. The method optimizes control inputs online by monitoring changes in the visual boundary of the tissue, while leveraging a deep deformation estimation model trained on simulation data to identify the optimal grasping point and ensure the convergence and safety of the adaptive controller. Through simulations and real-world experiments on different deformable materials, it has been demonstrated that this framework exhibits zero-shot adaptation to similar tasks and can complete the autonomous retraction process, from initial grasp selection to full ROI exposure. Therefore, it has the potential to be applied in actual surgical assistance scenarios.
comment: Accepted to Robotics: Science and Systems (RSS) 2026. 12 pages, 9 figures
WorldArena 2.0: Extending Embodied World Model Benchmarking on Modality, Functionality and Platform
World models have emerged as a central paradigm for embodied intelligence, enabling agents to predict action-conditioned future and reason about environmental dynamics. However, existing embodied world model benchmarks are still largely confined to vision-only prediction, offline embodied applications, and simulator-based evaluation, making them insufficient for assessing increasingly comprehensive world models. In this work, we introduce WorldArena 2.0, an expanded benchmark that systematically broadens embodied world model evaluation along three dimensions: modality, functionality, and platform. Along the modality dimension, WorldArena 2.0 extends evaluation from vision-only to visuotactile modalities, enabling assessment of multimodal perception and prediction. Along the functionality dimension, it extends beyond policy evaluation and planning to assess world models as interactive RL environments for policy optimization. Along the platform dimension, it moves beyond simulator-only evaluation to a diverse suite of simulated and real-world robotic settings across multiple embodiments. Under a standardized protocol, WorldArena 2.0 comprehensively evaluates perceptual quality, interactive utility, and cross-platform performance, providing a comprehensive testbed for tracking progress toward embodied world models. The benchmark is available at: https://world-arena.ai.
A Dexterous and Compliant Gripper With Soft Hydraulic Actuation for Microgravity Manipulation ICRA 2026
Astrobee's existing one-degree-of-freedom (DOF) underactuated compliant claw gripper enables perching on the International Space Station (ISS), but provides limited capability for continuous dexterous manipulation. More complex microgravity tasks require an end-effector that can maintain stable contact while limiting disturbance to the free-flying base, since contact forces directly couple into base motion. This article presents the integration of DexCoHand, a dexterous and compliant two-finger, 6-DOF gripper, with the Astrobee free-flying robot for microgravity manipulation. The system is evaluated in MuJoCo using Astrobee's standard handrail perching sequence, including approach, perching, and subsequent pan and tilt motions. Compared with Astrobee's existing gripper, DexCoHand preserves the commanded pan and tilt motions while reducing unintended cross-axis base motion. Hardware experiments on Earth further demonstrate DexCoHand's dexterous manipulation capabilities and its potential for more adaptable intelligent manipulation tasks.
comment: Accepted to the IEEE ICRA 2026 Space Robotics Workshop (SRW). 4 pages, 3 figures
Virtues of Ordered Chaos: Planning with Topple Actions in Tabletop Stack Rearrangement
Efficient object manipulation strategies have significant impact in automation applications. In this work, the stack rearrangement in tabletop settings is studied, with a focus on augmenting the task planning domain with richer nonprehensile aggregating actions, in particular the toppling of objects from a stack to the table. Toppling can compress long sequences of intermediate relocations. Computed plans need to interleave pick-and-place actions with topple throughout its plan based on the problem. In order to generate the task plan and model an abstraction to compute solutions that include both pick-and-place and topple actions, a novel aggregating gadget for topple is introduced. Using this directed graphical abstraction, candidate task plan computation becomes a variant of the pebble motion problem, treating objects as pebbles. Benchmarks are then reported in a IsaacSim-based physics simulation. Results highlight clear benefits of achieving faster execution than solely using pick-and-place actions. Though this work primarily investigates the topple action, we demonstrate that similar abstractions can model other aggregating actions of interest, like scoop. The current work provides a preliminary, strong indication of the promising benefits of abstractions for rich object interactions in manipulation applications.
comment: 8 pages, 7 figures
Optimal Knock-Pick Planning for Tightly Packed Tabletop Blocks With Parallel Grippers
Rearranging densely packed tabletop objects is challenging when parallel-gripper picks are infeasible without sufficient clearance around an object. This work studies the problem characteristics for practically motivated settings with uniformly sized blocks placed at planar tabletop grid locations. Since purely prehensile removal can become infeasible, a directional knock primitive is therefore introduced and the optimal knock-pick variant of the problem is formulated. The work proposes a series of abstractions wherein minimal constraining gadgets are covered to identify the necessary knocks. Utilizing a maximum-weight perfect matching on a graphical abstraction yields efficient polynomial-time computation of the optimal plan that minimizes the number of actions. Experiments are reported for increasing grid sizes in synthetic settings as well as in IsaacSim. The theoretical observations provide a promising stepping stone towards rigorously building efficient manipulation strategies that interleave prehensile and non-prehensile actions.
comment: Accepted by WAFR 2026, 18 pages, 6 figures
CosFly-Track: A Large-Scale Multi-Modal Dataset for UAV Visual Tracking via Multi-Constraint Trajectory Optimization
Recent aerial vision-language navigation (VLN) datasets have grown rapidly, but they primarily address goal-oriented navigation to static destinations, leaving UAV visual tracking -- continuously following a moving target while maintaining visibility -- largely without dedicated training data. We introduce CosFlyTrack, a large-scale multi-modal dataset and scalable generation pipeline for UAV visual tracking in urban environments. The dataset provides approximately 12,000 expert and perturbed UAV trajectories generated from 6,000 pedestrian paths, comprising 2.4 million timesteps (approximately 334 hours) with seven aligned data channels: RGB, metric depth, semantic segmentation, six-degree-of-freedom drone pose, target state with visibility flag, bilingual (Chinese-English) instructions, and trajectory-pair metadata. To generate high-quality expert trajectories, we develop MuCO, a multi-constraint optimizer that plans directly in continuous three-dimensional space with BVH-accelerated collision and visibility queries, jointly enforcing target visibility, viewpoint quality, collision avoidance, smoothness, and kinematic feasibility, avoiding the discretization artifacts and post-hoc smoothing of grid-based planners. Fine-tuning experiments on seven vision-language models show that CosFlyTrack improves tracking performance to 78.3 to 95.6 percent SR@1 meter, a 53 to 69 percentage point gain over zero-shot baselines, supporting the dataset as a training resource for dynamic target-following agents. The dataset is publicly available at https://huggingface.co/datasets/AutelRobotics/CosFly; evaluation scripts and pre-trained checkpoints are hosted at https://huggingface.co/AutelRobotics/CosFly-Track.
ESI-Bench: Towards Embodied Spatial Intelligence that Closes the Perception-Action Loop
Spatial intelligence unfolds through a perception-action loop: agents act to acquire observations, and reason about how observations vary as a function of action. Rather than passively processing what is seen, they actively uncover what is unseen - occluded structure, dynamics, containment, and functionality that cannot be resolved from passive sensing alone. We move beyond prior formulations of spatial intelligence that assume oracle observations by recasting the observer as an actor. We introduce ESI-BENCH, a comprehensive benchmark for embodied spatial intelligence spanning 10 task categories and 29 subcategories built on OmniGibson, grounded in Spelke's core knowledge systems. Agents must decide what abilities to deploy - perception, locomotion, and manipulation - and how to sequence them to actively accumulate task-relevant evidence. We conduct extensive experiments on state-of-the-art MLLMs and find that active exploration substantially outperforms passive counterparts, with agents spontaneously discovering emergent spatial strategies without explicit instructions, while random multi-view often adds noise rather than signal despite consuming far more images. Most failures stem not from weak perception but from action blindness: poor action choices lead to poor observations, which in turn drive cascading errors. While explicit 3D grounding stabilizes reasoning on depth-sensitive tasks, imperfect 3D representation proves more harmful than 2D baselines by distorting spatial relations. Human studies further reveal that unlike humans who seek falsifying viewpoints and revise beliefs under contradiction, models commit prematurely with high confidence regardless of evidence quality, exposing a metacognitive gap that neither better perception nor more embodied interaction alone can close.
comment: https://esi-bench.github.io/
Robo-Cortex: A Self-Evolving Embodied Agent via Dual-Grain Cognitive Memory and Autonomous Knowledge Induction
The ability to navigate and interact with complex environments is central to real-world embodied agents, yet navigation in unseen environments remains challenging due to "experiential amnesia," where existing trajectory-driven or reactive policies fail to synthesize generalizable strategies from past interactions. We propose Robo-Cortex, a self-evolving framework that enables robots to autonomously induce navigation heuristics and refine cognitive strategies through a continuous reflection-adaptation loop. By abstracting success patterns and failure pitfalls into natural-language heuristics, Robo-Cortex enables a transition from passive execution to active strategy evolution. Our core innovation is an Autonomous Knowledge Induction (AKI) mechanism that distills multimodal trajectories into a structured Navigation Heuristic Library for knowledge generalization. The architecture further incorporates a Dual-Grain Cognitive Memory system, comprising a Short-term Reflective Memory (SRM) for real-time local progress analysis, and a Long-term Principle Memory (LPM) that abstracts past trajectories into reusable guiding and cautionary principles. To ensure robust decision-making, we introduce a multimodal Imagine-then-Verify loop, where a world model simulates potential outcomes and a VLM-based evaluator validates action plans. Extensive evaluations on IGNav, AR, and AEQA show that Robo-Cortex consistently outperforms strong baselines in both task success and exploration efficiency, with gains of up to +4.16% SPL over the strongest prior method and up to +15.30% SPL under heuristic transfer to unseen environments. Preliminary real-world robotic experiments further support the effectiveness of Robo-Cortex in physical settings.
DexHoldem: Playing Texas Hold'em with Dexterous Embodied System
Evaluating embodied systems on real dexterous hardware requires more than isolated primitive skills: an agent must perceive a changing tabletop scene, choose a context-appropriate action, execute it with a dexterous hand, and leave the scene usable for later decisions. We introduce DexHoldem, a real-world system-level benchmark built around Texas Hold'em dexterous manipulation with a ShadowHand. DexHoldem provides 1,470 teleoperated demonstrations across 14 Texas Hold'em manipulation primitives, a standardized physical policy benchmark, and an agentic perception benchmark that tests whether agents can recover the structured game state needed for embodied decision making. On primitive execution, $π_{0.5}$ obtains the highest task completion rate ($61.2\%$), while $π_{0.5}$ and $π_0$ tie on scene-preserving success rate ($47.5\%$). On agentic perception, Opus 4.7 obtains the best strict problem-level accuracy ($34.3\%$), while GPT 5.5 obtains the best average field-wise accuracy ($66.8\%$), exposing a gap between isolated visual sub-capabilities and complete routing-relevant state recovery. Finally, we instantiate the full embodied-agent loop in three case studies, where waiting, recovery dispatches, human-help requests, and repeated primitive execution reveal how perception and policy errors accumulate during closed-loop deployment. DexHoldem therefore evaluates dexterous tabletop execution, agentic perception, and embodied decision routing in a shared physical setting. Project page: https://dexholdem.github.io/Dexholdem/.
comment: 30 Pages
Dexora: Open-source VLA for High-DoF Bimanual Dexterity ICRA 2026
Vision-Language-Action (VLA) models have recently become a central direction in embodied AI, but current systems are restricted to either dual-gripper control or single-arm dexterous hand manipulation. While low-dimensional gripper control can often be handled with simpler methods, high-dimensional dexterous hand control benefits greatly from full end-to-end VLA learning. In this work, we introduce Dexora, the first open-source VLA system that natively targets dual-arm, dual-hand high-DoF manipulation. We design a hybrid teleoperation pipeline that decouples gross arm kinematics (captured with a custom exoskeleton backpack) from fine finger motion (markerless hand tracking via Apple Vision Pro), and that drives both a physical dual-arm dual-hand platform and an identical MuJoCo digital twin. Using that interface, we assemble a large training corpus: an embodiment-matched synthetic corpus (100K simulated trajectories, 6.5M frames) and a real-world dataset of 10K teleoperated episodes (2.92M frames). To mitigate noisy teleoperation demonstrations, we propose a data-quality-aware training recipe: an offline discriminator provides clip-level weights for diffusion-transformer policy training, down-weighting low-quality demonstrations. Empirically, Dexora outperforms competitive VLA baselines on both basic and dexterous benchmarks (e.g., average dexterous success 66.7% vs. 51.7%), attains 90% success on basic tasks, and shows robust out-of-distribution and cross-embodiment generalization. Ablations confirm the importance of real data and the discriminator for dexterity.
comment: Accpeted by ICRA 2026
Data-Driven Dynamic Modeling of a Tendon-Actuated Continuum Robot
Developing dynamic models for tendon-driven continuum robots is challenging due to their nonlinear, high-dimensional, and friction-dominated dynamics. This paper presents a comparative study of data-driven system identification methods, including N4SID, ARX, and SINDYc, for modeling a tendon-actuated continuum robot with rolling joints developed at CERN. Despite the high number of joints of the robot, experimental analysis reveals that a two-degree-of-freedom dynamic model can accurately capture the system dynamics, owing to strong kinematic dependencies between the joints. The models are validated against experimental data, and used in the design of a model predictive controller, demonstrating their feasibility for real-time control.
ManiSoft: Towards Vision-Language Manipulation for Soft Continuum Robotics ICML 2026
Most existing vision-language manipulation research targets rigid robotic arms, whose fixed morphology limits adaptability in cluttered or confined spaces. Soft robotic arms offer an appealing alternative due to their deformability, but confront challenges such as unreliable proprioception and distributed low-level actuation. To investigate these challenges, we introduce \ManiSoft, a benchmark for vision-language manipulation with soft arms. ManiSoft features a tailored simulator that couples realistic soft-body dynamics with contact-rich interactions via an elastic force constraint. On this basis, ManiSoft defines four tasks, each highlighting distinct aspects of deformable control, from basic end-effector coordination to obstacle avoidance. To support policy training and evaluation, \ManiSoft{} includes an automated pipeline that generates $6{,}300$ diverse scenes and corresponding expert trajectories. To produce high-quality trajectories at scale, we first employ a high-level planner to decompose each task into a sequence of waypoints, followed by a low-level reinforcement learning policy that generates torque commands to track waypoints. Benchmarking three representative policy models shows relatively promising results in clean scenes but substantial performance drop under randomization. Visualization analysis indicates that failures stem primarily from inaccurate visual estimation of proprioceptive state and limited exploitation of deformability for adaptive obstacle avoiding. We anticipate ManiSoft to serve as a valuable testbed, bridging the gap between rigid and soft arms in the context of vision-language manipulation. Out codes and datasets are released at https://buaa-colalab.github.io/ManiSoft.
comment: Accepted in ICML 2026
Unified Walking, Running, and Recovery for Humanoids via State-Dependent Adversarial Motion Priors
We propose a unified reinforcement learning framework that enables a single policy to perform walking, running, and fall recovery on the Unitree G1 humanoid robot, validated on physical hardware without any explicit mode-switching command at deployment. The framework extends Adversarial Motion Priors (AMP) by replacing the conventional global reference distribution with a state-dependent gate that routes each training transition to one of two discriminators: a dedicated recovery discriminator and a velocity-conditioned locomotion discriminator that jointly covers walking and running. The gate is defined by a single fixed threshold on projected gravity: the recovery discriminator is activated when body tilt exceeds approximately $37^\circ$ from vertical ($|g_z+1|>0.6$); otherwise the locomotion discriminator is used, with the normalized commanded velocity serving as a condition that selects the appropriate reference trajectory between walk and run clips. Only three LAFAN1 reference clips are required to regularize the complete behavior set. At deployment, a single frozen ONNX policy executes at 50\,Hz with no runtime mode logic; hardware experiments demonstrate successful recovery from both prone and supine falls and smooth walk-to-run transitions under the same controller.
Not What You Asked For: Typographic Attacks in Household Robot Manipulation
Open-vocabulary embodied AI agents increasingly rely on vision-language models such as CLIP for object perception and task grounding. However, the shared embedding space that enables this flexibility introduces a structural vulnerability to typographic attacks, where printed text in a physical scene semantically overrides visual judgment. While prior work has quantified this threat in static 2D benchmarks and 3D navigation tasks, its impact on the full Sense-Plan-Act pipeline of household robot manipulation remains unexplored. This work evaluates typographic attacks in a Habitat-based simulation using the HomeRobot benchmark. We introduce a decoupled perception architecture that exposes a frozen CLIP encoder to adversarial stickers while maintaining geometric grounding via DETIC. In a controlled evaluation pool of 59 attributable episodes, the attack achieves an overall Attack Success Rate (ASR) of 67.8%, rising to 70.0% among fully successful episodes, under uncontrolled viewing angles and occlusion with no perceptual optimization. Critically, we find that perceptual errors propagate through the persistent 3D semantic map to produce kinetic failures, defined here as physically executed grasping and transport of the wrong object driven by an adversarially poisoned semantic state. In these cases, the robot physically grasps and delivers the wrong object to a target receptacle. These results establish typographic misclassification as a real, measurable, and physically consequential threat to the safety of modular manipulation pipelines that prior typographic attack research has left unexamined.
comment: 10 pages, 1 figure, IEEE conference format
Key-Gram: Extensible World Knowledge for Embodied Manipulation
Embodied control increasingly requires models to follow compositional language instructions while reasoning over dynamic visual states. However, current vision-language-action policies and world-action models often couple linguistic knowledge with visual computation in a shared backbone or conditioning pathway, leading to modality competition and making knowledge extension dependent on backbone updates. In this paper, we introduce Key-Gram, a conditional-memory framework that separates language-derived world knowledge from visual-state reasoning for embodied control. At its core is a memory module that decomposes an instruction into task-specific key-grams, retrieves static linguistic priors through deterministic hashed lookup, and injects the retrieved entries into selected hidden layers through context-aware gating and lightweight convolutional fusion. This design allows the backbone to devote its main capacity to visual reasoning and action inference, while reusable instruction knowledge is stored in an extensible external memory. The logical memory table can be conveniently partitioned during training and, due to its $O(1)$ lookup pattern, efficiently placed on host memory during inference. Across RoboTwin2.0, LIBERO/LIBERO-Plus, and real-world dual-arm manipulation, Key-Gram consistently improves both $π_{0}$ and $π_{0.5}$ backbones, with average relative gains of $29.5\%/9.9\%$ on RoboTwin2.0, $35.8\%/4.5\%$ on LIBERO-Plus transfer without target-domain fine-tuning, and $15.4\%/8.1\%$ on real-world long-horizon tasks. These results demonstrate that externalized linguistic memory provides an effective and extensible mechanism for improving compositional grounding, transfer, and real-world manipulation.
comment: 16 pages, 5 figures
Geometry-Aware Surrogate for Real-Time Hydrodynamics Estimation of Autonomous Ground Vehicles in Amphibious Environments
Autonomous ground vehicles operating in shallow water or flood-prone terrains require dynamic models that account for hydrodynamic forces. However, the simulation and planning tools currently available either lack the physical fidelity or are too computationally expensive to run in real time. This work presents a per-surface neural network surrogate that bridges this gap by predicting geometry-resolved hydrodynamic forces at real-time rates, trained entirely on high-fidelity CFD data from two geometrically distinct vehicles. A vehicle specific Signed Distance Field (SDF) provides per-surface submergence inputs, allowing the model to resolve how loading varies with vehicle geometry, depth, and flow direction. On held-out CFD data, the surrogate achieves a longitudinal-force symmetric MAPE (sMAPE) of 13\% and a vertical-force sMAPE of 3-12\%, with inference running under 0.9\,ms per sample. To evaluate the model under real-world conditions, water wading trials of a full-scale vehicle at different submersion depths are used. Motion capture derived kinematics serve as the surrogate inputs, and the resulting predictions are tested to reproduce known physical relationships between force, speed, and depth. The predicted drag follows quadratic speed scaling ($R^2 \geq 0.97$) and the buoyancy intercepts scale linearly with depth ($R^2 = 0.973$). Neither relationship is encoded in the model training loss, both emerge from the per-surface architecture summing individually predicted surface forces. The resulting framework provides a pathway for embedding physically grounded hydrodynamics into the simulation and planning loops that autonomous ground vehicles depend on in amphibious environments.
Bidirectional Optical sensors for Actuation Tracking (BOAT) in soft lattice systems
The growing adoption of lattice-based structures in soft robotics creates a need for advanced sensing solutions capable of monitoring their global deformation, particularly compression and extension. In this work, we address this challenge by introducing a novel optical sensor based on two patterned waveguides arranged in an ellipsoidal geometry. This Bidirectional Optical sensor for Actuation Tracking (BOAT) is seamlessly co-printed with a lattice structure actuated by an embedded pneumatic artificial muscle (PAM), and its performance is assessed. During PAM elongation or contraction, the bending of the embedded BOAT waveguides induces output signal variations that enable a clear discrimination between compression and extension states. The designs of both each specific waveguide structure (by surface patterning) and of the sensorized lattice-based unit embedding two BOATs are supported by numerical simulations. Experimental calibration over 100 consecutive pressure cycles ranging from +50 kPa to $-$40 kPa demonstrates a highly repeatable response, allowing a reliable distinction between extension and compression. Finally, sensor feedback is used to implement a digital shadow, enabling continuous synchronization between the whole sensorized unit and its virtual counterpart. These results establish BOAT as a powerful and reliable approach for deformation monitoring in soft lattice-based robotic systems.
REACT: Environment-Adaptive Architecture for Continuous Formation Navigation of Wheeled Mobile Robots
Formation control of wheeled mobile robots (WMRs) has been extensively studied due to its broad applications in fields such as logistics transportation, environmental monitoring, and search and rescue. However, most existing works mainly focus on tracking predefined formations, which limits their adaptability to complex real-world environments. To address this, we propose REACT (Real-time Environment-Adaptive architecture for Continuous formation navigaTion), a hierarchical architecture integrating centralized formation generation and distributed formation maintenance. Specifically, our upper layer generates new environment-adaptive formations when necessary and uses our proposed TCF-R2T (Trajectory-Conflict-Free Robot-to-Target assignment) algorithm to compute conflict-free WMR-to-target assignments in polynomial time, enabling timely formation transitions without trajectory conflicts. At the lower layer, each WMR executes our developed JSTP (Joint Spatio-Temporal trajectory Planning) method to maintain the generated formation by simultaneously optimizing spatial positions and temporal durations, thereby enhancing coordination among WMRs and enabling continuous navigation in obstacle-rich environments and dynamic-obstacle scenarios. Both simulation and real-world experiments validate the effectiveness and practical applicability of REACT. Experimental videos are available on our project website: https://dongjh20.github.io/REACT-website.
REBAR: Reference Ethical Benchmark for Autonomy Readiness
As autonomous systems grow more advanced, objective metrics to evaluate their ethical and legal compliance are critical for informing end users of their limitations and ensuring accountability of those who misuse them. Current ethical embodied AI frameworks remain mostly qualitative, focusing on system design (through safety guardrails or targeted red teaming), and the realized guardrails often directly disallow unsafe behavior without providing the user with an override or interpretable reason. Instead, there is a need for computable metrics through rigorous testing that allow a user to determine the applicability of the system to the task. To address this gap, we introduce the Reference Ethical Benchmark for Autonomy Readiness (REBAR), a quantitative test and evaluation framework for autonomous systems. REBAR maps operating metrics into a computable Autonomy Readiness Level (ARL) rubric that can quantify ethical performance. Key innovations of the framework include a neuro-symbolic Large Language Model (LLM) approach to calculate and explain the ethical difficulty of scenarios, LLM-driven at-scale generation of test instances, and a versatile, photorealistic simulation environment. By evaluating white-box autonomy solutions through this rigorous testing pipeline, REBAR delivers an objective and repeatable benchmark score, bridging the gap between abstract principles and verifiable, accountable autonomy.
comment: To be presented at the 2026 Workshop on Robot Ethics - Ethical, Legal and User Perspectives in Robotics and Automation (WOROBET)
Qumus: Realization of An Embodied AI Quantum Material Experimentalist
While modern Large Language Models (LLMs) and agentic artificial intelligence (AI) have demonstrated transformative capabilities in digital domains, the realization of embodied AI capable of real-world scientific discovery remains a difficult frontier. The advancements are hindered by the inherent complexity of integrating high-level reasoning, multimodal information processing and real-time physical execution. Here we introduce Qumus, the first AI quantum materials experimentalist. Physically embodied within a robotic mini-laboratory, Qumus is an intelligent, multimodal, and multi-agent system designed for the creation and nano-processing of atomically thin two-dimensional (2D) materials and stacked van der Waals (vdW) structures. Qumus autonomously navigates the full scientific cycle, from hypothesis generation and protocol planning to multi-step experimental execution, result analysis and reporting, acting as an experimentalist. Markedly, the system has achieved, for the first time, the AI-creation of graphene, as well as the first AI-fabrication of complex nanodevices including atomically thin field-effect transistors via vdW stacking. Qumus excels at these tasks by demonstrating autonomous error correction and closed-loop experimentation. Our results establish a generalizable framework for self-improving embodied AI systems that learn directly from the quantum world, opening a pathway toward accelerated discovery in quantum materials, electronics and beyond.
comment: 29 Pages in total. Supplementary Demo Videos are available at https://qumus.ai
Towards Ubiquitous Mapping and Localization for Dynamic Indoor Environments
We present UbiSLAM, an innovative solution for real-time mapping and localization in dynamic indoor environments. By deploying a network of fixed RGB-D cameras strategically throughout the workspace, UbiSLAM addresses limitations commonly encountered in traditional SLAM systems, such as sensitivity to environmental changes and reliance on mobile unit sensors. This fixed-sensor approach enables real-time, comprehensive mapping, enhancing the localization accuracy and responsiveness of robots operating within the environment. The centralized map generated by UbiSLAM is continuously updated, providing robots with an accurate global view, which improves navigation, minimizes collisions, and facilitates smoother human-robot interactions in shared spaces. Beyond its advantages, UbiSLAM faces challenges, particularly in ensuring complete spatial coverage and managing blind spots, which necessitate data integration from the robots themselves. In this paper we discuss potential solutions, such as automatic calibration for optimal camera placement and orientation, along with enhanced communication protocols for real-time data sharing. The proposed model reduces the computational load on individual robotic units, allowing less complex robotic platforms to operate effectively while enhancing the robustness of the overall system.
Dynamic robotic cloth folding with efficient Koopman operator-based model predictive control ICRA
Robotic cloth folding is a challenging task, particularly when considering dynamic folding tasks, which aim at folding cloth by fast motions that leverage its dynamics. When subject to such fast motions, the complexity of cloth dynamics hinders both system identification and planning of folding trajectories, resulting in a difficult simulation-to-reality transfer when using physical models of cloth. Compared to the dexterity that humans exhibit when performing folding tasks, robotic approaches usually employ small garments with quite rigid dynamics, and are either too slow, or fast but imprecise, requiring several attempts to achieve a reasonably good fold. In this paper, we tackle these challenges by generating fast folding trajectories with a novel model predictive controller, integrating physics-based simulation of cloth dynamics and efficient, kernel-based Koopman operator regression. Koopman operator regression, an increasingly popular machine learning technique for nonlinear system identification, is used to obtain a linear model for the cloth being folded. Such a surrogate model, trained with data from a high-fidelity, physics-based cloth simulator, can then be employed within a suitable model predictive control algorithm, in place of the costly, nonlinear one, to efficiently generate folding trajectories to be executed by a robotic manipulator. Both in simulated and real-robot experiments, we show how the linearization supplied by the Koopman operator-based model can be employed to efficiently generate fast folding trajectories to unseen poses, without sacrificing folding accuracy.
comment: Accepted for presentation at the 2026 IEEE International Conference on Robotics and Automation (ICRA)
PH-Dreamer: A Physics-Driven World Model via Port-Hamiltonian Generative Dynamics
World models built on recurrent state space architectures enable efficient latent imagination, yet remain physically unstructured, producing dynamics that violate conservation and dissipative principles. We introduce a unified Port-Hamiltonian framework that remedies this through three synergistic mechanisms. First, we embed implicit physical priors into recurrent transitions by modeling projected latent evolution as action controlled energy routing governed by flow and dissipation, biasing the projected PH phase space toward a more compact and physically structured representation. Second, we develop a kinematics aware energy world model that estimates the Hamiltonian and power balance from proprioceptive observations, providing an explicit physical signal for thermodynamic reasoning. Third, leveraging these energy gradients, we establish an energy guided Actor-Critic that uses Lagrangian multipliers to regularize policy optimization toward lower energy and smoother control. Across visual control benchmarks, this paradigm not only attains superior asymptotic returns but also elevates internal simulator fidelity by establishing a tighter, lower variance alignment between imagined and real rewards, all while reducing latent phase space volume by 4.18-8.41%, energy consumption by up to 7.80%, and mean squared jerk by up to 9.38%.
comment: 12 pages, 3 figures
Assessing Localization Technologies for Pedestrian Collision Avoidance
Robust pedestrian safety is crucial to the next-generation of intelligent transportation systems. Such systems rely on active pedestrian localization and predictive collision alerts. Pedestrian localization can be supported by Ultra-Wideband technology and Bluetooth 6.0, which offer high-precision ranging and low-latency communication, making them promising candidates for vehicular collision warning systems. This paper assesses the localization accuracy of these technologies for pedestrian alerting and benchmarks their performance against Global Navigation Satellite Systems. Experimental evaluations performed in this paper focused on key performance metrics, including localization accuracy and robustness to environmental conditions. Preliminary results suggest that Ultra-Wideband and Bluetooth 6.0 can serve as viable alternatives or complements to Global Navigation Satellite Systems in certain scenarios, improving situational awareness and enabling timely pedestrian alerts.
StableVLA: Towards Robust Vision-Language-Action Models without Extra Data ICML 2026
It is infeasible to encompass all possible disturbances within the training dataset. This raises a critical question regarding the robustness of Vision-Language-Action (VLA) models when encountering unseen real-world visual disturbances, particularly under imperfect visual conditions. In this work, we conduct a systematic study based on recent state-of-the-art VLA models and reveal a significant performance drop when visual disturbances absent from the training data are introduced. To mitigate this issue, we propose a lightweight adapter module grounded in information theory, termed the Information Bottleneck Adapter (IB-Adapter), which selectively filters potential noise from visual inputs. Without requiring any extra data or augmentation strategies, IB-Adapter consistently improves over the baseline by an average of 30%, while adding fewer than 10M parameters, demonstrating notable efficiency and effectiveness. Furthermore, even with a 14x smaller backbone (0.5B parameters) and no pre-training on the Open X-Embodiment dataset, our model StableVLA achieves robustness competitive with 7B-scale state-of-the-art VLAs. With negligible parameter overhead (<10M), our approach maintains accuracy on long-horizon tasks and surpasses OpenPi under both synthetic and physical visual corruptions.
comment: Accepted by ICML 2026. Code: https://github.com/DAGroup-PKU/HumanNet. Project website: https://dagroup-pku.github.io/StableVLA/
On Improving Multimodal Pedestrian Trajectory Prediction with CVAE: A Study on Benchmark and Robot Data
Accurate pedestrian trajectory prediction is crucial for autonomous systems operating in complex environments, such as modular buses and delivery robots in suburban or semi-structured areas. Social Spatio-Temporal Graph Convolutional Neural Networks (Social-STGCNN) have shown strong performance by modeling social interactions; however, producing diverse and well-calibrated future trajectories remains challenging. In this work, we build on a Social-STGCNN backbone and introduce a Conditional Variational Autoencoder (CVAE)-based probabilistic formulation to explicitly model multimodal future trajectories. We evaluate the method on the ETH and UCY pedestrian trajectory datasets as well as on a real-world pedestrian dataset collected by a mobile robot. Results show moderate gains on public benchmarks, but more consistent endpoint accuracy and improved trajectory diversity across different crowd configurations. Evaluation on robot-collected data further demonstrates the approach's effectiveness beyond curated benchmarks and supports its applicability in practical deployments.
A Heuristic Approach for Performance Tuning in RL-based Quadrotor Control via Reward Design and Termination Conditions
Reinforcement learning (RL)-based quadrotor control policies have achieved impressive performance in tasks such as fast navigation in cluttered environments and drone racing, where the focus is on speed and agility. However, in several applications, such as infrastructure inspection, it is critical to achieve precise, controlled maneuvers with tunable performance. In this article, we present a novel heuristic approach to achieve tunable performance in RL-based Quadrotor control through reward design and termination conditions. We present a novel reward structure containing dual bandwidth exponentials that achieves a baseline critically damped response in setpoint tracking, with low steady-state errors. When trained with a Proximal Policy Optimization (PPO) algorithm, in conjunction with episode truncation conditions, the desired performance is achieved in 6 million time steps in a sample-efficient manner. In order to tune the performance about the baseline behavior, we present intuitive heuristic rules to adjust the reward weights and exponential coefficients to achieve faster (acrobatic-like) and slower (inspection-like) settling time performance, while retaining the baseline critically damped response and approximately 2\% steady-state error. We evaluate the three RL policies (baseline, acrobatic, and inspection) across 100 trials and show accurate and tunable performance in position and yaw tracking from random initial conditions, thereby demonstrating the effectiveness of the proposed heuristic approach.
comment: Accepted in the 34th Mediterranean Conference on Control and Automation
COBALT: Crowdsourcing Robot Learning via Cloud-Based Teleoperation with Smartphones
The scarcity of large-scale, high-quality demonstration data remains a bottleneck in scaling imitation learning for robotic manipulation. We present COBALT, a teleoperation platform designed to democratize robot learning at scale both in simulation and in the real world. By leveraging vectorized environments, our scalable, load-balanced infrastructure supports concurrent teleoperation by multiple users on a single GPU, yielding a significant reduction in teleoperation cost. Operators can connect from nearly anywhere on Earth using commonly available devices, including single or dual smartphones, VR headsets, 3D mice, and keyboards. An inmemory data cache and efficient video streaming keep control and rendering synchronous, sustaining dozens of concurrent users at 20 Hz with sub-100 ms end-to-end latency for up to 8 concurrent users per GPU. We also demonstrate stable operation supporting 256 simulated clients across 8 GPUs, underscoring the system's ability to scale across hardware and within individual servers. We perform a comprehensive user study showing that phone-based teleoperation performs comparably to or better than specialized hardware, enabling faster, more ergonomic data collection. To ensure data quality, COBALT logs a suite of real-time metrics to automatically filter suboptimal demonstrations. We further demonstrate that a structured user training curriculum significantly improves data collection quality. Guided by insights from our user study, we crowdsource the collection of a large-scale, high-quality pilot dataset with 7500+ demonstrations (50+ hours) collected with smartphones across nine countries over five days. We validate the dataset's quality by training state-of-the-art imitation learning algorithms. Please visit \href{https://cobalt-teleop.github.io/}{cobalt-teleop.github.io} for more details.
Automatically Improving Simulation Physics for Articulated Objects
Simulation is a central tool for scalable robot learning, but its effectiveness depends on the quality of object assets. While modern 3D datasets provide rich geometric and kinematic representations, they typically lack the physical properties required for stable and realistic interaction, requiring significant manual effort to construct simulation-ready articulated objects. In this thesis, we introduce interaction-readiness, which characterizes whether an object can be reliably simulated under manipulation. We propose a quantitative evaluation framework that decomposes interaction-readiness into measurable components, enabling systematic analysis of object quality and revealing failure modes not captured by conventional evaluation. We further present a multi-modal, simulator-in-the-loop approach for generating interaction-ready articulated objects from incomplete 3D assets. The method integrates geometric, visual, and semantic information to infer physical properties and refines them through iterative simulator feedback to improve physical consistency. Experiments across diverse articulated objects and manipulation tasks show that object quality directly impacts simulation stability, interaction behavior, and policy performance. Objects refined by our method exhibit more stable and realistic dynamics, enabling more reliable downstream learning and evaluation. Overall, this thesis demonstrates the importance of physical realism for articulated objects in simulation and introduces a practical multi-modal refinement approach, guided by simulator feedback, for constructing such objects at scale.
CosFly: Plan in the Matrix, Fly in the World
We present CosFly, a box-structured planning and multimodal simulation pipeline for aerial tracking, together with CosFly-Track, a large-scale UAV dataset for dynamic target tracking across diverse environments including urban centers, highways, rural landscapes, forests, and coastal towns. In our current implementation on CARLA, CosFly provides a modular 7-step construction pipeline that converts complex 3D worlds into structured obstacle representations for planning, then projects the resulting trajectories back into multi-modal sensor data -- including RGB images, high-precision depth maps, and semantic segmentation masks -- paired with natural language navigation instructions. A key feature is the support for configurable fixed-FOV zoom levels (one FOV setting drawn per trajectory and held constant throughout), enabling simulation of various focal lengths through camera-intrinsic adjustments. The pipeline covers the complete workflow from 3D map export through grid simplification, pedestrian and drone trajectory planning, multi-modal rendering with 6-DOF pose annotations, quality inspection, and teacher-student caption generation. We analyze two trajectory-planning paradigms for aerial target tracking: a conventional two-stage pipeline with front-end candidate generation and backend refinement, and a direct gradient-based formulation that optimizes multiple tracking constraints in a single objective. The public CosFly-Track release contains 250 validated trajectories and approximately 100,000 rendered images with complete 6-DOF drone pose annotations (position x, y, z and orientation yaw, pitch, roll). Together, the pipeline and dataset establish a scalable foundation for aerial-ground collaborative research, supporting dynamic target tracking, UAV navigation, and multi-modal perception across diverse environments.
Neural Operators for Design-Space Surrogate Modeling of Tendon-Actuated Continuum Robots ICRA 2026
Continuum robots enable dexterous manipulation in constrained environments, but require accurate and efficient models for real-time manipulation and control. Traditional physics-based models can be computationally expensive and may suffer from inaccuracies due to unmodeled effects, while current learning-based methods often generalize poorly beyond the specific robot on which they are trained. We present a formulation of surrogate modeling for tendon-driven continuum robots as an operator learning problem that maps robot design parameters and tendon actuation inputs to resulting configurations. This formulation enables a single trained model to generalize across a large class of robot designs. We develop four novel neural operator architectures--two based on Deep Operator Networks (DeepONets) and two based on Fourier Neural Operators (FNOs)--and train them on simulation data to predict robot configurations. All architectures achieve good accuracy while allowing for fast and accurate generalization across designs. Our results demonstrate that operator learning provides an effective and generalizable surrogate for continuum robot mechanics in the design space, enabling fast modeling for control, planning, and design optimization in surgical and industrial applications.
comment: Accepted to ICRA 2026
Guiding Neuro-Symbolic Scenario Generation with Spatio-Temporal Logic
The rapid advancement of autonomous driving (AD) technologies has outpaced the development of robust safety evaluation methods. Conventional testing relies on exposing AD systems to vast numbers of real-world traffic scenes -- a brute-force approach that is prohibitively expensive and statistically ineffective at capturing the rare, safety-critical edge cases essential for validating real-world robustness. To address this fundamental limitation, we introduce STRELGen, a scalable framework for the targeted generation of safety-critical driving scenarios. STRELGen synergistically combines a multi-agent trajectory-generation diffusion model (DM) with Spatio-Temporal Logic (STREL) specifications that encode complex safety and realism properties through a highly interpretable formalism. Crucially, monitoring satisfaction levels of these specifications is differentiable, enabling gradient-based search. At inference time, we optimize directly over the DM latent space to maximize STREL formula satisfaction. The result is efficient generation of highly plausible yet safety-critical multi-agent scenarios that lie within the learned data distribution. STRELGen thus provides a flexible, interpretable, and powerful tool for stress-testing autonomous driving systems, moving beyond the limitations of brute-force data collection.
RLFTSim: Realistic and Controllable Multi-Agent Traffic Simulation via Reinforcement Learning Fine-Tuning CVPR 2026
Supervised open-loop training has been widely adopted for training traffic simulation models; however, it fails to capture the inherently dynamic, multi-agent interactions common in complex driving scenarios. We introduce RLFTSim, a reinforcement-learning-based fine-tuning framework that enhances scenario realism by aligning simulator rollouts with real-world data distributions and provides a method for distilling goal-conditioned controllability in scenario generation. We instantiate RLFTSim on top of a pre-trained simulation model, design a reward that balances fidelity and controllability, and perform comprehensive experiments on the Waymo Open Motion Dataset. Our results show improvements in realism, achieving state-of-the-art performance. Compared with other heuristic search-based fine-tuning methods, RLFTSim requires significantly fewer samples due to a proposed low-variance and dense reward signal, and it directly addresses the realism alignment issue by design. We also demonstrate the effectiveness of our approach for distilling traffic simulation controllability through goal conditioning. The project page is available at https://ehsan-ami.github.io/rlftsim.
comment: CVPR 2026 Highlight; Project page at https://ehsan-ami.github.io/rlftsim
Distributionally Robust Control via Stein Variational Inference for Contact-Rich Manipulation
Reliable robotic manipulation requires control policies that can accurately represent and adapt to uncertainty arising from contact-rich interactions. Modern data-driven methods mitigate uncertainty through large-scale training and computation, and degrade significantly in performance with limited number of training samples. By contrast, classical model-based controllers are computationally efficient and reliable, but their limited ability to represent task-relevant uncertainty can hinder performance in contact-rich interactions. In this work, we propose to expand the capabilities of model-based manipulation control through more flexible uncertainty modeling that retains performance while exactly adapting to uncertainty. Our approach casts the manipulation problem as a distributionally robust control optimization and proposes a novel deterministic formulation based on Stein variational inference that preserves performance while explicitly modeling task-sensitive parameter uncertainty. As a result, the derived controllers are more aware of task sensitivities to uncertainty, yielding high reliability without compromising performance. Experimental results demonstrate up to 3$\times$ improved robustness across a range of contact-rich manipulation tasks under broad parametric uncertainty, outperforming existing model-based control methods.
comment: In Proceedings of Robotics: Science and Systems, Sydney, Australia, July 2025
Probabilistic Recursively Feasible Motion Planning Under Uncertain Environments
Safe motion planning in uncertain, time-varying environments is challenging because the safe region can change unpredictably across planning steps, often causing a loss of recursive feasibility. In this work, we present a Probabilistic Recursively Feasible Model Predictive Control (PRF-MPC) framework that guarantees recursive feasibility with a specified probability. We introduce properties that an ideal predictor should satisfy to ensure distributional consistency, and use these properties to derive closed-form expressions for the means and covariances of trajectories predicted at future time steps. Building on this analysis, we construct safety constraints that ensure, with high probability, that the current safe set is contained within the safe sets at future time steps, thereby probabilistically guaranteeing recursive feasibility. Simulation results on a lane-change scenario demonstrate that the proposed method significantly improves recursive feasibility.
comment: 7 pages, 4 figures
Adversarial Stress Testing of SPARK Humanoid Safety Filters
Humanoid robots are difficult to deploy safely because they have high-dimensional bodies, many collision constraints, and must operate near people and obstacles. Safety filters help by modifying a nominal control action when it may violate collision-avoidance constraints. Still, nominal benchmark scores do not fully show how these filters behave in harder environments. In this work, we study the robustness of SPARK humanoid safety filters through replication and stress testing. We replicate the SPARK benchmark case G1SportMode_D1_WG_SO_v1 in MuJoCo and evaluate RSSA, RSSS, SSA, CBF, PFM, and SMA under controlled random seeds. We also built a post-processing pipeline that converts raw SPARK logs into goal-tracking, minimum-distance, and collision-step metrics. Our results show that some methods track the goal more closely, while others reduce collision steps more effectively. The stress tests further indicate that safety behavior can change under obstacle crowding, noisy distance estimates, and delayed obstacle information. These findings suggest that humanoid autonomy should be evaluated beyond nominal performance, using metrics that expose failure modes before deployment.
comment: 5 pages, 7 figures, 1 table. Code available at https://github.com/ghoshsaurav/spark-adversarial-safety
EgoTraj: Real-World Egocentric Human Trajectory Dataset for Multimodal Prediction
Accurately forecasting human trajectories from an egocentric perspective plays a central role in applications such as humanoid robotics, wearable sensing systems, and assistive navigation. However, progress in this direction remains limited due to the scarcity of egocentric trajectory datasets collected in real-world environments. Addressing this need, we introduce EgoTraj, an egocentric multimodal open dataset recorded using Meta Quest Pro (MQPro). EgoTraj contains 75 sequences of human navigation collected from multiple MQPro wearers in real-world urban environments. Each recording provides synchronized RGB video along with ground-truth data, including continuous time-synchronized 6-degree-of-freedom head poses, per-frame 3D eye gaze vectors, scene annotations. To the best of our knowledge, EgoTraj differs from typical egocentric trajectory datasets by capturing long-horizon, self-directed navigation across diverse urban routes with broad participant diversity. To demonstrate the potential of the dataset, we benchmark several state-of-the-art methods for egocentric trajectory prediction and conduct ablation studies to analyze the contributions of gaze, scene, and motion cues. The results highlight the utility of EgoTraj for AR-based perception, navigation, and assistive systems. The EgoTraj dataset, code, and EgoViz Dashboard are publicly available at https://github.com/yehiahmad/EgoTraj.
comment: 21 pages, 14 figures. Project page: https://github.com/yehiahmad/EgoTraj
Geo-Data-Driven HD Map Generation Workflow with Integrated Reference-Free Constraint-Based Verification
High-definition (HD) maps are core artifacts for automated driving systems, but their generation commonly relies on sensor-intensive mobile mapping campaigns, while quality assessment often depends on high-precision reference data. These dependencies make HD map engineering costly and difficult to apply in settings where specialised measurement data or independently measured reference maps are unavailable. This paper presents an engineering-oriented geo-data-driven workflow for HD map generation with integrated representation-level verification. The workflow uses openly available geo-engineering datasets as the primary input source and transforms them into lane-level HD map representations of existing road environments through explicit intermediate representations and processing stages. To assess the generated representations without external reference maps, the workflow integrates executable constraint-based verification into the engineering process. Selected constraints are derived from specifications relevant to automated driving and road-design guidelines. They are evaluated directly on the generated lanelet-based representation to detect geometric, topological, and elevation-related inconsistencies. The workflow is evaluated using real-world shapefile-based road-network data from four cities in Lower Saxony, Germany, and controlled defect-injection scenarios. The real-world evaluation shows that the generated map representations satisfy the selected constraints in the evaluated scenarios, while the defect-injection study demonstrates complete detection of the considered defect types without observed false positives. The results indicate that geo-data-driven HD map generation with integrated executable verification can provide a modular and inspectable complement to sensor-intensive mapping workflows under reduced sensing and reference-data availability.
OxyGen: Unified KV Cache Management for VLA Inference under Multi-Task Parallelism
Embodied AI agents increasingly require parallel execution of multiple tasks, such as manipulation, conversation, and memory construction, from shared observations under distinct time constraints. Recent Mixture-of-Transformers (MoT) Vision-Language-Action Models (VLAs) architecturally support such heterogeneous outputs, yet existing inference systems fail to achieve efficient multi-task parallelism for on-device deployment because of redundant computation and resource contention. We identify isolated KV cache management as the root cause. To address this, we propose unified KV cache management, an inference design that treats the KV cache as a first-class shared resource across tasks and over time. This abstraction enables two key optimizations: cross-task KV sharing eliminates redundant prefill of shared observations, while cross-frame continuous batching decouples variable-length language decoding from fixed-rate action generation across control cycles. We implement this design for $π_{0.5}$, a popular MoT VLA, and evaluate it on both NVIDIA GeForce RTX 4090 and Jetson AGX Thor, two representative platforms for on-device VLA inference. OxyGen achieves up to 3.7$\times$ speedup over isolated execution, delivering over 200 tokens/s language throughput and 70 Hz action frequency simultaneously without degrading action quality, and we further validate the gains on a real humanoid robot with on-board Jetson AGX Thor.
comment: Preprint
DriveMoE: Mixture-of-Experts for Vision-Language-Action Model in End-to-End Autonomous Driving CVPR 2026
End-to-end autonomous driving (E2E-AD) demands effective processing of multi-view sensory data and robust handling of diverse and complex driving scenarios, particularly rare maneuvers such as aggressive turns. Recent success of Mixture-of-Experts (MoE) architecture in Large Language Models (LLMs) demonstrates that specialization of parameters enables strong scalability. In this work, we propose DriveMoE, a novel MoE-based E2E-AD framework, with a Scene-Specialized Vision MoE and a Skill-Specialized Action MoE. DriveMoE is built upon our $π_0$ Vision-Language-Action (VLA) baseline (originally from the embodied AI field), called Drive-$π_0$. Specifically, we add Vision MoE to Drive-$π_0$ by training a router to select relevant cameras according to the driving context dynamically. This design mirrors human driving cognition, where drivers selectively attend to crucial visual cues rather than exhaustively processing all visual information. In addition, we add Action MoE by training another router to activate specialized expert modules for different driving behaviors. Through explicit behavioral specialization, DriveMoE is able to handle diverse scenarios without suffering from modes averaging like existing models. In Bench2Drive closed-loop evaluation experiments, DriveMoE achieves state-of-the-art (SOTA) performance, demonstrating the effectiveness of combining vision and action MoE in autonomous driving tasks. We will release our code and models of DriveMoE and Drive-$π_0$.
comment: Accepted by CVPR 2026, Project Page: https://thinklab-sjtu.github.io/DriveMoE/
Do Robots Really Need Anthropomorphic Hands? A Comparison of Human and Robotic Hands
Human manipulation skills represent a pinnacle of their voluntary motor functions, requiring the coordination of many degrees of freedom and processing of high-dimensional sensor input to achieve remarkable dexterity. Thus, we set out to answer whether the human hand, with its associated biomechanical properties, sensors, and control mechanisms, is an ideal that we should strive for in robotics. Do robots need anthropomorphic hands? We start by extracting characteristics of the human hand in terms of biomechanics and perception to compare them with currently commercially available robotic hands. From this comparison, we derive our research questions that connect manipulation system complexity to skill repertoire size and dexterity. We attempt to answer these with a systematic literature review, in which we analyze the manipulation capabilities demonstrated in 125 papers from 2019-2025. Although complex five-fingered hands are often considered the ultimate goal for robotic manipulators, they are not necessary for all tasks. We find that in-hand manipulation does not benefit from anthropomorphic hand design as simpler mechanisms are sufficient, but mechanism complexity correlates with the breadth of manipulation tasks a hand can perform. Sensor integration and intelligent manipulation strategies remain underexplored, which may be because of a misalignment with hand design: instead of replicating the number of fingers and degrees of freedom, focusing on robustness and softness would allow more intelligent control and learning to exploit environmental contacts and integrate more sensors. Finally, we argue for standardized evaluation criteria to enable systematic comparison of hand designs and manipulation systems.
What Drives Success in Physical Planning with Joint-Embedding Predictive World Models?
A long-standing challenge in AI is to develop agents capable of solving a wide range of physical tasks and generalizing to new, unseen tasks and environments. A popular recent approach involves training a world model from state-action trajectories and subsequently use it with a planning algorithm to solve new tasks. Planning is commonly performed in the input space, but a recent family of methods has introduced planning algorithms that optimize in the learned representation space of the world model, with the promise that abstracting irrelevant details yields more efficient planning. In this work, we characterize models from this family as JEPA-WMs and investigate the technical choices that make algorithms from this class work. We propose a comprehensive study of several key components with the objective of finding the optimal approach within the family. We conducted experiments using both simulated environments and real-world robotic data, and studied how the model architecture, the training objective, and the planning algorithm affect planning success. We combine our findings to propose a model that outperforms two established baselines, DINO-WM and V-JEPA-2-AC, in both navigation and manipulation tasks. Code, data and checkpoints are available at https://github.com/facebookresearch/jepa-wms.
comment: V2 of the article: - Added AdaLN-zero - Added table comparing JEPA-WMs with baselines with std translating per-seed variability only, no variability across epochs - Reordered figures in main body of the paper V3: added data scaling experiments, theoretical appendix section on autoregressive rollout, acceptance at TMLR
General-purpose LLMs as Models of Human Driver Behavior: The Case of Simplified Merging SC 2026
Human behavior models are essential as behavior references and for simulating human agents in virtual safety assessment of automated vehicles (AVs), yet current models face a trade-off between interpretability and flexibility. General-purpose large language models (LLMs) offer a promising alternative: a single model potentially deployable without parameter fitting across diverse scenarios. However, what LLMs can and cannot capture about human driving behavior remains poorly understood. We address this gap by embedding two general-purpose LLMs (OpenAI o3 and Google Gemini 2.5 Pro) as standalone, closed-loop driver agents in a simplified one-dimensional merging scenario and comparing their behavior against human data using quantitative and qualitative analyses. Both models reproduce human-like intermittent operational control and tactical dependencies on spatial cues. However, neither consistently captures the human response to dynamic velocity cues, and safety performance diverges sharply between models. A systematic prompt ablation study reveals that prompt components act as model-specific inductive biases that do not transfer across LLMs. These findings suggest that general-purpose LLMs could potentially serve as standalone, ready-to-use human behavior models in AV evaluation pipelines, but future research is needed to better understand their failure modes and ensure their validity as models of human driving behavior.
comment: To be published in proceedings of IEEE ITSC 2026
Guided Reinforcement Learning for Omnidirectional 3D Jumping in Quadruped Robots
Jumping poses a significant challenge for quadruped robots, despite being crucial for many operational scenarios. While optimisation methods exist for controlling such motions, they are often time-consuming and demand extensive knowledge of robot and terrain parameters, making them less robust in real-world scenarios. Reinforcement learning (RL) is emerging as a viable alternative, yet conventional end-to-end approaches lack efficiency in terms of sample complexity, requiring extensive training in simulations, and predictability of the final motion, which makes it difficult to certify the safety of the final motion. To overcome these limitations, this paper introduces a novel guided reinforcement learning approach that leverages physical intuition for efficient and explainable jumping, by combining Bézier curves with a Uniformly Accelerated Rectilinear Motion (UARM) model. Extensive simulation and experimental results clearly demonstrate the advantages of our approach over existing alternatives.
SG-CADVLM: A Context-Aware Decoding Powered Vision Language Model for Safety-Critical Scenario Generation
Autonomous Vehicle (AV) requires rigorous testing in safety-critical scenarios for safety validation, yet its validation is hindered by the high cost of field testing and the lack of fidelity in current simulations for rare safety-critical events. Crash reports offer rich and authentic specifications of real-world accident dynamics, making them a promising resource for Large Language Models and Vision-Language models to generate high-fidelity scenarios. However, the existing models frequently deviate from actual accident characteristics due to context suppression. To address these limitations, this paper presents SG-CADVLM, a framework integrateing Context-Aware Decoding with multimodal input processing to generate safety-critical scenarios from crash reports. The framework mitigates the hallucination of VLMs while generating road geometry and vehicle trajectories simultaneously. The experimental results demonstrate that SG-CADVLM generates combined critical and high-risk scenarios at a rate of 88.1% compared to 31.2% for the baseline methods, representing a 182% improvement, while producing executable simulations for autonomous vehicle testing.
Bundle Adjustment in the Eager Mode
Bundle adjustment (BA) is a critical technique in various robotic applications such as simultaneous localization and mapping (SLAM), augmented reality (AR), and photogrammetry. BA optimizes parameters such as camera poses and 3D landmarks to align them with observations. With the growing importance of deep learning in perception systems, there is an increasing need to integrate BA with deep learning frameworks for enhanced reliability and performance. However, widely-used C++-based BA libraries, such as GTSAM, g$^2$o, and Ceres Solver, lack native integration with modern deep learning libraries like PyTorch. This limitation affects their flexibility, ease of debugging, and overall implementation efficiency. To address this gap, we introduce an eager-mode BA library seamlessly integrated with PyTorch with high efficiency. Our approach includes a sparsity-aware auto-differentiation design and GPU-accelerated sparse operations designed for 2nd-order optimization. Our eager-mode BA on GPU demonstrates substantial runtime efficiency, achieving an average speedup of 18.5$\times$, 22$\times$, and 23$\times$ across all benchmarks compared to GTSAM, g$^2$o, and Ceres, respectively.
Global Prior Meets Local Consistency: Dual-Memory Augmented Vision-Language-Action Model for Efficient Robotic Manipulation CVPR 2026
Hierarchical Vision-Language-Action (VLA) models have rapidly become a dominant paradigm for robotic manipulation. It typically comprising a Vision-Language backbone for perception and understanding, together with a generative policy for action generation. However, its performance is increasingly bottlenecked by the action generation proceess. (i) Low inference efficiency. A pronounced distributional gap between isotropic noise priors and target action distributions, which increases denoising steps and the incidence of infeasible samples. (ii) Poor robustness. Existing policies condition solely on the current observation, neglecting the constraint of history sequence and thus lacking awareness of task progress and temporal consistency. To address these issues, we introduce OptimusVLA, a dual-memory VLA framework with Global Prior Memory (GPM) and Local Consistency Memory (LCM). GPM replaces Gaussian noise with task-level priors retrieved from semantically similar trajectories, thereby shortening the generative path and reducing the umber of function evaluations (NFE). LCM dynamically models executed action sequence to infer task progress and injects a learned consistency constraint that enforces temporal coherence and smoothness of trajectory. Across three simulation benchmarks, OptimusVLA consistently outperforms strong baselines: it achieves 98.6% average success rate on LIBERO, improves over pi_0 by 13.5% on CALVIN, and attains 38% average success rate on RoboTwin 2.0 Hard. In Real-World evaluation, OptimusVLA ranks best on Generalization and Long-horizon suites, surpassing pi_0 by 42.9% and 52.4%, respectively, while delivering 2.9x inference speedup.
comment: Accepted by CVPR 2026
State-Conditional Adversarial Learning: An Off-Policy Visual Domain Transfer Method for End-to-End Imitation Learning
We study visual domain transfer for end-to-end imitation learning in a realistic and challenging setting where target-domain data are strictly off-policy, expert-free, and scarce. We first provide a theoretical analysis showing that the target-domain imitation loss can be upper bounded by the source-domain loss plus a state-conditional latent KL divergence between source and target observation models. Guided by this result, we propose State- Conditional Adversarial Learning, an off-policy adversarial framework that aligns latent distributions conditioned on system state using a discriminator-based estimator of the conditional KL term. Experiments on visually diverse autonomous driving environments built on the BARC-CARLA simulator demonstrate that SCAL achieves robust transfer and strong sample efficiency.
AT-VLA: Adaptive Tactile Injection for Enhanced Feedback Reaction in Vision-Language-Action Models
Vision-Language-Action (VLA) models have significantly advanced the capabilities of robotic agents in executing diverse tasks; however, they still face challenges in contact-rich manipulation scenarios that require precise physical interactions. To address this limitation, recent studies have attempted to incorporate tactile signals during downstream tasks, enabling pretrained VLAs to interpret tactile feedback. Nevertheless, introducing new modalities during finetuning, which are rarely present in the pretrain stage, may disrupt the pretrained capabilities of VLAs. In addition, the inherently slow inference speed of VLAs hampers real-time responsiveness and limits the effective utilization of tactile feedback for action adjustment. To overcome these challenges, we propose Adaptive Tactile Vision-Language-Action (AT-VLA), which introduces a novel Adaptive Tactile Injection mechanism. This mechanism dynamically determines the appropriate timing and locations for tactile injection, incorporating only when it significantly contributes to action generation, thereby minimizing interference with pretrained representations. Furthermore, to enable rapid and accurate tactile responses, we propose a Tactile Reaction Dual-Stream mechanism, which decouples sensory processing into a slow visual-language stream for low-frequency perceptual reasoning and a fast tactile control stream for high-frequency physical interaction understanding, achieving real-time close-loop responses within 0.04 s. Real-world experiments thoroughly validate the effectiveness of AT-VLA in contact-rich manipulation tasks. The project page is available at: https://sites.google.com/view/at-vla.
Bio-Inspired Event-Based Visual Servoing for Ground Robots
Biological sensory systems are inherently adaptive, filtering out constant stimuli and prioritizing relative changes, likely enhancing computational and metabolic efficiency. Inspired by active sensing behaviors across a wide range of animals, this paper introduces a principled 1D event-based visual servoing framework for ground robots operating in structured environments. Utilizing a Dynamic Vision Sensor (DVS), we demonstrate that by applying a fixed spatial kernel to the asynchronous event stream generated from structured logarithmic intensity-change patterns, the resulting net event flux analytically isolates specific combinations of kinematic states. We establish a generalized theoretical bound for this event rate estimator and show that linear and quadratic spatial profiles isolate the robot's velocity and position-velocity product, respectively. Leveraging these properties, we employ a multi-pattern stimulus to directly synthesize a nonlinear state feedback term entirely without traditional state estimation. To overcome the inescapable loss of linear observability at equilibrium inherent in event sensing, we propose a bio-inspired active sensing limit-cycle controller. Experimental validation on a 1/10-scale autonomous ground vehicle confirms the efficacy, extreme low-latency, and computational efficiency of the proposed direct-sensing approach.
Propagating Unsafe Actions in LLM Controlled Multi-Robot Collaboration via Single Robot Compromise IJCAI 2026
Large language models (LLMs) are increasingly used as general planners in embodied intelligence, enabling high level coordination and low level task planning for both single robot and multi-robot collaboration. This increasing reliance on embodied LLM planners also raises critical security concerns, since misaligned or manipulated instructions can be translated into physical actions. Prior work has studied such threats in single robot settings, while security risks in LLM controlled multi-robot collaboration, especially those propagated through inter robot communication, remain largely unexplored. To bridge this gap, we propose a novel attack paradigm for multi-robot system in which the adversary interacts with only a single entry robot. The compromised robot then propagates malicious intent through peer communication, leading to coordinated unsafe actions across the system. Our evaluation, covering high risk dimensions of dereliction of duty, privacy compromise, and public safety hazards, reveals a persistent safety alignment gap in multi-robot planners. We quantify this process with three metrics, obedience, infectiousness, and stealthiness. Experiments demonstrate both persistent attacker control and rapid propagation: obedience reaches 1.00 in the strongest cases, and infectiousness rises to 0.90. Notably, the attack is highly efficient, requiring as few as 3.0 rounds to compromise all the robots while maintaining a stealthiness score of 0.81. Such risks are amplified when robots must resolve trade offs in critical situations, such as emergencies or conflicts of rights, because the coordination mechanism can unintentionally allow adversarial instructions to override safety requirements. The code is available at https://github.com/TheFatInsect/InfectBot.
comment: Accepted by the 35th International Joint Conference on Artificial Intelligence (IJCAI 2026). 9 pages, 4 figures, 3 tables
SutureFormer: Learning Surgical Trajectories via Goal-conditioned Offline RL in Pixel Space
Predicting surgical needle trajectories from endoscopic video is critical for robot-assisted suturing, enabling anticipatory planning, real-time guidance, and safer motion execution. Existing methods that directly learn motion distributions from visual observations tend to overlook the sequential dependency among adjacent motion steps. Moreover, sparse waypoint annotations often fail to provide sufficient supervision, further increasing the difficulty of supervised or imitation learning methods. To address these challenges, we formulate image-based needle trajectory prediction as a sequential decision-making problem, in which the needle tip is treated as an agent that moves step by step in pixel space. This formulation naturally captures the continuity of needle motion and enables the explicit modeling of physically plausible pixel-wise state transitions over time. From this perspective, we propose SutureFormer, a goal-conditioned offline reinforcement learning framework that leverages sparse annotations to dense reward signals via cubic spline interpolation, encouraging the policy to exploit limited expert guidance while exploring plausible future motion paths. SutureFormer encodes variable-length clips using an observation encoder to capture both local spatial cues and long-range temporal dynamics, and autoregressively predicts future waypoints through actions composed of discrete directions and continuous magnitudes. To enable stable offline policy optimization from expert demonstrations, we adopt Conservative Q-Learning with Behavioral Cloning regularization. Experiments on a new kidney wound suturing dataset containing 1,158 trajectories from 50 patients show that SutureFormer reduces Average Displacement Error by 58.6% compared with the strongest baseline, demonstrating the effectiveness of modeling needle trajectory prediction as pixel-level sequential action learning.
Encirclement Guaranteed Finite-Time Capture against Unknown Evader Strategies
We consider a pursuit-evasion scenario involving a group of pursuers and a single evader in a two-dimensional unbounded environment. The pursuers aim to capture the evader in finite time while ensuring the evader remains enclosed within the convex hull of their positions until capture, without knowledge of the evader's heading angle. Prior works have addressed the problem of encirclement and capture separately in different contexts. In this paper, we present a class of strategies for the pursuers that guarantee capture in finite time while maintaining encirclement, irrespective of the evader's strategy. Furthermore, we derive an upper bound on the time to capture. Numerical results highlight the effectiveness of the proposed framework against a range of evader strategies.
Adaptive Outer-Loop Control of Quadrotors via Reinforcement Learning
Deep Reinforcement Learning (DRL) for quadrotor flight control typically relies on Domain Randomization (DR) for sim-to-real transfer, resulting in overly conservative policies that struggle with dynamic disturbances. To overcome this, we propose a novel adaptive control architecture that actively perceives and reacts to instantaneous perturbations. First, we train an optimal outer-loop policy, then replace its reliance on ground-truth disturbance data with a Residual Dynamics Predictor (RDP). The RDP estimates the external forces and moments acting on the aircraft in flight online using only the history of states and control actions. For seamless hardware transfer, we introduce a data-efficient linear calibration bridge and an online thrust correction mechanism that align the simulated latent space with reality using mere seconds of flight data. Real-world validations on a Crazyflie micro-quadrotor demonstrate that our adaptive controller significantly outperforms baselines, maintaining precise trajectory tracking under severe uncertainties including mass variations, asymmetric payloads, and dynamic slung loads
FAM-HRI: Foundation-Model Assisted Multi-Modal Human-Robot Interaction Combining Gaze and Speech
ffective Human-Robot Interaction (HRI) is crucial for enhancing accessibility and usability in real-world robotics applications. However, existing solutions often rely on gesture- only or language-only commands, making interaction inefficient and ambiguous, particularly for users with physical impairments. In this paper, we introduce FAM-HRI, an efficient multimodal framework for HRI that integrates language and gaze inputs via foundation models. By leveraging lightweight Meta ARIA glasses, our system captures real-time multimodal signals and utilizes large language models (LLMs) to fuse user intention with scene context, enabling intuitive and precise robot manipulation. Our method accurately determines the gaze fixation time interval, reducing noise caused by the gaze dynamic nature. Experimental evaluations demonstrate that FAM-HRI achieves a high success rate in task execution while maintaining a low interaction time, providing a practical solution for individuals with limited physical mobility or motor impairments. To support the community, we have released our system design, algorithms, and solutions at https://github.com/laiyuzhi/FAM-HRI.
comment: This work has been accepted for publication in IEEE Transactions on Automation Science and Engineering @ 2026 IEEE
QuickLAP: Quick Language-Action Preference Learning for Semi-Autonomous Agents
Robots must learn from both what people do and what they say, but either modality alone is often incomplete: physical corrections are grounded but ambiguous in intent, while language expresses high-level goals but lacks physical grounding. We introduce QuickLAP: Quick Language-Action Preference learning, a Bayesian framework that fuses physical and language feedback to infer reward functions in real time. Our key insight is to treat language as a probabilistic observation over the user's latent preferences, clarifying which reward features matter and how physical corrections should be interpreted. QuickLAP uses Large Language Models (LLMs) to extract reward feature attention masks and preference shifts from free-form utterances, which it integrates with physical feedback in a closed-form update rule. This enables fast, real-time, and robust reward learning that handles ambiguous feedback. In a semi-autonomous driving simulator, QuickLAP reduces reward learning error by over 70% compared to physical-only and heuristic multimodal baselines. A 15-participant user study further validates our approach: participants found QuickLAP significantly more understandable and collaborative, and preferred its learned behavior over baselines. Code is available at https://github.com/MIT-CLEAR-Lab/QuickLAP.
SonarSweep: Fusing Sonar and Vision for Robust 3D Reconstruction via Plane Sweeping
Accurate 3D reconstruction in visually-degraded underwater environments remains a formidable challenge. Single-modality approaches are insufficient: vision-based methods fail due to poor visibility and geometric constraints, while sonar is crippled by inherent elevation ambiguity and low resolution. Consequently, prior fusion technique relies on heuristics and flawed geometric assumptions, leading to significant artifacts and an inability to model complex scenes. In this paper, we introduce SonarSweep, a novel, end-to-end deep learning framework that overcomes these limitations by adapting the principled plane sweep algorithm for cross-modal fusion between sonar and visual data. Extensive experiments in both high-fidelity simulation and real-world environments demonstrate that SonarSweep consistently generates dense and accurate depth maps, significantly outperforming state-of-the-art methods across challenging conditions, particularly in high turbidity. To foster further research, we will publicly release our code and a novel dataset featuring synchronized stereo-camera and sonar data, the first of its kind.
comment: 8 pages, 9 figures, conference
PLATO Hand: Shaping Contact Behavior with Fingernails for Precise Manipulation
We present the PLATO Hand, a dexterous robotic hand with a hybrid fingertip that combines a rigid fingernail, embedded distal phalanx, and compliant pulp to shape contact behavior during manipulation. \rrev{By mechanically organizing how contact is initiated, supported, and transmitted at the fingertip, this structure creates stable and task-relevant contact conditions across diverse object geometries and grasp orientations.} We develop a strain-energy-based bending--indentation model to guide the fingertip design and to explain how material stiffness and contact geometry govern deformation partitioning within the fingertip. \rrev{Experiments show improved pinch stability, improved fingernail-mediated dorsal-contact force transmission and proprioceptive observability}, and successful execution of edge-sensitive manipulation tasks, including paper singulation, card picking, and orange peeling. These results show that coupling a mechanically structured contact interface with a force-motion-transparent finger mechanism provides a principled approach to precise manipulation. Our project page is at: https://platohand.github.io
InFeR: Informed Failure Resilience in Learned Visual Navigation Control
While imitation learning (IL) has enabled successful visual navigation in many common environments, IL policies are prone to unpredictable failures under out-of-distribution (OOD) scenarios. This necessitates failure-resilient policies, which not only detect failures, but also recognise their sources and recover from them autonomously. We propose InFeR, a general framework for building IL policies with informed failure resilience without failure or recovery demonstrations. InFeR retrains an IL policy with a Variational Information Bottleneck (VIB) loss to structure its latent space for OOD failure detection. It applies a visual explainability technique, Grad-CAM, to localise an image region as the source of failure and inform a heuristic policy for recovery. All these are achieved without requiring additional training data. Real-world experiments show that InFeR enables informed failure recovery across two different policy architectures, yielding robust long-range navigation in complex environments.
EvoQRE: Modeling Bounded Rationality in Safety-Critical Traffic Simulation via Evolutionary Quantal Response Equilibrium
Existing traffic simulation frameworks for autonomous vehicles typically rely on imitation learning or game-theoretic approaches that solve for Nash or coarse correlated equilibria, implicitly assuming perfectly rational agents. However, human drivers exhibit bounded rationality, making approximately optimal decisions under cognitive and perceptual constraints. We propose EvoQRE, a principled framework for modeling safety-critical traffic interactions as general-sum Markov games solved via Quantal Response Equilibrium (QRE) and evolutionary game dynamics. EvoQRE integrates a pre-trained generative world model with entropy-regularized replicator dynamics, capturing stochastic human behavior while maintaining equilibrium structure. We provide rigorous theoretical results, proving that the proposed dynamics converge to Logit-QRE under a two-timescale stochastic approximation with an explicit convergence rate of O(log k / k^{1/3}) under weak monotonicity assumptions. We further extend QRE to continuous action spaces using mixture-based and energy-based policy representations. Experiments on the Waymo Open Motion Dataset and nuPlan benchmark demonstrate that EvoQRE achieves state-of-the-art realism, improved safety metrics, and controllable generation of diverse safety-critical scenarios through interpretable rationality parameters.
comment: This article is being withdrawn due to identified issues in the experimental evaluation and theoretical assumptions that may affect the validity of some reported conclusions. The authors plan to revise the methodology and provide a corrected version in future work.
CompassAD: Intent-Driven 3D Affordance Grounding in Functionally Competing Objects
When told to "cut the cake," a robot must choose the knife over nearby scissors, despite both objects affording the same cutting function. In real-world scenes, multiple objects may share identical affordances, yet only one is appropriate under the given task context. We call such cases confusing pairs. However, existing 3D affordance methods largely sidestep this challenge by evaluating isolated single objects, often with explicit category names provided in the query. We formalize Intent-Driven Confusable Affordance Grounding, a new 3D affordance setting that requires predicting a per-point affordance mask on the correct object within a multi-object point cloud, conditioned on implicit natural language intent. To study this problem, we construct CompassAD, the first benchmark centered on implicit intent in confusing multi-object compositions. It comprises 30 confusing object pairs spanning 16 affordance types, 6,422 compositions, and 88K+ query-answer pairs. Furthermore, we propose CompassNet, a framework that incorporates two dedicated modules tailored to this task. Instance-bounded Cross Injection (ICI) constrains language-geometry alignment within object boundaries to prevent cross-object semantic leakage. Bi-level Contrastive Refinement (BCR) enforces discrimination at both geometric-group and point levels, sharpening distinctions between target and confusable surfaces. Extensive experiments demonstrate state-of-the-art results on both seen and unseen queries, and deployment on a robotic manipulator confirms effective transfer to real-world grasping in confusing multi-object compositions.
Weather-Robust Cross-View Geo-Localization via Prototype-Based Semantic Part Discovery
Cross-view geo-localization (CVGL), which matches an oblique drone view to a geo-referenced satellite tile, has emerged as a key alternative for autonomous drone navigation when GNSS signals are jammed, spoofed, or unavailable. Despite strong recent progress, three limitations persist: (1) global-descriptor designs compress the patch grid into a single vector without separating layout from texture across the view gap; (2) altitude-related scale variation is retained in the learned embedding rather than marginalized; and (3) multi-objective training relies on hand-tuned scalars over losses on incompatible gradient scales. We propose SkyPart, a lightweight swappable head for patch-based vision transformers (ViTs) that institutes explicit part grouping over the patch grid. SkyPart has four theory-grounded components: (i) learnable prototypes competing for patch tokens via single-pass cosine assignment; (ii) altitude-conditioned linear modulation applied only during training, making the retrieval embedding altitude-free at inference; (iii) a graph-attention readout over active prototypes; and (iv) a Kendall uncertainty-weighted multi-objective loss whose stationary points are Pareto-stationary. At 26.95M parameters and 22.14 GFLOPs, SkyPart is the smallest among top-performing methods and sets a new state of the art on SUES-200, University-1652, and DenseUAV under a single-pass, no-re-ranking, no-TTA protocol. Its advantage over the strongest baseline widens under the ten-condition WeatherPrompt corruption benchmark.
comment: 37 pages, 7 figures, 6 tables
Multi-Source Human-in-the-Loop Digital Twin Testbed for Connected and Autonomous Vehicles in Mixed Traffic Flow
In the emerging mixed traffic environments, Connected and Autonomous Vehicles (CAVs) have to interact with surrounding human-driven vehicles (HDVs). This paper introduces MSH-MCCT (Multi-Source Human-in-the-Loop Mixed Cloud Control Testbed), a novel CAV testbed that captures complex interactions between various CAVs and HDVs. Utilizing the Mixed Digital Twin concept, which combines Mixed Reality with Digital Twin, MSH-MCCT integrates physical, virtual, and mixed platforms, along with multi-source control inputs. Bridged by the mixed platform, MSH-MCCT allows human drivers and CAV algorithms to operate both physical and virtual vehicles within multiple fields of view. Particularly, this testbed facilitates the coexistence and real-time interaction of physical and virtual CAVs \& HDVs, significantly enhancing the experimental flexibility and scalability. Experiments on vehicle platooning in mixed traffic showcase the potential of MSH-MCCT to conduct CAV testing with multi-source real human drivers in the loop through driving simulators of diverse fidelity. The videos for the experiments are available at our project website: https://dongjh20.github.io/MSH-MCCT.
LiPS: Lightweight Panoptic Segmentation for Resource-Constrained Robotics ICIP
Panoptic segmentation is a key enabler for robotic perception, as it unifies semantic understanding with object-level reasoning. However, the increasing complexity of state-of-the-art models makes them unsuitable for deployment on resource-constrained platforms such as mobile robots. We propose a novel approach called LiPS that addresses the challenge of efficient-to-compute panoptic segmentation with a lightweight design that retains query-based decoding while introducing a streamlined feature extraction and fusion pathway. It aims at providing a strong panoptic segmentation performance while substantially lowering the computational demands. Evaluations on standard benchmarks demonstrate that LiPS attains accuracy comparable to much heavier baselines, while providing up to 4.5 higher throughput, measured in frames per second, and requiring nearly 6.8 times fewer computations. This efficiency makes LiPS a highly relevant bridge between modern panoptic models and real-world robotic applications.
comment: Accepted to IEEE International Conference on Image Processing (ICIP) 2026, Paper #2070
COLSON: Controllable Learning-Based Social Navigation via Diffusion-Based Reinforcement Learning ICRA 2026
Mobile robot navigation in dynamic environments with pedestrian traffic is a key challenge in the development of autonomous mobile service robots. Recently, deep reinforcement learning-based methods have been actively studied and have outperformed traditional rule-based approaches owing to their optimization capabilities. Among these methods, those that assume continuous action spaces typically rely on Gaussian distributions, which limit the flexibility of the generated actions. In contrast, the application of diffusion models to reinforcement learning has advanced, enabling more flexible action distributions than Gaussian policy-based approaches. In this study, we apply a diffusion-based reinforcement learning approach to social navigation and validate its effectiveness. Furthermore, by exploiting the characteristics of diffusion models, we propose extensions that enable adaptation to previously unseen scenarios without additional training. As concrete scenario examples, we demonstrate adaptability to scenarios in which static obstacles exist in the environment that were not present during training, as well as scenarios in which the objective differs from training, such as accompanying target pedestrians while avoiding others to reach the destination.
comment: ICRA 2026
Teaching Robots to Interpret Social Interactions through Lexically-guided Dynamic Graph Learning ACM MM 26
For a robot to be called socially intelligent, it must be able to infer users internal states from their current behaviour, predict the users future behaviour, and if required, respond appropriately. In this work, we investigate how robots can be endowed with such social intelligence by modelling the dynamic relationship between user's internal states (latent) and actions (observable state). Our premise is that these states arise from the same underlying socio-cognitive process and influence each other dynamically. Drawing inspiration from theories in Cognitive Science, we propose a novel multi-task learning framework, termed as \textbf{SocialLDG} that explicitly models the dynamic relationship among the states represent as six distinct tasks. Our framework uses a language model to introduce lexical priors for each task and employs dynamic graph learning to model task affinity evolving with time. SocialLDG has three advantages: First, it achieves state-of-the-art performance on two challenging human-robot social interaction datasets available publicly. Second, it supports strong task scalability by learning new tasks seamlessly without catastrophic forgetting. Finally, benefiting from explicit modelling task affinity, it offers insights on how different interactions unfolds in time and how the internal states and observable actions influence each other in human decision making.
comment: submitted to ACM MM 26
Distributionally Robust Safety Under Arbitrary Uncertainties: A Safety Filtering Approach
In this work, we study how to ensure probabilistic safety for nonlinear systems under distributional ambiguity. Our approach builds on a backup-based safety filtering framework that switches between a high-performance nominal policy and a certified backup policy to ensure safety. To handle arbitrary uncertainties from ambiguous distributions, i.e., where the distribution is not of specific structure and the true distribution is unknown, we adopt a distributionally robust (DR) formulation using Wasserstein ambiguity sets. Rather than solving a high-dimensional DR trajectory optimization problem online, we exploit the structure of backup-based safety filtering to reduce safety certification to a one-dimensional search over the switching time between nominal and backup policies. We then develop a sampling-based certification procedure with finite-sample guarantees, where empirical failure probabilities are compared against a Wasserstein-inflated threshold. We validate our method through simulations across three systems, from a Dubins vehicle to a high-speed racing car and a fighter jet, demonstrating the broad applicability and computational efficiency.
comment: 10 pages, 4 figures, submitted to IEEE Robotics and Automation Letters (RA-L); Project Page: https://dcherenson.github.io/drs-gk
3D Modeling and Automated Measurement of Concrete Cracks via Segment Anything Refinement and Visual Inertial LiDAR Fusion
Visual-Spatial Systems has become increasingly essential in concrete crack inspection. However, existing methods often lacks adaptability to diverse scenarios, exhibits limited robustness in image-based approaches, and struggles with curved or complex geometries. To address these limitations, an innovative framework for two-dimensional (2D) crack detection, three-dimensional (3D) reconstruction, and 3D automatic crack measurement was proposed by integrating computer vision technologies and multi-modal Simultaneous localization and mapping (SLAM) in this study. Firstly, building on a base DeepLabv3+ segmentation model, and incorporating specific refinements utilizing foundation model Segment Anything Model (SAM), we developed a crack segmentation method with strong generalization across unfamiliar scenarios, enabling the generation of precise 2D crack masks. To enhance the accuracy and robustness of 3D reconstruction, Light Detection and Ranging (LiDAR) point clouds were utilized together with image data and segmentation masks. By leveraging both image- and LiDAR-SLAM, we developed a multi-frame and multi-modal fusion framework that produces dense, colorized point clouds, effectively capturing crack semantics at a 3D real-world scale. Furthermore, the crack geometric attributions were measured automatically and directly within 3D dense point cloud space, surpassing the limitations of conventional 2D image-based measurements. This advancement makes the method suitable for structural components with curved and complex 3D geometries. Experimental results across various concrete structures highlight the significant improvements and unique advantages of the proposed method, demonstrating its effectiveness, accuracy, and robustness in real-world applications.
comment: Title and author list updated
Solving Reach- and Stabilize-Avoid Problems Using Discounted Reachability
In this article, we consider the infinite-horizon reach-avoid (RA) and stabilize-avoid (SA) zero-sum game problems for general nonlinear continuous-time systems, where the goal is to find the set of states that can be controlled to reach or stabilize to a target set, without violating constraints even under the worst-case disturbance. Based on the Hamilton-Jacobi reachability method, we address the RA problem by designing a new Lipschitz continuous RA value function, whose zero sublevel set exactly characterizes the RA set. We establish that the associated Bellman backup operator is contractive and that the RA value function is the unique viscosity solution of a Hamilton-Jacobi variational inequality. Finally, we develop a two-step framework for the SA problem by integrating our RA strategies with a recently proposed Robust Control Lyapunov-Value Function, thereby ensuring both target reachability and long-term stability. We numerically verify our RA and SA frameworks on a 3D Dubins car system to demonstrate the efficacy of the proposed approach.
comment: 16 pages, 6 figures, 1 table. Accepted to IEEE Transactions on Automatic Control
Q-learning with Adjoint Matching
We propose Q-learning with Adjoint Matching (QAM), a novel TD-based reinforcement learning (RL) algorithm that tackles a long-standing challenge in continuous-action RL: efficient optimization of an expressive diffusion or flow-matching policy with respect to a parameterized Q-function. Effective optimization requires exploiting the first-order information of the critic, but it is challenging to do so for flow or diffusion policies because direct gradient-based optimization via backpropagation through their multi-step denoising process is numerically unstable. Existing methods work around this either by only using the value and discarding the gradient information, or by relying on approximations that sacrifice policy expressivity or bias the learned policy. QAM sidesteps both of these challenges by leveraging adjoint matching, a recently proposed technique in generative modeling, which transforms the critic's action gradient to form a step-wise objective function that is free from unstable backpropagation, while providing an unbiased, expressive policy at the optimum. Combined with temporal-difference backup for critic learning, QAM consistently outperforms prior approaches on hard, sparse reward tasks in both offline and offline-to-online RL.
comment: 32 pages, 8 figures, 7 tables
Learning Bilevel Policies over Symbolic World Models for Long-Horizon Planning
We tackle the challenge of building embodied AI agents that can reliably solve long-horizon planning problems. Imitation learning from demonstrations has shown itself to be effective in training robots to solve a diversity of complex tasks requiring fine motor control and manipulation over low-level (LL), continuous environments. Yet, it remains a difficult endeavour to generate long-horizon plans from imitation learning alone. In contrast, high-level (HL), symbolic abstractions facilitate efficient and interpretable long-horizon planning. We propose to combine the strengths of LL imitation learning for manipulation and control, and HL symbolic abstractions for long-horizon planning. We realise this idea via \emph{bilevel policies} of the form $(π^{\mathrm{hl}}, π^{\mathrm{ll}})$, consisting of a neural policy $π^{\mathrm{ll}}$ learned from LL demonstrations, and an HL symbolic policy $π^{\mathrm{hl}}$ that is constructed from symbolic abstractions of the LL demonstrations combined with inductive generalisation. We implement these ideas in the BISON system. Experiments on extended MetaWorld benchmarks demonstrate that BISON generalises to long horizons and problems with greater numbers of objects than those solved by VLA and end-to-end methods, and is more time and memory efficient in training and inference. Notably, when ignoring LL execution, BISON's HL policies can solve HL problems with 10,000 relevant objects in under a minute. Project page: https://dillonzchen.github.io/bison
Iterative Compositional Data Generation for Robot Control
Collecting robotic manipulation data is expensive, making it impractical to acquire demonstrations for the combinatorially large space of tasks that arise in multi-object, multi-robot, and multi-environment settings. While recent generative models can synthesize useful data for individual tasks, they do not exploit the compositional structure of robotic domains and struggle to generalize to unseen task combinations. We propose a semantic compositional diffusion transformer that factorizes transitions into robot-, object-, obstacle-, and objective-specific components and learns their interactions through attention. Once trained on a limited subset of tasks, we show that our model can zero-shot generate high-quality transitions from which we can learn control policies for unseen task combinations. Then, we introduce an iterative self-improvement procedure in which synthetic data is validated via offline reinforcement learning and incorporated into subsequent training rounds. Our approach substantially improves zero-shot performance over monolithic and hard-coded compositional baselines, ultimately solving nearly all held-out tasks and demonstrating the emergence of meaningful compositional structure in the learned representations.
RoboMD: Uncovering Robot Vulnerabilities through Semantic Potential Fields
Robot manipulation policies, while central to the promise of physical AI, are highly vulnerable in the presence of external variations in the real world. Diagnosing these vulnerabilities is hindered by two key challenges: (i) the relevant variations to test against are often unknown, and (ii) direct testing in the real world is costly and unsafe. We introduce a framework that tackles both issues by learning a separate deep reinforcement learning (deep RL) policy for vulnerability prediction through virtual runs on a continuous vision-language embedding trained with limited success-failure data. By treating this embedding space, which is rich in semantic and visual variations, as a potential field, the policy learns to move toward vulnerable regions while being repelled from success regions. This vulnerability prediction policy, trained on virtual rollouts, enables scalable and safe vulnerability analysis without expensive physical trials. By querying this policy, our framework builds a probabilistic vulnerability-likelihood map. Experiments across simulation benchmarks and a physical robot arm show that our framework uncovers up to 23% more unique vulnerabilities than state-of-the-art vision-language baselines, revealing subtle vulnerabilities overlooked by heuristic testing. Additionally, we show that fine-tuning the manipulation policy with the vulnerabilities discovered by our framework improves manipulation performance with much less fine-tuning data.
comment: 26 Pages, 20 figures
Multiagent Systems
The Dynamics of Policy Gradient in Social Dilemmas with Partner Selection
In social dilemmas self-interested learning agents face the choice between the societal benefit of cooperation and the immediate reward of defection. Significant evidence exists on the benefits of assortment mechanisms such as partner selection for the emergence of cooperation, but this is largely available through agent-based simulations. In this paper, we provide an analytical solution to the problem, studying the policy-gradient dynamics in a multi-agent environment with partner selection. We show how partner selection changes the opponent distribution and hence the reward landscape, and prove this promotes cooperation under simple rules known from the literature. In particular, we find that population variance is a necessary condition for cooperation to emerge. Using a two-dimensional Wiener process, we extend the dynamics to capture the stochastic effects of partner selection and the resulting opponent distribution. We derive a sufficient condition for the population to be cooperation-promoting and prove the existence of a stationary distribution. Simulations confirm that the stochastic model accurately captures the policy-gradient dynamics and clarifies how the learning rate affects the emergence of cooperation.
LLM-Guided Communication for Cooperative Multi-Agent Reinforcement Learning ICML 2026
Communication is a key component in multi-agent reinforcement learning (MARL) for mitigating partial observability, yet prior approaches often rely on inefficient information exchange or fail to transmit sufficient state information. To address this, we propose LLM-driven Multi-Agent Communication (LMAC), which leverages an LLM's reasoning capability to design a communication protocol that enables all agents to reconstruct the underlying state as accurately and uniformly as possible. LMAC iteratively refines the protocol using an explicit state-awareness criterion, improving state recovery while narrowing differences in agents' knowledge. Experiments on diverse MARL benchmarks show that LMAC improves state reconstruction across agents and yields substantial performance gains over prior communication baselines.
comment: 9 pages for main, 32 pages for total, Accepted to ICML 2026
Interaction-Breaking Adversarial Learning Framework for Robust Multi-Agent Reinforcement Learning ICML 2026
Cooperation is central to multi-agent reinforcement learning (MARL), yet learned coordination can be fragile when external perturbations disrupt inter-agent interactions. Prior robust MARL methods have primarily considered value-oriented attacks, leaving a gap in robustness when interaction structures themselves are corrupted. In this paper, we propose an interaction-breaking adversarial learning (IBAL) framework that takes an information-theoretic view to construct attacks that impede coordination by perturbing agents' observations and actions, and trains agents to perform reliably under such disruptions. Empirically, our approach improves robustness over existing robust MARL baselines across diverse attack settings and yields stronger performance even under agent-missing scenarios.
comment: 8 pages for main, 27 pages for total, Accepted to ICML 2026
Efficient Gradient Methods for Distributed Saddle Problems
The distributed setting for Saddle Problems (SPs) has recently emerged as a framework for various modern applications in machine learning and multiagent systems. Despite its relevance, the theoretical foundations of this setting have not yet been thoroughly established. In this paper, we advance this research direction by formalizing the distributed setup for SPs and providing rigorous definitions of communication and computational costs. Our main result is a novel decoupled method that achieves optimal communication cost within the zero-respecting framework. Our method is based on a multi-stage reduction to the decoupled minimization of residual norms, which yields strict improvements over the best known communication cost for the class and the long-standing oracle cost of the Extragradient method. Further, we show by a matching lower bound that our method is communication-optimal within the family of gradient-span algorithms. Finally, we study the extension of distributed SP into Variational Inequality Problem (VIP), which generalizes two-player zero-sum games to multiplayer general-sum games. We show that our decoupled method achieves a new state-of-the-art communication complexity for this broader class.
Beyond Scaling: Agents Are Heading to the Edge
The bottleneck of useful agentic intelligence has shifted from compressing world knowledge into a single model to executing a coordinated system. This position paper argues that personal-agent architecture must move to the edge because the core properties of agentic intelligence tasks, particularly their structural coupling with high-fidelity local context and the need for zero-latency execution loops, do not sit well with cloud-centric designs. We develop this claim through three structural shifts. First, the Prefrontal Turn: the main marginal lever of capability has moved from pre-training scale to framework-level executive control. Such control must remain physically close to the environment of action if the agent is to preserve cognitive alignment. Second, the Data-Geography Paradox, the ``dark matter'' of agentic data (local file hierarchies, real-time sensor streams, and transient OS states) degrades, disappears, or loses meaning once prepared for cloud transmission, thereby cutting the agent off from ground-truth context. Third, the interaction-alignment loop, the only economically and ecologically sustainable source of agentic refinement data is the high-fidelity implicit preference signal produced through real-time local interaction. Third, the interaction-alignment loop, the only economically and ecologically sustainable source of agentic refinement data is the high-fidelity implicit preference signal produced through real-time local interaction. We conclude with falsifiable predictions for the next deployment cycle of personal agents.
How Far Are We From True Auto-Research?
Recent auto-research systems can produce complete papers, but feasibility is not the same as quality, and the field still lacks a systematic study of how good agent-generated papers actually are. We introduce ResearchArena, a minimal scaffold that lets off-the-shelf agents (Claude Code using Opus 4.6, Codex using GPT-5.4, and Kimi Code using K2.5) carry out the full research loop themselves (ideation, experimentation, paper writing, self-refinement) under only lightweight guidance. Across 13 computer science seeds and 3 trials per agent-domain pair, ResearchArena yields 117 agent-generated papers, each evaluated under three complementary lenses: a manuscript-only reviewer (SAR), an artifact-aware peer review (PR) in which agents inspect the workspace alongside the manuscript, and an human conducted meta-review. Under SAR alone the picture is optimistic: Claude Code obtains the highest score, outperforms Analemma's FARS, and matches the weighted-average human ICLR 2025 submission, suggesting that minimally scaffolded agents can produce papers that look competitive on manuscript-only review. Manual inspection, however, reveals this picture is overstated: SAR scores are poorly aligned with its actual acceptance decisions and reward plausible framing without verifying experimental substance. Under artifact-aware PR scores drop sharply, and manual auditing identifies experimental rigor as the major bottleneck, decomposing into three failure modes (fabricated results, underpowered experiments, and plan/execution mismatch) that are highly agent-dependent: Codex 5%/8% paper-vs-artifact mismatch / fabricated references versus Kimi Code 77%/72%, a $\sim$15$\times$ spread that tracks distinct research personas the agents develop. None of the 117 agent-generated papers reaches the acceptance bar of a top-tier venue. This suggests that we are still gapped from the true auto-research.
DecisionBench: A Benchmark for Emergent Delegation in Long-Horizon Agentic Workflows
We introduce DecisionBench, a benchmark substrate for emergent delegation in long-horizon agentic workflows. The substrate fixes a task suite (GAIA, tau-bench, BFCL multi-turn), a peer-model pool (11 models, 7 vendor families), a delegation interface (call_model plus an optional read_profile channel), a deterministic skill-annotation layer, and a multi-axis metric suite covering quality, cost, latency, delegation rate, routing fidelity-at-k, vendor self-preference, and a counterfactual-delegation ceiling. The substrate is agnostic to how peer information is generated or delivered, so learned routers, richer peer memories, adaptive profile construction, and multi-step delegation can all be evaluated against it. We characterize the substrate with a five-condition reference sweep on the full pool (n=23,375 task instances). Three benchmark-level findings emerge: (i) mean end-task quality is statistically indistinguishable across the four awareness conditions (|beta| <= 0.010, p >= 0.21), so quality-only evaluation would miss the orchestration signal; (ii) routing fidelity-at-1 ranges from 7.5% to 29.5% across conditions at near-equal mean quality, with delivery channel (on-demand tool vs. preloaded description) dominating description content; (iii) a counterfactual ceiling places perfect delegation 15-31 percentage points above measured performance on every suite, locating large unrealized headroom for future orchestration methods. We release the substrate, annotation layer, reference intervention suite, analysis pipeline, and 220 per-condition run archives.
comment: 28 pages, 9 figures, 11 tables. Code and data: https://huggingface.co/decisionbench
RLFTSim: Realistic and Controllable Multi-Agent Traffic Simulation via Reinforcement Learning Fine-Tuning CVPR 2026
Supervised open-loop training has been widely adopted for training traffic simulation models; however, it fails to capture the inherently dynamic, multi-agent interactions common in complex driving scenarios. We introduce RLFTSim, a reinforcement-learning-based fine-tuning framework that enhances scenario realism by aligning simulator rollouts with real-world data distributions and provides a method for distilling goal-conditioned controllability in scenario generation. We instantiate RLFTSim on top of a pre-trained simulation model, design a reward that balances fidelity and controllability, and perform comprehensive experiments on the Waymo Open Motion Dataset. Our results show improvements in realism, achieving state-of-the-art performance. Compared with other heuristic search-based fine-tuning methods, RLFTSim requires significantly fewer samples due to a proposed low-variance and dense reward signal, and it directly addresses the realism alignment issue by design. We also demonstrate the effectiveness of our approach for distilling traffic simulation controllability through goal conditioning. The project page is available at https://ehsan-ami.github.io/rlftsim.
comment: CVPR 2026 Highlight; Project page at https://ehsan-ami.github.io/rlftsim
Nash Welfare in Additively Separable Hedonic Games
Additively separable hedonic games (ASHGs) are a prominent model of coalition formation where agents' preferences are derived from their individual valuations of peers. While social welfare maximization in ASHGs has traditionally focused mostly on utilitarian welfare, Nash welfare -- a well-established metric in economics which balances fairness with efficiency and offers scale invariance -- has been entirely overlooked. In this paper, we initiate the study of Nash welfare in ASHGs. We point out desirable properties fulfilled by partitions with high Nash welfare. This includes guaranteed contractual Nash stability in symmetric games, even for any approximation of Nash welfare. This is particularly appealing since, as for other welfare notions, Nash welfare turns out to be NP-hard to maximize, even for the ASHG subclass of symmetric aversion to enemies games (AEGs). A main focus of our study is on approximation algorithms for the Nash welfare objective. We present packing-based algorithms with approximation ratios for well-established subclasses of ASHGs: $n-1$ for AEGs and $2n$ for appreciation of friends games. This is complemented by a strict inapproximability result showing it is NP-hard to approximate Nash welfare within a factor of $1.0000759$ in general ASHGs. Further, we investigate the restricted settings with an upper bound on the coalition size or number of coalitions, and draw the boundary between the cases admitting efficient algorithms and those yielding NP-hardness: bounding the allowed size or number of coalitions by $2$ admits polynomial-time solvability, whereas bounds of $3$ or more yield NP-hardness or unbounded inapproximability.
MemCoT: Test-Time Scaling through Memory-Driven Chain-of-Thought
Large Language Models (LLMs) still suffer from severe hallucinations and catastrophic forgetting during causal reasoning over massive, fragmented long contexts. Existing memory mechanisms typically treat retrieval as a static, single-step passive matching process, leading to severe semantic dilution and contextual fragmentation. To overcome these fundamental bottlenecks, we propose MemCoT, a test-time memory scaling framework that redefines the reasoning process by transforming long-context reasoning into an iterative, stateful information search. MemCoT introduces a multi-view long-term memory perception module that enables Zoom-In evidence localization and Zoom-Out contextual expansion, allowing the model to first identify where relevant evidence resides and then reconstruct the surrounding causal structure necessary for reasoning. In addition, MemCoT employs a task-conditioned dual short-term memory system composed of semantic state memory and episodic trajectory memory. This short-term memory records historical search decisions and dynamically guides query decomposition and pruning across iterations. Empirical evaluations demonstrate that MemCoT establishes a state-of-the-art performance. Empowered by MemCoT, several open- and closed-source models achieve SOTA performance on the LoCoMo benchmark and LongMemEval-S benchmark.
comment: 14 pages, 7 figures
Tongyi DeepResearch Technical Report
We present Tongyi DeepResearch, an agentic large language model, which is specifically designed for long-horizon, deep information-seeking research tasks. To incentivize autonomous deep research agency, Tongyi DeepResearch is developed through an end-to-end training framework that combines agentic mid-training and agentic post-training, enabling scalable reasoning and information seeking across complex tasks. We design a highly scalable data synthesis pipeline that is fully automatic, without relying on costly human annotation, and empowers all training stages. By constructing customized environments for each stage, our system enables stable and consistent interactions throughout. Tongyi DeepResearch, featuring 30.5 billion total parameters, with only 3.3 billion activated per token, achieves state-of-the-art performance across a range of agentic deep research benchmarks, including Humanity's Last Exam, BrowseComp, BrowseComp-ZH, WebWalkerQA, xbench-DeepSearch, FRAMES and xbench-DeepSearch-2510. We open-source the model, framework, and complete solutions to empower the community.
comment: https://tongyi-agent.github.io/blog
AgentArk: Distilling Multi-Agent Intelligence into a Single LLM Agent
While large language model (LLM) multi-agent systems achieve superior reasoning performance through iterative debate, practical deployment is limited by their high computational cost and error propagation. This paper proposes AgentArk, a novel framework to distill multi-agent dynamics into the weights of a single model, effectively transforming explicit test-time interactions into implicit model capabilities. This equips a single agent with the intelligence of multi-agent systems while remaining computationally efficient. Specifically, we investigate three hierarchical distillation strategies across various models, tasks, scaling, and scenarios: reasoning-enhanced fine-tuning; trajectory-based augmentation; and process-aware distillation. By shifting the burden of computation from inference to training, the distilled models preserve the efficiency of one agent while exhibiting strong reasoning and self-correction performance of multiple agents. They further demonstrate enhanced robustness and generalization across diverse reasoning tasks. We hope this work can shed light on future research on efficient and robust multi-agent development. Our code is at https://github.com/AIFrontierLab/AgentArk.
Speech-Hands: A Self-Reflection Voice Agentic Approach to Speech Recognition and Audio Reasoning with Omni Perception ACL 2026
We introduce a voice-agentic framework that learns one critical omni-understanding skill: knowing when to trust itself versus when to consult external audio perception. Our work is motivated by a crucial yet counterintuitive finding: naively fine-tuning an omni-model on both speech recognition and external sound understanding tasks often degrades performance, as the model can be easily misled by noisy hypotheses. To address this, our framework, Speech-Hands, recasts the problem as an explicit self-reflection decision. This learnable reflection primitive proves effective in preventing the model from being derailed by flawed external candidates. We show that this agentic action mechanism generalizes naturally from speech recognition to complex, multiple-choice audio reasoning. Across the OpenASR leaderboard, Speech-Hands consistently outperforms strong baselines by 12.1% WER on seven benchmarks. The model also achieves 77.37% accuracy and high F1 on audio QA decisions, showing robust generalization and reliability across diverse audio question answering datasets. By unifying perception and decision-making, our work offers a practical path toward more reliable and resilient audio intelligence.
comment: Accepted to ACL 2026. Oral Presentation. Code: https://github.com/YukinoWan/Speech-Hands OpenClaw Branch: https://github.com/openclaw/openclaw/pull/69073
MAGIQ: A Post-Quantum Multi-Agentic AI Governance System with Provable Security
Our computing ecosystem is being transformed by two emerging paradigms: the increased deployment of agentic AI systems and advancements in quantum computing. With respect to agentic AI systems, one of the most critical problems is creating secure governing architectures that ensure agents follow their owners' communication and interaction policies and can be held accountable for the messages they exchange with other agents. With respect to quantum computing, existing systems must be retrofitted and new cryptographic mechanisms must be designed to ensure long-term security and quantum resistance. In fact, NIST recommends that standard public-key cryptographic algorithms, including RSA, Diffie-Hellman (DH), and elliptic-curve constructions (ECC), be deprecated starting in 2030 and disallowed after 2035. In this paper, we present MAGIQ, a framework for policy definition and enforcement in multi-agent AI systems using novel, highly efficient, quantum-resistant cryptographic protocols with proven security guarantees. MAGIQ (i) allows users to define rich communication and access-control policy budgets for agent-to-agent sessions and tasks, including global budgets for one-to-many agent sessions; (ii) enforces such policies using post-quantum cryptographic primitives; (iii) supports session-based enforcement of policies for agent-to-agent and one-to-many agent sessions; and (iv) provides accountability of agents to their users through message attribution. We formally model and prove the correctness and security of the system using the Universal Composability (UC) framework. We evaluate the computation and communication overhead of our framework and compare it with the state-of-the-art agentic AI framework SAGA. MAGIQ is a first step toward post-quantum-secure solutions for agentic AI systems.
EvoQRE: Modeling Bounded Rationality in Safety-Critical Traffic Simulation via Evolutionary Quantal Response Equilibrium
Existing traffic simulation frameworks for autonomous vehicles typically rely on imitation learning or game-theoretic approaches that solve for Nash or coarse correlated equilibria, implicitly assuming perfectly rational agents. However, human drivers exhibit bounded rationality, making approximately optimal decisions under cognitive and perceptual constraints. We propose EvoQRE, a principled framework for modeling safety-critical traffic interactions as general-sum Markov games solved via Quantal Response Equilibrium (QRE) and evolutionary game dynamics. EvoQRE integrates a pre-trained generative world model with entropy-regularized replicator dynamics, capturing stochastic human behavior while maintaining equilibrium structure. We provide rigorous theoretical results, proving that the proposed dynamics converge to Logit-QRE under a two-timescale stochastic approximation with an explicit convergence rate of O(log k / k^{1/3}) under weak monotonicity assumptions. We further extend QRE to continuous action spaces using mixture-based and energy-based policy representations. Experiments on the Waymo Open Motion Dataset and nuPlan benchmark demonstrate that EvoQRE achieves state-of-the-art realism, improved safety metrics, and controllable generation of diverse safety-critical scenarios through interpretable rationality parameters.
comment: This article is being withdrawn due to identified issues in the experimental evaluation and theoretical assumptions that may affect the validity of some reported conclusions. The authors plan to revise the methodology and provide a corrected version in future work.
Lying with Truths: Open-Channel Multi-Agent Collusion for Belief Manipulation via Generative Montage ACL 2026
As large language models (LLMs) transition to autonomous agents synthesizing real-time information, their reasoning capabilities introduce an unexpected attack surface. This paper introduces a novel threat where colluding agents steer victim beliefs using only truthful evidence fragments distributed through public channels, without relying on covert communications, backdoors, or falsified documents. By exploiting LLMs' overthinking tendency, we formalize the first cognitive collusion attack and propose Generative Montage: a Writer-Editor-Director framework that constructs deceptive narratives through adversarial debate and coordinated posting of evidence fragments, causing victims to internalize and propagate fabricated conclusions. To study this risk, we develop CoPHEME, a dataset derived from real-world rumor events, and simulate attacks across diverse LLM families. Our results show pervasive vulnerability across 14 LLM families: attack success rates reach 74.4% for proprietary models and 70.6% for open-weights models. Counterintuitively, stronger reasoning capabilities increase susceptibility, with reasoning-specialized models showing higher attack success than base models or prompts. Furthermore, these false beliefs then cascade to downstream judges, achieving over 60% deception rates, highlighting a socio-technical vulnerability in how LLM-based agents interact with dynamic information environments. Our implementation and data are available at: https://github.com/CharlesJW222/Lying_with_Truth/tree/main.
comment: Accepted to the ACL 2026 Main Conference (Oral Presentation)
Beyond Static Responses: Multi-Agent LLM Systems as a New Paradigm for Social Science Research
As large language models (LLMs) transition from static tools to fully agentic systems, their potential for transforming social science research has become increasingly evident. This paper introduces a structured framework for understanding the diverse applications of LLM-based agents, ranging from simple data processors to complex, multi-agent systems capable of simulating emergent social dynamics. By mapping this developmental continuum across six levels, the paper clarifies the technical and methodological boundaries between different agentic architectures, providing a comprehensive overview of current capabilities and future potential. It highlights how lower-tier systems streamline conventional tasks like text classification and data annotation, while higher-tier systems enable novel forms of inquiry, including the study of group dynamics, norm formation, and large-scale social processes. However, these advancements also introduce significant challenges, including issues of reproducibility, ethical oversight, and the risk of emergent biases. The paper critically examines these concerns, emphasizing the need for robust validation protocols, interdisciplinary collaboration, and standardized evaluation metrics. It argues that while LLM-based agents hold transformative potential for the social sciences, realizing this promise will require careful, context-sensitive deployment and ongoing methodological refinement. The paper concludes with a call for future research that balances technical innovation with ethical responsibility, encouraging the development of agentic systems that not only replicate but also extend the frontiers of social science, offering new insights into the complexities of human behavior.
Convergence of Multiagent Learning Systems for Traffic control
Rapid urbanization in cities like Bangalore has led to severe traffic congestion, making efficient Traffic Signal Control (TSC) essential. Multi-Agent Reinforcement Learning (MARL), often modeling each traffic signal as an independent agent using Q-learning, has emerged as a promising strategy to reduce average commuter delays. While prior work Prashant L A et. al has empirically demonstrated the effectiveness of this approach, a rigorous theoretical analysis of its stability and convergence properties in the context of traffic control has not been explored. This paper bridges that gap by focusing squarely on the theoretical basis of this multi-agent algorithm. We investigate the convergence problem inherent in using independent learners for the cooperative TSC task. Utilizing stochastic approximation methods, we formally analyze the learning dynamics. The primary contribution of this work is the proof that the specific multi-agent reinforcement learning algorithm for traffic control is proven to converge under the given conditions extending it from single agent convergence proofs for asynchronous value iteration.
comment: 14 pages 2 figures
Mapping Human Anti-collusion Mechanisms to Multi-agent AI Systems ICML 2026
As multi-agent AI systems become increasingly autonomous, evidence shows they can develop collusive strategies similar to those long observed in human markets and institutions. While human domains have accumulated centuries of anti-collusion mechanisms, it remains unclear how these can be adapted to AI settings. This paper addresses that gap by (i) developing a taxonomy of human anti-collusion mechanisms, including sanctions, leniency & whistleblowing, monitoring & auditing, market design, and governance and (ii) mapping them to potential interventions for multi-agent AI systems. For each mechanism, we propose implementation approaches. We also highlight open challenges, such as the attribution problem (difficulty attributing emergent coordination to specific agents), identity fluidity (agents being easily forked or modified), the boundary problem (distinguishing beneficial cooperation from harmful collusion), and adversarial adaptation (agents learning to evade detection).
comment: Accepted to ICML 2026 Workshop on Technical AI Governance Research (TAIGR); Published in Knowledge-Based Systems Journal
Talk, Judge, Cooperate: Gossip-Driven Indirect Reciprocity in Self-Interested LLM Agents ICML 2026
Indirect reciprocity, which means helping those who have helped others, is difficult to sustain among decentralized, self-interested LLM agents without reliable reputation systems. We address this challenge with the Agentic Linguistic Gossip Network (ALIGN), an automated framework that enables decentralized agents to form reputations, evaluate trustworthiness, and coordinate social norms by strategically sharing open-ended gossip with hierarchical tones. We demonstrate that ALIGN consistently improves indirect reciprocity and resists malicious entrants by identifying and ostracizing defectors. Notably, we find that stronger reasoning capabilities in LLMs lead to more incentive-aligned cooperation, whereas chat models often over-cooperate even when strategically suboptimal. These results suggest that leveraging LLM reasoning through decentralized gossip is a promising path for maintaining social welfare in agentic ecosystems. Our code is available at https://github.com/shuhui-zhu/ALIGN.
comment: ICML 2026
Systems and Control (EESS)
From Coverage to Sensing: ISAC meets FR3
Future 6G systems are expected to exploit upper midband spectrum in frequency range 3 (FR3) not only for high throughput communications, but also for sensing services such as localization, detection, and situational awareness. The following paper develops a concrete path from today's coverage-oriented deployments to FR3 networks that treat sensing as a native function. We first show how existing FR2 radars can be time-multiplexed and coordinated under a $6$G medium access control as radar-as-a-service, forming a bridge between legacy sensing and network-managed integrated sensing and communications (ISAC). We then propose a hierarchical FR3 beam-alignment strategy in which coarse access occurs at lower frequencies and refinement occurs at upper FR3, and quantify the resulting sensing and communication capabilities via range-angle Cram{é}r-Rao bounds in the near field. We identify intra- and inter-beam squint phenomena specific to wideband FR3 arrays, and discuss design approaches to mitigate them. On the signal-processing side, we argue that FR3 sensing cannot rely solely on pilot resources and discuss how much sensing information can be extracted from payload resource elements. We further highlight the role of calibrated FR3 channel simulators and real-time models as the core of wireless digital twins for training and evaluating ISAC algorithms, and discuss how massive MIMO and dense or distributed deployments at FR3 naturally act as large reconfigurable sensor arrays.
comment: Accepted by IEEE Communications Magazine, 2026
Active Defense Against False Data Injection Attacks in Robotic Manipulators
Robotic systems are vulnerable to False Data Injection Attacks (FDIAs), where adversaries corrupt sensor signals to gain malicious control. Feedback linearization exposes robotic systems to integrator vulnerability, making them susceptible to stealthy attacks that can cause significant deviations in end-effector behavior without raising alarms. This paper addresses the resilience of manipulators against finite-horizon FDIAs by formalizing two defense methods, namely anomaly-aware virtual damping and manipulability reduction, with probabilistic guarantees on nominal task execution. Simulations on a 7-DOF redundant manipulator show that the proposed defenses substantially reduce the impact of FDIA compared to using solely a threshold-based ADS like the Chi-squared, while preserving nominal task performance in the absence of attack.
comment: Extended 8-page version containing full proofs. An abridged 6-page version has been accepted for publication in the Proceedings of the 23rd IFAC World Congress (2026)
Cooperative and Noncooperative Paradigms for Game-Theoretic Control of Socio-Technical Systems
This tutorial presents cooperative and noncooperative game-theoretic frameworks for modeling, learning, and control in socio-technical systems, where human behavior, incentives, institutions, and social interactions are coupled with cyber-physical and networked infrastructures. The paper reviews strategic, dynamic, cooperative, matching, learning, and feedback-control approaches for analyzing how local decision-making, adaptation, and strategic interactions shape collective system outcomes. The tutorial further develops feedback-learning and incentive-design perspectives that connect equilibrium analysis with adaptation, distributed control, and mechanism design under information and coordination constraints. We also examine resilience and security challenges arising from adversarial behavior, misinformation, disruptions, and cascading failures in interconnected socio-technical networks. Finally, we discuss emerging research directions at the intersection of game theory, control, learning, and network science for resilient and adaptive socio-technical systems.
Control-Certified Wireless Resource Allocation for Digital-Twin-Enabled UAV Swarms
Wireless resource allocation in digital-twin-enabled unmanned aerial vehicle (UAV) swarms must be both network-feasible and certifiably safe for closed-loop control. Existing packet-level or scalar-priority schedulers cannot meaningfully compare heterogeneous multi-hop actions that differ simultaneously in route, retransmission depth, blocklength, bidirectional delay, delivery probability, and TDMA slot cost. This paper introduces a certificate-guided resource allocation framework for low-altitude multi-hop UAV swarms. A digital twin maps predicted topology, channel, route, and controller-side state into a shared five-dimensional quality-of-service (QoS) certificate comprising uplink/downlink delay bounds, directional delivery guarantees, and a certified upper bound on the interval between successful bidirectional interactions. A state-conditioned stochastic drift test then admits only certificates whose augmented Lyapunov drift is nonpositive under the current controller state. Admitted actions are reduced to certified supply frontiers by removing dominated route-slot configurations, and the online scheduler maximizes Lyapunov-drift reduction under a shared TDMA slot budget via exact dynamic programming. Closed-loop ns-3 simulations demonstrate that the proposed framework outperforms fixed-service, certificate-filtered fixed-priority, dynamic-transmission-count, and value-of-information baselines in both tracking accuracy and high-risk state suppression under identical communication budgets.
Residential Battery Pooling Under Backup Commitments
Residential batteries increasingly serve two roles: they can earn money by arbitraging wholesale prices and providing grid services, and they provide backup power during outages. This dual use creates a basic tradeoff between earning market value and preserving outage readiness. Coordination across many batteries can help, but a provider cannot treat the fleet as a single virtual battery when each household is promised its own backup protection. We compare standalone control, in which each home is dispatched independently, with pooling, in which homes are coordinated while each battery retains its own state of charge and household-specific backup requirement. Both regimes are implemented as model predictive control problems with 15-minute decision intervals and evaluated using household telemetry together with ERCOT market inputs. The empirical design focuses on the 543 homes in our sample that can support at least one backup product in standalone operation and studies backup caps ranging from 2 to 24 hours. Lower caps relax backup obligations, while the 24-hour cap coincides with assigning each home its own longest feasible backup tier. Pooling remains beneficial in this service-constrained setting, but its value declines smoothly as backup obligations tighten. Standalone firm margin ranges from \$11.06 per home per week at the 2-hour cap to \$10.79 at the 24-hour cap, while pooling benefit falls from \$1.49 to \$1.27 per home per week. Relative to standalone firm margin, pooling is worth about 13.5% at the 2-hour cap and about 11.8% at the 24-hour cap. Coordination therefore still helps after preserving household-level backup guarantees, but its value declines as backup obligations tighten.
Observer-Based Stabilization for Linear Multi-Agent Dynamical Systems Using Generalized Frequency Variables
We address the conditions and design of controllers and observers for homogeneous networks of linear MIMO agents. We develop networked controllers and observers that ensure the stability of both the system state and the estimation error, leveraging the concept of generalized frequency variables. A separation principle for networks is then established, showing that the observer and controller can be designed independently and combined to achieve a stable output feedback. Our results are illustrated via a highly unstable, oscillatory network of locally actuated pendulums on carts. Finally, necessary conditions for controllability and observability -- derived from agent properties and network structure -- are established and discussed.
comment: 23rd IFAC World Congress, Busan, South Korea, Aug. 2026
On Generalized Performance Evaluation and Generalized Controller Synthesis
In this paper, we propose the frameworks of generalized performance evaluation and generalized controller synthesis. To this end, we give a true concurrent process calculus as the model of systems, and present a lattice-valued performance evaluation language as the performance specification of systems. We give a framework of generalized performance evaluation based on the process calculus and the performance evaluation language. We show that the several problems in computer science are special cases of generalized performance evaluation. A generalized performance evaluation algorithm is presented. Furthermore, we present a framework of generalized controller synthesis, which is the inverse problem of generalized performance evaluation. We show several special cases of generalized controller synthesis in computer science, and give an outline of generalized controller synthesis algorithm.
comment: 15 pages
A Benchmark on LLM-Based Power Flow Computation: Do More Structured Prompts Help?
We present a controlled benchmark evaluating three LLMs -- Claude Sonnet 4.5, Gemini 2.5 Pro, and GPT-3.5 Turbo -- across four prompt formats (from concise narrative to structured JSON with explicit iteration trace) on Gauss--Seidel AC power flow computation for a three-bus system. Against 50 test cases with reference solutions computed numerically, Gemini 2.5 Pro with the simplest narrative prompt achieves the lowest mean absolute error (MAE = 0.257 MW/MVar, 54\% of cases within 5\% relative error), while the same model with a JSON-structured prompt raises MAE to 0.789 -- a 3.1$\times$ increase. Adding a worked example degrades accuracy for Gemini but provides a marginal gain for Claude. GPT-3.5 Turbo fails on at least 90\% of cases under all prompt formats. An independent 100-case replication with related prompt-format families confirms the qualitative ordering (Gemini $>$ Claude $>$ GPT-3.5): the best 100-case configuration (Gemini with explicit iteration trace) achieves MAE = 0.402 and 53\% within 5\%, while Claude Sonnet 4.5's near-flat accuracy profile ($\approx$38\% within 5\% across formats) and GPT-3.5's near total ineffectiveness (92--97\% above 20\% error) both replicate. In neither evaluation does any configuration achieve sufficient reliability for use as a direct numerical solver. These findings offer a diagnostic baseline for practitioners and researchers evaluating LLMs for smart-grid decision-support assistance.
Comparing Contract-Based Support Mechanisms for Long-Duration Energy Storage
Long-duration energy storage (LDES) faces significant revenue volatility that impedes investment. This paper evaluates four contract-based support mechanisms using an equilibrium model with risk-averse investors and incomplete risk markets. Applied to a stylized 2035 Great Britain case, we find that all mechanisms can achieve the targeted LDES capacity but differ substantially in cost-effectiveness and risk-aversion sensitivity. Contracts that eliminate revenue volatility achieve the lowest costs but may weaken operational incentives, while contracts that preserve market exposure maintain incentives at higher costs.
comment: Accepted for presentation at the 22nd International Conference on the European Energy Market (EEM26), Trondheim, Norway, 2026
HJ-Gauss: A Monte-Carlo HJ Reachability Scheme
Backward reachable tubes (BRTs), computed via viscous Hamilton-Jacobi (HJ) partial differential equations, provide principled safety certificates for learned controllers and planning algorithms in trustworthy machine learning. However, classical grid-based HJ solvers require $O(M^n)$ memory footprint for $M$ grid points per $n$ state dimension. This renders them impractical for high-dimensional systems. We address this bottleneck with a local PDE linearization that enables a frozen-coefficient sampling scheme for the viscous HJ PDE: a generalized Cole-Hopf-type transformation reduces the nonlinear HJ equation to a sequence of linear heat equations whose solutions admit Gaussian heat-kernel representations. The value function and its spatial gradient are then recovered via roll-outs of Monte Carlo expectations on Gaussian densities, yielding a storage and grid-free algorithm that scales as $N\cdot n$ for $N$ samples. This decoupling of memory from dimensionality enables reachability analysis on problems where grid-based methods are simply impossible. We prove a finite-sample concentration bound $O(N^{-1/2})$ error and conditional linear convergence for the introduced Monte-Carlo Picard iterative scheme. Numerical validation on pursuit-evasion games demonstrates relative $L^2_{\text{rel}}$ errors of $0.03 - 0.20$, with $14-26$ second wall-clock times per 2D slice on a CPU. Crucially, the method scales with validation on up to (but not limited to) $n=45$-dimensional multi-agent games.
Data Center Spatio-Temporal Load Flexibility in Security-Constrained Unit Commitment for Enhanced Grid Efficiency and Reliability
Data center electricity consumption reached 4.4% of U.S. total in 2023 and is projected to grow to 6.7--12% by 2028, imposing increasing stress on transmission networks while representing a largely untapped source of controllable demand-side flexibility. This paper proposes a modular security-constrained unit commitment (SCUC) framework that coordinates flexible data center workloads with system-level scheduling to reduce renewable curtailment, alleviate congestion, and lower operating costs. Three mixed-integer linear programming (MILP) models are formulated: the Data Center Spatial model (DC-S), enabling instantaneous workload redistribution across geographically distributed sites; the Data Center Temporal model (DC-T), permitting each site to shift its deferrable load across time while preserving the daily energy balance; and the Data Center Spatio-Temporal model (DC-ST), jointly activating both mechanisms and spanning the largest feasible operating region. Case studies on a modified IEEE 24-bus reliability test system show that DC-ST eliminates all base-case and post-contingency transmission violations at a flexibility ratio of 40%, and reduces renewable curtailment by up to 84.4% at 30% relative to the inflexible baseline. Sensitivity analysis further reveals that moderate flexibility levels of 20%--30% already capture most of the achievable benefits, supporting practical deployment with limited operational burden on data center operators.
comment: 5 pages, 4 figures, accepted by IEEE IAS Annual Meeting 2026
On Piecewise Quadratic Terminal Costs for MPC
This paper presents a novel approach to synthesize stabilizing termi- nal ingredients for linear model predictive control (MPC) schemes, with the aim of increasing the region of attraction while reducing suboptimal- ity with respect to the solution of the infinite-horizon optimal control problem. It is based on the construction of a novel terminal region using methods from the field of configuration-constrained polytopic computing, along with a terminal cost that is exactly equal to the infinite-horizon linear-quadratic regulator cost in a nontrivial neighborhood of the steady- state. The practical performance of the controller is illustrated through various case studies, and comparisons with state-of-the-art approaches are presented.
comment: 21 pages, 4 figures
A characteristic function framework for chance constraint programming in stochastic model predictive control
The computation of chance constraints in stochastic model predictive control is often numerically challenging due to the non-Gaussian nature of the disturbances. To overcome this problem, we propose an optimization computational framework applicable to non-Gaussian disturbances. This framework employs a numerical inversion method, utilizing the characteristic function of the disturbance distribution to compute the probability in the chance constraint as well as its gradient. To improve efficiency, it vectorizes integral points and reuses intermediate computations in Gauss-Kronrod quadrature. The framework is implemented within the YALMIP toolbox to perform chance constraint calculations for arbitrary non-Gaussian disturbances, applicable to both single-component distributions and mixture models. It allows the user to simply specify a distribution type and its parameters for the disturbance and directly compute the probability and its gradient to solve the optimization problem. The method is validated through a numerical example of a stochastic model predictive control application.
comment: 6 pages, 1 figure. Accepted by IFAC WC 2026
Advanced PID architectures for tracking changing active constraints
Advanced regulatory control (ARC), also known as advanced PID architectures, is a simple and robust way of controlling processes with changing and possibly conflicting constraints, where it previously was believed - at least in academia - that model-based solutions, such as MPC, were the only effective solution. To illustrate this, ARC is applied in two case studies. The first is a gas-liquid separation process, in which selectors and split-parallel control are combined to achieve bidirectional inventory control in which the throughput manipulator moves automatically to the most optimal position. The second case study is on keeping acceptable air quality (CO2-level) and temperature in a room (in this case, a barn for cows). The CO2 and temperature constraints can be conflicting, leading to a hierarchical switching network of PID controllers. Note: this is an extended version (with simulations) of paper at IFAC World Congress, August 2026, Korea.
Electric Vehicle Charging Profile Forecasting Using Hybrid Models
Electric Vehicle (EV) fast charging stations require forecasting techniques both at the single charger level and aggregated level. While for the latter several models exist, forecasting individual EV charging profiles is still underexplored in literature. However, such methods may be potentially used by battery-aware scheduling, leading to a more granular update of the charging station aggregated forecast and provide a more accurate estimation of EVs departure times. Nonetheless, the variable extent of available information in time and in different settings could jeopardize these benefits. For this reason, we propose a hybrid and lightweight method to estimate the EV charging profile before and during the charging process. Besides evaluating this method on multiple EVs from a public dataset, we also assess the impact of different level of information in the time transposition of the charging profile.
REACT: Environment-Adaptive Architecture for Continuous Formation Navigation of Wheeled Mobile Robots
Formation control of wheeled mobile robots (WMRs) has been extensively studied due to its broad applications in fields such as logistics transportation, environmental monitoring, and search and rescue. However, most existing works mainly focus on tracking predefined formations, which limits their adaptability to complex real-world environments. To address this, we propose REACT (Real-time Environment-Adaptive architecture for Continuous formation navigaTion), a hierarchical architecture integrating centralized formation generation and distributed formation maintenance. Specifically, our upper layer generates new environment-adaptive formations when necessary and uses our proposed TCF-R2T (Trajectory-Conflict-Free Robot-to-Target assignment) algorithm to compute conflict-free WMR-to-target assignments in polynomial time, enabling timely formation transitions without trajectory conflicts. At the lower layer, each WMR executes our developed JSTP (Joint Spatio-Temporal trajectory Planning) method to maintain the generated formation by simultaneously optimizing spatial positions and temporal durations, thereby enhancing coordination among WMRs and enabling continuous navigation in obstacle-rich environments and dynamic-obstacle scenarios. Both simulation and real-world experiments validate the effectiveness and practical applicability of REACT. Experimental videos are available on our project website: https://dongjh20.github.io/REACT-website.
Learning the dynamics of nonlinear systems with regional stability guarantees through linear matrix inequality constraints
This paper presents a method that learns a regionally stable recurrent neural network model from a set of input-output data generated by an unknown dynamical system. Relying on generalized sector conditions on the deadzone activation function, we first derive sufficient conditions that guarantee forward invariance on a compact set of the state space for any inputs from a given set. Such regional properties lead to less conservative conditions compared to variants that offer a global form of stability, and are in line with the system data that is only observed regionally. Our learning method derives conditions for regional stability using a barrier function approach, leading to models equipped with a certificate of regional stability in a subset of the state space and for a given input set. We illustrate our theoretical result with a numerical example and compare it to methods that impose a global form of stability, which fail to identify the system, and with a method that imposes no stability constraints at all, which does not guarantee a stable behavior within any state or input set.
comment: This work has been accepted to IFAC for publication under a Creative Commons Licence CC-BY-NC-ND
Multi-Criteria Integer Programming Model for Route Planning in an Off-Road Combat Environment
Route planning for military vehicles is a complex decision-making problem due to the simultaneous influence of environmental trafficability and tactical risks. This paper presents an optimization model that integrates soil trafficability and risk of enemy engagement into a decision-support model for planning activities in open terrain. Although a military application is the focus of this paper, other use cases include wildfire response, agricultural operations, and off-road vehicle recreation. The routing problem is formulated as a minimum cost mixed-integer linear program over a discretized representation of the operational environment. Each node represents a location and is connected by arcs to adjacent nodes whose traversal incurs a cost derived from a composite risk function that accounts for soil strength and the proximity to known enemy activity and prior convoy routes. Environmental inputs required for evaluating soil strength are obtained by integrating external models, which estimate spatial variations in the rating cone index (RCI) across the terrain. The model is evaluated through a case study conducted at a location in northern Colorado using fine-resolution environmental data and simulated tactical conditions. Scenario analyses demonstrate how variations in risk weighting, vehicle mobility characteristics, and operational conditions influence route geometry and mission risk. The objective function values achieved varied by five orders of magnitude based on the coefficients assigned to the terms in the cost function and the vehicle properties of the scenario. The results illustrate the capability of the proposed framework to quantify trade-offs between environmental mobility constraints and tactical considerations.
Continuous Aggregative LQG Games with Delayed Discrete Observations
Mean field game equilibria are predicated on the assumption of immediate pairwise interactions within a population of homogeneous agents with asymptotically vanishing influence as population size increases. However, in many real-world cases, agents receive population-level information with a delay. In this paper, we characterize agent best responses under an information exchange structure whereby agents observe the empirical mean state only at discrete time instants with some delay. Sufficient conditions are presented for the existence of a Nash equilibrium within a finite population of agents, and the cost increase due to delayed discrete empirical mean observations relative to zero-latency discrete observations and continuous global-state observations is also evaluated.
comment: 11 pages, 3 figures. Submitted to Automatica
Reachability-Augmented Dual Dynamic Programming for Optimal Path Parameterization
Optimal path parameterization (OPP) is a fundamental problem for planning trajectories along a prescribed geometric path under kinodynamic constraints and task-dependent objectives. While TOPP minimizes traversal time, its saturating states and controls may induce vibration and tracking errors, which can be mitigated by introducing smoothness objectives. However, a key capability gap remains in OPP: feasibility guarantees, general-objective optimality certificates, and computational efficiency are difficult to achieve simultaneously in a unified framework, especially for third-order OPP (OPP3) with non-convex constraints. This paper proposes reachability-augmented dual dynamic programming (RDDP), a state-grid-free and objective-aware DP framework for OPP. The key idea is to replace the relatively complete recourse assumption used in classical dual DP (DDP) with OPP-specific backward reachable sets, and then generate both value-function cuts and trial trajectories only inside these reachable sets. For convex and non-convex OPP, we prove global optimality and Karush-Kuhn-Tucker convergence of RDDP under OPP-specific conditions, respectively. Efficient instantiations are developed for OPP2 and OPP3. Experiments show that RDDP achieves objective values comparable to convex-optimization baselines while reducing computation time by 28.6 times for OPP2 and 5.8 times for OPP3. RDDP also achieves faster convergence than grid-based DP. Compared with reachability-analysis methods, RDDP retains the reachability mechanism while replacing local maximum-control propagation with value-function-guided control selection, thereby enabling objectives beyond traversal time. In summary, RDDP addresses a key capability gap in OPP by unifying certifiable general-objective optimization, reachability-based feasibility preservation, and online-compatible low-dimensional DP computation in a single OPP framework.
Dynamic Gradient-Based Calibration for Robust and Accurate Traffic Macrosimulation
Robust and accurate calibration of macroscopic traffic flow models such as METANET is critical for reliable prediction and effective control. While gradient-based methods are desirable for high-dimensional parameter spaces, their application to real-world traffic scenarios is hindered by highly nonconvex optimization landscapes. Consequently, standard static calibration frequently yields parameter sets that produce unstable, unrealistic traffic dynamics, undermining confidence in the estimated parameters and compromising the simulation's utility for counterfactual scenario testing. To address this, we propose a dynamic, rolling-horizon calibration framework. By reformulating static one-time estimation as a closed-loop control problem, parameters better maintain stability and accuracy in the presence of measurement noise. Using real-world data from the I-24 MOTION testbed, this work empirically characterizes the instability of standard methods. It then shows that the proposed approach simultaneously enhances robustness to perturbations and achieves a 48% improvement in predictive accuracy over conventional static calibration.
Probabilistic Recursively Feasible Motion Planning Under Uncertain Environments
Safe motion planning in uncertain, time-varying environments is challenging because the safe region can change unpredictably across planning steps, often causing a loss of recursive feasibility. In this work, we present a Probabilistic Recursively Feasible Model Predictive Control (PRF-MPC) framework that guarantees recursive feasibility with a specified probability. We introduce properties that an ideal predictor should satisfy to ensure distributional consistency, and use these properties to derive closed-form expressions for the means and covariances of trajectories predicted at future time steps. Building on this analysis, we construct safety constraints that ensure, with high probability, that the current safe set is contained within the safe sets at future time steps, thereby probabilistically guaranteeing recursive feasibility. Simulation results on a lane-change scenario demonstrate that the proposed method significantly improves recursive feasibility.
comment: 7 pages, 4 figures
Adversarial Stress Testing of SPARK Humanoid Safety Filters
Humanoid robots are difficult to deploy safely because they have high-dimensional bodies, many collision constraints, and must operate near people and obstacles. Safety filters help by modifying a nominal control action when it may violate collision-avoidance constraints. Still, nominal benchmark scores do not fully show how these filters behave in harder environments. In this work, we study the robustness of SPARK humanoid safety filters through replication and stress testing. We replicate the SPARK benchmark case G1SportMode_D1_WG_SO_v1 in MuJoCo and evaluate RSSA, RSSS, SSA, CBF, PFM, and SMA under controlled random seeds. We also built a post-processing pipeline that converts raw SPARK logs into goal-tracking, minimum-distance, and collision-step metrics. Our results show that some methods track the goal more closely, while others reduce collision steps more effectively. The stress tests further indicate that safety behavior can change under obstacle crowding, noisy distance estimates, and delayed obstacle information. These findings suggest that humanoid autonomy should be evaluated beyond nominal performance, using metrics that expose failure modes before deployment.
comment: 5 pages, 7 figures, 1 table. Code available at https://github.com/ghoshsaurav/spark-adversarial-safety
Scalable iterative Gramian synthesis for control-affine systems
This article presents a scalable implementation of nonlinear Gramian-based control synthesis for control-affine systems, including a minimum energy control construction. These synthesis advances are achieved by addressing key computational bottlenecks inherent to iterative synthesis map formulations, yielding a computational scheme that exhibits rapid convergence and high-precision. The efficacy of this synthesis framework is demonstrated across five canonical nonlinear control systems and 100-dimensional recurrent neural network models, including underactuated systems. Empirical scaling results further indicate that convergence is primarily governed by intrinsic system properties, such as nonlinearity and controllability, rather than by state-space dimensionality. This work provides a practical, scalable computational pathway for translating rigorous nonlinear synthesis theory into high-dimensional control applications.
The Fragility of Learning LQG Controllers
Learning methods are increasingly used to synthesize controllers from data, yet existing sample-complexity characterizations for continuous control are sharp only in the fully observed setting. This paper studies the partially observed case by deriving information-theoretic lower bounds for learning Linear Quadratic Gaussian (LQG) controllers from offline trajectories generated by a (linear) exploration policy. We prove an $\varepsilon$-local minimax excess-cost lower bound that applies to any algorithm mapping the offline dataset to a stabilizing linear controller. The bound is expressed in terms of the Hessian of the LQG cost with respect to model parameters and the inverse Fisher Information induced by the exploration policy. We further provide system-theoretic characterizations of these objects, enabling transparent construction of hard instances. Instantiating the bound on classical fragile robust-control examples, including variants of the Doyle LQG fragility counterexample and non-minimum-phase systems, demonstrates when fragile robust control problems translate into high sample complexity for learning-enabled control. These results suggest the asymptotic optimality of certainty-equivalent synthesis and motivate the importance of both task-directed experiment design and system co-design for sample-efficient learning in partially observed control.
Numerically Reliable Brunovsky Transformations
The Brunovsky canonical form provides sparse structural representations that are beneficial for computational optimal control, yet existing methods fail to compute it reliably. We propose a technique that produces Brunovsky transformations with substantially lower construction errors and improved conditioning. A controllable linear system is first reduced to the staircase form via an orthogonal similarity transformation. We then derive a simple linear parametrization of the transformations yielding the unique Brunovsky form. Numerical stability is further enhanced by applying a deadbeat gain before computing system matrix powers and by optimizing the linear parameters to minimize condition numbers.
comment: Accepted by the IFAC World Congress 2026 as a regular paper. Compared with the official final version (six pages), this version has more remarks and examples
Learning-based data-enabled moving horizon estimation with application to membrane-based biological wastewater treatment process
In this paper, we propose a data-enabled moving horizon estimation (MHE) approach for a class of nonlinear systems without explicit modeling, by leveraging Koopman operator theory and Willems fundamental lemma. Specifically, the nonlinear system is lifted to a linear parameter-varying Koopman surrogate, in which the lifting functions and scheduling mappings are learned directly from data using neural networks. Willems fundamental lemma is then employed to construct a trajectory-based representation of the Koopman surrogate, which bypasses the explicit identification of the matrices of the Koopman surrogate. Based on this representation, we formulate a convex data-enabled MHE design, which provides real-time estimates of the Koopman surrogate states, from which the states of the original nonlinear system are reconstructed. Sufficient conditions are derived to ensure the stability of the estimation error. The effectiveness of the proposed method is illustrated using a simulated membrane-based biological wastewater treatment process.
Solution Sets for Inverse Infinite-Horizon Linear-Quadratic Descriptor Differential Games
In this letter, we study a model-based inverse problem for infinite-horizon linear-quadratic differential games with descriptor dynamics. Given an observed feedback strategy profile, we seek to identify all cost functions that rationalize it as a feedback Nash equilibrium; this collection is referred to as the solution set. We characterize the solution set, show that it is rectangular and convex, and provide an algorithm for computing an admissible realization whenever it is nonempty. We also show that descriptor dynamics modify the geometry of the solution set and may reduce identifiability. Finally, we illustrate the results with numerical examples.
Guided Reinforcement Learning for Omnidirectional 3D Jumping in Quadruped Robots
Jumping poses a significant challenge for quadruped robots, despite being crucial for many operational scenarios. While optimisation methods exist for controlling such motions, they are often time-consuming and demand extensive knowledge of robot and terrain parameters, making them less robust in real-world scenarios. Reinforcement learning (RL) is emerging as a viable alternative, yet conventional end-to-end approaches lack efficiency in terms of sample complexity, requiring extensive training in simulations, and predictability of the final motion, which makes it difficult to certify the safety of the final motion. To overcome these limitations, this paper introduces a novel guided reinforcement learning approach that leverages physical intuition for efficient and explainable jumping, by combining Bézier curves with a Uniformly Accelerated Rectilinear Motion (UARM) model. Extensive simulation and experimental results clearly demonstrate the advantages of our approach over existing alternatives.
Online Learning-Based Control with Guaranteed Error Bounds for a Class of Nonlinear Systems
In this paper, we present a learning-based control for a class of nonlinear systems that guarantees exponential stability as well as bounded output errors. The control is based on the Gaussian Process Submodel Online Learning (GPSOL) algorithm and the Disturbance Error Rate Limiting (DERL) algorithm, both of which were developed in previous work. The GPSOL algorithm provides a method to learn Gaussian Process (GP) models for subsystems online, whereas the DERL algorithm allows to limit the rate of the prediction error of these GP models. The focus of this paper is the utilization of the GP model within an adaptive controller and the derivation of corresponding stability conditions and system peak-to-peak gains by means of linear matrix inequalities (LMIs). These peak-to-peak gains are then used to prescribe a desired prediction error rate for the DERL algorithm to achieve user-defined output error bounds. The gains and the related bounds were successfully verified using a simulation model. Furthermore, results form a successful experimental validation of the bounds and the overall control structure on a pneumatic test rig are presented. While the control scheme and error bounds proposed in this paper are limited to first-order single-input-single-output systems, an extension to certain classes of higher-order and multiple-input-multiple-output systems is expected to be forthcoming.
comment: Accepted at IFAC 2026 (23rd IFAC World Congress, Busan, Korea)
Data-Driven Safety Certificates of Infinite Networks with Unknown Models and Interconnection Topologies
Infinite networks are complex interconnected systems comprising a countably infinite number of subsystems, for which no fixed upper bound on the number of participating subsystems is specified a priori since it may vary over time as agents join or leave (e.g., vehicles in traffic). In such scenarios, the presence of infinitely many subsystems within the network renders the existing analysis frameworks tailored for finite networks inapplicable to infinite ones. This paper is concerned with offering a data-driven approach, within a compositional framework, for the safety certification of infinite networks with both unknown mathematical models and unknown interconnection topologies. Given the immense computational complexity stemming from the extensive dimension of infinite networks, our approach capitalizes on the joint dissipativity-type properties of subsystems, characterized by storage certificates. We introduce innovative compositional data-driven conditions to construct a barrier certificate for the infinite network leveraging storage certificates of its unknown subsystems derived from data, while offering correctness guarantees for network safety. We demonstrate that our compositional data-driven reasoning eliminates the requirement for checking the traditional dissipativity condition, which typically mandates precise knowledge of the interconnection topology. We illustrate our data-driven results on two physical infinite networks with unknown models and interconnection topologies.
The Potential Welfare Gains from Curtailment Trading Under Non-Firm Interconnection
Rapid growth of large loads led by data centers is straining grid capacity. These loads increasingly accept curtailment risk through non-firm interconnection agreements to gain faster grid access, expanding the pool of consumers subject to mandatory disconnection during supply shortfalls. Yet, blunt rules assign curtailment without reference to the wide variation in the value consumers place on avoiding curtailment, often captured by the value of lost load (VOLL). This paper introduces the network-constrained Curtailment Credit Market (CCM), a mechanism in which agents submit bids that determine bilateral credit flows, subject to transmission network constraints. We prove that the bilateral credit flow representation can reach every curtailment allocation available to an omniscient central planner (feasible-set equivalence), so the bilateral flow structure introduces no loss of allocative capability. Under truthful bidding, the CCM achieves the planner's total value of served load, matching the planner's allocative benchmark when bids reflect true interruption costs. The CCM is formulated as a bilevel clearing problem that admits an exact single-level mixed-integer linear program (MILP), solved in 0.009 to 0.034 seconds. Numerical experiments on three test systems validate the mechanism at increasing scale and complexity: a 3-bus toy network that isolates the core trading logic, the IEEE 24-bus reliability test system as a standard benchmark, and a reduced New York (NY) grid that captures coordination across NY load zones. Our simulations show that the CCM increases the total value of served load by 1.24 to 1.83 times relative to pro-rata curtailment. On the three test systems examined here, no participant is worse off under incentive-compatible benchmark payments than under the administrative baseline.
Encirclement Guaranteed Finite-Time Capture against Unknown Evader Strategies
We consider a pursuit-evasion scenario involving a group of pursuers and a single evader in a two-dimensional unbounded environment. The pursuers aim to capture the evader in finite time while ensuring the evader remains enclosed within the convex hull of their positions until capture, without knowledge of the evader's heading angle. Prior works have addressed the problem of encirclement and capture separately in different contexts. In this paper, we present a class of strategies for the pursuers that guarantee capture in finite time while maintaining encirclement, irrespective of the evader's strategy. Furthermore, we derive an upper bound on the time to capture. Numerical results highlight the effectiveness of the proposed framework against a range of evader strategies.
Characterizing all locally exponentially stabilizing controllers as a linear feedback plus learnable nonlinear Youla dynamics
We derive a state-space characterization of all dynamic state-feedback controllers that make an equilibrium of a nonlinear input-affine continuous-time system locally exponentially stable. Specifically, any controller obtained as the sum of a linear state-feedback $u=Kx$, with $K$ stabilizing the linearized system, and the output of internal locally exponentially stable controller dynamics is itself locally exponentially stabilizing. Conversely, every dynamic state-feedback controller that locally exponentially stabilizes the equilibrium admits such a decomposition. The result can be viewed as a state-space nonlinear Youla-type parametrization specialized to local, rather than global, and exponential, rather than asymptotic, closed-loop stability. The residual locally exponentially stable controller dynamics can be implemented with stable recurrent neural networks and trained as neural ODEs to achieve high closed-loop performance in nonlinear control tasks.
Neural Network-based Co-design of Output-Feedback Control Barrier Function and Observer with Input Constraints
Control Barrier Functions (CBFs) provide a powerful framework for ensuring safety in dynamical systems. However, their application typically relies on full state information, which is often violated in real-world due to the availability of partial state information. In this work, we propose a neural network-based framework for the co-design of a safety controller, observer, and CBF for partially observed continuous-time systems with input constraints. By formulating barrier conditions over an augmented state space, our approach ensures safety without requiring bounded estimation errors or handcrafted barrier functions. All components are jointly trained by formulating appropriate loss functions, and we introduce a validity condition to provide formal safety guarantees beyond the training data. Finally, we demonstrate the effectiveness of the proposed approach through several case studies.
Object Tracking Incorporating Transfer Learning into Unscented and Cubature Kalman Filters
We present a novel filtering algorithm that employs Bayesian transfer learning to address the challenges posed by mismatched intensity of the noise in a pair of sensors, each of which tracks an object using a nonlinear dynamic system model. In this setting, the primary sensor experiences a higher noise intensity in tracking the object than the source sensor. To improve the estimation accuracy of the primary sensor, we propose a framework that integrates Bayesian transfer learning into an Unscented Kalman Filter (UKF) and a Cubature Kalman Filter (CKF). In this approach, the parameters of the predicted observations in the source sensor are transferred to the primary sensor and used as an additional prior in the filtering process. Our simulation results show that the transfer learning approach significantly outperforms the conventional isolated UKF and CKF. Comparisons to a form of measurement vector fusion are also presented.
comment: 22 pages, 7 figures, 2 tables
PLATO Hand: Shaping Contact Behavior with Fingernails for Precise Manipulation
We present the PLATO Hand, a dexterous robotic hand with a hybrid fingertip that combines a rigid fingernail, embedded distal phalanx, and compliant pulp to shape contact behavior during manipulation. \rrev{By mechanically organizing how contact is initiated, supported, and transmitted at the fingertip, this structure creates stable and task-relevant contact conditions across diverse object geometries and grasp orientations.} We develop a strain-energy-based bending--indentation model to guide the fingertip design and to explain how material stiffness and contact geometry govern deformation partitioning within the fingertip. \rrev{Experiments show improved pinch stability, improved fingernail-mediated dorsal-contact force transmission and proprioceptive observability}, and successful execution of edge-sensitive manipulation tasks, including paper singulation, card picking, and orange peeling. These results show that coupling a mechanically structured contact interface with a force-motion-transparent finger mechanism provides a principled approach to precise manipulation. Our project page is at: https://platohand.github.io
Multi-Source Human-in-the-Loop Digital Twin Testbed for Connected and Autonomous Vehicles in Mixed Traffic Flow
In the emerging mixed traffic environments, Connected and Autonomous Vehicles (CAVs) have to interact with surrounding human-driven vehicles (HDVs). This paper introduces MSH-MCCT (Multi-Source Human-in-the-Loop Mixed Cloud Control Testbed), a novel CAV testbed that captures complex interactions between various CAVs and HDVs. Utilizing the Mixed Digital Twin concept, which combines Mixed Reality with Digital Twin, MSH-MCCT integrates physical, virtual, and mixed platforms, along with multi-source control inputs. Bridged by the mixed platform, MSH-MCCT allows human drivers and CAV algorithms to operate both physical and virtual vehicles within multiple fields of view. Particularly, this testbed facilitates the coexistence and real-time interaction of physical and virtual CAVs \& HDVs, significantly enhancing the experimental flexibility and scalability. Experiments on vehicle platooning in mixed traffic showcase the potential of MSH-MCCT to conduct CAV testing with multi-source real human drivers in the loop through driving simulators of diverse fidelity. The videos for the experiments are available at our project website: https://dongjh20.github.io/MSH-MCCT.
Ro-To-Go! Robust Reactive Control with Signal Temporal Logic
Signal Temporal Logic (STL) robustness is a common objective for optimal robot control, but its dependence on history limits the robot's decision-making capabilities when used in Model Predictive Control (MPC) approaches. In this work, we introduce Signal Temporal Logic robustness-to-go (Ro-To-Go), a new quantitative semantics for the logic that isolates the contributions of suffix trajectories. We prove its relationship to formula progression for Metric Temporal Logic, and show that the robustness-to-go depends only on the suffix trajectory and progressed formula. We implement robustness-to-go as the objective in an MPC algorithm and use formula progression to efficiently evaluate it online. We test the algorithm in simulation and compare it to MPC using traditional STL robustness. Our experiments show that using robustness-to-go results in a higher success rate.
Geometry-Aware Decentralized Sinkhorn for Wasserstein Barycenters
Distributed systems require fusing heterogeneous local probability distributions into a global summary over sparse and unreliable communication networks. Traditional consensus algorithms, which average distributions in Euclidean space, ignore their inherent geometric structure, leading to misleading results. Wasserstein barycenters offer a geometry-aware alternative by minimizing optimal transport costs, but their entropic approximations via the Sinkhorn algorithm typically require centralized coordination. This paper proposes a fully decentralized Sinkhorn algorithm that reformulates the centralized geometric mean as an arithmetic average in the log-domain, enabling approximation through local gossip protocols. Agents exchange log-messages with neighbors, interleaving consensus phases with local updates to mimic centralized iterations without a coordinator. To optimize bandwidth, we integrate event-triggered transmissions and b-bit quantization, providing tunable trade-offs between accuracy and communication while accommodating asynchrony and packet loss. Under mild assumptions, we prove convergence to a neighborhood of the centralized entropic barycenter, with bias linearly dependent on consensus tolerance, trigger threshold, and quantization error. Complexity scales near-linearly with network size. Simulations confirm near-centralized accuracy with significantly fewer messages, across various topologies and conditions.
Distributionally Robust Safety Under Arbitrary Uncertainties: A Safety Filtering Approach
In this work, we study how to ensure probabilistic safety for nonlinear systems under distributional ambiguity. Our approach builds on a backup-based safety filtering framework that switches between a high-performance nominal policy and a certified backup policy to ensure safety. To handle arbitrary uncertainties from ambiguous distributions, i.e., where the distribution is not of specific structure and the true distribution is unknown, we adopt a distributionally robust (DR) formulation using Wasserstein ambiguity sets. Rather than solving a high-dimensional DR trajectory optimization problem online, we exploit the structure of backup-based safety filtering to reduce safety certification to a one-dimensional search over the switching time between nominal and backup policies. We then develop a sampling-based certification procedure with finite-sample guarantees, where empirical failure probabilities are compared against a Wasserstein-inflated threshold. We validate our method through simulations across three systems, from a Dubins vehicle to a high-speed racing car and a fighter jet, demonstrating the broad applicability and computational efficiency.
comment: 10 pages, 4 figures, submitted to IEEE Robotics and Automation Letters (RA-L); Project Page: https://dcherenson.github.io/drs-gk
Solving Reach- and Stabilize-Avoid Problems Using Discounted Reachability
In this article, we consider the infinite-horizon reach-avoid (RA) and stabilize-avoid (SA) zero-sum game problems for general nonlinear continuous-time systems, where the goal is to find the set of states that can be controlled to reach or stabilize to a target set, without violating constraints even under the worst-case disturbance. Based on the Hamilton-Jacobi reachability method, we address the RA problem by designing a new Lipschitz continuous RA value function, whose zero sublevel set exactly characterizes the RA set. We establish that the associated Bellman backup operator is contractive and that the RA value function is the unique viscosity solution of a Hamilton-Jacobi variational inequality. Finally, we develop a two-step framework for the SA problem by integrating our RA strategies with a recently proposed Robust Control Lyapunov-Value Function, thereby ensuring both target reachability and long-term stability. We numerically verify our RA and SA frameworks on a 3D Dubins car system to demonstrate the efficacy of the proposed approach.
comment: 16 pages, 6 figures, 1 table. Accepted to IEEE Transactions on Automatic Control
Data-driven Acceleration of MPC with Guarantees
Model Predictive Control (MPC) is a powerful framework for optimal control but can be too slow for low-latency applications. We present a data-driven framework to accelerate MPC by replacing online optimization with a nonparametric policy constructed from offline MPC solutions. Our policy is greedy with respect to a constructed upper bound on the optimal cost-to-go, and can be implemented as a nonparametric lookup rule that is orders of magnitude faster than solving MPC online. Our analysis shows that under sufficient coverage conditions of the offline data, the policy is recursively feasible and admits provable, bounded optimality gap. These conditions establish an explicit trade-off between the amount of data collected and the tightness of the bounds. New solutions can be incorporated straightforwardly without the need for retraining, enabling continual improvement. Our experiments show that this policy is between 100 and 1000 times faster than standard MPC with only a modest hit to optimality, showing potential for real-time control tasks.
Bounds on Deep Neural Network Partial Derivatives with Respect to Parameters
Deep neural networks (DNNs) have emerged as a powerful tool with a growing body of literature exploring Lyapunov-based approaches for real-time system identification and control. These methods depend on establishing bounds for the second partial derivatives of DNNs with respect to their parameters, a requirement often assumed but rarely addressed explicitly. This paper provides rigorous mathematical formulations of polynomial bounds on both the first and second partial derivatives of DNNs with respect to their parameters. We present lemmas that characterize these bounds for fully-connected DNNs, while accommodating various classes of activation function including sigmoidal and ReLU-like functions. Our analysis yields closed-form expressions that enable precise stability guarantees for Lyapunov-based deep neural networks (Lb-DNNs). Furthermore, we extend our results to bound the higher-order terms in first-order Taylor approximations of DNNs, providing important tools for convergence analysis in gradient-based learning algorithms. The developed theoretical framework develops explicit, computable expressions, for previously assumed bounds, thereby strengthening the mathematical foundation of neural network applications in safety-critical control systems.
comment: 11 pages
Structural Sign Herdability in Temporally Switching Networks with Fixed Topology
This paper investigates structural herdability in a special class of temporally switching networks with fixed topology. We show that when the underlying digraph remains unchanged across all snapshots, the network attains complete SS herdability even in the presence of signed or layer dilations, a condition not applicable to static networks. This reveals a fundamental structural advantage of temporal dynamics and highlights a novel mechanism through which switching can overcome classical obstructions to herdability. To validate these conclusions, we utilize a more relaxed form of sign matching within each snapshot of the temporal network. Furthermore, we show that when all snapshots share the same underlying topology, the temporally switching network achieves $\mathcal{SS}$ herdability within just two snapshots, which is fewer than the number required for structural controllability. Several examples are included to demonstrate these results.
Robotics
PRIME: Physically-consistent Robotic Inertial and Motion Estimation for Legged and Humanoid Robots
Humanoid and legged robots interact with the environment through intermittent contacts, making accurate motion estimation fundamentally dependent on reasoning about contact dynamics. However, standard sensing pipelines-whether based on onboard proprioception with Extended Kalman Filters (EKFs) or external motion capture systems-recover only kinematics, while contact forces, contact timing, and inertial parameters remain unobserved. As a result, purely kinematic reconstructions often violate rigid-body dynamics, particularly during contact-rich motions. To enable accurate motion estimation from onboard kinematics in real-world deployment, we propose PRIME (Physically-consistent Robotic Inertial and Motion Estimation), a Maximum A Posteriori (MAP) formulation that refines measured kinematics and actuator commands into a dynamically consistent trajectory while jointly estimating frictional contact forces and physically consistent inertial parameters. Our approach incorporates differentiable contact dynamics with smoothed complementarity constraints and an Anitescu-style friction model, yielding a smooth optimization problem that remains tractable across versatile contact transitions. We evaluate PRIME on contact-rich locomotion with quadrupedal robots and the Unitree G1 humanoid, demonstrating improved trajectory consistency and accurate inertial parameter identification. Beyond improving state estimation and feedback control with calibrated inertial parameters, PRIME produces force- and contact-annotated motion reconstructions from real robots in deployment, which can be used to provide high-quality data for downstream learning applications, including large-scale behavior modeling and robot foundation models.
comment: Robotics: Science and Systems 2026
Mono-Hydra++: Real-Time Monocular Scene Graph Construction with Multi-Task Learning for 3D Indoor Mapping SP
Autonomous agile robots need more than metric geometry: they must understand objects, rooms, places, and spatial relations for search, inspection, exploration, and human robot interaction. Conventional metric maps support localization and collision avoidance, but do not provide this semantic and relational structure. 3D scene graphs address this gap by connecting geometry with object level and room level understanding. Building such representations on agile platforms remains difficult because aerial and lightweight robots operate under strict payload, power, and compute limits, making RGB-D cameras and LiDAR sensors impractical for many onboard settings. We present Mono-Hydra++, a real time monocular RGB plus IMU pipeline for indoor metric semantic mapping and hierarchical 3D scene graph construction. The system combines M2H-MX, a DINOv3 based multi-task model for depth and semantics, with a deep feature visual inertial odometry front end, sparse predicted depth constraints in the VIO derived pose graph, semantic masking for dynamic regions, and pose aware temporal alignment before volumetric fusion in the Mono-Hydra backend. On the Go-SLAM ScanNet evaluation subset, Mono-Hydra++ achieves 1.6% lower average trajectory error than the strongest RGB-D baseline in our comparison, while using only monocular RGB plus IMU input. On calibrated 7-Scenes, it improves average ATE by 29.8% over the strongest competing calibrated baseline. We further validate Mono-Hydra++ in a real ITC building deployment using RealSense RGB plus IMU and demonstrate embedded feasibility by deploying the ONNX/TensorRT FP16 M2H-MX-L perception model at 25.53 FPS on a Jetson Orin NX 16GB. These results show that Mono-Hydra++ can provide real time metric semantic mapping and scene graph construction for resource constrained robotic platforms without relying on active depth sensors.
comment: Submitted to ISPRS Journal of Photogrammetry and Remote Sensing. 50 pages, figures and tables included. Code: https://github.com/BavanthaU/mono-hydra-pp.git
From a Single Demonstration to a General Policy for Contact-Rich Manipulation
We present a Learning from Demonstration (LfD) framework that achieves one-shot generalization in multi-stage, contact-rich manipulation tasks. Central to our approach is the utilization of environmental constraints as the inductive bias. By representing a demonstration as a sequence of behaviors that exploit environmental constraints, the robot separates task-general structure -- the constraint types and their transitions -- from instance-specific details such as exact demonstration trajectories, poses, and local geometries. Our four-stage pipeline builds a complete policy on this representation: the robot first abstracts a single demonstration into environmental-constraint primitives, then disambiguates them through self-guided exploration, next assimilates targeted human corrections that handle out-of-distribution variations, and finally recovers the abstracted-away details online through compliant interaction. Because the resulting policy follows constraints rather than mimics trajectories, it generalizes across object poses, local geometries, and unmodeled contact dynamics. We validate our approach on seven real-world multi-stage contact-rich manipulation tasks and achieve over 90% success. These extensive experimental results establish environmental constraints as fundamental building blocks for efficient generalization in learning from demonstration.
comment: 21 pages, 22 figures, 7 tables
Motion-Uncertainty-Aware Next-Best-View Planning for Moving Object Reconstruction
Active 3D reconstruction of moving objects requires selecting informative viewpoints while accounting for object motion uncertainty during the decision-to-execution delay. Existing methods address only parts of this problem: next-best-view (NBV) planners for object reconstruction typically optimize surface coverage but assume static objects, while motion-aware active perception for moving targets accounts for target motion but prioritizes tracking or visibility over reconstruction coverage. This work presents a motion-uncertainty-aware NBV framework for reconstructing an unknown rigid object undergoing planar motion, using noisy planar position measurements of the object and depth observations from a mobile robot. The key idea is to evaluate each candidate viewpoint by its expected observation quality over plausible future object states induced by motion and measurement uncertainty, rather than at a single predicted object pose. To obtain this predictive belief, a fixed-lag Gaussian Process smoother estimates and predicts the object state from noisy position measurements. The resulting belief is used to generate candidate viewpoints around the predicted object location, filter them by reachability, and estimate their expected coverage-driven scores. Simulation and real-world experiments demonstrate improved reconstruction completeness over non-predictive NBV and prediction-only tracking methods, bridging coverage-driven active reconstruction and prediction-driven tracking.
comment: This paper is accepted for publication for Robotics: Science and Systems (RSS) 2026
Visual Sculpting: Visually-Aligned Planning Representations for Long-Horizon Robot Clay Sculpting
Clay sculpting is a nuanced, artistic task involving dexterous manipulation with long-horizon planning to achieve high-level goals. As a robotics problem, we formulate clay sculpting as a shape-to-shape matching challenge. Prior deformable object manipulation work either requires retraining a policy per goal or relies on dynamics models which represent state as sparse point clouds which do not capture important clay features, such as textures, well. We present a method for modeling the dynamics of deformable materials and planning for robotic sculpting in a representation that is visually-aligned, capturing lighting and texture features. With three different deformable materials and various end-effectors, we demonstrate that our dynamics model is comparable in performance to the state-of-the-art with the added benefit of being compatible with visual planning. Our actions are represented as parametrized pushes into clay with a single end-effector, which proved to be suitable for long-horizon (>100 actions) clay relief sculptures. Lastly, we show the benefits of planning in a visually-aligned representation, but also provide analysis providing evidence as to why this representation is challenging to plan in compared to 3D representations.
comment: 8 pages, 14 figures. Accepted for publication in IEEE Robotics and Automation Letters (RA-L)
Distributed 3D Leader-Follower Formation Control with Field-of-View Safety via Control Barrier Functions
This letter proposes a distributed 3D leader-follower formation (3D-LFF) control framework for multi-UAV systems that achieves formation tracking while enforcing perception safety constraints. Maintaining safe, vision-based 3D-LFF is challenging because onboard cameras impose strict Field-of-View (FOV) limitations, and demanding formation commands can drive the leader outside the follower's camera frustum, resulting in loss of visibility. To address this issue, we develop a perception-aware safe control architecture that guarantees visibility by construction. First, we derive a relative kinematic model in a line-of-sight coordinate representation and design a distributed 3D-LFF tracking controller using only locally available relative states. Next, we embed the nominal formation controller within a Control Barrier Function-based Quadratic Program (CBF-QP) safety filter that minimally modifies the commanded velocities to maintain the leader inside the follower's camera frustum while preserving formation tracking whenever feasible. Gazebo simulations and Crazyflie hardware experiments validate the proposed approach, demonstrating accurate formation tracking and effective FOV enforcement, including scenarios in which the nominal desired formation conflicts with visibility constraints.
comment: 9 page
RoboFlow4D: A Lightweight Flow World Model Toward Real-Time Flow-Guided Robotic Manipulation
Planning and acting in 3D environments is a fundamental capability for robotic manipulation in the real world. Although prior work has explored predictive flow planners to guide 3D manipulation, existing approaches often rely on modular pipelines stacking multiple submodels, resulting in high computational overhead and limited real-time performance. To address these challenges, we introduce RoboFlow4D, a lightweight flow world model that unifies perception and planning by estimating temporal motion in physical 3D space. As an end-to-end framework, RoboFlow4D directly predicts multi-frame 3D flows from visual observations and textual instructions, providing explicit flow-based planning to guide action generation. This design allows seamless integration with general action policies, forming an efficient observation-planning-execution closed loop. Through slow-fast collaboration between flow prediction and action control, RoboFlow4D enables real-time and resource-efficient manipulation. Extensive experiments in both simulation and real-world settings demonstrate that RoboFlow4D consistently improves manipulation success rates and computational efficiency, advancing flow-guided planning for embodied intelligence.
AffordVLA: Injecting Affordance Representations into Vision-Language-Action Models via Implicit Feature Alignment
Recent advances in Vision-Language-Action (VLA) models have shown strong potential for general-purpose robotic manipulation. However, the visual representations of most VLA models are often dominated by global object appearance and struggle to focus on task-relevant functional interaction regions, which limits their robustness in unstructured environments. Existing affordance-based methods typically rely on explicit mask injection or external perception modules, requiring additional annotations while introducing cascading perception errors and inference overhead. To address these limitations, we propose AffordVLA, an affordance-enhanced VLA framework that internalizes manipulation-centric affordance perception into VLA visual representations through implicit representation alignment. Specifically, we construct a zero-shot affordance teacher to extract task-conditioned affordance visual representations from RGB observations and language instructions. AffordVLA aligns the intermediate visual representations of the VLA with the affordance visual representations extracted by the teacher, thereby implicitly injecting manipulation-centric affordance perception into VLA visual representations and improving action accuracy. Extensive simulation and real-world experiments demonstrate that AffordVLA and its affordance teacher achieve state-of-the-art performance and outperform strong baselines. Ablation analyses show that AffordVLA effectively reshapes VLA visual representations while preserving inference efficiency, leading to improved manipulation success rates and training efficiency.
comment: 13pages, 10figures
DyGRO-VLA: Cross-Task Scaling of Vision-Language-Action Models via Dynamic Grouped Residual Optimization
Recent progress in Reinforcement Learning (RL) provides a principled approach to optimizing Vision-Language-Action (VLA) models, facilitating a shift from trajectory imitation to active learning in the task environment. Despite improvements in control precision, most RL optimizers remain task-specific, which reduces VLA models from generalist controllers to policies that overfit to a narrow set of tasks. In this study, we conduct an in-depth analysis of this phenomenon and highlight the importance of cross-task feature representations for improving the generalizability of VLA models. Motivated by this finding, we introduce DyGRO-VLA, a two-stage optimization framework that 1) effectively captures cross-task latent representations based on information-theoretic principles, and 2) dynamically refines policy optimization via a mixture-of-RL-residuals. DyGRO-VLA enables the RL optimizer to exploit task-relevant latent information while strategically mitigating adverse interference on the learned representations throughout the optimization process. We evaluate our approach on LIBERO, RoboTwin2 benchmarks, and further validate it on real world, demonstrating consistent improvements over strong baselines under multi-task training and distribution shift.
Rapid Vibration Suppression and Trajectory Tracking of a Serial Manipulator with Multi-Flexible Links
Flexible robotic manipulators (FRMs) offer advantages in lightweight design and large workspace, but their structural flexibility induces vibrations, accelerates fatigue, degrades tracking performance, and limits operational speed. These challenges are further amplified in multi-link serial manipulators, where increased overall length leads to greater structural flexibility. This article presents a backstepping output-feedback framework for fast vibration suppression and tip tracking of an n-degree-of-freedom serial flexible manipulator robot (nDSFMR), with a DeepONet-based approximation for practical deployment. Each link-joint is modeled as a Timoshenko beam coupled with an ODE and transformed into a canonical hyperbolic PDE with boundary dynamics. A backstepping-based boundary controller at the joint is developed to equivalently inject distributed damping along the beam, enabling rapid vibration suppression and trajectory tracking, only using available boundary measurements. To enable real-time implementation and scalability, a DeepONet neural operator is introduced to approximate the backstepping kernels, significantly reducing computational cost and facilitating fast controller updates under varying operating conditions. Experiments on a two-link flexible manipulator demonstrate faster vibration suppression and convergence of the end-effector to the desired trajectory, compared with a linear quadratic regulator (LQR) with feedforward control.
MUSE: Multimodal Uncertainty Quantification of State Estimation
Accurate visual state estimation has been a central topic in robotics with a wide range of applications in robot navigation, autonomous driving, and autonomous flight. Recent advances in robot perception have led to significant improvements in the accuracy and robustness of state estimation, yet a fundamental challenge remains in how to quantify and calibrate its precision, i.e., how confident we are in an estimate and whether failures can be detected. This issue is particularly pronounced in visual-inertial odometry (VIO), where the heteroscedastic and multimodal nature of the problem makes uncertainty quantification especially difficult. This paper introduces MUSE (Multimodal Uncertainty Quantification of State Estimation), a novel real-time learning-based framework that leverages the strong and efficient sequential modeling capacity of Mamba to estimate localization uncertainty from multiple asynchronous sensor streams. Experiments on both public and in-house datasets demonstrate that MUSE achieves superior reliability and robustness compared to existing uncertainty quantification methods, and ablation studies justify the benefits of its key design choices.
comment: Code and dataset: https://github.com/hungdche/MUSE
Tactile-based Multimodal Fusion in Embodied Intelligence: A Survey of Vision, Language, and Contact-Driven Paradigms
Tactile sensing is a fundamental modality for embodied intelligence, offering unique and direct feedback on contact geometry, material properties, and interaction dynamics that remote sensors cannot replace. However, unimodal tactile perception is inherently limited by its sparse spatial coverage and lack of global semantic context. With the recent explosion in deep learning and large language models, integrating tactile with vision and language has become essential to bridge physical interaction with semantic reasoning, leading to the emergence of Multimodal Tactile Fusion. Despite rapid progress, the existing researches remain fragmented across disparate datasets, sensing modalities, and tasks, lacking a unified theoretical framework. To address this gap, this paper provides a comprehensive survey of multimodal tactile fusion research up to the first quarter of 2026. We propose a hierarchical taxonomy that organizes the field into two primary dimensions: multimodal datasets and multimodal methods. On the data side, we categorize resources ranging from Tactile-Vision datasets, Tactile-Language datasets, Tactile-Vision-Language datasets, and Tactile-Vision-Other datasets. On the method side, we structure prior work into three core pillars: (1) Multimodal Perception and Recognition, which focuses on object understanding and grasp prediction; (2) Cross-Modal Generation, focusing on bidirectional translation between tactile, vision, and text; and (3) Multimodal Interaction, emphasizing feedback control and language-guided manipulation. Furthermore, we summarize representative tactile sensing hardware, review commonly used evaluation metrics and benchmark settings, and discuss current challenges and promising future directions.
comment: 20 pages, 8 figures
Efficient Feature-Free Initialization for Monocular Visual-Inertial Systems Using a Feed-Forward 3D Model
Fast and reliable initialization is critical for monocular visual-inertial navigation systems (VINS), as it establishes the starting conditions for subsequent state estimation. Despite steady progress, most existing methods heavily rely on visual feature correspondences and require 3-4 seconds of sensory data for successful initialization, which limits their applicability and efficiency. With the advent of feed-forward 3D models that can directly predict point clouds from images, we revisit the visual-inertial initialization problem from a concise perspective. In this work, we propose a feature-free initialization framework that leverages up-to-scale point clouds predicted by a feed-forward 3D model, thereby obviating the need for visual feature tracking and estimation. This design substantially reduces system complexity and improves the reliability of initialization. Experiments on public datasets demonstrate that the proposed feature-free initialization method achieves the highest success rate, exceeding 90%, and significantly reduces the data duration required for successful initialization, typically to under 1.2 s. We further validate our method on a self-collected dataset covering various indoor and outdoor scenarios, demonstrating robust performance, particularly in visually degraded environments where existing methods often fail. The code and dataset are available at https://github.com/Yuantai-Z/FF-VIO-Init.
Beyond Geometry: Efficient Topologically-Grounded Navigation in Complex 3D Environments
Ground robot navigation in complex 3D environments is often hindered by geometric ambiguity, where non-traversable structures such as furniture share local geometric properties with navigable ground. Furthermore, the computational cost of searching massive voxel spaces remains a significant challenge. To address these issues, we present a surface extraction framework that constructs a reduced state space of physically reachable standing positions by enforcing ground support, overhead clearance, and seed-based connectivity constraints. Evaluation across five Matterport3D indoor scenes and three PCT benchmark scenes demonstrates over 80\% state space reduction and sub-millisecond A* search on the Matterport3D scenes, with 100\% planning success across all 300 tested queries.
HCLM: A Hierarchical Framework for Cooperative Loco-Manipulation with Dual Quadrupeds
We introduce HCLM, a hierarchical framework for general-purpose cooperative loco-manipulation with dual quadrupedal systems. Coordinating multi-robot collaborative manipulation across floating bases is highly challenging due to the conflicting demands of spatial coordination, robust locomotion, and closed-chain physical interactions. To resolve this, our architecture systematically decouples high-level collaborative reasoning from low-level robust motion execution. At the high level, a centralized Joint Diffusion Policy leverages an SE(3)-invariant task-space representation to learn coordinate-agnostic spatial coordination patterns. To translate these frame-agnostic references into physical motion, a task-centric hybrid Whole-Body Controller synergizes a proactive kinematic Model Predictive Control for collision-free velocity distribution with a reactive execution layer. Crucially, this reactive layer guarantees rapid responsiveness for precise end-effector tracking, while concurrently integrating active force regulation via a cooperative admittance scheme to safely resolve kinematic conflicts and strictly regulate internal stresses during closed-chain interactions. We validate the framework across progressively challenging simulated scenarios, including cooperative carrying, packing and handovers, and successfully deploy the latter in the real world. The results demonstrate reliable task execution, strict configuration agnosticism, and exceptional resilience against severe physical perturbations, offering a highly robust pathway for multi-robot embodied coordination.
Task Capability Improvement Algorithm for Collaborative Manipulators
This work introduces a cooperative task capability improvement utilizing additional moments. The manipulators apply forces at the object's grasp point. Applying forces at a point other than the object's center of gravity produces undesired moments. The undesired moment acts as an additional moment. It improves the capability of an individual manipulator and, hence, the entire collaborative group. Any improvements in task capability directly add up to the object and transportation capability. The group's enhanced capability also helps achieve optimal capability, optimal resource allocation, and maximum fault tolerance in object manipulation. Our simulation results show an improvement in the capability of 5.86 \% compared to when no moment is used to enhance the capability of the manipulators.
CLAP: Contrastive Latent-space Prompt Optimization for End-to-end Autonomous Driving
End-to-end autonomous driving systems powered by Vision-Language-Action (VLA) models achieve strong performance on common driving scenarios, yet remain brittle in rare but safety-critical long-tail situations such as active construction zones and complex yielding geometries. In this paper, we present a method that addresses the long-tail challenging scenes beyond data scaling and model training. We introduce CLAP (Contrastive Latent-space Prompt optimization), a location-aware adaptation framework that augments a frozen VLA driving model with per-roadblock soft prompts, optimized from crowdsourced data and retrieved on demand via Vehicle-to-Everything (V2X) communication. Our approach rests on two observations from VLAs' latent space: (i) at the VLA's hidden-state layer, scenarios from the same roadblock cluster tightly and occupy compact regions of the latent space; and (ii) within a single roadblock, long-tail and normal frames are heavily intermixed in the latent representation, making it difficult to improve one without disturbing the other. CLAP addresses this via a two-stage pipeline: supervised contrastive learning to discover a roadblock-specific hard-scene direction, followed by directionally regularized prompt optimization that selectively improves challenging frames while preserving normal frame performance. On the NAVSIM benchmark with various state-of-the-art VLA backbones, CLAP reduces challenging scenario planning error by 24% with no regression on normal frames, significantly improving planning performance.
comment: 9 pages + appendix
Is VLA Reasoning Faithful? Probing Safety of Chain-of-Causation CVPR 2026
We present the first systematic study of faithfulness in Vision-Language-Action (VLA) driving models, analyzing 300 Alpamayo-R1-10B inferences across 100 diverse PhysicalAI-AV scenarios. Our main finding is that output natural-language rationales with trajectories may be significantly unfaithful: (i) overall reasoning fidelity is only 42.5%, with Chain-of-Causation matching scene reality less than half the time; (ii) 94 missed pedestrians in one-third of pedestrian-relevant scenes; (iii) 97.7% trajectory fragility under mild visual perturbations; and (iv) only 48.3% mean reasoning-action consistency, with 53.3% of inferences exhibiting low consistency, including 37.9% of stop-claimed cases where the model continues instead. We formalize faithfulness information-theoretically, define entity and action fidelity with verification criteria, and outline a four-component safety architecture aligned with these results.
comment: Accept (Poster), CVPR 2026 Workshop DriveX NonArchival Track
Stretch-ICP: A Continuous-Trajectory Registration and Deskewing Algorithm in Scenarios of Aggressive Motions
Robust robotic autonomy remains challenging in complex environments, where loss of stability on uneven or slippery terrain can induce extreme accelerations and angular velocities. Such motions corrupt sensor measurements and degrade state estimation, motivating the need for improved algorithmic robustness. To investigate this issue, we introduce the Tumbling-Induced Gyroscope Saturation (TIGS) dataset, which consists of recordings from a mechanical lidar and an Inertial Measurement Unit (IMU) tumbling down a hill. The dataset contains angular speeds up to four times higher than those in similar datasets and is publicly available. We then propose two complementary methods to improve Simultaneous Localization And Mapping (SLAM) robustness and evaluate them on TIGS. First, Saturation-Aware Angular Velocity Estimation (SAAVE) estimates angular velocities when gyroscope measurements become saturated during aggressive motions, reducing angular speed estimation error by 83.4%. Second, Stretch-ICP, a novel registration and deskewing algorithm, enables reconstruction of smoother 6-Degrees Of Freedom (DOF) trajectories under aggressive motions compared to classical Iterative Closest Point (ICP). Stretch-ICP reduces linear and angular velocity errors by 95.2% and 94.8%, respectively, at scan boundaries. Together, these contributions improve the robustness and consistency of lidar-inertial state estimation under aggressive motions.
comment: 29 pages, 16 figures, published in Sensors 2026, 26(8), 2567, special issue "New Challenges and Sensor Techniques in Robot Positioning"
SEDualVLN: A Spatially-Enhanced Dual-System for Vision-Language Navigation
Vision-Language Navigation (VLN) approaches have currently followed two primary paradigms: the end-to-end Vision-Language Model (VLM) policy fine-tuned on navigation trajectories to directly predict actions, and the zero-shot modular pipeline integrating pre-trained Multimodal Large Language Model (MLLM) for training-free generalization to unseen environments. However, end-to-end methods struggle with long-horizon navigation and lack dynamic reasoning, whereas zero-shot methods are constrained by limited spatial grounding for reliable planning and also require substantial reasoning time. To bridge this gap, we introduce SEDualVLN, a spatially-enhanced dual-system VLN framework. System 1 is a VLM model enhanced with both global and local spatial awareness, used for action generation. System 2 integrates a general MLLM with a mapping module, wherein the MLLM plans waypoints by leveraging top-down views of the real-time 3D map alongside streams of rendered path images. Both systems leverage different forms of spatial enhancement to cultivate the agent's sense of direction in VLN tasks. Ultimately, they cooperate to complete the navigation task through a fast-slow coordinated approach. SEDualVLN achieves state-of-the-art performance on VLN-CE benchmarks, and further ablation studies demonstrate the effectiveness of each system and module.
Generating Realistic Safety-Critical Scenarios for Vehicle-Pedestrian Interactions
Automated driving system deployment requires rigorous validation across safety-critical vehicle-pedestrian interactions, yet real-world datasets rarely capture high-risk scenarios while simulation platforms lack realistic behavior. In response, this study proposes a three-stage framework that combines real-world grounding with adaptive simulation to generate behaviorally realistic safety-critical scenarios at scale. Stage 1 pre-trains multi-agent state-space Transformer-enhanced DDPG (MA-SST-DDPG) agents on real-world safety-critical data to learn human-like interactive evasive behaviors through data-driven learning. Stage 2 deploys pre-trained multi-agents in CARLA for online reinforcement learning to generalize across diverse scenarios, integrating real-world knowledge with simulation experience to produce a refined MA-SST-DDPG model. Stage 3 uses CARLA with the refined model to generate over 198,000 high-resolution interaction episodes from eight intersection scenarios, culminating in the Vehicle-Pedestrian Safety-Critical Interaction (VPSCI) dataset. The Refined MA-SST-DDPG model outperformed baseline methods in reproducing realistic evasive behaviors, achieving the lowest trajectory errors (ADE = 0.072 m, FDE = 0.142 m). Statistical comparison confirmed distributional equivalence between the generated and real-world data in both conflict severity and behavioral response. A Turing test confirmed that the three-stage framework generated evasive behaviors were indistinguishable from real-world interactions. These results demonstrate the framework's effectiveness in producing high-fidelity safety-critical data, offering valuable sources for the development of ADS and simulation-based safety evaluations.
comment: 49 pages, 13 figures, 11 table
Event-Grounded Sparse Autoencoders for Vision-Language-Action Policies
Vision-Language-Action (VLA) policies translate language and visual inputs into robot actions, where their hidden representations directly shape closed-loop behavior. However, mechanistic interpretability tools from language and vision-language models do not transfer cleanly to VLAs: outputs are robot actions rather than human-readable tokens, and interventions can only be tested via expensive closed-loop rollouts. We propose an event-grounded interpretability pipeline that anchors SAE feature analysis to behavioral events rather than text contexts. End-effector keyframes are clustered within each task using visual, state, and temporal cues, linking SAE features to behaviorally salient events and, via optional VLM annotations, to semantic context. To our knowledge, our pipeline is among the first to ground SAE-based VLA analysis in closed-loop behavioral events. Across two simulation architectures and a real-robot study, event-grounded ranking yields the strongest causal effects on OpenVLA and transfers to the continuous action chunks of $π_{0.5}$. SAE is a sparse but imperfect intervention basis: usability varies with architecture and intervention site, and aggressive intervention reveals safety and interpretability limits. Overall, event-grounded SAE analysis emerges as a practical starting point for behavior-anchored VLA interpretability, motivating future work on SAE features beyond action-aligned coordinates, finer-grained closed-loop evaluation, and safe interventions for high-stakes VLA deployments. Code is available at \url{https://github.com/xc-j/Event-SAE}.
RoboMME: Benchmarking and Understanding Memory for Robotic Generalist Policies ICML 2026
Memory is critical for long-horizon and history-dependent robotic manipulation. Such tasks often involve counting repeated actions or manipulating objects that become temporarily occluded. Recent vision-language-action (VLA) models have begun to incorporate memory mechanisms; however, their evaluations remain confined to narrow, non-standardized settings. This limits their systematic understanding, comparison, and progress measurement. To address these challenges, we introduce RoboMME: a large-scale standardized benchmark for evaluating and advancing VLA models in long-horizon, history-dependent scenarios. Our benchmark comprises 16 manipulation tasks constructed under a carefully designed taxonomy that evaluates temporal, spatial, object, and procedural memory. We further develop a suite of 14 memory-augmented VLA variants built on the π0.5 backbone to systematically explore different memory representations across multiple integration strategies. Experimental results show that the effectiveness of memory representations is highly task-dependent, with each design offering distinct advantages and limitations across different tasks. Videos and code can be found at our website https://robomme.github.io.
comment: Accepted to ICML 2026
Constrained Policy Optimization via Sampling-Based Weight-Space Projection
Safety-critical learning requires policies that improve performance without leaving the safe operating regime. We study constrained policy learning where model parameters must satisfy rollout-based safety constraints that can be evaluated but not differentiated analytically. We propose SCPO, a sampling-based weight-space projection method that enforces safety directly in parameter space without requiring gradient access to the constraint functions. SCPO constructs a local safe region by combining rollout-based safety evaluations with smoothness bounds relating parameter perturbations to changes in safety metrics, and projects each gradient update via a convex SOCP. We establish a safe-by-induction guarantee: starting from any safe initialization, all intermediate policies remain safe given feasible projections. In constrained control settings with a stabilizing backup policy, SCPO further ensures closed-loop stability while enabling safe adaptation beyond the conservative backup. Experiments on constrained regression with harmful supervision and double-integrator imitation with a malicious expert show that SCPO rejects unsafe updates, maintains feasibility throughout training, and achieves meaningful objective improvement.
comment: Accepted for publication at IFAC World Congress 2026
DexWild: Dexterous Human Interactions for In-the-Wild Robot Policies
Large-scale, diverse robot datasets have emerged as a promising path toward enabling dexterous manipulation policies to generalize to novel environments, but acquiring such datasets presents many challenges. While teleoperation provides high-fidelity datasets, its high cost limits its scalability. Instead, what if people could use their own hands, just as they do in everyday life, to collect data? In DexWild, a diverse team of data collectors uses their hands to collect hours of interactions across a multitude of environments and objects. To record this data, we create DexWild-System, a low-cost, mobile, and easy-to-use device. The DexWild learning framework co-trains on both human and robot demonstrations, leading to improved performance compared to training on each dataset individually. This combination results in robust robot policies capable of generalizing to novel environments, tasks, and embodiments with minimal additional robot-specific data. Experimental results demonstrate that DexWild significantly improves performance, achieving a 68.5% success rate in unseen environments-nearly four times higher than policies trained with robot data only-and offering 5.8x better cross-embodiment generalization. Video results, codebases, and instructions at https://dexwild.github.io
comment: In RSS 2025. Website at https://dexwild.github.io
Therapist-Exoskeleton-Patient Interaction for Gait Therapy
Following a stroke, individuals often experience mobility and balance impairments due to lower-limb weakness and loss of independent joint control. Gait recovery is a key goal of rehabilitation, traditionally achieved through high-intensity therapist-led training. However, manual assistance can be physically demanding and limits the therapist's ability to interact with multiple joints simultaneously. Robotic exoskeletons offer multi-joint support, reduce therapist strain, and provide objective feedback, but current control strategies often limit therapist involvement and adaptability. We present a novel gait rehabilitation paradigm based on physical Human-Robot-Human Interaction (pHRHI), where both the therapist and the post-stroke individual wear lower-limb exoskeletons virtually connected at the hips and knees via spring-damper elements. This enables bidirectional interaction, allowing the therapist to guide movement and receive haptic feedback. In a study with eight chronic stroke patients, pHRHI training outperformed conventional therapist-guided treadmill walking, leading to increased joint range of motion, step metrics, muscle activation, and motivation. These results highlight pHRHI's potential to combine robotic precision with therapist intuition for improved rehabilitation outcomes.
See What Matters: Differentiable Grid Sample Pruning for Generalizable Vision-Language-Action Model
Vision-Language-Action (VLA) models have shown remarkable promise in robotics manipulation, yet their high computational cost hinders real-time deployment. Existing token pruning methods suffer from a fundamental trade-off: aggressive compression using pruning inevitably discards critical geometric details like contact points, leading to severe performance degradation. This forces a compromise, limiting the achievable compression rate and thus the potential speedup. We argue that breaking this trade-off requires rethinking compression as a geometry-aware, continuous token resampling in the vision encoder. To this end, we propose the Differentiable Grid Sampler (GridS), a plug-and-play module that performs task-aware, continuous resampling of visual tokens in VLA. By adaptively predicting a minimal set of salient coordinates and extracting features via differentiable interpolation, GridS preserves essential spatial information while achieving drastic compression (with fewer than 10% original visual tokens). Experiments on both LIBERO benchmark and a real robotic platform demonstrate that validating the lowest feasible visual token count reported to date, GridS achieves a 76% reduction in FLOPs with no degradation in the success rate. The code is available at https://github.com/Fediory/Grid-Sampler.
Geometry-aware 4D Video Generation for Robot Manipulation ICLR 2026
Understanding and predicting dynamics of the physical world can enhance a robot's ability to plan and interact effectively in complex environments. While recent video generation models have shown strong potential in modeling dynamic scenes, generating videos that are both temporally coherent and geometrically consistent across camera views remains a significant challenge. To address this, we propose a 4D video generation model that enforces multi-view 3D consistency of generated videos by supervising the model with cross-view pointmap alignment during training. Through this geometric supervision, the model learns a shared 3D scene representation, enabling it to generate spatio-temporally aligned future video sequences from novel viewpoints given a single RGB-D image per view, and without relying on camera poses as input. Compared to existing baselines, our method produces more visually stable and spatially aligned predictions across multiple simulated and real-world robotic datasets. We further show that the predicted 4D videos can be used to recover robot end-effector trajectories using an off-the-shelf 6DoF pose tracker, yielding robot manipulation policies that generalize well to novel camera viewpoints.
comment: ICLR 2026; Project website: https://robot4dgen.github.io
FUNCanon: Learning Pose-Aware Action Primitives via Functional Object Canonicalization for Generalizable Robotic Manipulation
General-purpose robotic skills from end-to-end demonstrations often leads to task-specific policies that fail to generalize beyond the training distribution. Therefore, we introduce FunCanon, a framework that converts long-horizon manipulation tasks into sequences of action chunks, each defined by an actor, verb, and object. These chunks focus policy learning on the actions themselves, rather than isolated tasks, enabling compositionality and reuse. To make policies pose-aware and category-general, we perform functional object canonicalization for functional alignment and automatic manipulation trajectory transfer, mapping objects into shared functional frames using affordance cues from large vision language models. An object centric and action centric diffusion policy FuncDiffuser trained on this aligned data naturally respects object affordances and poses, simplifying learning and improving generalization ability. Experiments on simulated and real-world benchmarks demonstrate category-level generalization, cross-task behavior reuse, and robust sim2real deployment, showing that functional canonicalization provides a strong inductive bias for scalable imitation learning in complex manipulation domains. Details of the demo and supplemental material are available on our project website https://sites.google.com/view/funcanon.
comment: project website: https://sites.google.com/view/funcanon, 11 pages
ORION: Option-Regularized Deep Reinforcement Learning for Cooperative Multi-Agent Online Navigation
Existing methods for multi-agent navigation typically assume fully known environments, offering limited support for partially known scenarios with outdated or imperfect prior maps, such as warehouses or factory floors. There, agents need to balance path optimality with collecting and sharing environmental information to help teammates reach their own targets. To these ends, we propose ORION, a novel deep reinforcement learning framework for cooperative multi-agent online navigation in partially known environments. Starting from an imperfect prior map, ORION trains agents to make decentralized decisions, coordinate toward individual targets, and actively reduce task-relevant map uncertainty through online observation sharing in a closed perception-action loop. We first design a shared graph encoder that fuses prior map with online perception into a unified representation, providing robust state embeddings under environmental discrepancies. At the core of ORION is an option-critic framework that learns high-level cooperative modes translated into sequences of low-level actions, enabling adaptive switching between individual navigation and team-level exploration. We further introduce a dual-stage cooperation strategy that allows agents to assist teammates under map uncertainty, thereby reducing the overall makespan. Across extensive maze-like maps and large-scale warehouse environments, ORION achieves high-quality real-time decentralized cooperation while scaling to up to 10 robots, outperforming state-of-the-art classical and learning-based baselines. Finally, we validate ORION on physical robot teams, demonstrating its robustness and practicality for real-world cooperative navigation.
CoLA-Flow Policy: Temporally Coherent Imitation Learning via Continuous Latent Action Flow Matching for Robotic Manipulation
Learning long-horizon robotic manipulation requires jointly achieving expressive behavior modeling, real-time inference, and stable execution, which remains challenging for existing generative policies. Diffusion-based approaches offer strong modeling capacity but incur high inference latency, while flow matching enables fast, near-single-step generation yet often suffers from unstable execution when operating directly in the raw action space. We propose Continuous Latent Action Flow Policy (CoLA-Flow Policy), a trajectory-level imitation learning framework that performs flow matching in a continuous latent action space. By encoding action sequences into temporally coherent latent trajectories and learning an explicit latent-space flow, CoLA-Flow Policy decouples global motion structure from low-level control noise, enabling smooth and reliable long-horizon execution. The framework further integrates geometry-aware point cloud conditioning and execution-time multimodal modulation, using visual cues as a representative modality to enhance real-world robustness. Experiments in simulation and on real robots show that CoLA-Flow Policy achieves near-single-step inference, improves trajectory smoothness by up to 93.7% and task success by up to 25 percentage points over raw action-space flow baselines, while remaining significantly faster than diffusion-based policies.
comment: 9 pages, 9 figures
Agentic Vehicles for Human-Centered Mobility: Definition, Prospects, and Synergistic Co-Development with Vehicle Autonomy
Autonomy, from the Greek autos (self) and nomos (law), refers to the capacity to operate according to internal rules without external control. Autonomous vehicles (AuVs) are therefore understood as vehicular systems that perceive their environment and execute tasks with minimal human intervention, consistent with the direction indicated by the SAE levels of automated driving. However, recent research and deployments increasingly showcase vehicular capabilities that, while not contradicting autonomy, are not entailed by it, including ambiguous goal handling, purposeful social engagement, external tool use, proactive problem solving, continuous learning, and context-sensitive reasoning in unseen and ethically salient situations, enabled in part by multimodal language models. These developments reveal a gap between technical autonomy and the broader social cognitive functions required for human-centered mobility, which are more precisely captured by the notion of agency. Therefore, rather than adding increasingly elaborate modifiers to "autonomous," we introduce agentic vehicles (AgVs) and suggest that autonomy and agency are intertwined but conceptually distinct: if autonomy concerns what to do and how to do it (task executions under internal rules), agency pertains to why to do it and what else can be done (goal-directed, adaptive actions). We present autonomy and agency as orthogonal yet synergistic dimensions with co-development implications. Vehicle agency marks a novel dimension of mobility service intelligence, heralding vehicles as purposeful actors in society.
First Experimental Demonstration of Natural Hovering Extremum Seeking: A New Paradigm in Flapping Flight Physics
In this letter, we report the first experimental demonstration of the recently emerged new paradigm in hovering and flapping flight physics called (Natural Hovering Extremum Seeking (NH-ES)) [doi.org/10.1103/4dm4-kc4g], which theorized that stable hovering flight physics observed in nature by flapping insects and hummingbirds can be generated via a model-free, real-time, computationally-basic, sensory-based feedback mechanism that only needs the built-in natural oscillations of the flapping wing as both the control and the propulsive input. We run experiments of moth-like, light source-seeking, on a flapping-wing body in a total model-free setting that is agnostic to morphological parameters and body/aerodynamic models. We show that the flapping body using NH-ES gains altitude and stabilizes autonomously the servos responsible for flapping, including with pitching dynamics (believed in literature to be a main reason of instability in open-loop hovering). The flapping body effectively/stably hovers about the light source, needing only feedback of local measurements of light intensity. Our results were also achieved under delay/noise effects, supporting earlier observations that NH-ES is robust against potential processing delays and noisy-sensations.
Motion Planning of Cooperative Nonholonomic Mobile Manipulators
We propose a real-time implementable motion planning framework for cooperative object transportation by nonholonomic mobile manipulator robots (MMRs) in dynamic environments. Our global planner finds a path from start to goal through the static, obstacle-free regions in the environment and generates a set of convex, static, obstacle-free regions around the path using a novel, fast, and computationally lightweight ellipse-based technique. We introduce a nonlinear Model Predictive Control (NMPC) based real-time implementable planning technique that jointly plans feasible motion for the mobile base and the manipulator's arm and generates a kinodynamic feasible, collision-free trajectory for cooperative object transportation. Simulation and hardware experiments validate the efficiency of our proposed planning framework.
comment: Published in ASME Letters in Translational Robotics. This includes supplementary materials
Learning Native Continuation for Action Chunking Flow Policies
Action chunking enables Vision Language Action (VLA) models to run in real time, but naive chunked execution often exhibits discontinuities at chunk boundaries. Real-Time Chunking (RTC) alleviates this issue but is external to the policy, leading to spurious multimodal switching and trajectories that are not intrinsically smooth. We propose Legato, a training-time continuation method for action-chunked flow-based VLA policies. Specifically, Legato initializes denoising from a schedule-shaped mixture of known actions and noise, exposing the model to partial action information. Moreover, Legato reshapes the learned flow dynamics to ensure that the denoising process remains consistent between training and inference under per-step guidance. Legato further uses randomized schedule condition during training to support varying inference delays and achieve controllable smoothness. Empirically, Legato produces smoother trajectories and reduces spurious multimodal switching during execution, leading to less hesitation and shorter task completion time. Extensive real-world experiments show that Legato consistently outperforms RTC across five manipulation tasks, achieving approximately 10% improvements in both trajectory smoothness and task completion time.
comment: Accepted by Robotics: Science and Systems 2026 (RSS 2026). Project page: https://lyfeng001.github.io/Legato/
Quality-guided UAV Surface Exploration for 3D Reconstruction
Reasons for mapping an unknown environment with autonomous robots are wide-ranging, but in practice, they are often overlooked when developing planning strategies. Rapid information gathering and comprehensive structural assessment of buildings have different requirements and therefore necessitate distinct methodologies. In this paper, we propose a novel modular Next-Best-View (NBV) planning framework for aerial robots that explicitly uses a reconstruction quality objective to guide the exploration planning. In particular, our approach introduces new and efficient methods for view generation and selection of viewpoint candidates that are adaptive to the user-defined quality requirements, fully exploiting the uncertainty encoded in a Truncated Signed Distance field (TSDF) representation of the environment. This results in informed and efficient exploration decisions tailored towards the predetermined objective. Finally, we validate our method via extensive simulations in realistic environments. We demonstrate that it successfully adjusts its behavior to the user goal while consistently outperforming conventional NBV strategies in terms of coverage, quality of the final 3D map and path efficiency.
Early Pruning for Public Transport Routing
Routing algorithms for public transport, particularly the widely used RAPTOR and its variants, often face performance bottlenecks during the transfer relaxation phase, especially on dense transfer graphs, when supporting unlimited transfers. This inefficiency arises from iterating over many potential inter-stop connections (walks, bikes, e-scooters, etc.). To maintain acceptable performance, practitioners often limit transfer distances or exclude certain transfer options, which can reduce path optimality and restrict the multimodal options presented to travellers. This paper introduces Early Pruning, a low-overhead technique that accelerates routing algorithms without compromising optimality. By pre-sorting transfer connections by duration and applying a pruning rule within the transfer loop, the method discards longer transfers at a stop once they cannot yield an earlier arrival than the current best solution. Early Pruning can be integrated with minimal changes to existing codebases and requires only a one-time preprocessing step. The technique preserves Pareto-optimality in extended-criteria settings whenever the additional optimization criteria are monotonically non-decreasing in transfer duration. Across multiple state-of-the-art RAPTOR-based solutions, including RAPTOR, ULTRA-RAPTOR, McRAPTOR, BM-RAPTOR, ULTRA-McRAPTOR, and UBM-RAPTOR and tested on the Switzerland and London transit networks, we achieved query time reductions of up to 57\%. This approach provides a generalizable improvement to the efficiency of transit pathfinding algorithms.
Real2Sim via Active Perception with Behavior Trees Automatically Generated by VLMs
Constructing physically accurate simulation environments (Real2Sim) traditionally relies on manual system identification or rigid, exhaustive exploration routines. These task-agnostic pipelines often fail to leverage semantic scene context, leading to redundant physical interactions and inefficient data acquisition. In this paper, we present an autonomous, intent-driven Real2Sim framework that leverages Vision-Language Models (VLMs) for Semantic Task Decomposition. Given a high-level natural language request, an incomplete simulation description, and a visual observation, the framework autonomously identifies the minimal subset of missing physical parameters required for the simulation task. It then generates a reactive Behavior Tree (BT) composed of atomic motion and sensing primitives to selectively acquire these parameters through contact-rich robotic interaction. Extensive real-world experiments on a torque-controlled Franka Emika Panda demonstrate that our approach accurately estimates object mass, surface geometry, and derived parameters such as friction. Quantitative evaluations reveal significant operational efficiency gains compared to exhaustive baseline methods, while ablation studies confirm the robustness of the prompt architecture across different state-of-the-art VLMs. Furthermore, the reactive hierarchy of the BT acts as a deterministic safety filter, successfully mitigating generative VLM hallucinations and preventing unsafe physical anomalies. Ultimately, this work provides a scalable, efficient, and interpretable pipeline for building physics-aware digital twins directly from unstructured human intent.
A Sliced Learning Framework for Online Disturbance Identification in Quadrotor SO(3) Attitude Control
This paper introduces a dimension-decomposed geometric learning framework called Sliced Learning for disturbance identification in quadrotor geometric attitude control. Instead of conventional learning-from-states, this framework adopts a learning-from-error strategy by using the Lie-algebraic error representation as the input feature, enabling axis-wise space decomposition (``slicing") while preserving the SO(3) structure. This is highly consistent with the geometric mechanism of cognitive control observed in neuroscience, where neural systems organize adaptive representations within structured subspaces to enable cognitive flexibility and efficiency. Based on this framework, we develop a lightweight and structurally interpretable Sliced Adaptive-Neuro Mapping (SANM) module. The high-dimensional mapping for online identification is axially ``sliced" into multiple low-dimensional submappings (``slices"), implemented by shallow neural networks and adaptive laws. These neural networks and adaptive laws are updated online via Lyapunov-based adaptation within their respective shared subspaces. To enhance interpretability, we prove exponential convergence despite time-varying disturbances and inertia uncertainties. To our knowledge, Sliced Learning is among the first frameworks to demonstrate lightweight online neural adaptation at 400 Hz on resource-constrained microcontroller units (MCUs), such as STM32, with real-world experimental validation.
comment: v4: This version has been accepted for publication in IEEE/ASME Transactions on Mechatronics (TMECH). Supplementary video links have also been added
A Visual Reinforcement Learning-Based Separate Primitive Policy for Peg-in-Hole Tasks
For peg-in-hole tasks, humans rely on binocular visual perception to locate the peg above the hole surface and then proceed with insertion. This paper draws insights from this behavior to enable agents to learn efficient assembly strategies through visual reinforcement learning. Hence, we propose a Separate Primitive Policy (S2P) to learn how to derive location and insertion actions simultaneously. S2P is compatible with model-free reinforcement learning algorithms. Ten insertion tasks featuring different polygons are developed as benchmarks for evaluations. Simulation experiments show that S2P can boost the sample efficiency and success rate even with force constraints. Real-world experiments are also performed to verify the feasibility of S2P. Ablations are finally given to discuss the generalizability of S2P and some factors that affect its performance.
comment: Accepted for publication in IEEE Robotics and Automation Letters (RA-L)
GeoWorld: Geometric World Models CVPR 2026
Energy-based predictive world models provide a powerful approach for multi-step visual planning by reasoning over latent energy landscapes rather than generating pixels. However, existing approaches face two major challenges: (i) their latent representations are typically learned in Euclidean space, neglecting the underlying geometric and hierarchical structure among states, and (ii) they struggle with long-horizon prediction, which leads to rapid degradation across extended rollouts. To address these challenges, we introduce GeoWorld, a geometric world model that preserves geometric structure and hierarchical relations through a Hyperbolic JEPA, which maps latent representations from Euclidean space onto hyperbolic manifolds. We further introduce Geometric Reinforcement Learning for energy-based optimization, enabling stable multi-step planning in hyperbolic latent space. Extensive experiments on CrossTask and COIN demonstrate around 3% SR improvement in 3-step planning and 2% SR improvement in 4-step planning compared to the state-of-the-art V-JEPA 2. Project website: https://steve-zeyu-zhang.github.io/GeoWorld.
comment: Accepted to CVPR 2026
Real-to-Sim for Highly Cluttered Environments via Physics-Consistent Inter-Object Reasoning
Reconstructing physically valid 3D scenes from single-view observations is a prerequisite for bridging the gap between visual perception and robotic control. However, in scenarios requiring precise contact reasoning, such as robotic manipulation in highly cluttered environments, geometric fidelity alone is insufficient. Standard perception pipelines often neglect physical constraints, resulting in invalid states, e.g., floating objects or severe inter-penetration, rendering downstream simulation unreliable. To address these limitations, we propose a novel physics-constrained Real-to-Sim pipeline that reconstructs physically consistent 3D scenes from single-view RGB-D data. Central to our approach is a differentiable optimization pipeline that explicitly models spatial dependencies via a contact graph, jointly refining object poses and physical properties through differentiable rigid-body simulation. Extensive evaluations in both simulation and real-world settings demonstrate that our reconstructed scenes achieve high physical fidelity and faithfully replicate real-world contact dynamics, enabling stable and reliable contact-rich manipulation.
comment: Project page: https://physics-constrained-real2sim.github.io
Universal Pose Pretraining for Generalizable Vision-Language-Action Policies
Existing Vision-Language-Action (VLA) models often suffer from feature collapse and low training efficiency because they entangle high-level perception with sparse, embodiment-specific action supervision. Since these models typically rely on VLM backbones optimized for Visual Question Answering (VQA), they excel at semantic identification but often overlook subtle 3D state variations that dictate distinct action patterns. To resolve these misalignments, we propose Pose-VLA, a decoupled paradigm that separates VLA training into a pre-training phase for extracting universal 3D spatial priors in a unified camera-centric space, and a post-training phase for efficient embodiment alignment within robot-specific action space. By introducing discrete pose tokens as a universal representation, Pose-VLA seamlessly integrates spatial grounding from diverse 3D datasets with geometry-level trajectories from robotic demonstrations. Our framework follows a two-stage pre-training pipeline, establishing fundamental spatial grounding via poses followed by motion alignment through trajectory supervision. Extensive evaluations demonstrate that Pose-VLA achieves state-of-the-art results on RoboTwin 2.0 with a 79.5% average success rate and competitive performance on LIBERO at 96.0%. Real-world experiments further showcase robust generalization across diverse objects using only 100 demonstrations per task, validating the efficiency of our pre-training paradigm.
comment: Accepted to Robotics: Science and Systems (RSS) 2026. Project website: https://hetolin.github.io/PoseVLA
A Deployable Embodied Vision-Language Navigation System with Hierarchical Cognition and Context-Aware Exploration
Bridging the gap between embodied intelligence and embedded deployment remains a key challenge in intelligent robotic systems, where perception, reasoning, and planning must operate under strict constraints on computation, memory, energy, and real-time execution. In vision-and-language navigation (VLN), existing approaches often face a trade-off between reasoning capability and deployment efficiency on real-world platforms. In this paper, we present a deployable embodied VLN system that achieves both high efficiency and strong high-level reasoning on real-world robots. The system is decomposed into a fast perception-action layer and a deep reasoning layer running asynchronously at different time scales, with a shared memory layer enabling efficient interaction between them. To support long-horizon reasoning, we incrementally construct a compact memory graph and progressively feed decomposed subgraphs into a vision-language model (VLM). Furthermore, we formulate exploration as a Weighted Traveling Repairman Problem (WTRP) by jointly considering reasoning outcomes and the spatial distribution of candidate regions. Extensive experiments in simulation and real-world environments demonstrate improved navigation success and efficiency over existing VLN approaches while maintaining real-time performance on resource-constrained hardware. Code and additional real-world experiments are available at https://github.com/xukuanHIT/HiCo-Nav.
comment: 10 pages, 5 figures,
Robust and Resilient Soft Robotic Object Insertion with Compliance-Enabled Contact Formation and Failure Recovery
Object insertion tasks are prone to failure under pose uncertainty and environmental variation, often requiring manual fine-tuning or controller retraining. We present a novel approach for robust and resilient object insertion using a passively compliant soft wrist that enables safe contact absorption through large deformations, without high-frequency control or force sensing. Our method structures insertion as compliance-enabled contact formations, sequential contact states that progressively constrain degrees of freedom, and integrates automated failure recovery strategies. Our key insight is that wrist compliance permits safe, repeated recovery attempts; hence, we refer to it as compliance-enabled failure recovery. We employ a pre-trained vision-language model (VLM) that assesses each skill execution from terminal poses and images, identifies failure modes, and proposes recovery actions by selecting skills and updating goals. In simulation, our method achieved an 83% success rate, recovering from failures induced by randomized conditions, including grasp misalignments up to 5 degrees, hole-pose errors up to 20 mm, fivefold increases in friction, and unseen square/rectangular pegs, and we further validated the approach on a real robot. Project page is available at https://omron-sinicx.github.io/compliance-enabled-failure-recovery/.
Multiagent Systems
Agent Bazaar: Enabling Economic Alignment in Multi-Agent Marketplaces
The deployment of Large Language Models (LLMs) as autonomous economic agents introduces systemic risks that extend beyond individual capability failures. As agents transition to directly interacting with marketplaces, their collective behavior can amplify volatility and mask deception at scale. We introduce the Agent Bazaar, a multi-agent simulation framework for evaluating Economic Alignment, the capacity of agentic systems to preserve market stability and integrity. We identify two failure modes: (1) Algorithmic Instability in a B2C market ("The Crash"), where firms amplify price volatility until the market collapses, and (2) Sybil Deception in a C2C market ("The Lemon Market"), where a single deceptive agent controlling multiple coordinated seller identities floods the market with fraudulent listings, eroding trust and consumer welfare. We evaluate frontier and open-weight models across both scenarios and find that models largely fail to self-regulate, with failure severity varying by model rather than by size. We propose economically aligned harnesses, Stabilizing Firms and Skeptical Guardians, that improve outcomes but remain fragile under harder market conditions. To close this gap, we train agents with REINFORCE++ using an adaptive curriculum, producing a 9B model that outperforms all evaluated frontier and open-weight models. We propose the Economic Alignment Score (EAS), a 4-component scalar metric aggregating stability, integrity, welfare, and profitability, enabling direct cross-model comparison. Our results show that economic alignment is orthogonal to general capability and can be directly trained with targeted RL.
comment: 17 pages, 9 figures
Reservation Based Smart Parking Management
In the framework of Smart Cities and Intelligent Transportation Systems (ITS), efficient parking management is essential to reduce urban congestion and emissions. However, current reservation-based systems often encounter a scenario in which users find their reserved slot occupied by a previous occupant who failed to vacate on time ("No PARK" situation). This paper introduces a dual-mechanism architecture designed to enhance system reliability. A Reservation Module uses a dynamic size buffer of non-reservable slots to grant parking availability. A reputation-based Reward System exploits a "star-based" metric to incentivize punctual departures through financial penalties and access restrictions. The simulations conducted with the SUMO urban simulator are promising, showing that the dynamic buffer strategy provides a better tradeoff between parking availability and reservation success. By progressively adapting to users behavior, the proposed system mitigates "NO PARK" instances and improves resource utilization, significantly enhancing urban viability. Index Terms-Smart City, Intelligent transportation systems, Parking, Reservation systems, V2I, Reputation-based mechanisms, Smart Parking
comment: 6 pages, accepted at the IEEE WETICE 2026 Conference
Automated Root-Cause Subclassification and No-Code Fix Generation for Invalid Bug Reports
Issues faced when using software are reported in the form of bug reports. However, many bug reports are invalid, meaning they do not require code changes, and are resolved with a no-code fix. Manually determining the root cause of the invalid bug reports and providing actionable resolutions by the customer support causes a serious waste of resources. Our goal is to introduce a standardized taxonomy for root-cause oriented invalid bug report subclassification, and perform experiments to test the accuracy of various approaches on invalid subclassification and no-code fix generation. We study how different configurations perform on a gold-standard benchmark we have created. Using a manually curated benchmark for higher quality analysis, we experimented with vanilla LLMs, Retrieval Augmented Generation, and agentic web search to identify invalid subclasses and generate no-code fixes. We evaluated the results against manually labeled ground truth data that includes the invalid subclass and no-code fixes from the original bug reports. We measured subclass detection performance with weighted F1-Score, and assessed no-code fix suggestions using BERTScore and Judge LLM success rates. For subclassification, retrieval augmented generation achieves the highest overall performance with 0.66 weighted F1, slightly outperforming vanilla LLMs at 0.65 and agentic web search at 0.64. At the subclass level, performance peaks at 0.85 F1 for Non-reproducibility and 0.79 for Feature Request and Question, while Wrong Version remains the most challenging with scores between 0.00 and 0.29. For no-code fix generation, agentic web search achieves the highest overall Judge LLM success rate at 68.9%, compared to 64.4% for RAG applications and 64.9% for vanilla LLMs, with subclass-level peaks of 87.4% for Working as Designed and 72.2% for Question.
comment: 19 pages, 4 figures, 7 tables
Scale-Dependent Collective Adaptation in Self-Amending LLM Societies: A Cross-Family Study of Emergent Governance
We study group decision-making in artificial societies where the rules of play are themselves subject to collective amendment. Using the self-amending game Nomic, we compare multiple scales across two LLM families and find that collective adaptation does not improve monotonically with model size. Instead, both families exhibit a narrow mid-scale regime that supports sustained rule adoption, diverse amendments, and balanced consensus. Smaller models tend to remain rule-inert, whereas larger models often converge on restrictive voting patterns, and heterogeneous mixed-size groups collapse into veto-driven gridlock. These cross-scale contrasts persist under temperature perturbations and under a shift from unanimity to majority voting, although latent-state structure varies by family and scale. Hidden-state divergence alone does not explain collective performance: high representational divergence can coincide with poor behavioural outcomes. Linear probes reveal regime-selective coupling between latent vote-predictive signals and collective behaviour, but decodability is necessary rather than sufficient for adaptive play. Overall, the recurring regularity is non-monotonicity, not the particular scale at which the optimum appears. Self-amending games therefore provide a controlled testbed for studying collective adaptation in artificial societies beyond raw model scale.
Human-Flow Digital Twin for Predicting the Effects of Mobility Introduction on Visitor Circulation MDM 2026
We propose a framework for predicting the effects of mobility introduction measures using a human-flow digital twin. This digital twin incorporates a multi-agent simulator that can represent how visitors choose destinations depending on factors such as their current location and the attractiveness of spots. We extract data on how visitors selected destinations with respect to measured pre-intervention human-flow data, inter-spot distances, spot attractiveness, and travel volumes, and use these data to train each agent's decision model of this simulator. The trained decision model is a function that takes a visitor's current state and surrounding environmental information as input and outputs which spot the visitor will move toward next. By expressing mobility introduction measures as changes to inter-point distances or to spot attractiveness, the framework can reproduce human flows with mobility introduction in the multi-agent simulator and thereby quantify effects such as changes in visitor counts and circulation. We evaluated the proposed method using human-flow data measured with and without introducing mobility within Wakayama Castle Park in Japan. When reproducing flows with mobility introduction using a multi-layer perceptron decision model, the cosine similarity of the spatial population distribution exceeded 0.7, confirming that the approach can replicate the flow changes caused by the mobility introduction.
comment: An accepted paper at the 27th IEEE International Conference on Mobile Data Management (MDM 2026). Project page: https://mc.net.ist.osaka-u.ac.jp/en/activity/wakayama-castle-mobility_2023/
Heterogeneous Information-Bottleneck Coordination Graphs for Multi-Agent Reinforcement Learning
Coordination graphs are a central abstraction in cooperative multi-agent reinforcement learning (MARL), yet existing sparse-graph learners lack a theoretically grounded mechanism to decide which edges should exist and how much information each edge should carry. Current methods rely on heuristic criteria that offer no formal guarantee on the learned topology, and no principled way to allocate different communication capacities to structurally different agent relationships. To address this, we propose Heterogeneous Information-Bottleneck Coordination Graphs (HIBCG), which learns a group-aware sparse graph in which both edge existence and message capacity are theoretically justified. With the graph information bottleneck (GIB) serving as the underlying tool, HIBCG first constructs a group-aligned block-diagonal prior that provides a closed-form criterion for edge retention -- determining which edges should exist and at what density per group block -- and then controls per-agent feature bandwidth on the resulting topology, compressing messages to retain only task-relevant content. We prove that the group-aligned prior strictly tightens the variational bound on topology learning, that the objective decomposes per group block, enabling differential edge control, and that capacity allocation follows a water-filling principle.
Task Capability Improvement Algorithm for Collaborative Manipulators
This work introduces a cooperative task capability improvement utilizing additional moments. The manipulators apply forces at the object's grasp point. Applying forces at a point other than the object's center of gravity produces undesired moments. The undesired moment acts as an additional moment. It improves the capability of an individual manipulator and, hence, the entire collaborative group. Any improvements in task capability directly add up to the object and transportation capability. The group's enhanced capability also helps achieve optimal capability, optimal resource allocation, and maximum fault tolerance in object manipulation. Our simulation results show an improvement in the capability of 5.86 \% compared to when no moment is used to enhance the capability of the manipulators.
MetaCogAgent: A Metacognitive Multi-Agent LLM Framework with Self-Aware Task Delegation
Multi-agent large language model (LLM) systems have shown promise for solving complex tasks through agent collaboration. However, existing frameworks assign tasks based on predefined roles without considering whether an agent can accurately assess its own competence boundaries, leading to overconfident execution on tasks beyond its expertise. Inspired by metacognition theory from cognitive science, we propose MetaCogAgent, a multi-agent LLM framework where each agent is equipped with a Metacognitive Self-Assessment Unit that evaluates task-capability alignment before execution. The framework introduces three contributions: (1) a self-assessment mechanism that estimates per-task confidence by combining verbalized uncertainty with historical capability profiles; (2) an adaptive delegation protocol that routes low-confidence tasks to better-suited agents through cross-agent evaluation; and (3) a capability boundary learning module that iteratively refines each agent's competence model via cybernetic feedback. Experiments on our constructed MetaCog-Eval benchmark (700 tasks across 5 cognitive dimensions) demonstrate that MetaCogAgent achieves 82.4% task accuracy -- 8.7% above the best routing baseline -- while using 5% fewer API calls than AutoGen and 34% fewer than ensemble voting. Ablation studies confirm that each metacognitive component contributes to overall system performance.
comment: 6 pages, submitted to IEEE SMC 2026
Bimodal Synchronization Performance: Why Noise and Sparse Connectivity Can Improve Collective Timing
Pulse-coupled oscillator models inspired by firefly synchronization are widely used to study decentralized time coordination in distributed systems. We analyze a discrete-time, discrete-phase firefly-inspired synchronization model and show that collective synchrony emerges only near a critical balance between the quorum threshold (fraction of pulsing neighbors required to trigger a phase update) and the pulse duration (how long agents remain detectable to others). Within this parameter region, the system exhibits bimodal performance: it either reaches near-perfect synchronization or becomes trapped in stable multi-cluster states, where symmetrically phase-offset subgroups mutually reinforce one another and prevent global synchrony. Our analysis shows that reducing connectivity or introducing noise suppresses these low-performance states by breaking such symmetric interactions, indicating that highly connected or noiseless systems are not necessarily optimal for collective synchronization.
Stop Drawing Scientific Claims from LLM Social Simulations Without Robustness Audits
The scientific claims drawn from LLM social simulations should be no stronger than the robustness audits that support them. Generative agents bring new expressive power to agent-based modeling, enabling simulations of collective social processes like cooperation, polarization, and norm formation. Yet they also introduce complexity through additional architectural choices, such as agent specification, memory representation, interaction protocols, and environment design. Small perturbations that appear minor to researchers can cascade into macro-level outcomes through repeated interaction, creating a "butterfly effect." Consequently, scientific claims drawn from LLM social simulations may reflect implementation artifacts rather than the social mechanisms being modeled. We support this position with two case studies: a repeated Prisoner's Dilemma and a social media echo chamber simulation. Across multiple models, minor perturbations in persona format and game-instruction framing shift cooperation rates by up to 76 percentage points, while network homophily and hub assignment produce significant and consistent shifts in polarization metrics. We also find that sensitivity is unevenly distributed across both architectural choices and model families: the same perturbation that produces the 76 pp shift in one frontier model only shifts another by 1 pp. Robustness is therefore a property that should be measured per claim and per model, not assumed. To address this validation gap, we introduce TRAILS (Taxonomy for Robustness Audits In LLM Simulations), a robustness-audit taxonomy spanning three levels of simulation design: agent (micro-level), interaction (meso-level), and system (macro-level). We call for robustness to become a first-order validation requirement before LLM social simulations are used to explain mechanisms, evaluate interventions, or inform decisions.
GenoMAS: A Multi-Agent Framework for Scientific Discovery via Code-Driven Gene Expression Analysis
Gene expression analysis holds the key to many biomedical discoveries, yet extracting insights from raw transcriptomic data remains formidable due to the complexity of multiple large, semi-structured files and the need for extensive domain expertise. Current automation approaches are often limited by either inflexible workflows that break down in edge cases or by fully autonomous agents that lack the necessary precision for rigorous scientific inquiry. GenoMAS charts a different course by presenting a team of LLM-based scientists that integrates the reliability of structured workflows with the adaptability of autonomous agents. GenoMAS orchestrates six specialized LLM agents through typed message-passing protocols, each contributing complementary strengths to a shared analytic canvas. At the heart of GenoMAS lies a guided-planning framework: programming agents unfold high-level task guidelines into Action Units and, at each juncture, elect to advance, revise, bypass, or backtrack, thereby maintaining logical coherence while bending gracefully to the idiosyncrasies of genomic data. On the GenoTEX benchmark, GenoMAS reaches a Composite Similarity Correlation of 89.13% for data preprocessing and an F$_1$ of 60.48% for gene identification, surpassing the best prior art by 10.61% and 16.85% respectively. Beyond metrics, GenoMAS surfaces biologically plausible gene-phenotype associations corroborated by the literature, all while adjusting for latent confounders. Code is available at https://github.com/Liu-Hy/GenoMAS.
comment: 51 pages (14 pages for the main text, 10 pages for references, and 27 pages for the appendix)
Motion Planning of Cooperative Nonholonomic Mobile Manipulators
We propose a real-time implementable motion planning framework for cooperative object transportation by nonholonomic mobile manipulator robots (MMRs) in dynamic environments. Our global planner finds a path from start to goal through the static, obstacle-free regions in the environment and generates a set of convex, static, obstacle-free regions around the path using a novel, fast, and computationally lightweight ellipse-based technique. We introduce a nonlinear Model Predictive Control (NMPC) based real-time implementable planning technique that jointly plans feasible motion for the mobile base and the manipulator's arm and generates a kinodynamic feasible, collision-free trajectory for cooperative object transportation. Simulation and hardware experiments validate the efficiency of our proposed planning framework.
comment: Published in ASME Letters in Translational Robotics. This includes supplementary materials
Causal Influences over Social Learning Networks
This paper investigates causal influences between agents linked by a social graph and interacting over time. In particular, the work examines the dynamics of social learning models and distributed decision-making protocols, and derives expressions that reveal the causal relations between pairs of agents and explain the flow of influence over the network. The results turn out to be dependent on the graph topology and the level of information that each agent has about the inference problem they are trying to solve. Using these conclusions, the paper proposes an algorithm to rank the overall influence between agents to discover highly influential agents. It also provides a method to learn the necessary model parameters from raw observational data. The results and the proposed algorithm are illustrated by considering both synthetic data and real social media data.
comment: Accepted to the Journal of Machine Learning Research
MolClaw: An Autonomous Agent with Hierarchical Skills for Drug Molecule Evaluation, Screening, and Optimization
Computational drug discovery, particularly the complex workflows of drug molecule screening and optimization, requires orchestrating dozens of specialized tools in multi-step workflows, yet current AI agents struggle to maintain robust performance and consistently underperform in these high-complexity scenarios. Here we present MolClaw, an autonomous agent that leads drug molecule evaluation, screening, and optimization. It unifies over 30 specialized domain resources through a three-tier hierarchical skill architecture (70 skills in total) that facilitates agent long-term interaction at runtime: tool-level skills standardize atomic operations, workflow-level skills compose them into validated pipelines with quality check and reflection, and a discipline-level skill supplies scientific principles governing planning and verification across all scenarios in the field. Additionally, we introduce MolBench, a benchmark comprising molecular screening, optimization, and end-to-end discovery challenges spanning 8 to 50+ sequential tool calls. MolClaw achieves state-of-the-art performance across all metrics, and ablation studies confirm that gains concentrate on tasks that demand structured workflows while vanishing on those solvable with ad hoc scripting, establishing workflow orchestration competence as the primary capability bottleneck for AI-driven drug discovery.
comment: 28 pages, 8 figures. Code and data will be released
Mechanism Plausibility in Generative Agent-Based Modeling
Large language models (LLMs) can generate high-level diverse phenomena without explicitly programmed rules. This capability has led to their adoption within different agent-based models (ABMs) and social simulations. Recent studies investigate their ability to generate different phenomena of interest, for example, human behavior on social media platforms or alien behavior in game-theoretic scenarios. However, capability, prediction, and explanation are different--drawing from the philosophy of science and mechanisms literature, explanation requires showing, to some degree, how a phenomenon is produced by related organized entities and activities. For modelers, describing the characteristics of an experiment or whether a simulation provides progress in capability (or explanation), can be difficult without being grounded in potentially distant research areas. We integrate recent work on LLM-ABMs with contemporary philosophy of science literature and use it to operationalize a definition of 'plausibility' in a four-level scale. Our scale separates the evaluation of a model's generative sufficiency (ability to reproduce a phenomenon) from its mechanistic plausibility (how the phenomenon could be produced), and clarifies the distinct roles of different models, such as predictive and explanatory ones. We introduce this as the Mechanism Plausibility Scale.
comment: Accepted at ACM FAccT 2026
Systems and Control (EESS)
Architecture Dependent Temporal Observability Under Deployment Interference in Edge Inference Systems
Edge inference systems are typically evaluated with software-reported latency collected under controlled conditions. We argue, and demonstrate empirically, that deployment interference can corrupt not only the inference timing being measured but the timing observability infrastructure that measures it, and that the two failures can occur independently. We pair software-reported timing with externally observable GPIO intervals captured by a Saleae Logic Pro 8 logic analyzer on an NVIDIA Jetson Orin Nano, running MobileNetV2 under two inference architectures (TensorRT FP16 GPU and ONNX Runtime CPU) across baseline, light memory pressure, and storage writeback stress. Across 35 paired capture runs (3500 samples) plus 3 storage-stress runs where external pairing failed (300 software-only samples), we observe three findings the software-only view does not surface. (1) The two architectures differ not only in mean latency but in distributional structure: TensorRT baseline clusters tightly near 1.23 ms (run-mean SD 15 us) while ORT CPU baseline is multimodal with run-mean SD 31.8 ms. (2) Light memory pressure inflates TensorRT P99 from 1.28 ms to 1.61 ms, while one of five ORT memory-stress runs collapses into a deterministic 198 ms regime rather than uniformly inflating variance. (3) All three TensorRT storage-stress runs produce complete software timing logs (100/100 iterations) alongside externally observable timing failures of three different kinds (full post-marker collapse, ~40% transition loss, and complete acquisition failure) -- while the runtime reports normal completion in every case. We claim, narrowly, that timing observability is itself an interference-sensitive resource, and that summary statistics from a single timing source can hide failure modes an independent external observer makes visible.
comment: 13 pages, 7 figures, 1 table. Workshop preprint
Attention-Guided Fusion of 1D and 2D CNNs for Robust ECG-Based Biometric Recognition
Electrocardiogram (ECG)-based biometric recognition has emerged as a promising solution for secure authentication and liveness detection. However, most existing methods rely on unimodal deep learning architectures that independently process either one-dimensional (1D) temporal signals or two-dimensional (2D) time-frequency representations, limiting robustness and generalization. To address this issue, this paper proposes a hybrid framework integrating 1D and 2D convolutional neural networks (CNNs) within a unified end-to-end architecture. The 1D branch extracts temporal and morphological features from raw ECG signals, while the 2D branch captures discriminative spectral information from time-frequency representations. An attention-guided fusion mechanism dynamically weights both modalities according to input characteristics, overcoming the limitations of conventional static fusion strategies. The framework was evaluated on three benchmark datasets (ECG-ID, MIT-BIH, and PTB), including healthy subjects and patients with cardiac pathologies, achieving identification accuracies of 99.56%, 100.00%, and 99.89%, respectively. To assess long-term biometric permanence, experiments were also conducted on the multi-session Heartprint dataset spanning ten years. The proposed approach achieved same-session accuracies of 98.54% (S1), 99.09% (S2), 94.93% (S3R), and 96.08% (S3L), while cross-session evaluations reached 56.33% (S1-S2) and 53.27% (S2-S3R), demonstrating the ability to capture stable biometric signatures over time. The optimal configuration combines InceptionTime for 1D processing, ResNet-34 for 2D analysis, and attention-based fusion. Ablation studies confirm that the proposed attention mechanism consistently outperforms conventional fusion approaches. Overall, the proposed framework provides a robust, scalable, and high-performance solution for ECG biometric recognition.
Distributed Synthesis of Gray-Box Distributed H2 Controllers
Distributed controller synthesis offers scalable and privacy-preserving control design, but typical state-of-the-art approaches either assume white-box models or resort to centralized synthesis. In this paper, we combine partially known model knowledge and an input-state dataset within a distributed gray-box scheme to design \(\mathcal{H}_2\) controllers. Our method can handle unknown dynamics and offers scalable synthesis. Each agent communicates with a set of neighbors determined by the physical coupling topology of the system such that we can apply the Alternating Direction Method of Multipliers (ADMM) to solve the problem iteratively in a fully distributed fashion (i.e., without a central server). The effectiveness and flexibility of the proposed approach is demonstrated in simulations of the IEEE 39-bus power system test case.
comment: Accepted for presentation at the 23rd IFAC World Congress, Busan, South Korea, 2026
Distributed Synchronisation of Heterogeneous Dynamical Networks With Nonlinear Diffusive Couplings
This letter investigates the problem of output synchronisation in heterogeneous dynamical networks with nonlinear diffusive couplings in the presence of disturbances on the coupling links. By exploiting relative dissipativity properties between adjacent agents, distributed conditions are established to guarantee output synchronisation. Specifically, these conditions can be verified using only local information associated with neighbouring agents and coupling links. As an illustration, a heterogeneous network of Goodwin oscillators is considered, where the relative dissipativity properties between neighbouring oscillators are characterised and used to analyse synchronisation.
Distributed 3D Leader-Follower Formation Control with Field-of-View Safety via Control Barrier Functions
This letter proposes a distributed 3D leader-follower formation (3D-LFF) control framework for multi-UAV systems that achieves formation tracking while enforcing perception safety constraints. Maintaining safe, vision-based 3D-LFF is challenging because onboard cameras impose strict Field-of-View (FOV) limitations, and demanding formation commands can drive the leader outside the follower's camera frustum, resulting in loss of visibility. To address this issue, we develop a perception-aware safe control architecture that guarantees visibility by construction. First, we derive a relative kinematic model in a line-of-sight coordinate representation and design a distributed 3D-LFF tracking controller using only locally available relative states. Next, we embed the nominal formation controller within a Control Barrier Function-based Quadratic Program (CBF-QP) safety filter that minimally modifies the commanded velocities to maintain the leader inside the follower's camera frustum while preserving formation tracking whenever feasible. Gazebo simulations and Crazyflie hardware experiments validate the proposed approach, demonstrating accurate formation tracking and effective FOV enforcement, including scenarios in which the nominal desired formation conflicts with visibility constraints.
comment: 9 page
Robust Soft-Constrained Spatially Selective Active Noise Control for Hearables Under Secondary Path Variations
Spatially selective active noise control (SSANC) hearables aim to attenuate noise from certain directions at the eardrum while preserving desired speech arriving from selected directions. Existing SSANC systems typically assume an accurate estimate of the secondary path from the loudspeaker to the inner error microphone. In practice, however, this path varies across users and device fits, which can degrade performance and compromise system stability. This paper proposes a robust soft-constrained optimization framework that computes a single control filter by minimizing the average cost over a set of secondary path estimates derived from human measurements. Simulations and experiments on a real-time control platform show that the proposed approach slightly reduces mean performance relative to the matched case but substantially narrows the performance spread under secondary path mismatch. The proposed framework therefore provides a practical design strategy when accurate secondary path estimates are unavailable.
comment: Submitted to the 19th International Workshop on Acoustic Signal Enhancement (IWAENC 2026)
Latency-Aware Deep Learning Benchmark for Real-Time Cyber-Physical Attack and Fault Classification in Inverter-Dominated Power Grids
This work introduces a latency-aware benchmarking framework for evaluating deep learning models in power system anomaly detection using high-fidelity, time-domain signals generated from an industry-grade electromagnetic transient simulator. Eight neural network architectures, ranging from MLPs to Transformers, were systematically evaluated on streaming datasets representing both physical faults and cyber-attacks in inverter-dominated networks. All models successfully classified two representative multi-event sequences in real time with sub-cycle response times below 15 ms. However, although classification decisions occurred within one cycle, the end-to-end inference latency consistently exceeded three cycles, ranging from 50 to 90 ms. These results highlight a critical gap between algorithmic capability and protection-grade deployment, pointing to the need for further optimization and hardware acceleration. The findings establish a reproducible benchmark for sub-cycle anomaly detection and provide guidance for transitioning machine learning methods from research prototypes to real-world protection applications.
Handling Control System Uncertainty
Control science is a core representative of the third industrial revolution and is so important to modern civilization. Control systems are the main subject of control science and may involve many aspects of consideration, such as hardware consideration, software consideration, operation consideration, maintenance consideration, economy consideration, society consideration. However, besides all such aspects of consideration, one aspect that is most essential to the control system is methodology consideration in mathematical sense, knowledge on which is what we refer to as control theory. Besides its importance from the mathematical perspective, control theory is even more charming as it is deeply rooted in practical applications. Charms of control theory consist in both know-why and know-how and it is the fusion of control theory and practical applications that highlights such charms. Control theory for practical applications, especially when somewhat with so-called ``advanced'' flavour, involves several fundamental aspects. This article introduces the Handling Control System Uncertainty aspect of Advanced Control Theory for Practical Applications.
Generating Realistic Safety-Critical Scenarios for Vehicle-Pedestrian Interactions
Automated driving system deployment requires rigorous validation across safety-critical vehicle-pedestrian interactions, yet real-world datasets rarely capture high-risk scenarios while simulation platforms lack realistic behavior. In response, this study proposes a three-stage framework that combines real-world grounding with adaptive simulation to generate behaviorally realistic safety-critical scenarios at scale. Stage 1 pre-trains multi-agent state-space Transformer-enhanced DDPG (MA-SST-DDPG) agents on real-world safety-critical data to learn human-like interactive evasive behaviors through data-driven learning. Stage 2 deploys pre-trained multi-agents in CARLA for online reinforcement learning to generalize across diverse scenarios, integrating real-world knowledge with simulation experience to produce a refined MA-SST-DDPG model. Stage 3 uses CARLA with the refined model to generate over 198,000 high-resolution interaction episodes from eight intersection scenarios, culminating in the Vehicle-Pedestrian Safety-Critical Interaction (VPSCI) dataset. The Refined MA-SST-DDPG model outperformed baseline methods in reproducing realistic evasive behaviors, achieving the lowest trajectory errors (ADE = 0.072 m, FDE = 0.142 m). Statistical comparison confirmed distributional equivalence between the generated and real-world data in both conflict severity and behavioral response. A Turing test confirmed that the three-stage framework generated evasive behaviors were indistinguishable from real-world interactions. These results demonstrate the framework's effectiveness in producing high-fidelity safety-critical data, offering valuable sources for the development of ADS and simulation-based safety evaluations.
comment: 49 pages, 13 figures, 11 table
Revisiting the Voltage-Source Behavior: Why Impedance Magnitude of Grid-Forming Converter Rises Near Fundamental Frequency?
Grid-forming (GFM) converters are generally expected to exhibit low impedance near the fundamental frequency due to their voltage-source behavior. However, an impedance peak and a negative-resistance region are consistently observed in this range, which contradicts this expectation and lacks a clear physical explanation. This paper reveals that these phenomena originate from the inherent dynamics of the active power control loop, where the mapping from power disturbance to the synchronous angle inherently involves an integrative action, intrinsically preventing a positive-resistance characteristic near the fundamental frequency. This finding explains why existing grid codes in China, the United States, and Europe exclude a narrow band around the fundamental frequency in impedance-based evaluations. It is further shown that the width of the excluded frequency band (e.g., +/- 3~5 Hz) is governed by the power-to-frequency dynamics. Based on this insight, a quantitative index is proposed to determine the exclusion bandwidth from the corner frequencies of the impedance magnitude curve. The proposed index provides a concise and theoretically grounded criterion for voltage-source assessment and impedance standardization of GFM converters.
Descriptive versus Regulatory Uncertainty in Bounded Predictive Systems
Any system that models the world under finite representational capacity must compress; any compression entails a prior; and the prior is the system's bias. What has not been established is whether uncertainty participates in the dynamics governing future behavior, or merely describes the output distribution without consequence. We introduce a structural distinction between descriptive uncertainty, which does not recursively modulate the system's policy, and regulatory uncertainty, which directly enters the optimization landscape and drives persistent adaptive restructuring. We prove formally that current transformer architectures are confined to descriptive uncertainty at inference. We ground this in thermodynamics via Landauer's principle: for uncertainty to be regulatory, epistemic error must cost real energy; in a decoupled system, hallucinations and correct derivations dissipate identical energy. We test this empirically across three locally-deployed language models (3B, 8B, 70B parameters). Token-level Shannon entropy is statistically invariant across tasks spanning pattern retrieval, causal operator application, and out-of-distribution causal generalization in all three models (all pairwise p >= 0.568; within-model ranges 0.011-0.028 nats), while task accuracy varies substantially across the same conditions (0%-100%). Entropy and accuracy are orthogonal. The decoupling is scale-invariant: larger models achieve higher accuracy but identical entropy flatness. This structural incapacity is not resolvable by additional parameters or training data. Genuine epistemic grounding requires physical coupling between thermodynamic substrate state and information processing cost.
DexWild: Dexterous Human Interactions for In-the-Wild Robot Policies
Large-scale, diverse robot datasets have emerged as a promising path toward enabling dexterous manipulation policies to generalize to novel environments, but acquiring such datasets presents many challenges. While teleoperation provides high-fidelity datasets, its high cost limits its scalability. Instead, what if people could use their own hands, just as they do in everyday life, to collect data? In DexWild, a diverse team of data collectors uses their hands to collect hours of interactions across a multitude of environments and objects. To record this data, we create DexWild-System, a low-cost, mobile, and easy-to-use device. The DexWild learning framework co-trains on both human and robot demonstrations, leading to improved performance compared to training on each dataset individually. This combination results in robust robot policies capable of generalizing to novel environments, tasks, and embodiments with minimal additional robot-specific data. Experimental results demonstrate that DexWild significantly improves performance, achieving a 68.5% success rate in unseen environments-nearly four times higher than policies trained with robot data only-and offering 5.8x better cross-embodiment generalization. Video results, codebases, and instructions at https://dexwild.github.io
comment: In RSS 2025. Website at https://dexwild.github.io
Zonotope-Based Elastic Tube Model Predictive Control
Tube-based Model Predictive Control (MPC) is a widely adopted robust control framework for constrained linear systems under additive disturbance. The paper is focused on reducing the numerical complexity associated with the tube parameterization, described as a sequence of elastically-scaled zonotopic sets. A new class of scaled-zonotope inclusion conditions is proposed, alleviating the need for a priori specification of certain set-containment constraints and achieving significant reductions in complexity. A comprehensive complexity analysis is provided for both the polyhedral and the zonotopic setting, illustrating the trade-off between an enlarged domain of attraction and the required computational effort. The proposed approach is validated through extensive numerical experiments.
Density-Ratio Weighted Behavioral Cloning: Learning Control Policies from Corrupted Datasets
Offline reinforcement learning (RL) enables policy optimization from fixed datasets, making it suitable for safety-critical applications where online exploration is infeasible. However, these datasets are often contaminated by adversarial poisoning, system errors, or low-quality samples, leading to degraded policy performance in standard behavioral cloning (BC) and offline RL methods. This paper introduces Density-Ratio Weighted Behavioral Cloning (Weighted BC), a robust imitation learning approach that uses a small, verified clean reference set to estimate trajectory-level density ratios via a binary discriminator. These ratios are clipped and used as weights in the BC objective to prioritize clean expert behavior while down-weighting or discarding corrupted data, without requiring knowledge of the contamination mechanism. We establish theoretical guarantees showing convergence to the clean expert policy with finite-sample bounds that are independent of the contamination rate. A comprehensive evaluation framework is established, which incorporates various poisoning protocols (reward, state, transition, and action) on continuous control benchmarks. Experiments demonstrate that Weighted BC maintains near-optimal performance even at high contamination ratios outperforming baselines such as traditional BC, batch-constrained Q-learning (BCQ) and behavior regularized actor-critic (BRAC).
Motion Planning of Cooperative Nonholonomic Mobile Manipulators
We propose a real-time implementable motion planning framework for cooperative object transportation by nonholonomic mobile manipulator robots (MMRs) in dynamic environments. Our global planner finds a path from start to goal through the static, obstacle-free regions in the environment and generates a set of convex, static, obstacle-free regions around the path using a novel, fast, and computationally lightweight ellipse-based technique. We introduce a nonlinear Model Predictive Control (NMPC) based real-time implementable planning technique that jointly plans feasible motion for the mobile base and the manipulator's arm and generates a kinodynamic feasible, collision-free trajectory for cooperative object transportation. Simulation and hardware experiments validate the efficiency of our proposed planning framework.
comment: Published in ASME Letters in Translational Robotics. This includes supplementary materials
A Sliced Learning Framework for Online Disturbance Identification in Quadrotor SO(3) Attitude Control
This paper introduces a dimension-decomposed geometric learning framework called Sliced Learning for disturbance identification in quadrotor geometric attitude control. Instead of conventional learning-from-states, this framework adopts a learning-from-error strategy by using the Lie-algebraic error representation as the input feature, enabling axis-wise space decomposition (``slicing") while preserving the SO(3) structure. This is highly consistent with the geometric mechanism of cognitive control observed in neuroscience, where neural systems organize adaptive representations within structured subspaces to enable cognitive flexibility and efficiency. Based on this framework, we develop a lightweight and structurally interpretable Sliced Adaptive-Neuro Mapping (SANM) module. The high-dimensional mapping for online identification is axially ``sliced" into multiple low-dimensional submappings (``slices"), implemented by shallow neural networks and adaptive laws. These neural networks and adaptive laws are updated online via Lyapunov-based adaptation within their respective shared subspaces. To enhance interpretability, we prove exponential convergence despite time-varying disturbances and inertia uncertainties. To our knowledge, Sliced Learning is among the first frameworks to demonstrate lightweight online neural adaptation at 400 Hz on resource-constrained microcontroller units (MCUs), such as STM32, with real-world experimental validation.
comment: v4: This version has been accepted for publication in IEEE/ASME Transactions on Mechatronics (TMECH). Supplementary video links have also been added
Explainable LP-MPC: Shadow Price Contributions Reveal MV-CV Pairings
In the process industries, MPC (Model Predictive Control) is typically implemented as a two-stage controller with a Linear Program (LP) steady-state optimizer that generates economically optimal targets for the MPC algorithm. Abnormal behaviors in industrial LP optimizers are often difficult to rationalize, especially when a large number of manipulated variables (MVs) and controlled variables (CVs) are involved. We introduce a novel, post-hoc LP explainability method by recasting the role of shadow prices in the LP solution as an attribution mechanism for MV-CV relationships. The core idea is that the shadow price of a constrained CV is not just an intrinsic property of the LP solution, but can be split into contributions from individual unconstrained MVs and resolved into one-to-one MV-CV pairings using a linear sum assignment algorithm. The proposed MV-CV pairing framework serves as a practical explainability tool for online LP-MPC systems, enabling practitioners to diagnose suboptimal constraints and verify alignment of the controller's behavior with its original design.
comment: Accepted at the 2026 IFAC World Congress
Robotics
Contrastive Conceptor Activation Steering (COAST): Unlocking Vision-Language-Action Models through Hidden States NeurIPS 2026
Vision-Language-Action (VLA) models leverage powerful perceptual priors from web-scale Vision-Language Model (VLM) pre-training, yet they remain surprisingly brittle in practice, frequently failing at simple robotic tasks. To mitigate this, we propose Contrastive Conceptor Activation Steering (COAST). COAST builds on the notion of a "conceptor", a linear operator that soft-projects data into the principal components of a target distribution. COAST uses conceptors to identify success-critical subspaces for a target robotic task from a few examples of success and failure rollouts. At inference time, it steers VLA latents into these identified success subspaces to improve task outcomes. Across three architecturally distinct neural policies (flow-matching VLA, autoregressive VLA, and Diffusion Policy), COAST improves absolute mean simulation and real-robot task success rate by over 20 and 40% respectively. The activation subspace geometry reveals that failure modes share substantial structure across tasks while success representations remain largely task-specific. When tasks share similar failure modes, this structure enables previously fitted conceptors to improve performance on new tasks without refitting. Ultimately, our results suggest that current VLAs retain substantial task-relevant knowledge in their latent representations, and that the action expert's decoding bottleneck could be mitigated by steering its residual stream toward task-relevant subspaces. COAST provides a lightweight, training-free path to unlocking these latent capabilities by steering the model towards its own "success" distributions.
comment: Submitted to NeurIPS 2026
ATRACT: A Trustworthy Robotic Autonomous system to support Casualty Triage
At a time when drones are increasingly associated with hostile operations, we re-purpose them for humanitarian and life-saving applications. However, adapting search and rescue drones for battlefield triage remains extremely challenging; the technology must perform reliably to support frontline medics who are forced to operate under extreme uncertainty, restricted access, and significant personal risk. Due to growing vulnerabilities of casualty evacuation in conflicting zones, this paper presents ATRACT (A Trustworthy Robotic Autonomous system to support Casualty Triage), a novel human-in-the-loop decision support system to enable early battlefield triage during the critical post-trauma period. ATRACT integrates drone-captured video with wearable sensor input for multi-modal learning to support casualty-state assessment, thereby addressing the limitations of existing systems. Drone video captures fine-grained behavioural cues, such as pose, posture, while body-worn sensors provide complementary physiological signals, including heart rate, breathing rate, and movement. By combining two modalities, ATRACT provides evidence to support the early judgement of medics when direct access to the casualty is delayed, risky, or restricted. To mitigate the data realism gap pertaining to injured actions, a conditional variational autoencoder is devised for data augmentation. Experimental results on our drone captured dataset show that proposed pipeline achieves 85.7% accuracy for action classification; while our lightweight CNN visual encoder remains competitive with stronger pre-trained video backbones. Overall, the results support ATRACT as a practically meaningful step towards remote triage in contested environments, where multi-modal sensing, human oversight and trustworthy decision support can improve casualty prioritisation, and lessen the exposure of frontline medics.
How to Instruct Your Robot: Dense Language Annotations Power Robot Policy Learning
Scaling robot policy learning is bottlenecked by the cost of collecting demonstrations, while language annotations for existing demonstrations are comparatively cheap. We study language density as a lever for extracting more signal from a fixed robot or egocentric-video corpus. We introduce DeMiAn (Dense Multi-aspect Annotation), a two-stage approach that first re-labels demonstration segments with VLM-generated annotations along four complementary aspects: physical motion, scene composition, arm pose, and reasoning. A learned instructor then maps a task description and initial scene snapshot to a task-appropriate annotation at deployment, running asynchronously so generation latency is hidden behind policy execution. Across over 1M robot manipulation clips and 50K EgoVerse human-egocentric videos, DeMiAn improves both a vision-language-action policy and a video-based world-action model without collecting new demonstrations. On RoboCasa, the instructor raises success by 5 points over a task-only baseline and comes within 3 points of a per-task oracle. No fixed annotation aspect dominates across tasks, showing that selecting the right dense language matters. DeMiAn also improves composite-task and out-of-distribution performance, and shifts the compute-performance frontier in both mid-training and post-training after accounting for annotation-generation FLOPs. These results position dense re-annotation as a practical scaling lever for robot policy learning.
Generalizable and Actionable Parts Pose Estimation with Symmetry Annotation-Free Learning Strategy ICML 2026
Urgently needed generalizable robot object interaction and manipulation requires high-quality Cross-Category object perception. As a pioneer of this area, Generalizable and Actionable Parts (GAParts) understanding has attracted increasing attention from relevant researchers. However, most recent works either have insufficient design regarding the symmetry issue or require rich symmetry annotation, which severely impedes precise GAPart pose estimation in data-lacking scenarios. In this paper, we propose SAFAG, a novel Symmetry Annotation-Free framework for Generalizable and Actionable Parts Pose Estimation. Specifically, we suggest a stepwise refinement two-stage framework for candidate-to-final quaternion regression, and tackle the symmetry prediction as a probability distribution problem with self-supervised learning strategy. The experimental results demonstrate the superior performance and robustness of our SAFAG. We believe that our work has the enormous potential to be applied in many areas of embodied AI system.
comment: Accepted as a poster at the Forty-third International Conference on Machine Learning (ICML 2026)
NORM-Nav: Zero-Shot Mobile Robot Navigation with Natural Language Behavioral Constraints
Mobile robots operating in human-centered environments must generate not only collision-free paths but also trajectories that follow local behavioral conventions. Conventional costmap-based navigation emphasizes geometric feasibility and often overlooks such requirements, which can result in socially inappropriate behaviors. This paper presents NORM-Nav, a zero-shot framework that integrates natural language behavioral constraints into costmap-based planning. An LLM parses each instruction into structured constraints and grounds them using real-time vision--LiDAR perception. These constraints are encoded as multi-layer costmaps that represent geometric, semantic, directional, and velocity cues and are directly compatible with standard grid-based planners. Simulation and real-world experiments indicate that NORM-Nav improves task success rates and produces trajectories closer to human references than representative baselines. The project website is available at https://ei-nav.github.io/NORM-Nav.
MORN: Metacognitive Object-Goal Regulation for Resource-Rational Long-Horizon Navigation
Robots deployed in unstructured human environments must frequently execute long-horizon missions, such as find the mug, then the chair, then the printer, under strict operational constraints. While contemporary zero-shot Object Navigation (ObjectNav) agents leverage Vision-Language Models (VLMs) to effectively localize semantic targets, they operate as purely reactive systems that inherently lack global resource awareness. Consequently, these agents inadvertently exhaust critical budgets, including time and battery, on infeasible subgoals due to partial observability, failing to balance local exploration with global mission viability. To bridge this gap by injecting resource-rationality into the navigation loop, we present MORN (Metacognitive Object-goal Regulation Navigation), an executive architecture inspired by Dual-Process Theory in cognitive science. MORN augments frozen navigation backbones with a System 2 meta-controller that continuously monitors the System 1 locomotor. By formalizing three neuro-cognitive states, Potentiality Index, Persistence Gating, and Evidence Accumulation, MORN dynamically regulates the mission schedule based on online estimates of progress velocity and perceptual uncertainty. This mechanism effectively neutralizes the Sunk Cost Fallacy, enabling agents to abort zombie goals early and decisively commit to achievable ones. Extensive experiments on the HM3D dataset demonstrate that MORN improves Goal Completion Rate (CR) from 0.23 to 0.30 and reduces Wasted Step Fraction (WSF) from 0.90 to 0.70, establishing that in resource-constrained autonomy, the metacognitive awareness of global resources is as critical as the reactive ability to navigate.
Beyond Safety Filtering: Control Barrier Function-Informed Reinforcement Learning for Connected and Automated Vehicles SC 2026
Reinforcement Learning (RL) uses rewards to guide learning, yet reward design is typically hand-crafted using heuristics that can be difficult to tune. We propose a Control Barrier Function (CBF)-informed reward design for Multi-Agent RL (MARL) that converts CBF constraint values under joint MARL actions into a reward signal that explicitly guides safe learning. We compare against two heuristic reward baselines in a four-way multi-lane intersection with connected and automated vehicles. Results show that our method achieves the highest task performance and is less sensitive to reward hyperparameters, yielding consistently strong performance across the tested hyperparameter range. Code for reproducing the experimental results and a video demonstration are available at https://github.com/bassamlab/SigmaRL.
comment: This paper has been accepted for publication in the Proceedings of the 2026 IEEE International Conference on Intelligent Transportation Systems (ITSC 2026)
SADP: Subgoal-Aware Diffusion Policy for Explainable Robots Learned from Foundation Model Generated Demonstrations
Explainable robots require not only successful task execution but also the ability to expose internal decision-making process in a user-friendly manner. However, most imitation learning methods are trained solely on task-level demonstrations, without explicitly modeling subgoal structure or execution progress. This limitation is further exacerbated by the scarcity of subgoal-level supervision in standard robot learning datasets, which restricts the development of robots that can convey the subtasks they are executing during long-horizon manipulation. To address this issue, this paper proposes Subgoal-Aware Diffusion Policy (SADP), a framework that leverages foundation models to autonomously generate subgoal-annotated demonstrations and trains diffusion policies on these datasets. SADP structures policy execution around human-interpretable subgoals by conditioning action generation on both task-level and subgoal-level descriptions. A lightweight auxiliary head further predicts subgoal completion states, allowing the robot to expose its current execution stage and monitor subgoal progression. Experiments in RLBench simulations and real-world evaluations on a UR5e robot demonstrate that SADP achieves higher task success rates than strong task-conditioned diffusion baselines, while providing subgoal-level execution signals for monitoring progress and diagnosing failures. These results highlight that built-in, rather than post-hoc, interpretability can coexist with high task performance.
SSTL: Self-Sensing Tendon Loop for Hysteresis Modeling and Compensation in Tendon-Sheath Mechanisms
Flexible endoscopic robots enable minimally invasive access through natural orifices, but their control accuracy is limited by configuration-dependent hysteresis in the tendon-sheath mechanisms (TSMs). Tendon-sheath friction and tendon elasticity induce a systematic discrepancy between the proximal actuation input and distal output, and this discrepancy varies with the insertion tube configuration. To address this challenge, this paper proposes the Self-Sensing Tendon Loop (SSTL), a double-pass tendon loop routed through the insertion tube and wrapped around a distal pulley, and returned to the proximal end. The loop structure allows both the input and output tensions of the SSTL to be measured proximally, thereby providing an input-output tension profile without requiring distal force or fiber-optic sensors. Because the SSTL shares the same routing path as the actuation TSM, the two TSMs exhibit strongly correlated hysteresis behaviors. From the SSTL tension profile, a learning-based mapping estimates the configuration-dependent hysteresis parameters of the actuation TSM, which are then used by a feedforward controller to compensate for actuation hysteresis. We validate the proposed method by tracking actuation tendon tension under three different insertion tube configurations. Across sinusoidal and random trajectories, the proposed method reduces average RMSE by 88.1% compared with the uncompensated baseline, achieving 97.8% of the performance of direct identification, which requires direct measurement of the input and output tension profile of the actuation TSM.
comment: 8 pages, 7 figures, 4 tables
Plan First, Diffuse Later: Extrinsic Graph Guidance for Long-Horizon Diffusion Planning
Compositional diffusion models offer a promising route to long-horizon planning by denoising multiple overlapping sub-trajectories while ensuring that together they constitute a global solution. However, enforcing local behavior over long chains is often insufficient for a coherent global structure to emerge. Recent works tackle this limitation through intrinsic search, which explores multiple paths during the denoising process. While intrinsic search improves global coherence, it comes at the cost of repeated evaluations of an already compute-heavy model. In this work, we argue that extrinsic search, performed outside the denoising process, offers a more effective mode of exploration for long-horizon planning while naturally enabling the use of classical algorithms to solve unseen combinatorial tasks at test time. Our eXtrinsic search-guided Diffuser (XDiffuser) first computes a plan over a state-space graph -- serving as a lightweight local connectivity oracle for the diffusion model. The plan is then used to guide denoising for a single trajectory, effectively offloading the burden of exploration. XDiffuser outperforms diffusion-based baselines on long-horizon tasks, with particularly large gains in the low-quality data regime and on unseen tasks beyond goal-reaching, including multi-agent coordination and TSP-style reasoning. Project website: https://yanivhass.github.io/XDiffuser-site/
Pedestrian-Aware LLM-Driven Behavioral Planning for Autonomous Vehicles SC
Autonomous Vehicles (AVs) must make reliable decisions in dense urban environments where pedestrian behavior is variable, sometimes abnormal, and often unseen during training. Reinforcement learning (RL)-based AV control systems perform well in structured traffic but struggle to generalize to unpredictable pedestrian interactions and out-of-distribution scenarios. Their reliance on handcrafted rewards and opaque decisions further limits their suitability for safety-critical, pedestrian-rich environments. To address these limitations, we introduce a Large Language Model (LLM)-based decision-making framework for pedestrian-aware behavioral planning. The system converts structured scene observations into natural-language reasoning prompts, enabling the LLM to infer pedestrian intent, anticipate risk, and generate cautious tactical driving decisions. These decisions are executed by a motion planner that ensures smooth, kinematically feasible control. We evaluate the framework in SUMO across multiple pedestrian-interaction scenarios, including unexpected jaywalking, turn-back crossing, hesitation, and bidirectional crossing. In zero-shot evaluation, the LLM-based agent achieves a 68% collision-free success rate, substantially outperforming deep RL baselines (17.7%). With few-shot episodic memory in a single-pedestrian scenario, performance increases to 96.0%, exceeding a custom DQN controller (82.0%). Cross-behavior evaluation further shows that memory derived from turn-back interactions transfers to unseen hesitation and bidirectional crossing scenarios, achieving 82.0% and 90.0% success, respectively. The system consistently initiates earlier responses, maintains wider safety buffers, and produces interpretable, human-aligned decisions.
comment: This paper has been accepted for presentation at the 29th IEEE International Conference on Intelligent Transportation Systems (ITSC)
"I'm Not Mad, Just Focused'': Understanding Human Emotions in Human-Robot Collaboration
Human-robot collaboration (HRC) can benefit from robots' abilities to interpret human emotional states. However, current emotion recognition (ER) models in HRC often fall short, particularly due to their reliance on acted datasets and single-modality inputs like facial expressions. We propose a novel vision language model (VLM)-based ER system that leverages contextual understanding to improve emotion interpretation in HRC. We first evaluate the VLM-ER system by assessing its semantic and sentiment similarity with human annotations on an existing HRC dataset. Then, in a user study with a service robot in a collaborative delivery task, we evaluate the effects of modulating the robot's behaviour based on the user's emotional state inferred by the VLM-ER system. The results show that the proposed VLM-ER system achieves higher semantic similarity and positive sentiment alignment with human annotations compared to a baseline convolutional neural network-based system. Further, participants in the user study preferred emotion-adaptive robot behaviour facilitated by the VLM-ER system.
EgoKit: Towards Unified Low-Cost Egocentric Data Collection with Heterogeneous Devices
Egocentric video is increasingly used as a data source for robot learning, activity understanding, and embodied AI research, but collecting it at scale remains fragmented in practice: each candidate host device, such as an Android phone, iPhone, iPad, smart glasses, or extended reality (XR) headset, exposes a different SDK, a different policy on raw camera access, and different limitations on external USB cameras and on-device tracking. Synchronized ego-view and wrist-view capture is therefore typically obtained by either committing to a single proprietary platform or building one-off rigs that do not transfer across devices. To address this gap, we present EgoKit, a toolkit that exposes the same egocentric recording workflow across six heterogeneous host devices. Across all supported devices, EgoKit presents the same recording interaction and produces locally stored video with a uniform log format; on XR headsets, it additionally logs head pose and OpenXR-standard 26-joint hand tracking aligned to the video streams. The companion accessories, including two wrist cameras with mounts, a head strap, and a USB-C hub, add wrist-view capture to any supported host without custom hardware fabrication. EgoKit is available at \url{https://egokit.chuange.org/}.
LACE: Latent Visual Representation for Cross-Embodiment Learning
Cross-embodiment learning from human demonstrations is hindered by the visual gap between human and robot embodiments. While self-supervised learning (SSL) backbones encode rich inter-class semantics of general objects, we show they fail to establish correspondence between human and robot hands. We propose LACE, a framework that aligns human and robot visual representations in the latent space of these backbones by leveraging correspondences between shared body parts across embodiments as sparse supervision. These annotations can be automatically obtained via forward kinematics, and single robot demonstration is sufficient to train the model. Our semantic alignment loss matches distributions incurred by corresponding features, lifting patch-level supervision to semantic-level alignment, while a Gram loss preserves pretrained feature quality. This alignment enables robot policies to leverage abundant human data when robot demonstrations are scarce: in zero-shot transfer, policies using LACE-DINO outperform those using DINO by a large margin (65\%), with consistent gains in low-data regimes and out-of-distribution environments.
DriveSafer: End-to-End Autonomous Driving with Safety Guidance
End-to-End (E2E) autonomous driving models have shown growing capability in recent years, with performance improving on increasingly challenging benchmarks. However, modern generative E2E planners still suffer from a substantial number of catastrophic failures in safety-critical scenarios. We find that many such failures arise from violations of physical constraints and safety requirements, leading to unsafe behavior. Motivated by this finding, in this paper, we focus on improving safety outcomes in generative end-to-end driving with a targeted reduction of catastrophic planning failures, instead of enhancing average planning quality. Towards this end, we propose DriveSafer, a failure-aware safety framework for end-to-end planners. DriveSafer explicitly steers generative planners towards safe behaviors leveraging both training-time safety constraints and inference-time safety guidance. Compared to the state-of-the-art DiffusionDrive model, on the NAVSIM benchmark, DriveSafer reduces the number of catastrophic failures (PDMS=0) by 48%, with over 65% reduction in drivable-area compliance failures.
DynoSLAM: Dynamic SLAM with Generative Graph Neural Networks for Real-World Social Navigation
Traditional Simultaneous Localization and Mapping (SLAM) algorithms rely heavily on the static environment assumption, which severely limits their applicability in real-world spaces populated by moving entities, such as pedestrians. In this work, we propose DynoSLAM, a tightly-coupled Dynamic GraphSLAM architecture that integrates socially-aware Graph Neural Networks (GNNs) directly into the factor graph optimization. Unlike conventional approaches that use rigid constant-velocity heuristics or deterministic single-agent neural priors, our framework formulates pedestrian motion forecasting as a stochastic World Model. By utilizing Monte Carlo rollouts from a trained GNN, we capture the multimodal epistemic uncertainty of human interactions and embed it into the SLAM graph via a dynamic Mahalanobis distance factor. We demonstrate through extensive simulated experiments that this stochastic formulation not only maintains highly accurate retrospective tracking but also prevents the optimization failures caused by the deterministic "argmax problem". Ultimately, extracting the empirical mean and covariance matrices of future pedestrian states provides a mathematically rigorous, probabilistic safety envelope for downstream local planners, enabling anticipatory and collision-free robot navigation in densely crowded environments.
comment: Code & Project page at https://github.com/makriot/dynoslam
Efficient Trajectory Optimization for Autonomous Racing via Formula-1 Data-Driven Initialization
Trajectory optimization is a central component of fast and efficient autonomous racing. However practical optimization pipelines remain highly sensitive to initialization and may converge slowly or to suboptimal local solutions when seeded with heuristic trajectories such as the centerline or minimum-curvature paths. To address this limitation, we leverage expert driving behavior as a initialization prior and propose a learning-informed initialization strategy based on real-world Formula~1 telemetry. To this end, we first construct a multi-track Formula~1 trajectory dataset by reconstructing and aligning noisy GPS telemetry to a standardized reference-line representation across 17 tracks. Building on this, we present a neural network that predicts an expert-like raceline offset directly from local track geometry, without explicitly modeling vehicle dynamics or forces. The predicted raceline is then used as an informed seed for a minimum-time optimal control solver. Experiments on all 17 tracks demonstrate that the learned initialization accelerates solver convergence and significantly reduces runtime compared to traditional geometric baselines, while preserving the final optimized lap time.
Adaptive Control in Autonomous Driving via Real-Time Recurrent RL
We study online fine-tuning of pretrained control policies for autonomous driving using Real-Time Recurrent Reinforcement Learning (RTRRL), a memory-efficient algorithm that updates policy parameters at every time step without backpropagation through time. We extend RTRRL to support LrcSSM, a recently proposed nonlinear diagonal state-space model, and combine offline behavioral cloning with online RTRRL fine-tuning to adapt policies to distribution shifts at deployment. We validate the approach in the CarRacing simulation and on a 1:10-scale RoboRacer platform equipped with an event camera, where a pretrained policy is fine-tuned online during real-world line-following. To our knowledge, this is the first demonstration of online RL fine-tuning with event-camera observations on standard (non-spiking) hardware in closed-loop control. LrcSSM-based policies improve fastest and most consistently across both settings.
Action-Gradient Monte Carlo Tree Search for Non-Parametric Continuous (PO)MDPs
Online planning in continuous state, action, and observation spaces remains challenging for autonomous systems. While Monte Carlo Tree Search (MCTS) scales effectively via sampling, most continuous (PO)MDP solvers do not exploit gradient-based action optimization. We propose Action-Gradient MCTS (AGMCTS), a framework that combines global tree search with local gradient-based action refinement, while maintaining consistent value estimates. We provide three key theoretical contributions: (1) an action score gradient theorem for particle belief states; (2) the Multiple Importance Sampling (MIS) Tree that supports frequent action-branch updates by reusing prior samples without introducing estimator drift; and (3) tractable action score gradients for smooth generative models using the Area Formula. Empirical results demonstrate that AGMCTS outperforms state-of-the-art sample-based solvers in multiple challenging continuous MDP and POMDP benchmarks.
Unleashing the Potential of Diffusion Models for End-to-End Autonomous Driving
Diffusion models have become a popular choice for decision-making tasks in robotics, and more recently, are also being considered for solving autonomous driving tasks. However, their applications and evaluations in autonomous driving remain limited to simulation-based or laboratory settings. The full strength of diffusion models for large-scale, complex real-world settings, such as End-to-End Autonomous Driving (E2E AD), remains underexplored. In this study, we conducted a systematic and large-scale investigation to unleash the potential of the diffusion models as planners for E2E AD, based on a tremendous amount of real-vehicle data and road testing. Through comprehensive and carefully controlled studies, we identify key insights into the diffusion loss space, trajectory representation, and data scaling that significantly impact E2E planning performance. Moreover, we also provide an effective reinforcement learning post-training strategy to further enhance the safety and robustness of the learned planner. The resulting diffusion-based learning framework, Hyper Diffusion Planner (HDP), is deployed on a real-vehicle platform and evaluated across 6 urban driving scenarios and 200 km of real-world testing, achieving a notable 10x performance improvement over the base model. Our work demonstrates that diffusion models, when properly designed and trained, can serve as effective and scalable E2E AD planners for complex, real-world autonomous driving tasks.
DECODE: Domain-aware Continual Domain Expansion for Motion Prediction
Motion prediction is critical for autonomous vehicles to effectively navigate complex environments and accurately anticipate the behaviors of other traffic participants. As autonomous driving continues to evolve, the need to assimilate new and varied driving scenarios necessitates frequent model updates through retraining. To address these demands, we introduce DECODE, a novel continual learning framework that begins with a pre-trained generalized model and incrementally develops specialized models for distinct domains. Unlike existing continual learning approaches that attempt to develop a unified model capable of generalizing across diverse scenarios, DECODE uniquely balances specialization with generalization, dynamically adjusting to real-time demands. The proposed framework leverages a hypernetwork to generate model parameters, significantly reducing storage requirements, and incorporates a normalizing flow mechanism for real-time model selection based on likelihood estimation. Furthermore, DECODE merges outputs from the most relevant specialized and generalized models using deep Bayesian uncertainty estimation techniques. This integration ensures optimal performance in familiar conditions while maintaining robustness in unfamiliar scenarios. Extensive evaluations confirm the effectiveness of the framework, achieving a notably low forgetting rate of 0.044 and an average minADE of 0.584 m, significantly surpassing traditional learning strategies and demonstrating adaptability across a wide range of driving conditions.
comment: This work has been published in IEEE TPAMI Early Access
Gesture First, LLM-Assisted Voice Complement: Exploring Multimodal Robot 'Puppeteer' Teleoperation Via Virtual Counterpart in Augmented Reality
Robot teleoperation via augmented reality (AR) offers a promising path toward more intuitive human-robot interaction (HRI). We present a head-mounted AR 'puppeteer' system in which users control a physical robot by interacting with its virtual counterpart robot using large language model (LLM)-assisted voice commands and hand-gesture interaction on the Meta Quest 3. In a within-subject user study with 42 participants performing an AR-based robotic pick-and-place pattern-matching task, we empirically compare two interaction conditions: gesture-only (GO) and combined voice+gesture (VG) on performance and user experience (UX). In VG, voice and gesture operate in a sequential role-allocated manner, with voice handling high-level navigation and gesture handling fine manipulation. Our results show that GO currently provides more reliable and efficient control for this time-critical task, while VG introduces additional flexibility but also latency and recognition issues that can increase workload. We additionally analyze how prior robotics expertise differentiates performance and UX across conditions. Based on these findings, we distill a set of design guidelines for AR 'puppeteer' metaphoric robot teleoperation, framing multimodality as an adaptive strategy that must balance efficiency, robustness, and user expertise rather than assuming that additional modalities are universally beneficial.
comment: This work is under peer review
cuNRTO: GPU-Accelerated Nonlinear Robust Trajectory Optimization
Robust trajectory optimization enables autonomous systems to operate safely under uncertainty by computing control policies that satisfy the constraints for all bounded disturbances. However, these problems often lead to large Second Order Conic Programming (SOCP) constraints, which are computationally expensive. In this work, we propose the CUDA Nonlinear Robust Trajectory Optimization (cuNRTO) framework by introducing two dynamic optimization architectures that have direct application to robust decision-making and are implemented on CUDA. The first architecture, NRTO-DR, leverages the Douglas-Rachford (DR) splitting method to solve the SOCP inner subproblems of NRTO, thereby significantly reducing the computational burden through parallel SOCP projections and sparse direct solves. The second architecture, NRTO-FullADMM, is a novel variant that further exploits the problem structure to improve scalability using the Alternating Direction Method of Multipliers (ADMM). Finally, we provide GPU implementations of the proposed methodologies using custom CUDA kernels for SOC projection steps and cuBLAS GEMM chains for feedback gain updates. We validate the performance of cuNRTO through simulated experiments on unicycle, quadcopter, and Franka manipulator models, demonstrating speedups of up to 139.6$\times$. More details are available at https://cunrto.github.io.
SuReNav: Superpixel Graph-based Constraint Relaxation for Navigation in Over-constrained Environments ICRA 2026
We address the over-constrained planning problem in semi-static environments. The planning objective is to find a best-effort solution that avoids all hard constraint regions while minimally traversing the least risky areas. Conventional methods often rely on pre-defined area costs, limiting generalizations. Further, the spatial continuity of navigation spaces makes it difficult to identify regions that are passable without overestimation. To overcome these challenges, we propose SuReNav, a superpixel graph-based constraint relaxation and navigation method that imitates human-like safe and efficient navigation. Our framework consists of three components: 1) superpixel graph map generation with regional constraints, 2) regional-constraint relaxation using graph neural network trained on human demonstrations for safe and efficient navigation, and 3) interleaving relaxation, planning, and execution for complete navigation. We evaluate our method against state-of-the-art baselines on 2D semantic maps and 3D maps from OpenStreetMap, achieving the highest human-likeness score of complete navigation while maintaining a balanced trade-off between efficiency and safety. We finally demonstrate its scalability and generalization performance in real-world urban navigation with a quadruped robot, Spot. Code and Videos are available at https://sure-nav.github.io/.
comment: Accepted by ICRA 2026. Code and videos are available at https://sure-nav.github.io/
Beyond Policy Optimization: A Data Curation Flywheel for Sparse-Reward Long-Horizon Planning
Large Language Reasoning Models have demonstrated remarkable success on static tasks, yet their application to multi-round agentic planning in interactive environments faces two fundamental challenges. First, the intractable credit assignment problem renders conventional reinforcement learning ineffective in sparse-reward settings. Second, the computational overhead of verbose, step-by-step reasoning histories is prohibitive. To address these challenges, we propose BPO, a three-stage framework (bootstrapping, extrapolation, and refinement) that establishes a self-improving data flywheel to develop robust reasoning models for long-horizon, sparse-reward environments. Our framework first bootstraps efficient reasoning using the proposed planning quaternions with long-short chain-of-thought fusion. It then extrapolates to out-of-distribution tasks through complexity-stratified curriculum learning. Finally, the model iteratively refines itself by learning exclusively on experiences selected via reward-gated rejection sampling. Experiments on ALFWorld, ScienceWorld, and WebShop demonstrate that our approach achieves state-of-the-art with significant token efficiency, providing a new recipe for reasoning models in agentic planning.
Efficient Emotion-Aware Iconic Gesture Prediction for Robot Co-Speech
Co-speech gestures increase engagement and improve speech understanding. Most data-driven robot systems generate rhythmic beat-like motion, yet few integrate semantic emphasis. To address this, we propose a lightweight transformer that derives iconic gesture placement and intensity from text and emotion alone, requiring no audio input at inference time. The model outperforms GPT-4o in both semantic gesture placement classification and intensity regression on the BEAT2 dataset, while remaining computationally compact and suitable for real-time deployment on embodied agents.
FASTER: Rethinking Real-Time Flow VLAs FAST
Real-time execution is crucial for deploying Vision-Language-Action (VLA) models in the physical world. Existing asynchronous inference methods primarily optimize trajectory smoothness, but neglect the critical latency in reacting to environmental changes. By rethinking the notion of reaction in action chunking policies, this paper presents a systematic analysis of the factors governing reaction time. We show that reaction time follows a uniform distribution determined jointly by the Time to First Action (TTFA) and the execution horizon. Moreover, we reveal that the standard practice of applying a constant schedule in flow-based VLAs can be inefficient and forces the system to complete all sampling steps before any movement can start, forming the bottleneck in reaction latency. To overcome this issue, we propose Fast Action Sampling for ImmediaTE Reaction (FASTER). By introducing a Horizon-Aware Schedule, FASTER adaptively prioritizes near-term actions during flow sampling, compressing the denoising of the immediate reaction by tenfold (e.g., in $π_{0.5}$ and X-VLA) into a single step, while preserving the quality of long-horizon trajectory. Coupled with a streaming client-server pipeline, FASTER substantially reduces the effective reaction latency on real robots, especially when deployed on consumer-grade GPUs. Real-world experiments, including a highly dynamic table tennis task, prove that FASTER unlocks substantially improved real-time responsiveness for generalist policies, enabling rapid generation of accurate and smooth trajectories.
comment: Project page: https://innovator-zero.github.io/FASTER
VISOR: A Vision-Language Model-based Test Oracle for Testing Robots
Testing robots requires assessing whether they perform their intended tasks correctly, dependably, and with high quality, a challenge known as the test oracle problem in software testing. Traditionally, this assessment relies on task-specific symbolic oracles for task correctness and on human manual evaluation of robot behavior, which is time-consuming, subjective, and error-prone. To address this, we propose VISOR, a Vision-Language Model (VLM)-based approach for automated test oracle assessment that eliminates the need of expensive human evaluations. VISOR performs automated evaluation of task correctness and quality, addressing the limitations of existing symbolic test oracles, which are task-specific and provide pass/fail judgments without explicitly quantifying task quality. Given the inherent uncertainty in VLMs, VISOR also explicitly quantifies its own uncertainty during test assessments. We evaluated VISOR using two VLMs, i.e., GPT and Gemini, across four robotic tasks on over 1,000 videos. Results show that Gemini achieves higher recall while GPT achieves higher precision. However, both models show low correlation between uncertainty and correctness, which prevents using uncertainty as a correctness predictor.
Self-Supervised Bootstrapping of Action-Predictive Embodied Reasoning
Embodied Chain-of-Thought (CoT) reasoning has significantly enhanced Vision-Language-Action (VLA) models, yet current methods rely on rigid templates to specify reasoning primitives (e.g., objects in the scene, high-level plans, structural affordances). These templates can force policies to process irrelevant information that distracts from critical action-prediction signals. This creates a bottleneck: without successful policies, we cannot verify reasoning quality; without quality reasoning, we cannot build robust policies. We introduce R&B-EnCoRe, which enables models to bootstrap embodied reasoning from internet-scale knowledge through self-supervised refinement. By treating reasoning as a latent variable within importance-weighted variational inference, models can generate and distill a refined reasoning training dataset of embodiment-specific strategies without external rewards, verifiers, or human annotation. We validate R&B-EnCoRe across manipulation (Franka Panda in simulation, WidowX in hardware), legged navigation (bipedal, wheeled, bicycle, quadruped), and autonomous driving embodiments using various VLA architectures with 1B, 4B, 7B, and 30B parameters. Our approach achieves 28% gains in manipulation success, 101% improvement in navigation scores, and 21% reduction in collision-rate metric over models that indiscriminately reason about all available primitives. R&B-EnCoRe enables models to distill reasoning that is predictive of successful control, bypassing manual annotation engineering while grounding internet-scale knowledge in physical execution.
comment: Robotics: Science and Systems (RSS) 2026
TACO: Temporal Consensus Optimization for Continual Neural Mapping
Neural implicit mapping has emerged as a powerful paradigm for robotic navigation and scene understanding. However, real-world robotic deployment requires continual adaptation to changing environments under strict memory and computation constraints, which existing mapping systems fail to support. Most prior methods rely on replaying historical observations to preserve consistency and assume static scenes. As a result, they cannot adapt to continual learning in dynamic robotic settings. To address these challenges, we propose TACO (TemporAl Consensus Optimization), a replay-free framework for continual neural mapping. We reformulate mapping as a temporal consensus optimization problem, where we treat past model snapshots as temporal neighbors. Intuitively, our approach resembles a model consulting its own past knowledge. We update the current map by enforcing weighted consensus with historical representations. Our method allows reliable past geometry to constrain optimization while permitting unreliable or outdated regions to be revised in response to new observations. TACO achieves a balance between memory efficiency and adaptability without storing or replaying previous data. Through extensive simulated and real-world experiments, we show that TACO robustly adapts to scene changes, and consistently outperforms other continual learning baselines.
comment: In: Robotics: Science and Systems (RSS 2026)
Towards Long-Lived Robots: Continual Learning VLA Models via Reinforcement Fine-Tuning
Pretrained on large-scale and diverse datasets, VLA models demonstrate strong generalization and adaptability as general-purpose robotic policies. However, Supervised Fine-Tuning (SFT), which serves as the primary mechanism for adapting VLAs to downstream domains, requires substantial amounts of task-specific data and is prone to catastrophic forgetting. To address these limitations, we propose LifeLong-RFT, a simple yet effective Reinforcement Fine-Tuning (RFT) strategy for VLA models independent of online environmental feedback and pre-trained reward models. By integrating chunking-level on-policy reinforcement learning with the proposed multi-dimensional process reward mechanism, LifeLong-RFT quantifies the heterogeneous contributions of intermediate action chunks across three dimensions to facilitate policy optimization. Specifically, (1) the Quantized Action Consistency Reward (QACR) ensures accurate action prediction within the discrete action space; (2) the Continuous Trajectory Alignment Reward (CTAR) aligns decoded continuous action chunks with reference trajectories to ensure precise control; (3) the Format Compliance Reward (FCR) guarantees the structural validity of outputs. Comprehensive experiments across SimplerEnv, LIBERO, and real-world tasks demonstrate that LifeLong-RFT exhibits strong performance in multi-task learning. Furthermore, for continual learning on the LIBERO benchmark, our method achieves a 22% gain in average success rate over SFT, while effectively adapting to new tasks using only 20% of the training data. Overall, our method provides a promising post-training paradigm for VLAs. The project page is available at .
Multiagent Systems
Multi-LLM Systems Exhibit Robust Semantic Collapse
Whether machines can originate novel content has been debated for nearly two centuries, from Lovelace's assertion that no engine can "originate anything" to Turing's question of whether a machine can amplify ideas brought in from outside. Multi-large language model (LLM) systems, increasingly deployed for autonomous generation, reopen this question empirically. Here we show that such systems, operating in closed loops, exhibit semantic collapse: systematic convergence in semantic representations despite apparent lexical variation. Across model families, extended simulations of 200 to 1,000 rounds, the pattern remains consistent. Twelve intervention strategies, spanning decoding parameters, prompt design, agent composition, activation engineering, and reinforcement learning, fail to restore semantic diversity. Mechanistic analyses suggest that semantic collapse is not explained by alignment or conformity biases, but is consistent with intrinsic properties of autoregressive generation. Our results point to fundamental constraints in the ability of multi-LLM systems to sustain open-ended knowledge production in closed-loop settings.
comment: 64 pages, 8 figures, 7 tables; includes Supplementary Information
Responsible Agentic AI Requires Explicit Provenance
Agentic AI is rapidly proliferating across diverse real-world domains such as software engineering, yet public trust has not kept pace. The central reason is that responsibility, despite being widely discussed, remains a subjective and unenforced concept, as no current agentic framework produces the quantifiable, traceable, and interventionable provenance needed to assign it when harm emerges from compositions no single party designed. We position that what is missing is not better benchmark-level evaluation but $\textbf{explicit provenance}$ across the full agentic lifecycle, which is the only viable basis for making responsibility computable and actionable. We advance this agenda along four axes: establishing $\textit{why}$ such provenance is a structural necessity by identifying responsibility gaps across sociotechnical dimensions, formalizing $\textit{what}$ it must encode through a causal attribution function and responsibility tensor, discussing $\textit{how}$ it can be made computable across four lifecycle layers with preliminary experiments showing that provenance is estimable and interveneable online before irreversible harm accumulates, and examining $\textit{who}$ bears responsibility through a concrete agentic incident. Explicit provenance is not a discretionary refinement but the necessary condition for responsible agentic AI, and no stakeholder across its ecosystem can afford to treat it as optional.
comment: Under Review
MADP: A Multi-Agent Pipeline for Sustainable Document Processing with Human-in-the-Loop
Document processing automation remains a critical challenge in enterprise environments, where traditional manual approaches are labor-intensive and error-prone. We present MADP, a multi-agent architecture that addresses the challenge of automating document processing in enterprise settings by combining deep learning-based classification and parsing with large language model extraction, while maintaining accuracy through selective human validation. Our system integrates five specialized agents--Classificator, Splitter, Parser, Extraction, and Validator--with a Human-in-the-Loop (HITL) mechanism and a novel Prompt Fine Tuning with Feedback Inheritance (PFTFI) approach. The operational analysis on a production use-case scenario of 100,000 invoices per year indicates a potential reduction of Full-Time Equivalent (FTE) requirements by approximately 70%. Production deployment on 955 real-world documents processed through January 2026 achieves a 97.0% full-pipeline automation rate, with only 3% requiring non-AI fallback. Ablation evaluation on a stratified 100-document subset (5 documents per each of 20 supplier/document-type categories) demonstrates that the full MADP configuration with Human-in-the-Loop supervision attains 98.5% document-level accuracy. Additionally, we present a comprehensive sustainability analysis showing that our hybrid AI+HITL approach reduces CO2 emissions by 69%, energy consumption by 69%, and water usage by 63% compared to traditional manual processing. Benchmark comparisons of multiple LLM backends (Granite-Docling, Mistral-Small, DeepSeek-OCR) provide practical insights for deployment in production environments.
comment: 18 pages, 5 figures
S-Bus: Automatic Read-Set Reconstruction for Multi-Agent LLM State Coordination
Concurrent LLM agents sharing mutable natural-language state produce Structural Race Conditions (SRCs): write-write and cross-shard stale-read conflicts that silently corrupt agent output. Existing multi-agent frameworks (LangGraph, CrewAI, AutoGen) provide no write-ownership semantics over shared state. We present S-Bus, an HTTP middleware whose central mechanism is a server-side DeliveryLog: a per-agent log of HTTP GET operations that automatically reconstructs each agent's read set at commit time without agent SDK changes under HTTP/1.1. The consistency property the DeliveryLog provides -- Observable-Read Isolation (ORI), a partial causal consistency over the HTTP-observable projection of the read set -- prevents structural race conditions when agents collaborate via shared shards. Three contributions: (C1) The DeliveryLog mechanism for automatic HTTP-traffic-based read-set reconstruction, with three-tier mechanised evidence: ReadSetSoundness and ORICommitSafety machine-checked in TLAPS (modulo one retained typing axiom); exhaustive TLC at N=3 (20,763,484 distinct states, zero violations); Dafny discharges 9 inductive soundness lemmas. (C2) Empirical structural-conflict prevention parity against PostgreSQL 17 SERIALIZABLE and Redis 7 WATCH/MULTI on shared-shard contention sweeps with 427,308 active HTTP-409 conflicts: zero Type-I corruptions across all three backends. (C3) ORI's operating envelope is topology-conditional: semantically neutral in dedicated-shard workloads; harmful in single-shard collaborative writing because preservation propagates concurrent contradictions. Source code: https://github.com/sajjadanwar0/sbus
comment: 24 pages, 23 tables. Code, formal proofs, and experimental harness available at: https://github.com/sajjadanwar0/sbus
PyraVid: Hierarchical Multimodal Memory for Long-Horizon Video Reasoning
Memory has become an increasingly important component of agentic systems, as these systems are expected to reason over long-term experience. However, prior work has largely focused on unimodal memory, leaving multimodal memory relatively underexplored despite its central role in real-world applications. Compared with unimodal settings, multimodal memory introduces additional challenges, including heterogeneous input integration, person-centric information alignment, and evidence aggregation across different granularities. We present PyraVid, a hierarchical multimodal memory framework inspired by Event Segmentation Theory from cognitive science. PyraVid organizes long videos into a coarse-to-fine pyramid structure, enabling structured memory access and effective evidence aggregation. It further supports structure-guided memory expansion with pruning, allowing the retrieval of related events with strong causal connectivity but low semantic similarity while reducing noise. Experiments on multiple long-video understanding benchmarks show that PyraVid consistently improves performance across datasets, model scales, and question types, highlighting the effectiveness of hierarchical multimodal memory for long-horizon reasoning.
Reliability and Effectiveness of Autonomous AI Agents in Supply Chain Management
This paper studies autonomous generative AI agents in multi-echelon supply chains using the MIT Beer Game. We identify four inference-time levers that shape performance: model selection, policies and guardrails, centralized data sharing, and prompt engineering. Model capability is the dominant factor: an out-of-the-box reasoning model exceeds human-level performance, and optimized reasoning models reduce costs by up to 67% relative to human teams. However, strong average performance masks substantial reliability risks. We introduce the agent bullwhip effect, the amplification of decision unreliability across echelons, manifesting along two dimensions: decision variance increases both across facilities at the same point in time and within the same facility across time. We develop a mathematical framework showing that this phenomenon is inherent to multi-agent systems that involve coordination and information delays, and we demonstrate that repeated sampling fails to meaningfully reduce it. To address this limitation, we propose a Group Relative Policy Optimization (GRPO)-based reinforcement-learning post-training framework that trains a shared base LLM using system-level supply-chain rewards. GRPO post-training substantially reduces tail events, curtails agent bullwhip, and improves the reliability of autonomous supply-chain agents.
Lifelong LaCAM with Local Guidance for Lifelong MAPF
Local guidance has recently proven to be a powerful driver of empirical performance in real-time, suboptimal multi-agent pathfinding (MAPF), improving the scalable configuration-based solver LaCAM. By injecting informative spatiotemporal cues around each agent, local guidance mitigates congestion, reduces waiting, and remains scalable enough even with tight time budgets, yielding state-of-the-art performance for one-shot MAPF. This study asks whether the same benefits can be lifted to the lifelong setting (LMAPF), where tasks arrive continuously and improvements in per-step plans can increase task completion throughput over long horizons. We propose LLLG, a Lifelong version of LaCAM enhanced with Local Guidance, which employs a receding-horizon windowed planning framework and warm-starts guidance from the previous solution at each timestep. Our method scales effectively, maintains high throughput even in compact, dense environments, and surpasses existing planners, thereby pushing the frontier of real-time, lifelong MAPF.
comment: 10 pages, 11 figures, accepted to SoCS 2026
Dynamic Deployment of Mobile Charging Trucks During Natural Disaster Evacuation: An Offline-to-Online Framework
During large-scale evacuations, concentrated electric vehicle (EV) charging demand can overload fixed charging stations (FCSs), leading to prolonged waiting time and increased risk exposure. To address this challenge, this study proposes dynamically deploying mobile charging trucks (MCTs) to complement FCSs, and develops an Adaptive Risk-aware MCT Deployment (ARMD) framework for real-time operation. It divides the MCT deployment into two problems: risk-aware allocation of MCTs among FCSs and dynamic routing of MCTs to the assigned FCSs, and solves them under an offline-to-online paradigm. The resource allocation problem is formulated as a decentralized partially observable Markov decision process, and a multi-agent proximal policy optimization (MAPPO)-based policy is developed to coordinate multiple MCTs under decentralized observations. The policy is pre-trained offline in an evacuation simulator and adaptively refined online according to current evacuation context. For routing, a spatio-temporal travel time predictor is developed to support rolling-horizon route updates. The proposed framework is evaluated in a simulated hurricane evacuation environment built using real-world data from Hillsborough County, Florida. Experiments show that ARMD consistently outperforms offline optimization, online heuristic dispatch, and rolling-horizon optimization in reducing risk exposure. For demand perturbation scenarios, ARMD reduces average risk exposure by up to 71.1%, relative to the baseline without MCTs. In the case of fixed e-vehicle charging infrastructure or road link failures, ARMD achieves 39.3% to 60.5% reduction in average risk exposure, with its advantages becoming more pronounced as the severity of disruption increases. These results demonstrate the effectiveness and robustness of ARMD in enhancing mobile charging operations for realistic scenarios of uncertain evacuation conditions.
NeuroMAS: Multi-Agent Systems as Neural Networks with Joint Reinforcement Learning
Multi-agent language systems are often built as hand-designed workflows, where agents are assigned semantic roles and communication protocols are specified in advance. We propose NeuroMAS, a method that first treats a multi-agent language system as a trainable and scalable neural-network-like architecture with LLM agents as nodes and intermediate textual signals as edges. In NeuroMAS, agent nodes are role-free but structure-aware: the topology only determines how information can flow in general, while reinforcement learning training determines how nodes communicate, specialize, and coordinate. This formulation shifts multi-agent design from workflow engineering toward architecture design, where depth, width, connectivity, and growth protocol become scalable sources of capability. Further, we provide a theoretical perspective showing why such modular textual computation is more parameter-efficient when tasks admit hierarchical decompositions. Experiments show that NeuroMAS improves significantly over both inference-time and trained multi-agent baselines. We further find that organizational scaling is path-dependent: larger systems can be challenging to train from scratch, but become feasible when grown progressively from smaller trained systems. These results suggest that learned neural multi-agent systems are a promising scaling axis for LLMs.
Genflow Ad Studio: A Compound AI Architecture for Brand-Aligned, Self-Correcting Video Generation
Recent advancements in generative video models demonstrate high visual fidelity, yet their integration into enterprise environments is restricted by temporal inconsistencies and severe brand misalignment. Current monolithic architectures struggle to enforce rigid brand constraints, frequently hallucinating unapproved visual assets. We introduce Genflow, a Compound AI System designed to enforce brand consistency in generative media production. Our architecture integrates a retrieval-based 'Brand DNA' extraction module to parameterize generation according to established corporate identity guidelines. Furthermore, we implement an Adversarial Multi-Agent Quality Control (QC) loop. Instead of a single-pass generation, this pipeline employs evaluator agents to iteratively critique generated frames against the extracted parameters, prompting generator models to refine outputs until a deterministic consensus is reached. By transitioning to a multi-stage, self-correcting pipeline, Genflow improved the yield of brand-compliant video generations from 42% to 89%, establishing a robust framework for scalable, enterprise-grade generative systems.
comment: 6 pages, 2 figures, 2 tables. Accepted to the ACM Conference on AI and Agentic Systems (CAIS '26). Includes demo video and code repository links
Responsible Federated LLMs via Safety Filtering and Constitutional AI ACL 2026
Recent research has increasingly focused on training large language models (LLMs) using federated learning, known as FedLLM. However, responsible AI (RAI), which aims to ensure safe and trustworthy responses, remains underexplored in this context. In FedLLM, client-side training data may contain harmful content, resulting in unsafe LLMs that can generate inappropriate responses. Aggregating such models into a global model and redistributing it to clients risks the widespread deployment of unsafe LLMs. To address this, we incorporate two well-established RAI techniques into FedLLM: safety filtering and constitutional AI. Our experiments show that these methods significantly improve LLM safety, achieving over 20% improvement on AdvBench.
comment: Accepted at the 6th Workshop on Trustworthy NLP (TrustNLP), ACL 2026
Empowering VLMs for Few-Shot Multimodal Time Series Classification via Tailored Agentic Reasoning
In this paper, we propose the first VL$\underline{\textbf{M}}$ $\underline{\textbf{a}}$gentic $\underline{\textbf{r}}$easoning framework for few-$\underline{\textbf{s}}$hot multimodal $\underline{\textbf{T}}$ime $\underline{\textbf{S}}$eries $\underline{\textbf{C}}$lassification ($\textbf{MarsTSC}$), which introduces a self-evolving knowledge bank as a dynamic context iteratively refined via reflective agentic reasoning. The framework comprises three collaborative roles: i) Generator conducts reliable classification via reasoning; ii) Reflector diagnoses the root causes of reasoning errors to yield discriminative insights targeting the temporal features overlooked by Generator; iii) Modifier applies verified updates to the knowledge bank to prevent context collapse. We further introduce a test-time update strategy to enable cautious, continuous knowledge bank refinement to mitigate few-shot bias and distribution shift. Extensive experiments across 12 mainstream time series benchmarks demonstrate that $\textbf{MarsTSC}$ delivers substantial and consistent performance gains across 6 VLM backbones, outperforming both classical and foundation model-based time series baselines under few-shot conditions, while producing interpretable rationales that ground each classification decision in human-readable feature evidence.
comment: 18 pages, 12 figures, 6 tables. Preprint
BioProAgent: Neuro-Symbolic Grounding for Constrained Scientific Planning
Large language models (LLMs) have demonstrated significant reasoning capabilities in scientific discovery but struggle to bridge the gap to physical execution in wet-labs. In these irreversible environments, probabilistic hallucinations are not merely incorrect; they can cause equipment damage or experimental failure. We propose BioProAgent, a neuro-symbolic framework that anchors probabilistic planning in a deterministic Finite State Machine (FSM). We introduce a State-Augmented Planning mechanism that enforces a rigorous Design-Verify-Rectify workflow, ensuring hardware compliance before execution. Furthermore, we address the context bottleneck inherent in complex device schemas by Semantic Symbol Grounding, reducing token consumption by ~6* through symbolic abstraction. In the extended BioProBench benchmark, BioProAgent achieves 95.6% physical compliance (compared to 21.0% for ReAct), demonstrating that neuro-symbolic constraints are essential for reliable autonomy in irreversible physical environments. Code: https://github.com/YuyangSunshine/bioproagent | Website: https://yuyangsunshine.github.io/BioPro-Project.
SkillMAS: Skill Co-Evolution with LLM-based Multi-Agent System
Large language model (LLM) agent systems are increasingly expected to improve after deployment, but existing work often decouples two adaptation targets: skill evolution and multi-agent system (MAS) restructuring. This separation can create organization bottlenecks, context pressure, and mis-specialization. We present SkillMAS, a non-parametric framework for adaptive specialization in multi-agent systems that couples skill evolution with MAS restructuring. SkillMAS uses Utility Learning to assign credit from verified execution traces, bounded skill evolution to refine reusable procedures without unfiltered library growth, and evidence-gated MAS restructuring when retained failures and Executor Utility indicate a structural mismatch. Across embodied manipulation, command-line execution, and retail workflows, SkillMAS is competitive under the reported harnesses while clarifying how post-deployment specialization is attributed, updated, and applied.
comment: 21 pages, 2 figures
Embodied Multi-Agent Coordination by Aligning World Models Through Dialogue
Effective collaboration between embodied agents requires more than acting in a shared environment; it demands communication grounded in each agent's evolving understanding of the world. When agents can only partially observe their surroundings, coordination without communication is provably hard, but communication can, in principle, bridge this gap by allowing agents to share observations and align their world models. In this work, we examine whether LLM-based embodied agents actually realize the ability to communicate. We extend PARTNR, a benchmark for collaborative household robotics, with a natural-language dialogue channel that enables two agents with partial observability to communicate during task execution. To evaluate whether dialogue leads to genuine world-model alignment rather than superficial coordination, we propose a framework for measuring world-model alignment defined over per-agent world graphs: observation convergence (do private world models align over time?), information novelty (do messages convey what the partner lacks?), and belief-sensitive messaging (do agents model what their partner knows?). Our experiments across three LLMs reveal that dialogue reduces action conflicts 40 to 83 percentage points but degrades task success relative to silent coordination. Using our metrics, we characterize the gap between superficial coordination and genuine world-model alignment, and identify where current models fall on this spectrum.
Ready from Day 1: Population-Aware Coordination for Large-Scale Constrained Multi-Agent Systems NeurIPS 2026
In large-scale multi-agent systems with shared resource constraints, an upstream planner must iteratively evaluate candidate resource plans -- assessing feasibility, aggregate response, and marginal cost -- before committing to one. Lagrangian relaxation separates local decisions through a broadcast cost signal, but the planner still needs the cost-to-utilization response map to explore plan space, and this map depends on population composition that changes across planning cycles. We propose \emph{population-aware coordination interfaces}: learned primal and dual maps, conditioned on compact population summaries, that the planner queries inside its iterative loop. The primal map predicts aggregate utilization under a proposed cost trajectory; the dual map predicts the cost trajectory for a target plan. By encoding response-relevant population structure, these maps remain reliable across evolving populations without per-cycle retraining, and support coordination of large populations from compact subsamples. We additionally cast Sim2Real transfer as a backtestable procedure, enabling evaluation before deployment. In a supply-chain capacity-control case study, population-aware interfaces reduce forecast error by 16--19\% and capacity violations by 20--51\% relative to population-unaware baselines under composition shift; 20K-agent cohorts support accurate coordination of 500K-agent populations; and simulator-trained primal maps achieve 11.1\% MAPE on real observations versus 13--24\% for baselines.
comment: 30 pages, 16 figures. Submitted to NeurIPS 2026
Systems and Control (EESS)
Replicating Real-World 23-Hz Oscillations Caused by Large Electronic Loads
In 2024, Texas operators observed 23-Hz oscillations in real power measurements close to a large electronic load (LEL). Oscillations emerged when the load's power consumption reached approximately 320 MW level and subsided as the active power demand decreased. The paper aims to analyze the event and reproduce the oscillations using electromagnetic transient (EMT) simulations. In the first stage, a representative feedback system is developed, and frequency-domain analysis is conducted to examine the phenomenon and identify its key influencing factors. Next, detailed EMT simulations are performed to further validate the proposed analytical approach. The results show that the feedback system effectively captures and characterizes the critical features of the 23-Hz oscillation incident. In addition, the EMT simulations successfully reproduce the real-world event, with the simulated results closely matching the fault recorder data.
comment: 10 pages, 17 figures
Weighted Flow Matching and Physics-Informed Nonlinear Filtering for Parameter Estimation in Digital Twins
Digital twins (DTs) rely on continuous synchronization between physical systems and their virtual counterparts through online parameter estimation under uncertainty. In many practical settings, however, this task is challenged by low observability, weak excitation, nonlinear dynamics, and noisy or biased measurements. In this work, we develop a new mathematical framework that integrates Weighted Flow Matching (WFM) generative modeling with physics-informed nonlinear filtering to enhance parameter estimation in DTs. WFM relies on dynamic reweighting of training samples, which guides the generative model toward parameter regimes most informative of the evolving system state. This generative component is tightly coupled with a physics-informed filtering architecture based on the Unscented Kalman Filter (UKF), yielding a unified DT framework that combines data-driven probability transport with physically consistent state and parameter estimation. The effectiveness of the new integrated framework is demonstrated within a spacecraft DT architecture, where stable moment of inertia estimation is achieved under uncertain and noisy sensing, with significant performance improvements over established approaches such as Extended Kalman Filtering (EKF) and Ensemble Kalman Filtering (EnKF). These results highlight the potential of weighted generative modeling as a core mechanism for real-time DT synchronization in operational and mission-critical systems.
comment: 14 pages, 5 figures
Geometric Fault Identification via Mirror Descent Learning
This paper develops a fault detection and identification (FDI) method for nonlinear control-affine systems under simultaneous actuator and sensor faults. We adopt a geometric approach to study the isolability of faults in the sense of the principal angles between subspaces corresponding to each actuator and sensor fault. As for the fault identification, a hybrid estimator that consists of a Luenberger-like observer with contraction guarantees is developed. Moreover, neural networks are embedded in the mentioned observer to estimate actuator and sensor faults. Considering that the training dataset for neural networks cannot be representative of every fault scenario, the last layer of each network is adapted using mirror descent-based laws. The mirror descent-based adaptive laws impose isolability conditions for fault channels and do not assume a quadratic parameter estimation space to consider the geometry of the fault subspaces. A Lyapunov-based analysis establishes that the state and parameter estimation errors are uniformly ultimately bounded. The effectiveness of our proposed FDI method is illustrated on the 3-axis attitude control system of a spacecraft.
A review of imbalance price forecasting algorithms in Europe: algorithms, metrics and the way forward
Renewable electricity generation has grown significantly across many European power systems, leading to a greener energy mix, but also additional complexity in balancing electricity supply and demand. Unexpected differences between forecasts and actual output can lead to fluctuations in the system imbalance, which causes volatile imbalance prices. Accurate imbalance price forecasts are crucial for market players to choose a strategic balancing position. In early works, most forecasting methods combined fundamental and statistical approaches, but currently there is a clear trend towards data-driven machine learning models. This review compares forecasting algorithms in European markets with a focus on methodology. We emphasize the importance of high-quality input data, including intraday information and per-minute system data. Next, we identify the need for a common benchmark to compare novel forecasting methods developed for different markets and time periods. Finally, we argue that forecasts should be evaluated in terms of both downstream value and accuracy.
Ensuring reliability in 100% renewable microgrids: a scenario-based joint planning and operational design framework
Off-grid microgrids powered entirely by renewable energy sources face substantial challenges in achieving utility-grade reliability standards. Existing microgrid planning frameworks often prioritize cost minimization while treating reliability as a secondary metric, thereby leading to suboptimal designs. This paper presents a comprehensive scenario-based optimization framework that simultaneously addresses long-term capacity planning and short-term operational dispatch in two stages for 100%-renewable microgrids. The developed two-stage stochastic programming model co-optimizes the investment and operation of photovoltaic generation and battery energy storage, while ensuring compliance with stringent reliability constraints following utility grid standards. Network modeling with operational constraints, such as line capacities and voltage limits, is incorporated to allow distributed resource placement leveraging power sharing between microgrid nodes. A novel scenario generation approach captures critical uncertainties, including seasonal demand fluctuations, solar output variations, and probabilistic equipment failures, through the statistical clustering of historical data. The optimization framework integrates utility-grade reliability constraints limiting the expected energy not served to below 0.002% of the annual demand while minimizing the total system costs. Numerical simulations demonstrate the effectiveness of the proposed framework, achieving 99.998% supply reliability using only photovoltaic power and battery energy storage. The optimized network-aware distributed resource allocation provides inherent resilience through power rerouting during component outages, maintaining load continuity even under simultaneous equipment failures. This study confirms the feasibility of 100%-renewable microgrids to support remote communities while meeting utility-grade reliability benchmarks.
comment: Accepted by Energy Conversion and Management on 13 May 2026
Empirical evaluation of Time Series Foundation Models for Day-ahead and Imbalance Electricity Price Forecasting in Belgium
Recent advances in Time Series Foundation Models (TSFMs) promise zero-shot forecasting capabilities with minimal task-specific training. While these models have shown strong performance across generic benchmarks, their applicability in volatile, complex electricity markets remains underexplored. Addressing this gap, this study provides a systematic empirical evaluation of several TSFMs, specifically Chronos-2 and Chronos-Bolt (developed by Amazon), and TimesFM 2.5 (provided by Google), for forecasting Belgian day-ahead and imbalance electricity prices. For both considered markets, Chronos-2 in ARX mode produces the most accurate forecasts. Compared with the best ensemble prediction from other machine learning methods, Chronos-2's Mean Absolute Error (MAE) is 5% lower for the day-ahead market. In contrast, the model yields 10% higher MAE predicting imbalance prices across all forecast horizons, except for the two-hour-ahead horizon. Moreover, we find that TSFMs exhibit genuine zero-shot forecasting skills but still struggle under extreme market conditions.
Reliability and Effectiveness of Autonomous AI Agents in Supply Chain Management
This paper studies autonomous generative AI agents in multi-echelon supply chains using the MIT Beer Game. We identify four inference-time levers that shape performance: model selection, policies and guardrails, centralized data sharing, and prompt engineering. Model capability is the dominant factor: an out-of-the-box reasoning model exceeds human-level performance, and optimized reasoning models reduce costs by up to 67% relative to human teams. However, strong average performance masks substantial reliability risks. We introduce the agent bullwhip effect, the amplification of decision unreliability across echelons, manifesting along two dimensions: decision variance increases both across facilities at the same point in time and within the same facility across time. We develop a mathematical framework showing that this phenomenon is inherent to multi-agent systems that involve coordination and information delays, and we demonstrate that repeated sampling fails to meaningfully reduce it. To address this limitation, we propose a Group Relative Policy Optimization (GRPO)-based reinforcement-learning post-training framework that trains a shared base LLM using system-level supply-chain rewards. GRPO post-training substantially reduces tail events, curtails agent bullwhip, and improves the reliability of autonomous supply-chain agents.
Over-approximation of weakly-hard constraints for control systems verification (Extended)
A hard real-time system cannot miss any deadline. A weakly-hard real-time system, on the contrary, is designed to tolerate a specific number of deadline misses. For instance, the AnyMiss(2, 300) weakly-hard constraint stipulates that in every window of 300 consecutive jobs, at most 2 deadlines are missed. The weakly-hard model is the state-of-the-art for industrial dependability-by-design of control systems that tolerate deterministic failures. Weakly-hard constraints correspond to regular languages. The size of the minimal finite state machine that recognizes whether a string satisfies the constraint (about 45k states for AnyMiss(2, 300)) is a notorious impediment for the verification of control system properties. This paper discusses an over-approximation of the language that allows us to provide sound safety guarantees for control systems under deadline misses that would be out of reach using the minimal finite state machine. We present a compressed language acceptor and prove that it simulates the original finite state machine. We study language cardinality properties, and report on empirical results that show how the new acceptor can be embedded in the control design workflow, leading to verifying safety for systems for which the state-of-the-art tools do not provide answers.
comment: Extended version of the paper "Over-approximation of weakly-hard constraints for control systems verification", accepted for publication at CAV 2026
Beyond Safety Filtering: Control Barrier Function-Informed Reinforcement Learning for Connected and Automated Vehicles SC 2026
Reinforcement Learning (RL) uses rewards to guide learning, yet reward design is typically hand-crafted using heuristics that can be difficult to tune. We propose a Control Barrier Function (CBF)-informed reward design for Multi-Agent RL (MARL) that converts CBF constraint values under joint MARL actions into a reward signal that explicitly guides safe learning. We compare against two heuristic reward baselines in a four-way multi-lane intersection with connected and automated vehicles. Results show that our method achieves the highest task performance and is less sensitive to reward hyperparameters, yielding consistently strong performance across the tested hyperparameter range. Code for reproducing the experimental results and a video demonstration are available at https://github.com/bassamlab/SigmaRL.
comment: This paper has been accepted for publication in the Proceedings of the 2026 IEEE International Conference on Intelligent Transportation Systems (ITSC 2026)
A Resilience Evaluation Framework for Electric Distribution Systems: Historical Weather Conditioning, Sensitivity Analysis, and a Flooding-Aware Extension
Evaluating resilience in electric distribution systems under severe weather requires models that can connect network topology, hazard simulation, fragility modeling, restoration assumptions, repair strategy, and downstream consequences. This paper extends our prior graph-based resilience evaluation framework for power distribution systems in three ways: it adds analysis conditioned on historical events with real outage and weather data, introduces sensitivity studies for key modeling assumptions, and includes a coupled power-flooding extension for sewage-backup assessment. Historical wind events drive Monte Carlo simulations conditioned on real weather, and the observed outage trajectories are treated as realized historical samples for comparison. Wind-event resilience metrics stabilize at approximately 256 episodes, and outage peak, duration, and outage intensity change systematically with fragility parameters, network topology, restoration assumptions, and repair strategies. In a separate 1000-episode joint power-flooding simulation, episodes with at least one flooded customer occur in 1.9% of episodes overall, and both flood occurrence and flood intensity increase with outage intensity, showing a selective power-to-flood consequence pathway. Overall, the framework provides a practical basis for resilience assessment, comparative scenario analysis, and coupled power-flooding studies in a limited public-data setting, while also suggesting that more detailed utility data could further improve simulation realism.
comment: Submitted to International Journal of Electrical Power and Energy Systems
Knapsack-based Online Sensor Selection for Vehicle State Estimation
As connected and autonomous driving technologies advance, vehicles increasingly rely on data from external sensors. Although this information can enhance state estimation, processing all available streams imposes significant communication and computational costs. To address this challenge, we introduce a Sensor Management Center (SMC) that selects a low-cost subset of external sensors in real time while satisfying chance-constrained error bounds derived from an Extended Kalman Filter (EKF) covariance. We formulate the selection problem as a multidimensional minimum knapsack problem and adopt a deficiency-weighted greedy algorithm as an approximate yet efficient solution. The proposed approach is validated through MATLAB simulations and experiments on a 1:15-scale cooperative driving testbed.
comment: 7 pages, 5 figures. Accepted to the 23rd IFAC World Congress (IFAC 2026)
Modeling Coincident Peak Pricing in Electricity Markets: Challenges and Peak Shaving Effectiveness
Coincident Peak (CP) pricing is widely used in U.S. electricity markets to allocate capacity and transmission costs. This paper develops a behavioral game-theoretic framework for CP-driven load shifting that couples a nonlinear cost-allocation model with day-ahead (one-shot) and real-time (sequential-learning) decision processes. We examine two update rules, namely best-response dynamics (BRD) and fictitious-play dynamics (FPD), across continuous and finite action spaces to quantify how flexibility, action resolution, and participation influence peak outcomes. Using ERCOT peak-day data, we find that FPD reliably reduces system peaks, whereas BRD is more variable and can increase peaks under tight-capacity conditions. Finer action resolution improves peak shaving, while the number of participants is largely neutral when aggregate flexibility is fixed. Meanwhile, information-provider signals can induce herding, whereas response-aware or diverse signals improve peak shaving. These results highlight both the potential and limits of CP pricing: smoothing information and enabling granular control are as important as the amount of available flexibility. The framework offers practical guidance for system operators and consumers: For ISOs, broadcasting smoothed CP signals and setting minimum controllable-capacity thresholds enhance coordination. For consumers, greater flexibility and finer control resolution improve both cost savings and peak-shaving performance.
comment: Coincident Peak Pricing, Demand Response, Game Theory, Peak Shaving
AoI-MDP: An AoI Optimized Markov Decision Process (Student Abstract)
Ocean exploration places high demands on autonomous underwater vehicles, especially when there's observation delay. We propose age of information optimized Markov decision process (AoI-MDP) to enhance underwater tasks by modeling observation delay as signal delay and including it in the state space. AoI-MDP also introduces wait time in the action space and integrates AoI with reward functions, optimizing information freshness and decision-making using reinforcement learning. Simulations show AoI-MDP outperforms the standard MDP, demonstrating superior performance, feasibility, and generalization in underwater tasks. To accelerate relevant research, we have made the codes available as open-source at https://github.com/Xiboxtg/AoI-MDP.
A Coupled V2G Equilibrium Model of Electric Vehicle and Power System Interactions
Vehicle-to-grid (V2G) technology empowers electric vehicles (EVs) to act as mobile energy resources, providing critical support to power systems, especially under stressed conditions. To understand the economic mechanism driving V2G participation and its benefits to power grid, this paper proposes a multi-player coupled equilibrium framework that models the bidirectional interactions between power grid operations and EV routing, incorporating charging and discharging choice in a preprocessed feasible path generation procedure. Energy prices are endogenously determined by market clearance conditions. We formulate the overall problem as a Variational Inequality that unite the decision-making of Distribution System Operator, Charging Network Operator, Load Serving Entities, and EV drivers. Numerical studies validate the framework under two stress scenarios: increased household load and power line outages. Results show that when EVs are incentivized by reduced generalized path costs, V2G is particularly effective in eliminating load shedding and reducing distribution locational marginal electricity prices. On the transportation side, V2G can lead to divergence in EV behavior between normal and scarcity conditions, and alter route choices yet improve overall trip economic.
comment: under review for journal
Stable Fiber-Koopman Residual Dynamics for Environment-Constrained Robust Control
Learning-based dynamical models face a persistent tension between expressiveness and formal guarantees: richer model classes improve predictive accuracy, but their stability properties are typically verified only empirically, if at all. This paper proposes \emph{Stable Fiber-Koopman Residual Dynamics} (SFKD), a unified framework that simultaneously addresses environment-aware geometric consistency, latent-space stability certification, and bounded residual perturbation propagation. Concretely, SFKD constructs a fiber bundle latent manifold whose fibers encode environment-specific dynamics; an environment-conditioned Koopman operator governs the dominant linear evolution on each fiber; and a contraction-constrained residual neural network captures unmodeled nonlinear effects while admitting an explicit input-to-state stability (ISS) certificate. The resulting model is embedded in a sampling-based MPPI controller for autonomous vehicle path tracking under variable surface conditions and wind disturbances. Theoretical analysis establishes ISS of the latent dynamics and a finite ultimate bound on tracking error. Numerical experiments against five baselines -- Koopman MPC, Neural ODE, ICODE, ControlSynth, and ICODE-MPPI -- demonstrate a 31\% reduction in tracking RMSE, a 44\% improvement in control smoothness, and near-zero latent stability violation rate across environment-switching scenarios.
Adaptive Control in Autonomous Driving via Real-Time Recurrent RL
We study online fine-tuning of pretrained control policies for autonomous driving using Real-Time Recurrent Reinforcement Learning (RTRRL), a memory-efficient algorithm that updates policy parameters at every time step without backpropagation through time. We extend RTRRL to support LrcSSM, a recently proposed nonlinear diagonal state-space model, and combine offline behavioral cloning with online RTRRL fine-tuning to adapt policies to distribution shifts at deployment. We validate the approach in the CarRacing simulation and on a 1:10-scale RoboRacer platform equipped with an event camera, where a pretrained policy is fine-tuned online during real-world line-following. To our knowledge, this is the first demonstration of online RL fine-tuning with event-camera observations on standard (non-spiking) hardware in closed-loop control. LrcSSM-based policies improve fastest and most consistently across both settings.
Coordination Control of Discrete Event Systems under Cyber Attacks
In this paper, coordination control of discrete event systems under joint sensor and actuator attacks is investigated. Sensor attacks are described by a set of attack languages using a proposed ALTER model. Several local supervisors are used to control the system. The goal is to design local supervisors to ensure safety of the system even under cyber attacks (CA). The necessary and sufficient conditions for the existence of such supervisors are derived in terms of conditional decomposability, CA-controllability and CA-observability. A method is developed to calculate local state estimates under sensor attacks. Two methods are also developed to design local supervisors, one for discrete event systems satisfying conditional decomposability, CA-controllability and CA-observability, and one for discrete event systems satisfying conditional decomposability only. The approach works for both stealthy and non-stealthy attacks. A practical example is given to illustrate the results.
comment: 26 pages, examples about stealthiness were added, description was improved
Experiment-as-Code Labs: A Declarative Stack for AI-Driven Scientific Discovery
To unleash the full potential of AI for Science, we must untether the agents from a purely digital environment. The agent's ability to control and explore in real-world labs is essential because the physical lab remains foundational to scientific discovery. While some tasks can be performed on a computer (e.g., data analysis, running simulated experiments), Eureka moments could occur at any time while operating lab instruments (e.g., when a scientist notices unexpected clues, intuition may prompt a real-time course change). Although autonomous labs are on the rise, which expose programmable APIs to control scientific instruments via software, bridging the gap between increasingly powerful AI agents and automated lab equipment requires innovation that draws insights from computer systems. We propose a new paradigm called ``Experiment-as-Code (EaC) Labs,'' where a core concept is to encode experiments as declarative configurations that can be compiled down to device-level APIs. AI agents come up with hypotheses and experiments, written as an ensemble of declarative configurations. The systems layer performs program analysis, safety checks, resource assignment, and job orchestration. Finally, programmatic experimentation occurs via actuating the device APIs. This is a general stack that is science-, lab-, and instrument-independent, representing a novel synthesis across the physical, systems, and intelligence layers to unleash the next breakthrough in AI for Science.
comment: Experiment-as-Code (EaC) white paper
A System-Theoretic Approach to Hawkes Process Identification with Guaranteed Positivity and Stability
The Hawkes process models self-exciting event streams, requiring a strictly non-negative and stable stochastic intensity. Standard identification methods enforce these properties using non-negative causal bases, yielding conservative parameter constraints and severely ill-conditioned least-squares Gram matrices at higher model orders. To overcome this, we introduce a system-theoretic identification framework utilizing the sign-indefinite orthonormal Laguerre basis, which guarantees a well-conditioned asymptotic Gram matrix independent of model order. We formulate a constrained least-squares problem enforcing the necessary and sufficient conditions for positivity and stability. By constructing the empirical Gram matrix via a Lyapunov equation and representing the constraints through a sum-of-squares trace equivalence, the proposed estimator is efficiently computed via semidefinite programming.
comment: 6 pages, 2 figures
cuNRTO: GPU-Accelerated Nonlinear Robust Trajectory Optimization
Robust trajectory optimization enables autonomous systems to operate safely under uncertainty by computing control policies that satisfy the constraints for all bounded disturbances. However, these problems often lead to large Second Order Conic Programming (SOCP) constraints, which are computationally expensive. In this work, we propose the CUDA Nonlinear Robust Trajectory Optimization (cuNRTO) framework by introducing two dynamic optimization architectures that have direct application to robust decision-making and are implemented on CUDA. The first architecture, NRTO-DR, leverages the Douglas-Rachford (DR) splitting method to solve the SOCP inner subproblems of NRTO, thereby significantly reducing the computational burden through parallel SOCP projections and sparse direct solves. The second architecture, NRTO-FullADMM, is a novel variant that further exploits the problem structure to improve scalability using the Alternating Direction Method of Multipliers (ADMM). Finally, we provide GPU implementations of the proposed methodologies using custom CUDA kernels for SOC projection steps and cuBLAS GEMM chains for feedback gain updates. We validate the performance of cuNRTO through simulated experiments on unicycle, quadcopter, and Franka manipulator models, demonstrating speedups of up to 139.6$\times$. More details are available at https://cunrto.github.io.
Language Models as Efficient Reward Function Searchers for Custom-Environment Multi-Objective Reinforcement
Achieving the effective design and improvement of reward functions in reinforcement learning (RL) tasks with complex custom environments and multiple requirements presents considerable challenges. In this paper, we propose ERFSL, an efficient reward function searcher using LLMs, which enables LLMs to be effective white-box searchers and highlights their advanced semantic understanding capabilities. Specifically, we generate reward components for each numerically explicit user requirement and employ a reward critic to identify the correct code form. Then, LLMs assign weights to the reward components to balance their values and iteratively adjust the weights without ambiguity and redundant adjustments by flexibly adopting directional mutation and crossover strategies, similar to genetic algorithms, based on the context provided by the training log analyzer. We applied the framework to a customized data collection RL task without direct human feedback or reward examples (zero-shot learning). The reward critic successfully corrects the reward code with only one feedback instance for each requirement, effectively preventing unrectifiable errors. The initialization of weights enables the acquisition of different reward functions within the Pareto solution set without the need for weight search. Even in cases where a weight is 500 times off, on average, only 5.2 iterations are needed to meet user requirements. The ERFSL also works well with most prompts utilizing GPT-4o mini, as we decompose the weight searching process to reduce the requirement for numerical and long-context understanding capabilities.
Multi-Objective-Optimization Assisted Data Collection Framework for IoUT Based on Offline Reinforcement
The Information Updating Networks (IUNs) offers significant potential for ocean exploration but encounters challenges due to dynamic underwater environments and severe system attenuation. Current methods relying on Autonomous Underwater Vehicles (AUVs) based on online reinforcement learning (RL) lead to high computational costs and low data utilization. To address these issues and the constraints of turbulent ocean environments, we propose a multi-AUV assisted data collection framework for IUNs based on multi-agent offline RL. This framework maximizes data rate and the value of information (VoI), minimizes energy consumption, and ensures collision avoidance by utilizing environmental and equipment status data. We introduce a semi-communication decentralized training with decentralized execution (SC-DTDE) paradigm and a multi-agent independent conservative Q-learning algorithm (MAICQL) to effectively tackle the problem. Extensive simulations demonstrate the high applicability, robustness, and data collection efficiency of the proposed framework.
Enhancing Information Freshness: An AoI Optimized Markov Decision Process
Ocean exploration utilizing autonomous underwater vehicles (AUVs) via reinforcement learning (RL) has emerged as a significant research focus. However, underwater tasks have mostly failed due to the observation delay caused by information limitation in the information updating networks. In this study, we present an AoI optimized Markov decision process (AoI-MDP) to improve the performance of underwater tasks. Specifically, AoI-MDP models observation delay as timing delay through statistical delay formulation, and includes this delay as a new component in the state space. Additionally, we introduce wait time in the action space, and integrate AoI with reward functions to achieve joint optimization of information freshness and decision-making for AUVs leveraging RL for training. Finally, we apply this approach to the multi-AUV data collection task scenario as an example. Simulation results highlight the feasibility of AoI-MDP, which effectively minimizes AoI while showcasing superior performance in the task. To accelerate relevant research in this field, we have made the simulation codes available as open-source.
Robotics
EfficientTDMPC: Improved MPC Objectives for Sample-Efficient Continuous Control
We introduce EfficientTDMPC, a sample-efficient model-based reinforcement learning method for continuous control built on the TD-MPC family of algorithms. Central to this family is a planner that aims to find an action sequence that maximizes the estimated return. The return is estimated using a learned model and value networks, each of which can introduce error. EfficientTDMPC proposes to reduce this error in two ways. First, it introduces an ensemble of dynamics models and averages the return estimates across those models and across different rollout depths. Second, it adds the option to apply an uncertainty penalty to the planner objective, yielding a planner that avoids actions with uncertain return estimates. It then adds practical improvements which increase buffer data freshness and reduce compute. Lastly, we find that our contributions enable EfficientTDMPC to benefit more from a higher update-to-data (UTD) ratio, further improving sample efficiency. To the best of our knowledge, in the low data regime of each benchmark, EfficientTDMPC achieves state-of-the-art (SOTA) in terms of sample efficiency on HumanoidBench-Hard and DMC hard, while matching SOTA on DMC easy.
Bayesian Networks for Path-Based Sensors: Gathering Information and Path Planning in Communication Denied Environments
A "path-based sensor" produces a single observation along a continuous path. For example, a boolean path-based sensor returns a single "1" if an event of interest is detected at any point along the path and a "0" otherwise. Notably, a "1" provides no direct information about where along the path the event(s) may have occurred. Previous work has demonstrated that observations from multiple path-based sensors can be fused to create a Bayesian belief map over the spatial locations of the underlying event or phenomenon. Moreover, path planning can employ Shannon information theory to accelerate the rate of convergence of the belief map. In this paper, we present a new method to update the belief map based on a path-based sensor observation, and then plan paths to increase information gain. In contrast to prior work that approximates the posterior by averaging over the alternative event histories, we introduce a Bayesian Network (BN) formulation that models the probabilistic relationships between the latent variables and path-based sensor measurements, enabling a more principled Bayesian belief update. We consider static hazard detection in a communication-denied environment as a representative problem setting. The event of a robot returning from its path corresponds to a path-based hazard sensor reading of "0" (hazard not detected), while a robot failing to return corresponds to a reading of "1" (hazard detected). We consider false positives and false negatives. We find that the new method leads to quicker convergence of the belief map than prior work in both single- and multi-robot cases.
comment: This paper has been accepted for presentation at 17th World Symposium on the Algorithmic Foundations of Robotics (WAFR 2026)
Policy Library CBF: Finite-Horizon Safety at Runtime via Parallel Rollouts
Safety-critical autonomy in unstructured environments poses significant challenges for online safety certification under evolving constraints. We propose Policy Library Control Barrier Function~(PL-CBF), a runtime safety filter that evaluates a library of fallback policies via parallel finite-horizon rollouts, selects the least invasive safe mode, and enforces safety by solving a quadratic program that minimally modifies a nominal policy. We provide a theoretical analysis based on a finite-horizon language metric over closed-loop behaviors, characterizing policy-library coverage requirements for certifying finite-horizon safety. Simulations on a planar double-integrator (4 states), highway driving with abrupt friction changes using a realistic nonlinear vehicle model (8 states), and 3D quadrotor navigation in crowded dynamic environments (12 states) demonstrate improved safety coverage over single-policy safety filters while retaining millisecond-level runtime.
comment: Project page: https://www.taekyung.me/plcbf
From Prompts to Protocols: An AI Agent for Laboratory Automation
Automating science laboratories enables faster, safer, more accurate, and more reproducible execution of protocols, accelerating the discovery and testing of new materials, drugs, and more. However, setting up and running autonomous labs requires coordinating numerous instruments and robots, forcing scientists to write code, manage configuration files, and navigate complex software infrastructure. We present an AI agent architecture that integrates large language models with laboratory orchestration, enabling scientists to interactively create and monitor automated lab protocols using natural language. Integrated into the Experiment Orchestration System (EOS), the AI agent operates under an agentic loop with automated validation and error correction, and supports the complete experimental lifecycle: creating protocols, running and monitoring both protocols and closed-loop optimization campaigns, and analyzing results. A visual graph editor renders protocols as interactive node-based diagrams synchronized with the AI agent's protocol representation, enabling seamless alternation between AI-assisted and manual protocol construction. Evaluated on three simulated automated labs spanning chemistry, biology, and materials science, the AI agent achieves a 97% first-attempt protocol generation success rate and an order of magnitude reduction in required interface actions.
Nori Bot: A Sub-$1,000 Floor-to-Counter Mobile Manipulator
Open-source mobile manipulators have reached $660 (XLeRobot) but every sub-$1,000 platform shares three limitations: a fixed-height workspace, reactive-only control, and no protection against the stall-induced burn-out that destroys cheap Feetech servos. We present Nori Bot, a 17-DoF dual-arm mobile manipulator at $947 (~3% the cost of comparable commercial platforms) that addresses all three: (1) a 600mm Z-axis lift on the existing servo bus for floor-to-counter reach; (2) a thin-client Raspberry Pi 4 paired with the OpenClaw proactive agent runtime so cron jobs and hooks trigger physical tasks autonomously; and (3) a software safety stack with sensorless grip-force feedback via motor current on a soft TPU finger. Code, CAD, and the skill manifest will be released.
comment: 7 pages, 3 figures, 2 tables. Columbia University Deep Learning Robot Manipulation course project, Spring 2026
A Mechanistic Model for Collective Motion from Sensorimotor Regularities
Collective behavior in animals has long been modeled through self-propelled particle models, which reproduce striking group-level phenomena through abstract interaction forces. Yet these models are fundamentally descriptive: they leave open the question of how collective behavior is actually produced. Recent empirical work makes this gap concrete: locusts do not align with neighbors, sensory and cognitive mechanisms mediate interaction instead. A mechanistic model must therefore operate at the sensorimotor level, grounded in what individual organisms can actually perceive, estimate, and physically execute. We present such a model based on a modeling framework from robotics, extended here to collective motion. Each agent perceives neighbors through bearing and apparent-size cues within a limited field of view, maintains uncertain internal state estimates, and selects actions through gradient descent on a desired social distance -- without any prescribed interaction forces. This simple model produces diverse collective behaviors including polarized motion, milling, ring formations, and subgroup fragmentation. A global sensitivity analysis shows that behavioral transitions are governed by sensorimotor parameters corresponding to measurable biological quantities: field of view geometry, sensory noise, turning agility, and memory. Collective behavior can therefore be understood as the emergent outcome of interacting sensorimotor regularities, and differences across species as the emergent outcome of differences in embodiment and environment.
No Plan, Yet Human: A Reactive Robotics Model Predicts Human Planning Failures on a Clinical Task
Understanding why some sequential planning problems are harder than others requires models that go beyond average performance. They should capture the specific pattern of which problems are hard, and ideally fail in the same way people do when planning capacity is reduced. We apply AICON, a reactive gradient-descent framework developed for robotic manipulation, to the Tower of London test, a cognitive test used to assess planning in Parkinson's disease, mild cognitive impairment, and stroke. Without any lookahead planning or knowledge of human cognition, AICON reproduces the fine-grained human difficulty ordering across 24 problems better than structural task parameters and generalizes to held-out problems in a leave-two-out evaluation. Crucially, AICON outperforms a planning baseline for groups with reduced planning capacity while the planning baseline better captures healthy controls. This dissociation was predicted by the original AICON paper, which noted that the model's failure modes resemble those of Parkinson's patients who struggle with goal hierarchies but not move counts. This suggests that as planning capacity is reduced, human behavior shifts toward the reactive mode AICON models. The finding extends a broader pattern: AICON, originally built for robotics, now captures aspects of biological behavior across perception, eye movements, and sequential planning, suggesting its core abstraction reflects something real about how biological systems are organized.
IVGT: Implicit Visual Geometry Transformer for Neural Scene Representation
Reconstructing coherent 3D geometry and appearance from unposed multi-view images is a fundamental yet challenging problem in computer vision. Most existing visual geometry foundation models predict explicit geometry by regressing pixel-aligned pointmaps, often suffering from redundancy and limited geometric continuity. We propose IVGT, an Implicit Visual Geometry Transformer that implicitly models continuous and coherent geometry from pose-free multi-view images. This formulation learns a continuous neural scene representation in a canonical coordinate system and supports continuous spatial queries at any 3D positions, retrieving local features to predict signed distance (SDF) values and colors using lightweight decoders. It allows direct extraction of continuous and coherent surface geometry, enabling rendering of RGB images, depth maps, and surface normal maps from arbitrary viewpoints. We train IVGT via multi-dataset joint optimization with 2D supervision and 3D geometric regularization. IVGT demonstrates generalization across scenes and achieves strong performance on various tasks, including mesh and point cloud reconstruction, novel view synthesis, depth and surface normal estimation, and camera pose estimation.
comment: Code: https://github.com/wzzheng/IVGT/
DexJoCo: A Benchmark and Toolkit for Task-Oriented Dexterous Manipulation on MuJoCo
Achieving human-level manipulation requires dexterous robotic hands capable of complex object interactions. Advancing such capabilities further demands standardized benchmarks for systematic evaluation. However, existing dexterous benchmarks lack tasks that reflect the unique manipulation capabilities of dexterous hands over parallel grippers, as well as comprehensive evaluation pipelines. In this paper, we present DexJoCo, a benchmark and toolkit for task-oriented dexterous manipulation, comprising 11 functionally grounded tasks that evaluate tool-use, bimanual coordination, long-horizon execution, and reasoning. We develop a low-cost data collection system and collect 1.1K trajectories across these tasks, with support for domain randomization to assess robustness. We benchmark modern models under diverse settings, including visual and dynamics randomization, multi-task training, and action-head adaptation. Through extensive empirical analysis, we identify several important insights and common limitations of current policies in dexterous manipulation, highlighting key challenges for future research in dexterous hand robot learning. Project page available at: https://dexjoco.github.io
comment: 8 pages, 6 figures, project page is available at: https://dexjoco.github.io
Learn Where Outcomes Diverge: Efficient VLA RL via Probabilistic Chunk Masking
Reinforcement learning (RL) allows vision-language-action (VLA) policies to generalize beyond their training distribution by optimizing directly for task success, but post-training is computationally expensive. A natural response has been to speed rollout collection through faster simulators and world models. In GRPO-based VLA RL, we find that the dominant cost lies elsewhere: gradient computation accounts for approximately 78% of wall-clock time per step in our runs, while rollout collection accounts for only 21%. Gradient cost dominates because much of this computation is spent on phases that contribute little to learning. GRPO's learning signal is driven by advantage variance: only phases where successful and failed rollouts diverge produce learning signal. However, GRPO assigns the same advantage to every chunk in a rollout. As a result, actor-update compute is spent uniformly across the trajectory, including phases the policy already handles after pre-training and supervised fine-tuning. This paper presents Probabilistic Chunk Masking (PCM), a drop-in modification to GRPO that allocates gradient computation to a small, probabilistically selected subset of chunks per trajectory. PCM scores semantic phases using success-failure action variance, a rollout-derived proxy for per-phase gradient variance, and samples a fixed chunk budget with online-updated phase-level keep probabilities. We formalize per-phase gradient variance as the quantity determines where gradient computation is useful and show that success-failure action variance provides a measurable proxy for it. PCM requires no reward model or learned critic. On three LIBERO benchmarks, PCM matches the final success rate of standard GRPO while achieving 2.38 times wall-clock speedup, 4.8 times faster gradient updates, and 60% lower peak activation memory, while backpropagating through fewer than 20% of trajectory chunks.
STABLE: Simulation-Ready Tabletop Layout Generation via a Semantics-Physics Dual System ICML 2026
Generating simulation-ready tabletop scenes from task instructions is an intriguing and promising research direction in the field of Embodied AI. However, existing task-to-scene generation methods rely exclusively on large language models (LLMs) to predict scene layouts, inevitably yielding object collisions or floating due to LLMs' inherent limitations in 3D spatial reasoning. In this paper, we present STABLE, a semantics-physics dual-system tailored for simulation-ready tabletop scene generation. STABLE consists of two complementary modules: (i) a Semantic Reasoner, a fine-tuned LLM trained on a structured tabletop scene dataset to generate coarse layouts from input task instructions, and (ii) a Physics Corrector, a physics-aware flow-based denoising model that outputs pose updates to refine layouts, which ensures the physical plausibility of scenes while preserves semantic alignment with task instructions. STABLE adopts a progressive generation paradigm: by alternating between the Semantic Reasoner and Physics Corrector, it incrementally expands the scene from task-critical objects to background objects. Experiments demonstrate that STABLE successfully generates simulation-ready tabletop scenes that strictly conform to task instructions and significantly enhances the physical validity of scenes over prior art.
comment: ICML 2026
Beyond Collision Avoidance: Multi-Robot Yielding and Spatial Affordance in Emergency Evacuations
As mobile service robots increasingly coexist with pedestrians, ensuring passively safe behaviour during confined emergency evacuations is critical. Existing multi-robot yielding strategies often focus solely on collision avoidance and macroscopic flow optimisation, overlooking environmental affordances and human spatial expectations. To bridge the gap between macroscopic theory and micro-level perception, we conducted a game-based virtual evacuation experiment (N=56). We investigated individual psychological responses to four multi-robot yielding strategies (Hide, LineEscape, Freeze, ShortestPath) across confined corridors with and without refuge niches. Our results establish a robust preference hierarchy (Hide > LineEscape > Freeze > ShortestPath), demonstrating that proactive space-yielding significantly outperforms freezing and efficiency-first approaches. Crucially, we found that environmental affordances heavily shape cognitive expectations. Actively utilising available niches amplifies the psychological comfort of proactive yielding (Hide). Conversely, failing to use an obvious niche (e.g., executing LineEscape) may trigger Expectation Violation. This is reflected in a drastically increased perceived cognitive delay, despite objectively unimpeded trajectories. Furthermore, prior robot interaction experience helps users decode complex social intents. Ultimately, this research demonstrates that safe human-robot interaction during emergencies must evolve from pure trajectory optimisation to semantically aware navigation. Future work will extend this framework to investigate complex interactions between robot swarms and pedestrian crowds.
Towards Trustworthy and Explainable AI for Perception Models: From Concept to Prototype Vehicle Deployment SC 2026
Deep Neural Networks have become the dominant solution for Autonomous Driving perception, but their opacity conflicts with emerging Trustworthy AI guidelines and complicates safety assurance, debugging, and human oversight. While theoretical frameworks for safe and Explainable AI (XAI) exist, concrete implementations of Trustworthy AI for 3D scene understanding remain scarce. We address this gap by proposing a Trustworthy AI perception module that is remarkably robust, integrates faithful explainability, and calibrated uncertainty estimates. Building on a transformer-based detector, we derive explanation from the attention mechanism at inference time and validate their faithfulness using perturbation-based consistency tests. We further integrate an uncertainty estimation and calibration module, and apply robustness-enhancing training methods. Experiments show faithful saliency behavior, improved robustness, and well-calibrated uncertainty estimates. Finally, we deploy these Trustworthy AI elements in a prototype vehicle and provide an XAI Interface that visualizes documentation artifacts, model uncertainty state, and saliency maps, demonstrating the feasibility of trustworthy perception monitoring in real time. Supplementary materials are available at https://tillbeemelmanns.github.io/trustworthy_ai/ .
comment: Accepted for publication at IEEE ITSC 2026
Health-Conditioned Vision-Language-Action Models for Malfunction-Aware Robot Control ICRA
Research on Vision Language Action (VLA) models has been increasing rapidly in recent years. Although some of them focus on detecting, preventing, and recovering from task failures, they usually don't deal with adapting to robot's physical failures. In real-life scenarios, most robots face physical degradations in various ways such as joint degradation, actuator failure, or weak gripper. We introduce malfunction-aware (health-conditioned) VLA that takes a health vector as an input that gives information about robots' joints' operation angle and torque capability, and adapts its predictions to complete the tasks with the degraded joints. To achieve this, we inject a Health Projector module to the VLA-Adapter architecture and train it on malfunction robot data we collected on the LIBERO environment [1]. We collect 128 teleoperated episodes on Libero-Spatial tasks. Our results show that, with a very lightweight addition, the model can learn to operate successfully with different configurations of degraded joints which the default pretrained VLA-Adapter's Libero-Spatial-Pro model cannot. The code and dataset will be available soon at https://github.com/h-arslan/health-aware-vla
comment: VLA Pipelines Workshop at IEEE International Conference on Robotics and Automation (ICRA) 2026
Learning Sim-Grounded Policies for Bimanual Rope Manipulation from Human Teleoperation Data ICRA 2026
Deformable Linear Objects (DLOs) such as ropes and cables are widely encountered in both household and industrial applications, yet remain challenging to manipulate due to their infinite-dimensional configuration space and frequent self-occlusion. Imitation learning from teleoperation offers a practical path to bimanual DLO manipulation, but its scalability is limited by human effort, making the choice of observation space critical for generalization from small datasets. In this study, we investigate whether the lack of generalization in egocentric visual policies for the knot-untangling task stems from the observation space itself, rather than from the policy architecture or data scale. We compare two Action Chunking with Transformers policies trained on the same bimanual teleoperation data: a vision-based policy conditioned on two egocentric RGB streams from wrist-mounted cameras, and a state-based policy conditioned on the DLO's 3D particle state, extracted from an initial observation via multi-view fusion and evolved in a particle-based eXtended Position-Based Dynamics simulation. Evaluated open-loop on an unseen rope configuration, the state-based policy outperforms its visual counterpart with a 30.8% reduction in L1 error when predicting the initial grasp-and-pull action, quantifying the observability gap between pixels and physics-consistent state, and pointing toward more data-efficient robot learning for the DLO manipulation task from limited human demonstrations.
comment: Accepted to the Beyond Teleoperation Workshop at ICRA 2026, 5 pages, 2 figures
Mind Dreamer: Untethering Imagination via Active Latent Intervention on Latent Manifolds
Model-Based Reinforcement Learning (MBRL) leverages latent imagination for sample efficiency, yet remains constrained by Historical Tethering: imagination is typically initialized from observed states. This creates a learning asymmetry, where the world model's manifold discovery outpaces the policy's sparse-reward optimization. We propose Mind Dreamer (MD), a framework that operationalizes Active Latent Intervention (ALI) to transcend Markovian continuity. MD reformulates discovery as the minimization of a global Relay Manifold Expected Free Energy (R-EFE); by sampling initial states from a learned generator $s_0 \sim p_{gen}(\cdot)$ rather than the historical buffer, MD utilizes an adversarial generator to synthesize non-continuous latent jumps to epistemic blind spots that are physically plausible yet cognitively challenging. To resolve the credit assignment paradox across these spatial ruptures, we derive the Relay Value Function (RVF) and Relay Uncertainty Function (RUF). These potentials treat synthesized anchors as counterfactual intermediary states, propagating pragmatic and epistemic value through a principled Bellman-style formulation. Notably, we prove that uncertainty propagation across discontinuities necessitates a quadratic discount $γ^2$, establishing a formal epistemic horizon. Theoretically, MD approximates a variance-minimizing importance sampler that expands the manifold's spectral gap, reducing the hitting time to critical bottleneck states. Empirically, MD achieves a 1.67$\times$ average speedup over DreamerV3 on DeepMind Control Suite, reaching 8.8$\times$ in sparse-reward tasks.
comment: 34 pages, 7 figures
Adaptive Outer-Loop Control of Quadrotors via Reinforcement Learning
Deep Reinforcement Learning (DRL) for quadrotor flight control typically relies on Domain Randomization (DR) for sim-to-real transfer, resulting in overly conservative policies that struggle with dynamic disturbances. To overcome this, we propose a novel adaptive control architecture that actively perceives and reacts to instantaneous perturbations. First, we train an optimal outer-loop policy, then replace its reliance on ground-truth disturbance data with a Residual Dynamics Predictor (RDP). The RDP estimates the external forces and moments acting on the aircraft in flight online using only the history of states and control actions. For seamless hardware transfer, we introduce a data-efficient linear calibration bridge and an online thrust correction mechanism that align the simulated latent space with reality using mere seconds of flight data. Real-world validations on a Crazyflie micro-quadrotor demonstrate that our adaptive controller significantly outperforms baselines, maintaining precise trajectory tracking under severe uncertainties including mass variations, asymmetric payloads, and dynamic slung loads
Fast Expanding Safe Circular Regions for Efficient Local Path Planning
Local navigation is one of the fundamental problems in robot navigation, and numerous approaches have been proposed over the years, including methods such as the Dynamic Window Approach, Model Predictive Control, and more recently, Control Barrier Functions and machine learning based techniques. While these methods perform well in simple environments, many of them rely on optimization or learning based procedures that can struggle in more complex scenarios. In contrast, this article proposes a more geometric algorithmic approach that enables a local navigation method with faster computation times and longer planning horizons. The proposed method is based on the computation of a sequence of circular regions from a local LiDAR scan that expand in the direction of the goal and capture free local navigable space. The proposed method was implemented in the ROS2 framework and evaluated in a simulated environment.
comment: Accepted by the IFAC World Congress 2026
Constrained MPC-Based Motion Planning for Morphing Quadrotors in Ultra-Narrow Passages under Limited Perception
This paper introduces a motion planning framework to plan morphology and trajectory for morphing quadrotors under extremely constrained environments. We develop a novel obstacle avoidance cost function for nonlinear model predictive control (MPC) that enables navigation through extremely narrow gaps under limited perception from a 2D LiDAR. Classical artificial potential field-based costs typically have a high cost in narrow passages, artificially blocking the navigable path. In contrast, we propose a smooth exponential obstacle cost that preserves low traversal cost within narrow gaps while maintaining strong collision avoidance behavior. The formulation avoids hard activation thresholds and introduces a cost reduction factor to reduce the cost within narrow passages. Direct use of 2D LiDAR measurements in MPC allows navigation around arbitrarily shaped obstacles. The method is embedded within an acados-based nonlinear MPC framework. Simulation and experimental results demonstrate successful traversal of narrow corridors where typical repulsive cost functions would fail. The approach provides a computationally efficient and practical solution for navigating through tight spaces while maintaining safety from the obstacles. While we are implementing the framework on the morphing quadrotors, the cost function formulation is general-purpose for any mobile robot application, and is not limited to the morphing quadrotors. The implementation code is available at \href{https://github.com/harshjmodi1996/morphocopter_mpc}{Github Repo} and a short video is available at \href{https://zh.engr.tamu.edu/wp-content/uploads/sites/310/2026/03/MPC_MorphoCopter_video.mp4}{Video Link}.
Learning Bilevel Policies over Symbolic World Models for Long-Horizon Planning
We tackle the challenge of building embodied AI agents that can reliably solve long-horizon planning problems. Imitation learning from demonstrations has shown itself to be effective in training robots to solve a diversity of complex tasks requiring fine motor control and manipulation over low-level (LL), continuous environments. Yet, it remains a difficult endeavour to generate long-horizon plans from imitation learning alone. In contrast, high-level (HL), symbolic abstractions facilitate efficient and interpretable long-horizon planning. We propose to combine the strengths of LL imitation learning for manipulation and control, and HL symbolic abstractions for long-horizon planning. We realise this idea via \emph{bilevel policies} of the form $(π^{\mathrm{hl}}, π^{\mathrm{ll}})$, consisting of a neural policy $π^{\mathrm{ll}}$ learned from LL demonstrations, and an HL symbolic policy $π^{\mathrm{hl}}$ that is constructed from symbolic abstractions of the LL demonstrations combined with inductive generalisation. We implement these ideas in the BISON system. Experiments on extended MetaWorld benchmarks demonstrate that BISON generalises to long horizons and problems with greater numbers of objects than those solved by VLA and end-to-end methods, and is more time and memory efficient in training and inference. Notably, when ignoring LL execution, BISON's HL policies can solve HL problems with 10,000 relevant objects in under a minute. Project page: https://dillonzchen.github.io/bison
OHP-RL: Online Human Preference as Guidance in Reinforcement Learning for Robot Manipulation
While reinforcement learning (RL) enables robots to acquire skills autonomously, its real-world deployment is severely limited by inefficient and unsafe exploration. Human-in-the-loop interventions offer a practical solution, yet existing methods typically exploit these interventions as auxiliary training signals, without fully capturing the richer information they provide about when and how autonomy should be guided. Human interventions often encode relative preferences over behavior under safety and task constraints, rather than prescribing exact actions to imitate. Motivated by this perspective, we propose Online Human Preference as Guidance in Reinforcement Learning (OHP-RL), a framework that leverages human interventions as preference information to guide policy learning. OHP-RL introduces a state-dependent preference gate that adaptively regulates when and to what extent human interventions should shape policy learning. This design enables the agent to benefit from intermittent and imperfect human feedback while preserving autonomous exploration and stable policy optimization. We evaluate OHP-RL on three challenging real-world contact-rich manipulation tasks on a Franka robot. Across all tasks, OHP-RL consistently achieves strong success rates, faster convergence, and substantially lower human intervention effort than prior approaches. Moreover, the learned policies exhibit more stable and human-aligned behavior throughout training.
WorldVLN: Autoregressive World Action Model for Aerial Vision-Language Navigation
Aerial vision-language navigation (VLN) requires agents to follow natural-language instructions through closed-loop perception and action in 3D environments. We argue that aerial VLN can be formulated as a prediction-driven world-action problem: the agent should anticipate latent world evolution and act according to the predicted consequences. To this end, we propose WorldVLN, the first autoregressive world action model for aerial VLN. Unlike full-sequence video-generation world models that generate an entire visual clip, WorldVLN adapts a latent autoregressive video backbone to predict short-horizon world-state transitions and directly decodes them into executable waypoint actions. After each action segment is executed, newly received observations are encoded back into the autoregressive context, enabling closed-loop world-action prediction. We further introduce a two-stage training framework that first grounds the video prior in instruction-conditioned navigation dynamics and then develops Action-aware GRPO, the first reinforcement learning method tailored to autoregressive WAMs, to optimize waypoint decisions through their downstream rollout consequences. On public outdoor and indoor benchmarks, WorldVLN consistently outperforms existing Vision-Language-Action baselines with 12\%+ success-rate gains and larger advantages on challenging cases. It further transfers zero-shot to real drone deployment, suggesting that the proposed WorldVLN offers a promising route for spatial action tasks. Demos and code are available at https://embodiedcity.github.io/WorldVLN/.
Driving Through the Network: Performance and Workload Under Latency and Video Impairment
Teleoperation promises to extend the operational envelope of automated vehicles, yet it critically depends on network latency and video quality. We report a fixed-base driving-simulator study (N=25) with a 2x2 manipulation of added latency (100/300 ms) and bitrate (500/2000 kbit/s), plus a best-case baseline (0 ms added, 9000 kbit/s). We measured effective glass-to-glass (G2G) latency per condition (baseline approx. 413 ms; effective totals approx. 500-700 ms) and verified stable framerate and encoder settings. Multimodal measures covered performance (speed, steering reversals, crashes), oculomotor behavior (blink rate, fixation duration), physiology (RR interval, heart rate, skin conductance), and subjective workload. Latency and bitrate each increased operator load and modestly affected performance. Physiological measures (heart rate, RR interval) exhibited sub-additive interactions, whereas performance and oculomotor interactions were small or non-significant. Equivalence tests showed that 300 ms with 2000 kbit/s was velocity-equivalent to best-case (SESOI +/- 2 km/h), while 300 ms with 500 kbit/s was not. We argue that latency and video quality should be treated as largely independent design levers, and that physiology-aware adaptation can anticipate overload before safety is compromised.
comment: Preprint of VEHITS 2026 : 12th International Conference on Vehicle Technology and Intelligent Transport Systems
A Reproducible and Physically Feasible Dynamic Parameter Identification Framework for a Low-Cost Robot Arm
This paper presents a reproducible and physically feasible dynamic parameter identification framework for CRANE-X7, a low-cost robot arm driven by modular smart actuators. To improve practical identifiability, products of inertia are removed according to approximate link symmetry, reducing the rigid-body model from 65 to 39 base parameters. Identification motions are hand-designed from structured single-joint and adjacent-joint primitives under practical joint-range limits. The proposed pipeline combines preprocessing, inverse-dynamics-regressor-based ordinary least squares (OLS), conditional semidefinite-programming (SDP) projection for feasibility recovery, and closed-loop input error (CLIE) refinement. Candidate solutions from 40 structured trajectories are analyzed in a common PCA space to select a statistically central representative model. Because statistical centrality alone does not ensure physical acceptability, the selected model is finally screened by an all-pose positive-definiteness audit of the inertia matrix and, when necessary, corrected by a localized post-CLIE SDP rescue step. Experiments show that the parameter cloud becomes progressively more concentrated from OLS to SDP and CLIE, while the final accepted model preserves high predictive accuracy on held-out validation motions. These results demonstrate a practical route to statistically coherent and physically feasible dynamic models for low-cost robot platforms.
comment: 11 pages, 8 figures, 7 tables, and 1 algorithm
FocalPolicy: Frequency-Optimized Chunking and Locally Anchored Flow Matching for Coherent Visuomotor Policy
Visuomotor policies aim to learn complex manipulation tasks from expert demonstrations. However, generating smooth and coherent trajectories remains challenging, as it requires balancing proximal precision with distal foresight. Existing approaches typically focus on optimizing intra-chunk action distributions, often neglecting the inter-chunk coherence. Consequently, inter-chunk discontinuities significantly impede the learning of coherent long-horizon actions. To overcome this limitation and achieve a synergetic balance between precision and foresight, we propose FocalPolicy, a foresight-aware visuomotor policy that combines Frequency-Optimized Chunking with Locally Anchored flow matching. We introduce a foresight composite objective that supervises time-domain alignment within the proximal actions while regularizing frequency-domain structure over multiple future action chunks to improve cross-chunk coherence. To efficiently learn complex action distributions, we design locally anchored campling to enhance target signal propagation efficiency during consistency flow matching training. Extensive experiments demonstrate that FocalPolicy outperforms existing approaches and confirm the generalizability of our modules to other baselines. Project website: https://focalpolicy.github.io/
Dynamic Plasma Shape Control with Arbitrary Sensor Subsets
Plasma shape control in tokamaks requires a real-time controller that tracks dynamically changing shape targets while tolerating diagnostic failures. Classical approaches decompose the problem into equilibrium reconstruction followed by a linear controller, and assume a fixed, fully operational sensor set. We present a reinforcement learning agent that addresses both limitations simultaneously. The agent is trained in NSFsim, a high-fidelity tokamak simulator configured for DIII-D, on a curated dataset of 120 experimental plasma shapes. The shape targets are resampled as random step changes every 0.25 s, exposing the agent to diverse transitions across the full shape envelope. At test time the agent zero-shot tracks dynamic shape sequences; on a held-out static configuration in simulation it achieves a mean shape error of 2.01 cm, and dynamic trajectory following is demonstrated qualitatively in simulation and on the physical device. Diagnostic dropout randomly masks 30% of magnetic sensors per episode, yielding a single policy robust to arbitrary sensor subsets without backup controllers or mode-switching logic. An asymmetric actor-critic architecture with privileged equilibrium information improves value estimation under partial observability; an auxiliary shape reconstruction head on the actor enables end-to-end shape reconstruction from raw diagnostics and serves as an interpretability tool for policy analysis. The policy transfers to experimental DIII-D shots, where it directly commands the coil actuators on two dynamic shape maneuvers, and to the independent GSevolve simulator.
Designing for Robot Wranglers: A Synthesis of Literature and Practice
Robots are increasingly present in human spaces, such as for conducting deliveries in hospitals, interacting with visitors at museums, and stocking items in warehouses. To ensure the seamless integration of robots into these spaces, a new role in human-robot interaction is emerging - the robot wrangler, namely an individual who is responsible for setting up, overseeing, and troubleshooting the robot. To understand the needs of this stakeholder, we conducted a scoping review that uncovered a typology of robot wrangling across the research literature, and discovered that wrangling is an umbrella term that collapses a highly complex and heterogeneous space of activities, often rendering this labor difficult to characterize and support. To further clarify and understand robot wrangling, we then reflected on our own firsthand and imagined experiences as robot wranglers within our own respective domains. Guided by the scoping review and our reflections, we devise a series of design implications for supporting wranglers directly as individuals and as members of a wider service ecology.
comment: Accepted for publication in the Proceedings of ACM Designing Interactive Systems (2026)
Structured Jacobian Construction for Motion Optimization with High-Order Time Derivatives in Multi-Link Systems
This paper presents a novel framework for Jacobian computation in motion optimization problems involving multi-link systems, where physical quantities are represented using higher-order time derivatives. In motion optimization of robots and humans, cost functions may incorporate higher-order time derivatives, such as jerk or the time variation of forces, to capture smoothness and perceptual characteristics, particularly in motion skill analysis and expressive behaviors, thereby necessitating Jacobian computations involving these quantities. However, such Jacobians are typically computed using numerical or automatic differentiation without explicitly exploiting the underlying multi-link structure, which can lead to increased computational cost and numerical instability. To address this limitation, we propose a structured Jacobian formulation for motion optimization, based on the comprehensive motion computation framework, in which physical quantities and their higher-order time derivatives are systematically represented along the multi-link structure. The proposed method systematically derives analytical expressions for Jacobians of kinematic and dynamic quantities, including momentum, forces, and joint torques, with respect to generalized coordinates and their higher-order derivatives. The resulting framework is applicable to both direct and inverse optimization. Through numerical experiments, we demonstrate that the proposed method improves computational efficiency compared to numerical and automatic differentiation, while achieving comparable accuracy. Furthermore, we demonstrate its effectiveness in inverse optimization by recovering cost function weights from motion data. Together, these results indicate that the proposed formulation provides a scalable and structured computational foundation for motion optimization involving higher-order time derivatives in multi-link systems.
GAP: Geometric Anchor Pre-training for Data-Efficient Visuomotor Learning of Manipulation Tasks
Learning visuomotor policies from scarce expert demonstrations remains a core challenge in robotic manipulation. A primary hurdle lies in distilling high-dimensional RGB representations into control-relevant geometry without overfitting. While using frozen pre-trained Vision Foundation Models (VFMs) improves data efficiency, it also shifts most task adaptation onto a small spatial pooling module, which can latch onto task-irrelevant shortcuts and lose geometric grounding when finetuned with few data samples. More broadly, pre-trained visual representations used for policy learning have been observed to struggle under even minor scene perturbations, highlighting the need for robustness-oriented inductive biases. We propose Geometric Anchor Pre-training (GAP), a simple, action-free warm-up stage that regularizes the spatial adapter before downstream imitation learning. GAP pre-trains the pooling layer on a lightweight simulated proxy task where object masks are available at no cost, encouraging the adapter to produce keypoints that lie on the object, cover its spatial extent, and remain sharp and repeatable over time. This yields stable geometric anchors that provide a reliable coordinate interface for few-shot policy learning, while keeping the VFM frozen. We evaluate GAP on RoboMimic and ManiSkill under severe data scarcity (15-50 demonstrations) and domain shift. A simple adapter regularized with GAP consistently outperforms stronger attention-based poolers and end-to-end fine-tuning, achieving 62% success on RoboMimic Can with 15 demonstrations (+16% over AFA), 63% on the long-horizon high-precision Tool Hang task with 50 demonstrations, and 61% on ManiSkill StackCube with 30 demonstrations (+11% over full fine-tuning). The proxy stage is lightweight and fully decoupled from downstream tasks, making it practical to reuse across environments and manipulation skills.
comment: Project webpage at https://lambdavi.github.io/gap
HandelBot: Real-World Piano Playing via Fast Adaptation of Dexterous Robot Policies
Mastering dexterous manipulation with multi-fingered hands has been a grand challenge in robotics for decades. Despite its potential, the difficulty of collecting high-quality data remains a primary bottleneck for high-precision tasks. While reinforcement learning and simulation-to-real-world transfer offer a promising alternative, the transferred policies often fail for tasks demanding millimeter-scale precision, such as bimanual piano playing. In this work, we introduce HandelBot, a framework that combines a simulation policy and rapid adaptation through a two-stage pipeline. Starting from a simulation-trained policy, we first apply a structured refinement stage to correct spatial alignments by adjusting lateral finger joints based on physical rollouts. Next, we use residual reinforcement learning to autonomously learn fine-grained corrective actions. Through extensive hardware experiments across five recognized songs, we demonstrate that HandelBot can successfully perform precise bimanual piano playing. Our system outperforms direct simulation deployment by a factor of 1.8x and requires only 30 minutes of physical interaction data.
comment: Website: https://amberxie88.github.io/handelbot
GRaD-Nav++: Vision-Language Model Enabled Visual Drone Navigation with Gaussian Radiance Fields and Differentiable Dynamics
Autonomous drones capable of interpreting and executing high-level language instructions in unstructured environments remain a long-standing goal. Yet existing approaches are constrained by their dependence on hand-crafted skills, extensive parameter tuning, or computationally intensive models unsuitable for onboard use. We introduce GRaD-Nav++, a lightweight Vision-Language-Action (VLA) framework that runs fully onboard and follows natural-language commands in real time. Our policy is trained in a photorealistic 3D Gaussian Splatting (3DGS) simulator via Differentiable Reinforcement Learning (DiffRL), enabling efficient learning of low-level control from visual and linguistic inputs. At its core is a Mixture-of-Experts (MoE) action head, which adaptively routes computation to improve generalization while mitigating forgetting. In multi-task generalization experiments, GRaD-Nav++ achieves a success rate of 83% on trained tasks and 75% on unseen tasks in simulation. When deployed on real hardware, it attains 67% success on trained tasks and 50% on unseen ones. In multi-environment adaptation experiments, GRaD-Nav++ achieves an average success rate of 81% across diverse simulated environments and 67% across varied real-world settings. These results establish a new benchmark for fully onboard Vision-Language-Action (VLA) flight and demonstrate that compact, efficient models can enable reliable, language-guided navigation without relying on external infrastructure.
comment: Published in: IEEE Robotics and Automation Letters ( Volume: 11, Issue: 2, February 2026)
One Hand to Rule Them All: Canonical Representations for Unified Dexterous Manipulation
Dexterous manipulation policies today largely assume fixed hand designs, severely restricting their generalization to new embodiments with varied kinematic and structural layouts. To overcome this limitation, we introduce a parameterized canonical representation that unifies a broad spectrum of dexterous hand architectures. It comprises a unified parameter space and a canonical URDF format, offering three key advantages. 1) The parameter space captures essential morphological and kinematic variations for effective conditioning in learning algorithms. 2) A structured latent manifold can be learned over our space, where interpolations between embodiments yield smooth and physically meaningful morphology transitions. 3) The canonical URDF standardizes the action space while preserving dynamic and functional properties of the original URDFs, enabling efficient and reliable cross-embodiment policy learning. We validate these advantages through extensive analysis and experiments, including grasp policy replay, VAE latent encoding, and cross-embodiment zero-shot transfer. Specifically, we train a VAE on the unified representation to obtain a compact, semantically rich latent embedding, and develop a grasping policy conditioned on the canonical representation that generalizes across dexterous hands. We demonstrate, through simulation and real-world tasks on unseen morphologies (e.g., 81.9% zero-shot success rate on 3-finger LEAP Hand), that our framework unifies both the representational and action spaces of structurally diverse hands, providing a scalable foundation for cross-hand learning toward universal dexterous manipulation. Project Page: https://zhenyuwei2003.github.io/OHRA/
comment: Accepted at RSS 2026
A Novel Model for 3D Motion Planning for a Generalized Dubins Vehicle with Pitch and Yaw Rate Constraints
In this paper, we propose a new modeling approach and a fast algorithm for 3D motion planning, applicable for fixed-wing unmanned aerial vehicles. The goal is to construct the shortest path connecting given initial and final configurations subject to motion constraints. Our work differs from existing literature in two ways. First, we consider full vehicle orientation using a body-attached frame, which includes roll, pitch, and yaw angles. However, existing work uses only pitch and/or heading angle, which is insufficient to uniquely determine orientation. Second, we use two control inputs to represent bounded pitch and yaw rates, reflecting control by two separate actuators. In contrast, most previous methods rely on a single input, such as path curvature, which is insufficient for accurately modeling the vehicle's kinematics in 3D. We use a rotation minimizing frame to describe the vehicle's configuration and its evolution, and construct paths by concatenating optimal Dubins paths on spherical, cylindrical, or planar surfaces. Numerical simulations show our approach generates feasible paths within 10 seconds on average and yields shorter paths than existing methods in most cases.
comment: The code for this paper is available at https://github.com/DeepakPrakashKumar/3D-Motion-Planning-for-Generalized-Dubins-with-Pitch-Yaw-constraints
Flatness-based trajectory planning for 3D overhead cranes with friction compensation and collision avoidance
This paper presents an optimal trajectory generation method for 3D overhead cranes by leveraging differential flatness. This framework enables the direct inclusion of complex physical and dynamic constraints, such as nonlinear friction and collision avoidance for both payload and rope. Our approach allows for aggressive movements by constraining payload swing only at the final point. A comparative simulation study validates our approach, demonstrating that neglecting dry friction leads to actuator saturation and collisions. The results show that friction modeling is a fundamental requirement for fast and safe crane trajectories.
comment: 6 pages, 8 figures. Final version, after peer review and acceptance, submitted to the 23rd IFAC World Congress
RE-SAC: Disentangling aleatoric and epistemic risks in bus fleet control: A stable and robust ensemble DRL approach
Bus holding control is challenging due to stochastic traffic and passenger demand. While deep reinforcement learning (DRL) shows promise, standard actor-critic algorithms suffer from Q-value instability in volatile environments. A key source of this instability is the conflation of two distinct uncertainties: aleatoric uncertainty (irreducible noise) and epistemic uncertainty (data insufficiency). Treating these as a single risk leads to value underestimation in noisy states, causing catastrophic policy collapse. We propose a robust ensemble soft actor-critic (RE-SAC) framework to explicitly disentangle these uncertainties. RE-SAC applies Integral Probability Metric (IPM)-based weight regularization to the critic network to hedge against aleatoric risk, providing a smooth analytical lower bound for the robust Bellman operator without expensive inner-loop perturbations. To address epistemic risk, a diversified Q-ensemble penalizes overconfident value estimates in sparsely covered regions. This dual mechanism prevents the ensemble variance from misidentifying noise as a data gap, a failure mode identified in our ablation study. Experiments in a realistic bidirectional bus corridor simulation demonstrate that RE-SAC achieves the highest cumulative reward (approx. -0.4e6) compared to vanilla SAC (-0.55e6). Mahalanobis rareness analysis confirms that RE-SAC reduces Oracle Q-value estimation error by up to 62% in rare out-of-distribution states (MAE of 1647 vs. 4343), demonstrating superior robustness under high traffic variability.
Sampling-Based Global Optimal Control and Estimation via Semidefinite Programming
Global optimization has gained attraction over the past decades, thanks to the development of both theoretical foundations and efficient numerical routines. Among recent advances, Kernel Sum of Squares (KernelSOS) provides a powerful theoretical framework, combining the expressivity of kernel methods with the guarantees of SOS optimization. In this paper, we take KernelSOS from theory to practice and demonstrate its use on challenging control and robotics problems. We identify and address the practical considerations required to make the method work in applied settings: restarting strategies, systematic calibration of hyperparameters, methods for recovering minimizers, and the combination with fast local solvers. As a proof of concept, the application of KernelSOS to robot localization highlights its competitiveness with existing SOS approaches that rely on heuristics and handcrafted reformulations to render the problem polynomial. Even in the high-dimensional, non-parametric setting of trajectory optimization with simulators treated as black boxes, we demonstrate how KernelSOS can be combined with fast local solvers to uncover higher-quality solutions without compromising overall runtimes.
Learning Structured Robot Policies from Vision-Language Models via Synthetic Neuro-Symbolic Supervision
Vision-Language Models (VLMs) have recently demonstrated strong capabilities in mapping multimodal observations to robot behaviors. However, most current approaches rely on end-to-end visuomotor policies that remain opaque and difficult to analyze, limiting their use in real-world robotic applications. In contrast, classical robotic systems often rely on structured policy representations that provide interpretability, modularity, and reactive execution. This work investigates how foundation models can be specialized to generate structured robot policies grounded in multimodal perception, bridging high-dimensional learning and symbolic control. We propose a neuro-symbolic approach in which a VLM synthesizes executable Behavior Tree policies from visual observations, natural language instructions, and structured system specifications. To enable scalable supervision without manual annotation, we introduce an automated pipeline that generates a synthetic multimodal dataset of domain-randomized scenes paired with instruction-policy examples produced by a foundation model. By decoupling structured task decomposition under constrained symbolic grammars from hardware-specific motor control, we demonstrate that a 12B-parameter model can learn structured spatial-symbolic mappings required for executable BT synthesis, solely through in-silico supervision. Real-world physical experiments on two heterogeneous robotic manipulators confirm that these structurally constrained policies achieve zero-shot transfer to real-world environments. The results emphasize that the data bottleneck in robotic planning can be bypassed by procedurally synthesizing high-fidelity, neuro-symbolic training data.
Detecting Heel Strike and toe off Events Using Kinematic Methods and LSTM Models
Accurate gait event detection is crucial for gait analysis, rehabilitation, and assistive technology, particularly in exoskeleton control, where precise identification of stance and swing phases is essential. This study evaluated the performance of seven kinematics-based methods and a Long Short-Term Memory (LSTM) model for detecting heel strike and toe-off events across 4363 gait cycles from 588 able-bodied subjects. The results indicated that while the Zeni et al. method achieved the highest accuracy among kinematics-based approaches, other methods exhibited systematic biases or required dataset-specific tuning. The LSTM model performed comparably to Zeni et al., providing a data-driven alternative without systematic bias. These findings highlight the potential of deep learning-based approaches for gait event detection while emphasizing the need for further validation in clinical populations and across diverse gait conditions. Future research will explore the generalizability of these methods in pathological populations, such as individuals with post-stroke conditions and knee osteoarthritis, as well as their robustness across varied gait conditions and data collection settings to enhance their applicability in rehabilitation and exoskeleton control.
An Introduction to Deep Reinforcement and Imitation Learning
Embodied agents, such as robots and virtual characters, must continuously select actions to execute tasks effectively, solving complex sequential decision-making problems. Given the difficulty of designing such controllers manually, learning-based approaches have emerged as promising alternatives, most notably Deep Reinforcement Learning (DRL) and Deep Imitation Learning (DIL). DRL leverages reward signals to optimize behavior, while DIL uses expert demonstrations to guide learning. This document introduces DRL and DIL in the context of embodied agents, adopting a concise, depth-first approach to the literature. It is self-contained, presenting all necessary mathematical and machine learning concepts as they are needed. It is not intended as a survey of the field; rather, it focuses on a small set of foundational algorithms and techniques, prioritizing in-depth understanding over broad coverage. The material ranges from Markov Decision Processes to REINFORCE and Proximal Policy Optimization (PPO) for DRL, and from Behavioral Cloning to Dataset Aggregation (DAgger) and Generative Adversarial Imitation Learning (GAIL) for DIL.
Sparse ActionGen: Accelerating Diffusion Policy with Real-time Pruning
Diffusion Policy has dominated action generation due to its strong capabilities for modeling multi-modal action distributions, but its multi-step denoising processes make it impractical for real-time visuomotor control. Existing caching-based acceleration methods typically rely on $\textit{static}$ schedules that fail to adapt to the $\textit{dynamics}$ of robot-environment interactions, thereby leading to suboptimal performance. In this paper, we propose $\underline{\textbf{S}}$parse $\underline{\textbf{A}}$ction$\underline{\textbf{G}}$en ($\textbf{SAG}$) for extremely sparse action generation. To accommodate the iterative interactions, SAG customizes a rollout-adaptive prune-then-reuse mechanism that first identifies prunable computations globally and then reuses cached activations to substitute them during action diffusion. To capture the rollout dynamics, SAG parameterizes an observation-conditioned diffusion pruner for environment-aware adaptation and instantiates it with a highly parameter- and inference-efficient design for real-time prediction. Furthermore, SAG introduces a one-for-all reusing strategy that reuses activations across both timesteps and blocks in a zig-zag manner, minimizing the global redundancy. Extensive experiments on multiple robotic benchmarks demonstrate that SAG achieves up to 4$\times$ generation speedup without sacrificing performance. Project Page: https://sparse-actiongen.github.io.
OpenFrontier: General Navigation with Visual-Language Grounded Frontiers
Open-world navigation requires robots to make decisions in complex everyday environments while adapting to flexible task requirements. Conventional navigation approaches often rely on dense 3D reconstruction and hand-crafted goal metrics, which limits their generalization across tasks and environments. Recent advances in vision-language navigation (VLN) and vision-language-action (VLA) models enable end-to-end policies conditioned on natural language, but typically require interactive training, large-scale data collection, or task-specific fine-tuning with a mobile agent. We formulate navigation as a sparse subgoal identification and reaching problem and observe that providing visual anchoring targets for high-level semantic priors enables highly efficient goal-conditioned navigation. Based on this insight, we select visual frontiers as semantic anchors and propose OpenFrontier, a navigation framework that requires no task-specific training or fine-tuning and seamlessly integrates diverse vision-language prior models. OpenFrontier enables efficient navigation with a lightweight system design, without dense 3D semantic mapping, task-specific policy training, or model fine-tuning. We evaluate OpenFrontier across multiple navigation benchmarks and demonstrate strong zero-shot performance, as well as effective real-world deployment on a mobile robot.
CLOVER: Closed-Loop Value Estimation and Ranking for End-to-End Autonomous Driving Planning
End-to-end autonomous driving planners are commonly trained by imitating a single logged trajectory, yet evaluated by rule-based planning metrics that measure safety, feasibility, progress, and comfort. This creates a training--evaluation mismatch: trajectories close to the logged path may violate planning rules, while alternatives farther from the demonstration can remain valid and high-scoring. The mismatch is especially limiting for proposal-selection planners, whose performance depends on candidate-set coverage and scorer ranking quality. We propose CLOVER, a Closed-LOop Value Estimation and Ranking framework for end-to-end autonomous driving planning. CLOVER follows a lightweight generator--scorer formulation: a generator produces diverse candidate trajectories, and a scorer predicts planning-metric sub-scores to rank them at inference time. To expand proposal support beyond single-trajectory imitation, CLOVER constructs evaluator-filtered pseudo-expert trajectories and trains the generator with set-level coverage supervision. It then performs conservative closed-loop self-distillation: the scorer is fitted to true evaluator sub-scores on generated proposals, while the generator is refined toward teacher-selected top-$k$ and vector-Pareto targets with stability regularization. We analyze when an imperfect scorer can improve the generator, showing that scorer-mediated refinement is reliable when scorer-selected targets are enriched under the true evaluator and updates remain conservative. On NAVSIM, CLOVER achieves 94.5 PDMS and 90.4 EPDMS, establishing a new state of the art. On the more challenging NavHard split, it obtains 48.3 EPDMS, matching the strongest reported result. On supplementary nuScenes open-loop evaluation, CLOVER achieves the lowest L2 error and collision rate among compared methods. Code data will be released at https://github.com/WilliamXuanYu/CLOVER.
ProCompNav: Proactive Instance Navigation with Comparative Judgment for Ambiguous User Queries
Natural-language instance navigation becomes challenging when the initial user request does not uniquely specify the target instance. A practical agent should reduce the user's burden by actively asking only the information needed to distinguish the target from similar distractors, rather than requiring a detailed description upfront. Existing approaches often fall short of this goal: they may stop at the first plausible candidate before sufficiently exploring alternatives, or, even after collecting multiple candidates, ask about the target's attributes derived from individual candidates rather than questions selected to distinguish candidates in the pool. As a result, despite the dialogue, the agent may still fail to distinguish the target from distractors, leading to premature decisions and lengthy user responses. We propose Proactive Instance Navigation with Comparative Judgment (ProCompNav), a two-stage framework that first constructs a candidate pool and then identifies the target through comparative judgment. At each round, ProCompNav extracts an attribute-value pair that splits the current pool, asks a binary yes/no question, and prunes all inconsistent candidates at once. This reframes disambiguation from open-ended target description to pool-level discriminative questioning, where each question is chosen to narrow the candidate set. On CoIN-Bench, ProCompNav improves Success Rate over interactive baselines with the same minimal input and non-interactive baselines with detailed descriptions, while substantially reducing Response Length. ProCompNav also achieves state-of-the-art Success Rate on TextNav, suggesting that comparative judgment is broadly useful for instance-level navigation among similar distractors. Code is available at https://github.com/tree-jhk/procompnav.
comment: Project page: https://tree-jhk.github.io/procompnav/ . Code: https://github.com/tree-jhk/procompnav/
Multiagent Systems
GRASP: Graph Agentic Search over Propositions for Multi-hop Question Answering
Agentic retrieval improves multi-hop question answering by giving language models autonomy to iteratively gather evidence. Recent work augments these systems with knowledge graphs for structured traversal, but this combination introduces significant cost: expensive graph construction at index time and compounding token usage at inference time. We introduce Graph Agentic Search over Propositions (GRASP), an agentic system that simultaneously optimizes for high accuracy and minimal token usage in multi-hop question answering. Rather than executing a rigid, singular query, GRASP actively coordinates its retrieval strategy by decomposing multi-hop queries into dependency-aware plans. This enables GRASP to dynamically scale the number of sub-agents according to the complexity of the problem. Each sub-agent resolves its single-hop query by exploring a novel three-layer hierarchical graph of entities, propositions, and passages, using the entity layer for targeted traversal and the proposition layer for high-recall passage retrieval via reciprocal-rank voting. We evaluate GRASP on MuSiQue, 2WikiMultihopQA, and HotpotQA under two settings: open-corpus retrieval and extended context reasoning (LongBench). GRASP achieves the highest QA accuracy in the open retrieval setting on MuSiQue and 2Wiki while using 40-50 percent fewer tokens than IRCoT+HippoRAG2. Furthermore, GRASP leads on EM and F1 across all three datasets in the LongBench setting while using 30 percent fewer tokens than the next most accurate method. Finally, we introduce success economy - the amortized token cost per correct answer, weighted by difficulty - and advocate for efficiency-aware evaluation as a standard practice for agentic QA.
A Mechanistic Model for Collective Motion from Sensorimotor Regularities
Collective behavior in animals has long been modeled through self-propelled particle models, which reproduce striking group-level phenomena through abstract interaction forces. Yet these models are fundamentally descriptive: they leave open the question of how collective behavior is actually produced. Recent empirical work makes this gap concrete: locusts do not align with neighbors, sensory and cognitive mechanisms mediate interaction instead. A mechanistic model must therefore operate at the sensorimotor level, grounded in what individual organisms can actually perceive, estimate, and physically execute. We present such a model based on a modeling framework from robotics, extended here to collective motion. Each agent perceives neighbors through bearing and apparent-size cues within a limited field of view, maintains uncertain internal state estimates, and selects actions through gradient descent on a desired social distance -- without any prescribed interaction forces. This simple model produces diverse collective behaviors including polarized motion, milling, ring formations, and subgroup fragmentation. A global sensitivity analysis shows that behavioral transitions are governed by sensorimotor parameters corresponding to measurable biological quantities: field of view geometry, sensory noise, turning agility, and memory. Collective behavior can therefore be understood as the emergent outcome of interacting sensorimotor regularities, and differences across species as the emergent outcome of differences in embodiment and environment.
FORGE: Self-Evolving Agent Memory With No Weight Updates via Population Broadcast
Can LLM agents improve decision-making through self-generated memory without gradient updates? We propose FORGE (Failure-Optimized Reflective Graduation and Evolution), a staged, population-based protocol that evolves prompt-injected natural-language memory for hierarchical ReAct agents. FORGE wraps a Reflexion-style inner loop, where a dedicated reflection agent (using the same underlying LLM, no distillation from a stronger model) converts failed trajectories into reusable knowledge artifacts: textual heuristics (Rules), few-shot demonstrations (Examples), or both (Mixed), with an outer loop that propagates the best-performing instance's memory to the population between stages and freezes converged instances via a graduation criterion. We evaluate on CybORG CAGE-2, a stochastic network-defense POMDP at a 30-step horizon against the B-line attacker, where all four tested LLM families (Gemini-2.5-Flash-Lite, Grok-4-Fast, Llama-4-Maverick, Qwen3-235B) exhibit strongly negative, heavy-tailed zero-shot rewards. Compared against both a zero-shot baseline and a Reflexion baseline (isolated single-stream learning), FORGE improves average evaluation return by 1.7-7.7$\times$ over zero-shot and by 29-72% over Reflexion in all 12 model-representation conditions, reducing major-failure rates (below $-100$) to as low as $\sim$1%. We find that (1) population broadcast is critical mechanism, with a no-graduation ablation confirming that broadcast carries the performance gains while graduation primarily saves compute; (2) Examples achieves the strongest returns for three of four models, Rules offers the best cost-reliability profile with $\sim$40% fewer tokens; and (3) weaker baseline models benefit disproportionately, suggesting FORGE may mitigate capability gaps rather than amplify strong models. All evidence is confined to CAGE-2 B-line; cross-family findings are directional evidence.
Context, Reasoning, and Hierarchy: A Cost-Performance Study of Compound LLM Agent Design in an Adversarial POMDP
Deploying compound LLM agents in adversarial, partially observable sequential environments requires navigating several design dimensions: (1) what the agent sees, (2) how it reasons, and (3) how tasks are decomposed across components. Yet practitioners lack guidance on which design choices improve performance versus merely increase inference costs. We present a controlled study of compound LLM agent design in CybORG CAGE-2, a cyber defense environment modeled as a Partially Observable Markov Decision Process (POMDP). Reward is non-positive, so all configurations operate in a failure-mitigation mode. Our evaluation spans five model families, six models, and twelve configurations (3,475 episodes) with token-level cost accounting. We vary context representation (raw observations vs. a deterministic state-tracking layer with compressed history), deliberation (self-questioning, self-critique, and self-improvement tools, with optional chain-of-thought prompting), and hierarchical decomposition (monolithic ReAct vs. delegation to specialized sub-agents). We find that: (1) Programmatic state abstraction delivers the largest returns per token spent (RPTS), improving mean return by up to 76% over raw observations. (2) Distributing deliberation tools across a hierarchy degrades performance relative to hierarchy alone for all five model families, reaching up to 3.4$\times$ worse mean return while using 1.8-2.7$\times$ more tokens. We call this destructive pattern a deliberation cascade. (3) Hierarchical decomposition without deliberation achieves the best absolute performance for most models, and context engineering is generally more cost-effective than deliberation. These findings suggest a design principle for structured adversarial POMDPs: invest in programmatic infrastructure and clean task decomposition rather than deeper per-agent reasoning, as these strategies can interfere when combined.
paper.json: A Coordination Convention for LLM-Agent-Actionable Papers
LLM agents routinely serve as first (and sometimes only) readers of academic papers, skimming for sub-claims, extracting reproducibility steps, and generalizing scope. Standard prose papers produce recurring failures in this role: sub-claims that cannot be cited at sub-paper granularity, scope overextension beyond what the paper tests, and figure commands buried in codebases rather than the paper itself. We propose `paper.json`, a companion JSON file that travels with the PDF and addresses each failure with a lightweight convention: stable claim IDs (C1), an explicit does-not-claim list (C2), exact per-figure shell commands (C3), and stable definition IDs (C5). A fifth convention (C4) holds that minimum viable compliance, hand-written JSON alongside the PDF, is achievable in under an hour for a finished paper without touching the human-readable output. C1, C2, C3, and C5 are open invitations: an agent that reads a compliant paper and acts on it produces evidence for or against them. This paper is itself compliant: `uv run validator.py paper.json --against paper.typ` passes. Repo: https://github.com/arquicanedo/paper-json
MAxLM: Multi-Agent Language Model-Based Scheduling and Resource Allocation in MU-MIMO-OFDMA-Enabled Wireless Networks
Wireless networks support multi-user (MU) communication with multiple-input multiple-output (MIMO) and orthogonal frequency-division multiple access (OFDMA) technologies. In the joint MU-MIMO-OFDMA-enabled transmission mode, network throughput can be significantly increased by effectively utilizing the multi-channel resources to schedule numerous wireless users/stations (STAs) simultaneously. In this paper, we study ways to optimize the user scheduling and resource allocation (SRA) for the UL scheduled access (UL-SA) of a joint MU-MIMO-OFDMA-enabled wireless local area network (WLAN). In particular, we propose a multi-agent (MA) framework that utilizes an openly available pretrained small/medium-sized Language Model (xLM) to perform SRA for the UL-SA. To facilitate autonomous SRA using our proposed technique, we introduce the AI-assisted Wireless Systems Engineering and Research (WiSER) platform. We evaluate the performance of MAxLM-optimized SRA for network scenarios with a varying number of STAs and antenna settings on the WLAN Access Point. Numerical results confirm that our proposed technique achieves higher UL-SA throughput than the benchmark techniques.
Multi-Agent Cooperative Transportation: Optimal and Efficient Task Allocation and Path Finding
Multi-robot systems are integral to modern logistics, but their capabilities are often limited to tasks executable by individual agents. This paper addresses a critical gap in existing frameworks like Multi-Agent Path Finding (MAPF) and Task Allocation and Path Finding (TAPF), which lack true cooperation for transporting large items that require multiple agents. To this end, we formalise the Cooperative Transportation Task Allocation and Path Finding (CT-TAPF) problem, which integrates team formation, task assignment, and collision-free pathfinding. We present an optimal solver, Cooperative Transportation Task Conflict-Based Search (CT-TCBS), which features a novel Incremental Expansion strategy to tackle the combinatorial explosion inherent in team formation. Recognising the computational cost of optimality, we also develop a family of sub-optimal solvers that employ a global, task-centric perspective, selecting the next task to assign based on a global difficulty metric (Best Task or Worst Task). Our comprehensive empirical evaluation demonstrates three key findings: (1) the incremental expansion strategy significantly outperforms the naive combinatorial approach by successfully pruning the dominant task-allocation search space; (2) we identify a task-conflict expansion dilemma, where sophisticated conflict resolvers effective for large-agent pathfinding subproblems can be detrimental in the integrated CT-TAPF setting; and (3) our proposed sub-optimal solvers establish a new, more efficient frontier on the solution quality-runtime spectrum compared to "nn-" agent-centric baselines. This work provides a foundational framework and a set of effective algorithms for a new, practical class of cooperative multi-agent problems.
Who Owns This Agent? Tracing AI Agents Back to Their Owners
AI agents are increasingly deployed to act autonomously in the world, yet there is still no reliable way to trace a harmful agent back to the account that deployed it. This creates the same accountability gap across both ends of the intent spectrum: benign operators may deploy misconfigured or overbroad agents that cause harm unintentionally, while malicious operators may deliberately weaponize agents for scams, harassment, or cyber attacks. In many cases, these agents are powered by vendor-hosted models, a dependency that holds even for sophisticated adversaries such as state actors conducting cyber operations. In either case, affected parties can observe the behavior but cannot notify the responsible operator, stop the session, or identify the account for investigation. We formalize this gap as the problem of agent attribution: linking an observed agent interaction to the responsible account at the hosting vendor. To our knowledge, this is the first work to define the problem and present a practical solution. Our protocol is canary-based: an authorized party injects a canary into the agent's interaction stream, and the vendor searches a narrow window of session logs to recover the originating session and account. Simple canaries suffice in non-adversarial settings. For adversarial operators who filter or paraphrase incoming content, we develop robust canary constructions that cannot be suppressed without degrading the agent's own task performance, yielding a formal asymmetry in the defender's favor. We evaluate a variety of scenarios including real-world agents and show that our attribution method is reliable, robust, and scalable for vendor-side deployment.
comment: Under Review
BootstrapAgent: Distilling Repository Setup into Reusable Agent Knowledge
Code agents increasingly help developers work with unfamiliar repositories, but every such task depends on a costly prerequisite: bootstrapping the repository into a usable development state. This process requires substantial trial-and-error exploration, yet the resulting knowledge--resolved dependencies, repair strategies--stays trapped in a single conversation, unavailable to future agents. We therefore formulate repository bootstrapping as a reusable startup knowledge problem and introduce BootstrapAgent, a multi-agent framework that distills the heuristics discovered during bootstrap exploration into a persistent, verifiable, agent-consumable .bootstrap contract. Through evidence extraction, structured planning, deterministic Docker-based verification, and trace-driven repair, BootstrapAgent generates a contract covering environment setup, diagnostic checks, minimal verification, and accumulated repair knowledge. We further propose warm repair with clean replay to accelerate iterative debugging without sacrificing cold-start reproducibility, and a delta repair with sanity check to prevent reward hacking. Experiments on three benchmarks show that BootstrapAgent achieves a 92.9% success rate, outperforming the baseline by over 10% while reducing downstream agent token usage by 25.9% and build time by 22.3%. Our code is available at https://github.com/Vossera/BootstrapAgent.
comment: 19 pages, 9 figures, 6 tables
From Gridworlds to Warehouses: Adapting Lightweight One-shot Multi-Agent Pathfinding for AGVs IJCAI 2026
Multi-agent pathfinding (MAPF) under one-shot planning is a core component of warehouse automation, yet classical formulations typically assume four-connected 2D grids with unit-time moves in four directions. To fill reality gaps while still being trackable with discrete combinatorial search, this work proposes a more practical counterpart tailored to differential-drive AGVs. We term this multi-agent warehouse pathfinding (MAWPF), featured with four constraints: (i) agent actions are restricted to straight motion and in-place rotation; (ii) rotations require multi-step costs; (iii) acceleration and deceleration are considered, and; (iv) follower collisions are prohibited to prevent rear-end crashes. To solve MAWPF efficiently, we adapt representative suboptimal MAPF algorithms-PP, LNS2, PIBT, and LaCAM-and conduct comprehensive benchmarking. Our experiments reveal that PP and LNS2 struggle to solve instances with many agents, while PIBT-based approaches achieve preferable scalability with increased solution cost. We believe that these constitute an important step toward adapting classical gridworld MAPF to operational warehouse setups.
comment: To be presented at IJCAI 2026
Preserving Topology Privacy of Network Systems by Feedback: Conditions and Distributed Design
This paper develops a feedback-based method to preserve the topology privacy of consensus protocols in network systems. The key idea is to intentionally violate topology identifiability conditions, thereby preventing unique or accurate recovery of the true topology from available observations, while preserving the intended consensus behavior. This problem is challenging because the feedback magnitude directly reflects the privacy level of edges, while it is strongly coupled with the consensus convergence and constrained by local communications at each node. To begin with, we derive the feedback conditions of both partial and full observation cases, where the topology unsolvability from observation data is characterized in the former, and the solution space that enforces topology inaccuracy from data is constructed in the latter. Then, we propose a novel distributed topology modification design under limited privacy budgets, and establish the performance guarantees through a controllable tradeoff between the consensus deviation and the topology privacy. Finally, we develop a low-complexity heuristic algorithm to achieve optimal privacy preservation on existing edges. Comparative simulations validate the effectiveness and outperformance of the proposed preservation design.
comment: 13 pages
Distributed Zeroth-Order Policy Gradient for Networked Multi-agent Reinforcement Learning from Human Feedback
We study a networked multi-agent reinforcement learning (NMARL) problem with human feedback in an infinite-horizon setting, where agents interact over an underlying network with localized state dependencies and aim to collaboratively maximize the average discounted return. Existing approaches with preference feedback are primarily developed for single-agent settings and rely on centralized training, which limits their scalability and applicability to large-scale networked multi-agent systems. To address this, we introduce a novel human feedback mechanism based on spatiotemporally truncated trajectories, defined as $H$-horizon trajectory pairs aggregated over each agent's $κ$-hop neighborhood. Building on this, we develop a distributed zeroth-order policy gradient algorithm, where each agent estimates its local policy gradient using human preference feedback generated from both the current joint policy and a perturbed joint policy drawn from zero-mean Gaussian distribution. Specifically, the algorithm is fully distributed, as the feedback received by each agent depends solely on the state-action information within its $κ$-hop neighborhood and does not require explicit reward signals or centralized control. We further rigorously establish that the proposed algorithm converges to an $ε$-stationary point with polynomial sample complexity. Finally, simulation results in a stochastic GridWorld environment and a predator-prey environment further demonstrate that the effectiveness and scalability of the proposed algorithm in achieving collaborative optimization based solely on human preference feedback.
Response-Conditioned Parallel-to-Sequential Orchestration for Multi-Agent Systems
Multi-agent systems can solve complex tasks through collaboration between multiple Large Language Model agents. Existing collaboration frameworks typically operate in either a parallel or a sequential mode. In the parallel mode, agents respond independently to queries followed by aggregation of responses. In contrast, sequential systems allow agents to communicate via a directed topology and refine one another step by step. However, both modes are inadequate for achieving the desired objectives of minimizing communication and latency while simultaneously maximizing the accuracy of the final response. In this work, we introduce a hybrid paradigm called Nexa, a trainable response-conditioned policy that bridges the gap between the two modes. Nexa begins with a parallel execution stage, embeds the resulting responses into a shared semantic space, and then predicts a sparse directed acyclic communication graph. If the graph is empty, the system remains purely parallel; if it is non-empty, the system performs one sequential message propagation. The policy is a lightweight transformer model, and the method avoids the need for external LLM judges or reward models, as well as hand-crafted test-time topology search. We formalize this hybrid execution problem, show that the resulting graph is acyclic by construction, and that the framework strictly subsumes pure parallel execution, and present a training procedure based on policy-gradient optimization. Results demonstrate that the response-conditioned policy learned by Nexa under one setting can be reused when the number of agents, the task, or the underlying agent changes, thus emphasizing the generalizability of the learned communication policy.
Task-Semantic Graph-Driven Distributed Agent Networking for Underwater Target Tracking
Autonomous underwater vehicle (AUV) swarms are emerging as intelligent underwater networks, where each node must sense, communicate, process local data, and make decisions under severe acoustic constraints. Persistent underwater target tracking is a typical task with moving targets, changing communication topology, intermittent acoustic links, and limited observation for each AUV. Multi-agent reinforcement learning (MARL) is a natural candidate for distributed tracking, yet existing studies still lack a unified open-source platform for evaluating different MARL algorithms under six-degree-of-freedom AUV dynamics. In addition, policies trained with raw geometric states and low-level force actions often struggle to represent task phases, observation reliability, link quality, and local cooperation roles. This paper addresses these issues by developing an open-source MARL-AUV platform that integrates DI-engine with a six-degree-of-freedom underwater AUV target-tracking simulator. To the best of our knowledge, it is the first open platform that connects a public MARL training framework with physically modeled AUV swarm-based tasks, and provides a unified experimental protocol for fair training, testing, and comparison of representative RL and MARL algorithms. Based on this platform, we propose STG-MAPPO, a Semantic Task Graph-enhanced variant of Multi-Agent Proximal Policy Optimization. STG-MAPPO builds semantic policy inputs from tracking diagnostics, task phases, observation confidence, link availability, neighbor tracking quality, and local role advantage. A compact semantic task graph links communication-constrained network states to decentralized actor decisions, and a velocity-level action abstraction maps high-level cooperative decisions to executable six-degree-offreedom AUV control inputs.The code is available at https://github.com/dasjsaj/MARL-AUV.
Herding CATs: ALARA for Agent Harness Engineering in Portable Composable Multi-Agent Teams
Industry practitioners and academic researchers regularly use multi-agent systems to accelerate their work, but the applications through which users operate these systems do not provide a simple, unified mechanism for scalably managing critical components of the agent harness. This lack of control adversely impacts both the quality of individual human-agent interactions and reduces the capacity for practitioners to coordinate context engineering efforts. The behavioral specifications that define what agents in such systems can do remain fragmented across prose instruction files--for which compliance cannot be guaranteed--or framework-internal configurations, making these specifications difficult to share, version, or collaboratively maintain across teams and projects. Applying the ALARA principle from radiation safety (exposures kept as low as reasonably achievable) to context, we introduce a context-agent-tool (CAT) data layer expressed through interrelated plain-text files, allowing users to directly declare tool access for each agent and to modify the tools themselves that are used by the agents when processing. We demonstrate capability of this CAT data layer to enable real agentic usage by using a command-line shell that loads the team and executes agent runs -- \texttt{npcsh} -- and evaluating 22 locally-hosted models from 0.6B to 35B parameters across 115 practical tasks spanning file operations, web search, multi-step scripting, tool chaining, and multi-agent delegation. We characterize which model families succeed in certain task categories and where they break down across $\sim$2500 total executions.
comment: Accepted to HAXD 2026, 8 pages, 6 figures
Enhancing Clinical Trial Patient Matching through Knowledge Augmentation and Reasoning with Multi-Agent
Matching patients effectively and efficiently for clinical trials is a significant challenge due to the complexity and variability of patient profiles and trial criteria. This paper introduces \textbf{Multi-Agent for Knowledge Augmentation and Reasoning (MAKAR)}, a novel multi-agent system that enhances patient-trial matching by integrating criterion augmentation with structured reasoning. MAKAR consistently improves performance by an average of 7\% across different datasets. Furthermore, it enables privacy-preserving deployment and maintains competitive performance when using smaller open-source models. Overall, MAKAR can contributes to more transparent, accurate, and privacy-conscious AI-driven patient matching.
EnactToM: An Evolving Benchmark for Functional Theory of Mind in Embodied Agents
Theory of Mind (ToM), the ability to track others epistemic state, makes humans efficient collaborators. AI agents need the same capacity in multi agent settings, yet existing benchmarks mostly test literal ToM by asking direct belief questions. The ability act optimally on implicit beliefs in embodied environments, called functional ToM, remains largely untested. We introduce EnactToM, an evolving benchmark of 300 embodied multi-agent tasks set in a 3D household with partial observability, private information, and constrained communication. Each task is formally verified for solvability and required epistemic depth, and new tasks are generated increase difficulty as models improve. On the hard split, all seven evaluated frontier models score 0.0% Pass^3 on functional task completion, while averaging 45.0% on literal belief probes. Manual analysis traces 93% of sampled failures to epistemic coordination breakdowns such as withheld information, ignored partner constraints, and misallocated messages, providing a concrete target for future work.
Helix: A Dual-Helix Co-Evolutionary Multi-Agent System for Prompt Optimization and Question Reformulation
Automated prompt optimization (APO) aims to improve large language model performance by refining prompt instructions. However, existing methods are largely constrained by fixed prompt templates, limited search spaces, or single-sided optimization that treats user questions as immutable inputs. In practice, question formulation and prompt design are inherently interdependent: clearer question structures facilitate focused reasoning and task understanding, while effective prompts reveal better ways to organize and restate queries. Ignoring this coupling fundamentally limits the effectiveness and adaptability of current APO approaches. We propose a unified multi-agent system (Helix) that jointly optimizes question reformulation and prompt instructions through a structured three-stage co-evolutionary framework. Helix integrates (1) planner-guided decomposition that breaks optimization into coupled question-prompt objectives, (2) dual-track co-evolution where specialized agents iteratively refine and critique each other to produce complementary improvements, and (3) strategy-driven question generation that instantiates high-quality reformulations for robust inference. Extensive experiments on 12 benchmarks against 6 strong baselines demonstrate the effectiveness of Helix, achieving up to 3.95% performance improvements across tasks with favorable optimization efficiency.
From Model Design to Organizational Design: Complexity Redistribution and Trade-Offs in Generative AI
This paper introduces the Generality-Accuracy-Simplicity (GAS) framework to analyze how large language models (LLMs) are reshaping organizations and competitive strategy. We argue that viewing AI as a simple reduction in input costs overlooks two critical dynamics: (a) the inherent trade-offs among generality, accuracy, and simplicity, and (b) the redistribution of complexity across stakeholders. While LLMs appear to defy the traditional trade-off by offering high generality and accuracy through simple interfaces, this user-facing simplicity masks a significant shift of complexity to infrastructure, compliance, and specialized personnel. The GAS trade-off, therefore, does not disappear but is relocated from the user to the organization, creating new managerial challenges, particularly around accuracy in high-stakes applications. We contend that competitive advantage no longer stems from mere AI adoption, but from mastering this redistributed complexity through the design of abstraction layers, workflow alignment, and complementary expertise. This study advances AI strategy by clarifying how scalable cognition relocates complexity and redefines the conditions for technology integration.
Don't Retrieve, Navigate: Distilling Enterprise Knowledge into Navigable Agent Skills for QA and RAG
Retrieval-Augmented Generation (RAG) grounds LLM responses in external evidence but treats the model as a passive consumer of search results, with no view of how the corpus is organized or what it has not yet seen. We present Corpus2Skill, which distills a document corpus offline into a hierarchical skill directory and lets an LLM agent navigate it at serve time, drilling from a bird's-eye view through progressively finer summaries down to documents, and backtracking when a branch is unproductive. On an enterprise customer-support benchmark, Corpus2Skill improves both answer quality and grounding over single-shot dense, hybrid, hierarchical-retrieval, and agentic RAG baselines at a moderate cost tradeoff. A ten-subset generalization study further shows that corpus navigation is not a universal replacement for retrieval: it consistently helps on single-domain corpora with a recoverable topical taxonomy, but flat retrieval remains preferable on open-domain factoid pools or homogeneous-tabular corpora that defeat top-level clustering. We characterize this scope distinction and discuss it as a design guideline for knowledge-grounded systems. Code is available at https://github.com/dukesun99/Corpus2Skill.
Skills as Verifiable Artifacts: A Trust Schema and a Biconditional Correctness Criterion for Human-in-the-Loop Agent Runtimes
Agent skills - structured packages of instructions, scripts, and references that augment a large language model (LLM) without modifying the model itself - have moved from convenience to first-class deployment artifact. The runtime that loads them inherits the same problem package managers and operating systems have always faced: a piece of content claims a behavior; the runtime must decide whether to believe it. We argue this paper's central thesis up front: a skill is untrusted code until it is verified, and the runtime that loads it must enforce that default rather than infer trust from a signature, a clearance, or a registry of origin. Without skill verification, a human-in-the-loop (HITL) gate must fire on every irreversible call - which is operationally untenable and degrades into rubber-stamping at any non-trivial scale. With skill verification treated as a separate, gated process, HITL fires only for what is unverified, and the system becomes sustainable. We give a trust schema that includes an explicit verification level on every skill manifest; a capability gate whose HITL policy is a function of that verification level; a biconditional correctness criterion that any candidate verification procedure must satisfy on an adversarial-ensemble exercise; and a portable runtime profile with ten normative guidelines abstracted from a working open-source reference implementation. The contribution is harness- and model-agnostic; nothing here requires retraining, fine-tuning, or proprietary infrastructure.
Chance-Constrained Correlated Equilibria for Robust Noncooperative Coordination
Correlated equilibria enable a coordinator to influence the self-interested agents by recommending actions that no player has an incentive to deviate from. However, the effectiveness of this mechanism relies on accurate knowledge of the agents' cost structures. When cost parameters are uncertain, the recommended actions may no longer be incentive compatible, allowing agents to benefit from deviating from them. We study a chance-constrained correlated equilibrium problem formulation that accounts for uncertainty in agents' costs and guarantees incentive compatibility with a prescribed confidence level. We derive sensitivity results that quantify how uncertainty in individual incentive constraints affects the expected coordination outcome. In particular, the analysis characterizes the value of information by relating the marginal benefit of reducing uncertainty to the dual sensitivities of the incentive constraints, providing guidance on which sources of uncertainty should be prioritized for information acquisition. The results further reveal that increasing the confidence level is not always beneficial and can introduce a tradeoff between robustness and system efficiency. Numerical experiments demonstrate this tradeoff: CC-CE reduces realized coordination cost by up to 35% at intermediate confidence levels, while the proposed information-gain metric consistently identifies effective uncertainty sources to reduce.
STAR: Failure-Aware Markovian Routing for Multi-Agent Spatiotemporal Reasoning
Compositional spatiotemporal reasoning often requires a system to invoke multiple heterogeneous specialists, such as geometric, temporal, topological, and trajectory agents. A central question is how such a system should route among specialists when execution does not simply succeed or fail, but fails in qualitatively different ways. Existing tool-augmented and multi-agent LLM systems typically leave this routing decision implicit in language generation, making recovery ad hoc, difficult to interpret, and hard to optimize. This paper presents STAR (Spatio-Temporal Agent Router), a failure-aware routing framework that externalizes inter-agent control as a state-conditioned transition policy over the current agent, task type, and typed execution status. At the center of STARis an agent routing matrix that combines expert-specified nominal routes with recovery transitions learned from execution traces. Because the matrix conditions on distinct failure states, the router can respond differently to malformed outputs, missing dependencies, and tool--query mismatches, rather than collapsing them into a generic retry signal. Specialists execute through a tool-grounded extract--compute--deposit protocol and write intermediate results to a shared blackboard for downstream fusion. Results prove that retaining unsuccessful traces during training enlarges the support of the routing policy on error states, enabling recovery transitions that success-only training cannot represent. Across three spatiotemporal benchmarks and eight backbone LLMs, STAR improves over multiple baselines with the clearest gains on queries whose execution deviates from the nominal routing path. Router-specific ablations and recovery analyses further show that typed failure-aware routing, rather than specialist composition alone, is a key factor for these improvements.
comment: 30 pages, 13 figures
Herding CATs: ALARA for Agent Harness Engineering in Portable Composable Multi-Agent Teams
Industry practitioners and academic researchers regularly use multi-agent systems to accelerate their work, but the applications through which users operate these systems do not provide a simple, unified mechanism for scalably managing critical components of the agent harness. This lack of control adversely impacts both the quality of individual human-agent interactions and reduces the capacity for practitioners to coordinate context engineering efforts. The behavioral specifications that define what agents in such systems can do remain fragmented across prose instruction files -- for which compliance cannot be guaranteed -- or framework-internal configurations, making these specifications difficult to share, version, or collaboratively maintain across teams and projects. Applying the ALARA principle from radiation safety (exposures kept as low as reasonably achievable) to context, we introduce a context-agent-tool (CAT) data layer expressed through interrelated plain-text files, allowing users to directly declare tool access for each agent and to modify the tools themselves that are used by the agents when processing. We demonstrate capability of this CAT data layer to enable real agentic usage by using a command-line shell that loads the team and executes agent runs -- \texttt{npcsh} -- and evaluating 22 locally-hosted models from 0.6B to 35B parameters across 115 practical tasks spanning file operations, web search, multi-step scripting, tool chaining, and multi-agent delegation. We characterize which model families succeed in certain task categories and where they break down across $\sim$2500 total executions.
comment: Accepted to HAXD 2026, 8 pages, 6 figures
Systems and Control (EESS)
Health-Aware Fast Charging Using Homogenized Model with Heterogeneous Internal State Reconstruction
Fast charging of lithium-ion batteries is limited by lithium plating, which occurs when the anode potential drops below 0 V vs Li/Li+. Model-based control aims to maximize charging current while maintaining anode potentials above this threshold. In this work, a plating-free fast charging strategy is demonstrated using a Homogenized Model (HM) coupled with a classical PID controller. The HM, derived from homogenization theory applied to the Poisson-Nernst-Planck equations, retains the physics of the Doyle-Fuller-Newman model while capturing electrode microstructural heterogeneity in a one-dimensional double-continua formulation. By reconstructing three-dimensional distributions of electrochemical variables from precomputed closure variables, the HM enables non-invasive estimation of heterogeneous anode potentials, acting as a virtual sensor. Through MATLAB-COMSOL co-simulation, a PID controller regulates current to maintain the full 3D anode potential distribution above the plating limit, achieving model-based fast charging at a fraction of the computational cost of high-fidelity models. The results demonstrate the potential of HM-based control for safe, degradation-aware, and efficient fast charging of lithium-ion batteries.
comment: 6 pages, 6 figures. Accepted for presentation at the IFAC World Congress 2026; to appear in IFAC-PapersOnLine
Trajectory-based Safety of Monotone Systems: Verification and Control Synthesis
This paper presents a novel data-driven framework for the robust safety verification and safe control synthesis of unknown monotone discrete-time systems. While existing data-driven safety analysis approaches are often either heuristic in nature or require large amounts of data to provide rigorous guarantees, we leverage the structural property of monotonicity to significantly reduce data requirements while still ensuring formal safety guarantees. Our approach is built upon a new class of certificates called dominance functions, constructed directly from collected system trajectories, which themselves need not be safe. By exploiting the monotone structure of the dynamics, we show that dominance functions are (i) dissipative, meaning that they decrease monotonically along system trajectories, and (ii) sufficiently \expressive to characterize safety certificates for monotone systems. Together, these properties establish dominance functions as principled building blocks for the systematic construction of formal safety certificates directly from trajectory data. For both robust safety verification and safe control synthesis, we develop an efficient sampling-based optimization framework that searches for safety certificates represented as linear combinations of dominance functions constructed from collected trajectories. We validate our data-driven framework on two monotone systems by successfully deriving safety certificates from a small number of trajectories.
comment: Preprint submitted for a journal revision
The Score Kalman Filter
A central obstacle in nonlinear Bayesian filtering is representing the belief distribution. Moment-based filters address this by propagating polynomial moments and reconstructing a density from them. Recent work completes the predict-update loop via the maximum-entropy (MaxEnt) principle, but each step requires the partition function and its gradient, both $n$-dimensional integrals whose cost scales exponentially, restricting the demonstrated MaxEnt moment filtering to $n \le 4$. We avoid the partition function entirely by combining score matching with Stein's identity. In our setting, score matching reduces the density fit to a single linear solve whose coefficients are assembled directly from the propagated moments. The same parameters then drive Stein's identity to close the moment hierarchy during prediction and to recover posterior moments after each Bayesian update, keeping the full predict-update loop free of partition function evaluation. The resulting Score Kalman Filter (SKF) reduces to the classical information-form Kalman filter as a special case and performs every step through linear algebra. On nonlinear coupled-oscillator networks, the SKF runs through $n=20$ and reports lower RMSE than the EKF, UKF, EnKF, and particle-filter baselines on the tested synthetic benchmarks.
comment: 56 pages, 27 figures
A Coupled Inductor Based Multi Port DC DC Converter with Coordinated Duty-Cycle and Phase Shift Control
Electrified powertrains rely heavily on magnetics for power conversion, where cost, volume, and weight concerns make integrated multi-use designs an attractive solution. With EV powertrain architectures requiring a boost stage being a major market segment, the proposed Coupled Inductor-Based Multi-Port DC-DC Converter (CI-MPC) leverages the existing magnetic framework of a conventional topology to realize independent, isolated, and simultaneously regulated converters without additional magnetic cores or cascaded stages. Unlike existing architectures that use secondary windings solely for voltage gain or passive rectification, the proposed topology integrates an actively controlled full bridge on the secondary side to create a distinct, independently regulated auxiliary converter. Primary output regulation is achieved via duty-cycle control, while the auxiliary converter employs phase-shift modulation synchronized with the primary switching to enable active rectification and flexible voltage or current regulation. A unified control framework ensures decoupled operation with minimal interaction between the primary and auxiliary loops, while also avoiding high step-down conversion ratios from high voltages to lower auxiliary levels. The operating principles and coordinated control strategies are validated through simulation and experimental results on a hardware prototype, demonstrating enhanced controllability, decoupled regulation, and a scalable pathway toward generalized multi-port power conversion within a unified magnetic framework.
comment: 11 pages, 13 figures
Policy Library CBF: Finite-Horizon Safety at Runtime via Parallel Rollouts
Safety-critical autonomy in unstructured environments poses significant challenges for online safety certification under evolving constraints. We propose Policy Library Control Barrier Function~(PL-CBF), a runtime safety filter that evaluates a library of fallback policies via parallel finite-horizon rollouts, selects the least invasive safe mode, and enforces safety by solving a quadratic program that minimally modifies a nominal policy. We provide a theoretical analysis based on a finite-horizon language metric over closed-loop behaviors, characterizing policy-library coverage requirements for certifying finite-horizon safety. Simulations on a planar double-integrator (4 states), highway driving with abrupt friction changes using a realistic nonlinear vehicle model (8 states), and 3D quadrotor navigation in crowded dynamic environments (12 states) demonstrate improved safety coverage over single-policy safety filters while retaining millisecond-level runtime.
comment: Project page: https://www.taekyung.me/plcbf
Provably Efficient Sensor Allocation for Unknown High-dimensional Systems with Limited Sensing
This paper focuses on learning efficient sensor allocations that ensure observability of unknown high-dimensional linear systems using only a small number of sensors. Existing methods either require an impractically large number of sensors or assume access to an observable allocation in advance. We propose a two-stage framework that overcomes these limitations: first, a novel system identification algorithm integrates information from multiple trajectories, each observing different subsets of state coordinates; then, a classic sensor allocation method is adapted to operate on the learned system parameters. Our non-asymptotic guarantees show that the proposed approach learns a sensor allocation with a near-optimal number of sensors when sensors can be allocated on any state coordinate. We further extend the results to settings with inaccessible state coordinates that are unavailable for sensor allocation.
Linear Programming Approach to Deceptive Path Planning Game with Goal Selection
In adversarial settings, a mobile agent may strategically plan its motion to influence an opponent's inference about its intended goal. We study deceptive path planning in a scenario where a mobile agent aims to reach a privately selected goal while an adversarial observer allocates limited defensive resources based on the observed trajectory. Unlike classical path-planning and goal-recognition approaches that model observers as passive inference process, our game-theoretic formulation models them as strategic decision-makers. For the resulting dynamic asymmetric-information game, we develop an efficient solution method that combines a linear programming formulation with the Double Oracle algorithm. To evaluate performance, we introduce metrics that quantify both the risk and the effectiveness of deception and provide illustrative numerical examples.
comment: Accepted to American Control Conference 2026
FORGE: Self-Evolving Agent Memory With No Weight Updates via Population Broadcast
Can LLM agents improve decision-making through self-generated memory without gradient updates? We propose FORGE (Failure-Optimized Reflective Graduation and Evolution), a staged, population-based protocol that evolves prompt-injected natural-language memory for hierarchical ReAct agents. FORGE wraps a Reflexion-style inner loop, where a dedicated reflection agent (using the same underlying LLM, no distillation from a stronger model) converts failed trajectories into reusable knowledge artifacts: textual heuristics (Rules), few-shot demonstrations (Examples), or both (Mixed), with an outer loop that propagates the best-performing instance's memory to the population between stages and freezes converged instances via a graduation criterion. We evaluate on CybORG CAGE-2, a stochastic network-defense POMDP at a 30-step horizon against the B-line attacker, where all four tested LLM families (Gemini-2.5-Flash-Lite, Grok-4-Fast, Llama-4-Maverick, Qwen3-235B) exhibit strongly negative, heavy-tailed zero-shot rewards. Compared against both a zero-shot baseline and a Reflexion baseline (isolated single-stream learning), FORGE improves average evaluation return by 1.7-7.7$\times$ over zero-shot and by 29-72% over Reflexion in all 12 model-representation conditions, reducing major-failure rates (below $-100$) to as low as $\sim$1%. We find that (1) population broadcast is critical mechanism, with a no-graduation ablation confirming that broadcast carries the performance gains while graduation primarily saves compute; (2) Examples achieves the strongest returns for three of four models, Rules offers the best cost-reliability profile with $\sim$40% fewer tokens; and (3) weaker baseline models benefit disproportionately, suggesting FORGE may mitigate capability gaps rather than amplify strong models. All evidence is confined to CAGE-2 B-line; cross-family findings are directional evidence.
A Unified Generative-AI Framework for Smart Energy Infrastructure: Intelligent Gas Distribution, Utility Billing, Carbon Analytics, and Quantum-Inspired Optimisation
The accelerating convergence of smart metering, generative artificial intelligence, and quantum-inspired combinatorial optimisation is reshaping how energy utilities manage physical infrastructure, customer engagement, and environmental accountability
Preemption Revisited: Multi-Threshold Preemption Policies for AoI Minimization
The study of optimal preemption policies for status update systems has been a recurring topic in the age of information (AoI) literature, where threshold-based structures have been shown to be optimal under a generate-at-will update generation model under certain assumptions. In this work, we study the effectiveness of threshold-based policies for a system with random update arrivals. In this regard, we introduce an analytical framework for evaluating the AoI of multi-threshold preemption policies and present interesting characteristics of the structure of the optimal preemption policy. We show the effectiveness of these threshold-based policies over the traditional probabilistic preemption policies and single-threshold policies, where we observe that significant gains in terms of AoI can be obtained by utilizing both the age of the packet and the age of the system when designing these preemption policies.
Context, Reasoning, and Hierarchy: A Cost-Performance Study of Compound LLM Agent Design in an Adversarial POMDP
Deploying compound LLM agents in adversarial, partially observable sequential environments requires navigating several design dimensions: (1) what the agent sees, (2) how it reasons, and (3) how tasks are decomposed across components. Yet practitioners lack guidance on which design choices improve performance versus merely increase inference costs. We present a controlled study of compound LLM agent design in CybORG CAGE-2, a cyber defense environment modeled as a Partially Observable Markov Decision Process (POMDP). Reward is non-positive, so all configurations operate in a failure-mitigation mode. Our evaluation spans five model families, six models, and twelve configurations (3,475 episodes) with token-level cost accounting. We vary context representation (raw observations vs. a deterministic state-tracking layer with compressed history), deliberation (self-questioning, self-critique, and self-improvement tools, with optional chain-of-thought prompting), and hierarchical decomposition (monolithic ReAct vs. delegation to specialized sub-agents). We find that: (1) Programmatic state abstraction delivers the largest returns per token spent (RPTS), improving mean return by up to 76% over raw observations. (2) Distributing deliberation tools across a hierarchy degrades performance relative to hierarchy alone for all five model families, reaching up to 3.4$\times$ worse mean return while using 1.8-2.7$\times$ more tokens. We call this destructive pattern a deliberation cascade. (3) Hierarchical decomposition without deliberation achieves the best absolute performance for most models, and context engineering is generally more cost-effective than deliberation. These findings suggest a design principle for structured adversarial POMDPs: invest in programmatic infrastructure and clean task decomposition rather than deeper per-agent reasoning, as these strategies can interfere when combined.
Watts vs. Bytes: Turning Data Centers into Grid Assets via Storage Compute Co-Optimization
Enabling continued data-center growth under increasing grid stress motivates closer coordination between flexible computing demand and co-located battery energy storage systems (BESS) to improve site operations and provide grid services. This paper develops a robust co-optimization framework for day-ahead operation of data centers with co-located BESS under utility-imposed interconnection limits on peak load and ramping. The model jointly considers deadline-constrained computing workloads, managed through workload scheduling and dynamic voltage and frequency scaling (DVFS), together with degradation-aware BESS dispatch to enable cost optimization and participation in ancillary-service markets. Case studies based on real-world market and workload data show that the proposed framework yields feasible day-ahead schedules across a range of operating conditions, with substantially larger benefits when interconnection constraints become binding. Under baseline conditions, BESS value is derived from both ancillary-service participation and improved workload and energy management. Under stressed peak-load and ramping limits, however, the daily value of BESS increases by a factor of two or more, driven primarily \revise{by BESS actions to reduce the potential incompletion in the schedulable workload while complying with interconnection constraints}. Under tight peak-load caps, workload composition also matters where a higher share of non-schedulable jobs can increase operating cost by more than 25\% relative to more flexible workload mixes. \revise{Additionally, DVFS studies further show that processor-level control is a material flexibility lever under tight load limits.} These results demonstrate that coordinated compute-storage flexibility can materially expand the operational headroom and grid value of data centers, especially under increasingly scarce grid capacity.
comment: 17 pages, 10 figures
Covert Bayesian Quickest Change Detection
We investigate the problem of covert quickest change detection in a Bayesian and infinite-horizon setting. A legitimate entity seeks to detect a change in the state of a discrete memoryless channel as quickly as possible by actively probing it. Simultaneously, the entity must ensure its probing remains covert from an adversary monitoring the channel for active sensing. We introduce the expected covertness budget (ECB) as an analytically tractable covertness metric that bounds from above the relative entropy between the observation sequences induced by active and passive sensing. Under constraints on both the probability of false alarm (PFA) and the ECB, we establish a second-order asymptotic converse bound on the average detection delay as the PFA constraint approaches zero, for any positive ECB constraint, explicitly quantifying the maximum square-root-order covert sensing gain possible. Furthermore, we propose an achievability scheme utilizing a constant-sensing-probability Shiryaev-type policy and show that it matches the second-order asymptotic converse. We illustrate our result with a numerical example.
comment: 36 pages, 2 figures. Submitted to IEEE ITW 2026
Active Learning MPC Objective Functions from Preferences
Designing the objective function in Model Predictive Control (MPC) is challenging when performance assessment criteria are available only from human judgment. We adopt a preference-based learning (PbL) approach to learn the MPC objective function from preferences over trajectory pairs. However, the real-world application of PbL is often restricted by the significant cost or limited availability of human preference queries. To address this, Active Learning (AL) strategies seek to improve sampling efficiency, reducing the labeling effort required to obtain a well-performing classifier. We present two AL strategies for learning the MPC objective function from human preferences over pairwise system trajectories: a pool-based strategy that selects trajectory pairs that are both uncertain under the current surrogate and diverse relative to previously labeled comparisons, and a query-synthesis strategy that incorporates new trajectories using the current surrogate-driven MPC. Numerical results show that the proposed strategies yield closed-loop behaviors that align more with the expressed preference using fewer number of queries compared to a random sampling approach.
comment: (6 pages, 3 figures)
Communication-Efficient Federated Online Decision-Making with Stateful Costs
We study dynamic regret in federated online decision-making with stateful incurred costs under block-based synchronization and partial client participation. In this setting, sparse communication affects not only the pointwise update quality but also the realized state trajectory along which costs are incurred. We propose \textbf{BLADE}, a projected blockwise federated online decision method. BLADE uses only \(O(T/K)\) communication and achieves a dynamic-regret bound for the incurred cost against path-length-bounded comparator sequences; under \(K=\lceil\sqrt T\rceil\), the bound is sublinear whenever \(V_T=o(T^{1/4})\). Experiments on a controlled synthetic stable linear system validate the predicted communication--regret, memory, participation, disturbance-variation, and horizon-scaling effects.
comment: Preprint. Comments welcome
Enhanced input stacking for non-square MIMO modal identification of aeronautical structures via Fast and Relaxed Vector Fitting
Fast and Relaxed Vector Fitting (FRVF) is a frequency-domain system identification approach that has been widely adopted in electrical system modelling, while its application to mechanical systems has remained relatively unexplored. In this work, FRVF is reformulated for the identification of structural modal parameters of an aircraft based on Ground Vibration Test (GVT) data within a Multi-Input Multi-Output (MIMO) framework. The proposed procedure consists of three stages: (i) rational approximation of frequency response functions via an enhanced input-stacking strategy, (ii) identification of system poles from the resulting rational model, and (iii) estimation of modal parameters from the extracted poles and associated residues. The methodology is first numerically validated on a MIMO beam model, with particular emphasis on accuracy and robustness under increasing measurement noise. Subsequently, experimental validation is conducted using GVT data from the BAE Systems Hawk T1A aircraft. The results obtained demonstrate a level of performance comparable to that achieved by existing methods. Overall, the extended MIMO formulation of FRVF exhibits high accuracy and strong robustness to measurement noise, highlighting its suitability for application in GVT-based modal analysis.
Constrained MPC-Based Motion Planning for Morphing Quadrotors in Ultra-Narrow Passages under Limited Perception
This paper introduces a motion planning framework to plan morphology and trajectory for morphing quadrotors under extremely constrained environments. We develop a novel obstacle avoidance cost function for nonlinear model predictive control (MPC) that enables navigation through extremely narrow gaps under limited perception from a 2D LiDAR. Classical artificial potential field-based costs typically have a high cost in narrow passages, artificially blocking the navigable path. In contrast, we propose a smooth exponential obstacle cost that preserves low traversal cost within narrow gaps while maintaining strong collision avoidance behavior. The formulation avoids hard activation thresholds and introduces a cost reduction factor to reduce the cost within narrow passages. Direct use of 2D LiDAR measurements in MPC allows navigation around arbitrarily shaped obstacles. The method is embedded within an acados-based nonlinear MPC framework. Simulation and experimental results demonstrate successful traversal of narrow corridors where typical repulsive cost functions would fail. The approach provides a computationally efficient and practical solution for navigating through tight spaces while maintaining safety from the obstacles. While we are implementing the framework on the morphing quadrotors, the cost function formulation is general-purpose for any mobile robot application, and is not limited to the morphing quadrotors. The implementation code is available at \href{https://github.com/harshjmodi1996/morphocopter_mpc}{Github Repo} and a short video is available at \href{https://zh.engr.tamu.edu/wp-content/uploads/sites/310/2026/03/MPC_MorphoCopter_video.mp4}{Video Link}.
State Estimation
Control science is a core representative of the third industrial revolution and is so important to modern civilization. Control systems are the main subject of control science and may involve many aspects of consideration, such as hardware consideration, software consideration, operation consideration, maintenance consideration, economy consideration, society consideration. However, besides all such aspects of consideration, one aspect that is most essential to the control system is methodology consideration in mathematical sense, knowledge on which is what we refer to as control theory. Besides its importance from the mathematical perspective, control theory is even more charming as it is deeply rooted in practical applications. Charms of control theory consist in both know-why and know-how and it is the fusion of control theory and practical applications that highlights such charms. Control theory for practical applications, especially when somewhat with so-called "advanced" flavour, involves several fundamental aspects. This article introduces the State Estimation aspect of Advanced Control Theory for Practical Applications [1,2].
Dynamic Plasma Shape Control with Arbitrary Sensor Subsets
Plasma shape control in tokamaks requires a real-time controller that tracks dynamically changing shape targets while tolerating diagnostic failures. Classical approaches decompose the problem into equilibrium reconstruction followed by a linear controller, and assume a fixed, fully operational sensor set. We present a reinforcement learning agent that addresses both limitations simultaneously. The agent is trained in NSFsim, a high-fidelity tokamak simulator configured for DIII-D, on a curated dataset of 120 experimental plasma shapes. The shape targets are resampled as random step changes every 0.25 s, exposing the agent to diverse transitions across the full shape envelope. At test time the agent zero-shot tracks dynamic shape sequences; on a held-out static configuration in simulation it achieves a mean shape error of 2.01 cm, and dynamic trajectory following is demonstrated qualitatively in simulation and on the physical device. Diagnostic dropout randomly masks 30% of magnetic sensors per episode, yielding a single policy robust to arbitrary sensor subsets without backup controllers or mode-switching logic. An asymmetric actor-critic architecture with privileged equilibrium information improves value estimation under partial observability; an auxiliary shape reconstruction head on the actor enables end-to-end shape reconstruction from raw diagnostics and serves as an interpretability tool for policy analysis. The policy transfers to experimental DIII-D shots, where it directly commands the coil actuators on two dynamic shape maneuvers, and to the independent GSevolve simulator.
Communication-Efficient Approximate Gradient Coding for Distributed Learning in Heterogeneous Systems
We propose a communication-efficient optimally structured gradient coding scheme to jointly address straggler resilience and communication efficiency in heterogeneous distributed learning. By establishing a unified framework that simultaneously optimizes gradient coding and quantization, we formulate an optimization problem to minimize residual error subject to an unbiasedness constraint. We rigorously establish the joint global optimum by deriving a closed-form code structure coupled with an optimal bit allocation strategy, while simultaneously proposing a low-complexity bit allocation algorithm that efficiently yields near-optimal performance. We provide rigorous convergence analysis for convex and smooth functions. Experiments on the COCO dataset demonstrate that our joint design significantly accelerates convergence and enhances communication efficiency compared to existing baselines.
Uncertainty Propagation under Residual Disturbances: A Smart-Home Case Study
This paper presents a data-driven framework for uncertainty propagation under unmeasured or statistically unmodeled (unstructured) disturbances. We consider residual disturbances, which consolidate all unstructured disturbances into a single quantity that can be estimated from data. Under mild assumptions, the resulting stochastic predictor is causal and distributionally consistent, enabling efficient uncertainty quantification through polynomial chaos expansions and higher-order Chebyshev inequalities. The proposed method is validated using experimental data from a smart home in Norway.
comment: Accepted by IFAC World congress 2026
The Shared Prosperity Internet
The Shared Prosperity Internet (SPI) is a network-computing architecture that makes the benefits of automation and Artificial Intelligence (AI) broadly accessible to the society. To ground its design, this paper maps the physical constraints of Shannon, Landauer, Turing, and Einstein to three design principles: trustworthiness, sustainability, and technological sovereignty, and maps them into three technical pillars: i) post-Shannon, goal-oriented communication that transmits only what the task requires; ii) anticipatory decision-making ("negative latency") with confidence-bounded pre-action and correction; and iii) beyond-digital computing that selects energy-optimal substrates under deadline and computability constraints. The SPI is grounded in three societal use cases: remote teaching for pupils, remote teaching of robots and cyber-physical systems, and elder care. Furthermore, this paper defines measurable outcomes for an SPI, including latency decomposition, bits per event, energy and CO2 per task, safety and privacy indicators, and robustness.
comment: 8 pages, conference, 4 figures, 16 references
Reactive Robot-Centric Safety for Autonomous Navigation in Constrained and Dynamic Environments
In this work, we address the problem of ensuring real-time safety in autonomous robot navigation, in spatially constrained dynamic environments, by utilizing only onboard sensors. We present a real-time control architecture that integrates a 3D LIDAR perception-based composite control barrier function(CBF)-based safety filter directly into the autonomy pipeline. The proposed perception-driven framework enforces collision avoidance constraints dynamically from onboard point cloud data, thus allowing a large number of constraints to be handled at the control frequency, while remaining minimally invasive to nominal task execution. The safety region is defined as an ellipsoid in the body-frame, consistent with the geometry of the platform, which induces time-varying constraints in the world frame as the robot rotates; this effect is handled through a dedicated formulation of time-varying (CBF) for each LIDAR point. We validate the system through multiple field experiments in underground environments by utilizing a quadruped platform performing a visual inspection task, demonstrating reliable operation in the presence of dynamic obstacles, unsafe high-level references, abrupt localization anomalies, and while traversing through narrow corridors.
comment: 9 pages, 12 figures, currently under review
Fairness-Guaranteed Online Power Allocation Policies for EV Fast Charging Stations
The rapid expansion of electric vehicles (EVs) necessitates scalable and efficient fast charging station (FCS) infrastructure. These stations often operate in oversubscribed configurations where the total port rating exceeds a station-level cap reflecting infrastructure limits, grid constraints or market setpoints. In such settings, ensuring fairness in real-time power allocation is essential to prevent user bias and secure equitable access to limited resources while maximizing infrastructure utilization. This task is further complicated by state-of-charge dependent EV power limits defined by charge curves, for which accurate data is often unavailable. This paper introduces two fairness-guaranteed online power allocation policies: FAIR-OPAP-C for conventional FCSs with continuously adjustable power delivery, and FAIR-OPAP-M for modular FCSs composed of discrete assignable power modules. Unlike existing methods, these algorithms require no prior knowledge of charge curves, utilizing only instantaneous power requests available via standard protocols. We formalize fairness with a unified framework encompassing envy-freeness, Pareto efficiency, and proportionality, and establish theoretical guarantees for both algorithms. The algorithms rely on lightweight operations, achieving near-linear and logarithmic scalability for the conventional and modular cases, respectively. Comprehensive evaluations show the proposed methods achieve superior performance across various metrics among seven benchmarks from EV charging and fair division literature. Furthermore, they are orders of magnitude faster than optimization-based approaches, with runtimes below 1 ms for up to 300 EVs, validating their suitability for real-time deployment on hardware-constrained edge devices.
comment: 12 pages, 5 figures
Preserving Topology Privacy of Network Systems by Feedback: Conditions and Distributed Design
This paper develops a feedback-based method to preserve the topology privacy of consensus protocols in network systems. The key idea is to intentionally violate topology identifiability conditions, thereby preventing unique or accurate recovery of the true topology from available observations, while preserving the intended consensus behavior. This problem is challenging because the feedback magnitude directly reflects the privacy level of edges, while it is strongly coupled with the consensus convergence and constrained by local communications at each node. To begin with, we derive the feedback conditions of both partial and full observation cases, where the topology unsolvability from observation data is characterized in the former, and the solution space that enforces topology inaccuracy from data is constructed in the latter. Then, we propose a novel distributed topology modification design under limited privacy budgets, and establish the performance guarantees through a controllable tradeoff between the consensus deviation and the topology privacy. Finally, we develop a low-complexity heuristic algorithm to achieve optimal privacy preservation on existing edges. Comparative simulations validate the effectiveness and outperformance of the proposed preservation design.
comment: 13 pages
Enabling Intelligent Bidirectional Charging: A Real-World Communication Interface Between Electric Vehicles, Charging Infrastructure, and a Control Optimizer
This paper presents the real-world implementation and field validation of a user-aware bidirectional electric vehicle (EV) charging system developed within the Mobilities for EU and DymoBat projects in Dresden. Building on earlier simulation frameworks, the system enables transition from conceptual models to operational deployment in urban environments. To support grid flexibility and sustainable mobility, the solution combines real-time vehicle and user data with a centralized optimization platform to enable dynamic charging and discharging decisions. The architecture integrates a wireless On-Board Diagnostic II (OBD-II) interface and an open middleware node connected via a 5G campus network, allowing early access to vehicle state-of-charge before plug-in. A tablet-based interface captures user preferences such as departure time and energy demand, which are incorporated into the optimization together with grid conditions. A key contribution is a multi-level communication architecture linking the EV, charging station, user interface, and grid control center using the Open Charge Point Protocol (OCPP). The system integrates software, embedded hardware, and network communication for real-time charging management. Field deployment at Ostra Sport Park in Dresden demonstrates feasibility, improved load balancing, and robust vehicle-to-grid operation. The results show that early data acquisition and predictive control can enhance system efficiency. This work provides a practical benchmark for positive energy districts and future urban e-mobility systems.
Transformer-like Inference from Optimal Control
Decoder-only transformers compute the conditional probability of the next token from a sequence of past observations. This paper derives, from first principles, inference architectures that solve the same prediction problem - and in doing so, recovers transformer-like layer operations as a consequence of optimal control theory. The framework is developed for two model classes: a nonlinear model of discrete-valued processes, directly motivated by the transformer, and a linear Gaussian model as a tractable baseline. For both model classes, the prediction objective is reformulated as an optimal control problem whose solution yields an explicit inference algorithm, the dual filter, with a layer structure that mirrors the layer structure of a decoder-only transformer. Numerical experiments provide a comparison of the optimal control to attention weights from a trained transformer. These experiments reveal that when the embedding dimension is insufficient, the transformer implicitly exploits non-Markovian structure.
comment: Preprint
Direct Data-Driven Linear Quadratic Tracking via Policy Optimization
Direct data-driven optimal control provides an elegant end-to-end paradigm, yet its real-time applicability is often hindered by the growing dimensionality of online decision variables. Recent breakthroughs, notably Data-EnablEd Policy Optimization (DeePO), overcome this bottleneck for the Linear Quadratic Regulator (LQR) through sample-covariance parameterization; however, extending this paradigm to Linear Quadratic Tracking (LQT) poses a fundamental challenge. The core difficulty stems from the intricate coupling between time-varying references and the feedback-feedforward policy structure, which prevents a direct application of constant-dimension parameterization. We first introduce a reference-decoupled reformulation of LQT that naturally accommodates the covariance parameterization, guaranteeing a fixed dimension of decision variables independent of data horizon. This formulation is proven to be exactly equivalent to the indirect certainty-equivalence LQT solution. Leveraging this characterization, we develop offline and online DeePO algorithms. Theoretically, we prove global linear convergence for the offline algorithm using local gradient dominance and smoothness, and show that in the online setting the optimality gap decays linearly up to a bias term that scales inversely with the signal-to-noise ratio (SNR). Numerical simulations varify the theoretical results and illustrate the superior tracking performance of the proposed method.
Stochastic Mirror Descent under Iterate-Dependent Markov Noise: Analysis in the Asymptotic and Finite Time Regimes
We study a stochastic optimization problem in which the sampling distribution depends on the decision variable, and the available samples are generated through an iterate-dependent Markov chain. Such settings arise naturally in problems with decision-dependent uncertainty; however, they introduce bias and temporal dependence, which render standard techniques developed for i.i.d.\ noise inapplicable. In this work, we analyze the stochastic mirror descent algorithm under iterate-dependent Markov noise. We first establish almost sure convergence for both convex and non-convex problems under the mild assumption of Lipschitz continuity of the objective function, without requiring differentiability. We then derive finite-time concentration bounds for smooth objectives. In the convex setting, the resulting sample complexity matches the classical rate of stochastic mirror descent under i.i.d.\ noise. In the non-convex setting, we obtain a sample complexity bound in terms of the norm of the Riemannian gradient over the probability simplex. Overall, our results establish a unified convergence framework for stochastic mirror descent with state-dependent Markov noise, and highlight its behavior in both convex and non-convex regimes.
Distributionally Robust Nash Equilibrium Seeking with Partial Observations and Distributed Communication
In this work, we study stochastic one-shot games where agents' utilities depend on the collective strategy profiles of other agents as well as on some well-behaved randomness. While each decision-maker is agnostic to the random variable's underlying distribution, they have access to finitely many i.i.d. samples generated from it. We consider two cases: one where samples are shared; and another, more special one, where samples are individually accessible. To hedge against the unknown uncertainty, each agent plays a distributionally robust game and aims to maximize the worst-case expected utility over a Wasserstein ball around the sample average distribution. In this setting, we provide conditions under which the game has a non-empty set of distributionally robust Nash equilibria (DRoNE) and then characterize the closeness of the DRoNE set to the Nash equilibria (NE) of the associated stochastic game. We then propose an inertial, supported, better response, ascending supergradient dynamics ISBRAG that seeks the DRoNE's when the distributionally robust game possesses what we term as amicable supergradients. This forms the basis of a distributed version (d-ISBRAG) where agents estimate others' strategies by means of a dynamic consensus subroutine over a directed communication network. While initially the distributed algorithm works in the case where agents have individual samples, we later extend this to the case of shared observations under certain simplifying assumptions. This involves analyzing a tractable reformulation of the distributionally robust optimization problem and solving it in a distributed manner to compute the required supergradients. Simulations illustrate our results.
comment: 22 pages, 2 figures
Terrain Consistent Reference-Guided RL for Humanoid Navigation Autonomy
We present a method for training reference-guided, perceptive reinforcement learning locomotion policies for humanoid robots in which reference trajectories are modulated in training to be consistent with terrain geometry. Aiming to deploy our method with standard navigation autonomy infrastructure, we synthesize SE(2)-controllable reference trajectories inside the RL training loop, projecting desired footsteps onto valid footholds and adjusting swing-foot and center-of-mass trajectories to match the terrain. The resulting policy exposes a clean SE(2) velocity interface compatible with standard navigation planners. In simulation, environmentally-conditioned references significantly improve reference tracking performance compared to environment agnostic references. On hardware, we integrate the policy with an MPC + control barrier function planner and demonstrate long-horizon (>70m) closed-loop autonomous navigation on the Unitree G1 through outdoor environments containing rough terrain and consecutive flights of stairs, with all sensing and computation onboard.
comment: 8 pages, 4 figures, intended to submit to Humanoids 2026
Co-Design Optimization for Data Center Cooling System via Digital Twin
Liquid-cooled exascale supercomputers dissipate heat through cooling plants organized as multiple parallel subloops, but how to allocate coolant distribution units (CDUs) across subloops and how to distribute flow among them has not been systematically addressed for facilities at this scale. This paper presents a three-layer optimization framework that jointly determines the integer partition of CDUs across subloops, the continuous flow fraction allocation, and the per-timestep co-design optimization of total flow rate and supply temperature subject to per-subloop thermal safety constraints. The Modelica simulation model is built based on the data of Frontier exascale supercomputer at Oak Ridge National Laboratory. By developing a reduced-order surrogate model, all 611 feasible partitions of 25 CDUs are evaluated across the full year operational dataset of 49,353 timesteps. Three progressively richer operational strategies are compared, ranging from flow control optimization to full three-layer co-design optimization with dynamically adjusted flow fractions. The globally optimal design is a two-subloop plant achieving 35.48% annual cooling energy savings, only 0.18% above the current three-subloop Frontier design at 35.30%. Flow fraction optimization is shown to compensate for any feasible CDU-to-subloop assignment, reducing the design sensitivity by 93% and providing a low-cost software-only pathway to near-optimal performance on the existing Frontier hardware. The framework is transferable to other liquid-cooled high-performance computing plants.
comment: 12 pages, 8 figures
High-Resolution PTDF-Based Planning of Storage and Transmission Under High Renewables
Transmission Expansion Planning (TEP) optimizes power grid upgrades and investments to ensure reliable, efficient, and cost-effective electricity delivery while addressing grid constraints. To support growing demand and renewable energy integration, energy storage is emerging as a pivotal asset that provides temporal flexibility and alleviates congestion. This paper develops a multiperiod, two-stage PTDF formulation that co-optimizes transmission upgrades and storage siting/sizing. To ensure scalability, a trust-region, multicut Benders scheme warm-started from per-representative-day optima is proposed. Applied to a 2,000-bus synthetic Texas system under high-renewable projections, the method attains final optimality gaps below 2% and yields a plan with storage at 167 nodes (32% of peak renewable capacity). These results demonstrate that the proposed PTDF-based methodology efficiently handles large distributed storage fleets, demonstrating scalability at high spatial resolution.
Functional requirements decomposition in set-based design
Designing systems is typically uncertain and ambiguous at early stages. Set-based design supports alternative exploration and gradual uncertainty reduction during the early lifecycle, making it practical for complex systems design. In parallel, the functional requirements decomposition helps to advance the design incrementally. However, current literature on set-based design lacks formal guidance in how to decompose functional requirements. To bridge this gap, we introduce a four-step method to decompose functional requirements for set-based design hierarchically. We systematically define, reason, and narrow the sets, breaking down the functional requirements into formal sub-requirements. This method allows parallel abstraction, ensuring the resulting system satisfies the top-level functional requirements.
SPARe: Stacked Parallelism with Adaptive Reordering for Fault-Tolerant LLM Pretraining Systems with 100k+ GPUs ICML 2026
In large-scale LLM pre-training systems with 100k+ GPUs, failures become the norm rather than the exception, and restart costs can dominate wall-clock training time. However, existing fault-tolerance mechanisms are largely unprepared for this restart-dominant regime. To address this challenge, we propose SPARe - Stacked Parallelism with Adaptive Reordering - a fault-tolerance framework that masks node failures during gradient synchronization by stacking redundant data shards across parallelism groups and adaptively reordering execution. SPARe achieves availability comparable to traditional replication while maintaining near-constant computation overhead of only 2~3x, even under high redundancy where traditional replication would require linearly inflating overhead. We derive closed-form expressions for endurable failure count and computation overhead, validate them via SimGrid-based discrete-event simulation, and jointly optimize redundancy and checkpointing to minimize time-to-train. At extreme scale with up to 600k GPUs, SPARe reduces time-to-train by 40~50% compared to traditional replication.
comment: Forty-Third International Conference on Machine Learning (ICML 2026)
Run-to-Run Indirect Trajectory Tracking Control of Electromechanical Systems Based on Identifiable and Flat Models
Differentially flat models are frequently used to design feedforward controllers for electromechanical systems. However, control performance depends on model accuracy, which makes feedback imperative. This paper presents a control scheme for electromechanical systems in which measuring or estimating the output to be controlled -- typically the position -- is not feasible. It employs an identifiable-model-based controller and predictor, coupled with an iterative loop that updates model parameters using the error between a measurable output and its prediction. Simulations on electromechanical switching devices show effective tracking of the desired position trajectory using only coil current measurements.
comment: 6 pages, 4 figures. Final version, after peer review and acceptance, submitted to the 23rd IFAC World Congress
Flatness-based trajectory planning for 3D overhead cranes with friction compensation and collision avoidance
This paper presents an optimal trajectory generation method for 3D overhead cranes by leveraging differential flatness. This framework enables the direct inclusion of complex physical and dynamic constraints, such as nonlinear friction and collision avoidance for both payload and rope. Our approach allows for aggressive movements by constraining payload swing only at the final point. A comparative simulation study validates our approach, demonstrating that neglecting dry friction leads to actuator saturation and collisions. The results show that friction modeling is a fundamental requirement for fast and safe crane trajectories.
comment: 6 pages, 8 figures. Final version, after peer review and acceptance, submitted to the 23rd IFAC World Congress
On the (non-)resilience of encrypted controllers to covert attacks
The security of networked control systems (NCS) is receiving increasing attention from both cyber-security and system-theoretic perspectives. The former focuses on classical IT security goals such as confidentiality, integrity, and availability of process data, while the latter investigates tailored attacks (and detection schemes), including covert and zero-dynamics attacks. Confidentiality in control systems can, for instance, be achieved by securely outsourcing the evaluation of the controller to third-party platforms, such as cloud services. The underlying technology enabling such secure computation often is homomorphic encryption (HE). Recent works in encrypted control have proposed modifications to underlying HE schemes to achieve not only confidentiality but also resilience to certain types of integrity attacks. While extensions in this direction are desirable in principle, we show that the integrity problem in encrypted control cannot be solved by public-key HE schemes alone due to their inherent malleability. In other words, the same homomorphisms that enable encrypted control in the first place can be leveraged not only constructively but also destructively. More precisely, we demonstrate that NCS are vulnerable to covert attacks, even when encrypted control is employed. Remarkably, this remains possible without knowledge of an unencrypted model. Yet, resilience to such attacks can still be achieved through complementary techniques. We present an approach based on verifiable computation that integrates with modern homomorphic cryptosystems and is asymptotically secure while incurring no communication overhead.
comment: Extended version of a paper presented at the IFAC World Congress 2026
Optimal Delay Compensation in Networked Predictive Control
Networked Predictive Control is widely used to mitigate the effect of delays and dropouts in Networked Control Systems, particularly when these exceed the sampling time. A key design choice of these methods is the delay bound, which determines the prediction horizon and the robustness to information loss. This work develops a systematic method to select the optimal bound by quantifying the trade-off between prediction errors and open-loop operation caused by communication losses. Simulation studies demonstrate the performance gains achieved with the optimal bound.
comment: Final accepted manuscript for the 23rd IFAC World Congress, Busan, Republic of Korea, 2026. To appear in IFAC-PapersOnLine
A Lyapunov Characterization of Robust D-Stability with Application to Decentralized Integral Control of LTI Systems
The concept of matrix D-stability plays an important role in applications, ranging from economic and biological system models to decentralized control. Here we provide necessary and sufficient Lyapunov-type conditions for the robust (block) D-stability property. We leverage this characterization as part of a novel Lyapunov analysis of decentralized integral control for MIMO LTI systems, providing sufficient conditions guaranteeing stability under low-gain and under arbitrary connection and disconnection of individual control loops.
The fragile nature of road transportation networks
Major cities worldwide experience problems with the performance of their road transportation networks, and the continuous increase in traffic demand presents a substantial challenge to the optimal operation of urban road networks and the efficiency of traffic control strategies. The operation of transportation systems is widely considered to display fragile property, i.e., the loss in performance increases exponentially with the linearly growing magnitude of disruptions. Meanwhile, the risk engineering community is embracing the novel concept of antifragility, enabling systems to learn from past events and exhibit improved performance under disruptions of previously unseen magnitudes. In this study, based on established traffic flow theory knowledge, namely the Macroscopic Fundamental Diagram (MFD), we first conduct a rigorous mathematical analysis to theoretically prove the fragile nature of road transportation networks. Subsequently, we propose a skewness-based indicator that can be readily applied to cross-compare the degree of fragility for different networks solely dependent on the MFD-related parameters. Finally, we implement a numerical simulation calibrated with real-world network data to bridge the gap between the theoretical proof and the practical operations, with results showing the reinforcing effect of higher-order statistics and stochasticity on the fragility of the networks. This work aims to demonstrate the fragile nature of road transportation networks and guide researchers towards adopting the methods of antifragile design for future networks and traffic control strategies.
comment: 35 pages, 11 figures
Application of Deep Reinforcement Learning to Event-Triggered Control for Networked Artificial Pancreas Systems
This paper proposes a deep reinforcement learning (DRL)-based event-triggered controller design for networked artificial pancreas (AP) systems. Although existing DRL-based AP controllers typically assume periodic control updates, networked control systems (NCSs) require a reduction in communication frequency to achieve energy-efficient operation, which is directly tied to control updates. However, jointly learning both insulin dosing and update timing significantly increases the complexity of the learning problem. To alleviate this complexity, we develop a practical DRL-based controller design that avoids explicitly learning update timing by introducing a rule-based criterion defined by changes in blood glucose. As a result, decision-making occurs at irregular intervals, and the problem is naturally formulated as a semi-Markov decision process (SMDP), for which we extend a standard DRL algorithm. Numerical experiments demonstrate that the proposed method improves communication efficiency while maintaining control performance.
comment: 14 pages, 7 figures, submitted to a journal
Opponent State Inference Under Partial Observability: An HMM-POMDP Framework for 2026 Formula 1 Energy Strategy
The 2026 Formula 1 technical regulations introduce a fundamental change to energy strategy: under a 50/50 internal combustion engine / battery power split with unlimited regeneration and a driver-controlled Override Mode, the optimal energy deployment policy depends not only on a driver's own state but on the hidden state of rival cars. This creates a Partially Observable Stochastic Game that cannot be solved by single-agent optimisation methods. We present a tractable two-layer inference and decision framework. The first layer is a 40-state Hidden Markov Model (HMM) that infers a probability distribution over each rival's ERS charge level (four modes: H, M, L_harvest, L_derate), Override Mode status, and tyre degradation state from six publicly observable telemetry signals. The second layer is a Deep Q-Network (DQN) policy that takes the HMM belief state as input and selects between energy deployment strategies. We formally characterise the counter-harvest trap, a deceptive strategy in which a car deliberately suppresses observable deployment signals to induce a rival into a failed attack, and show that detecting it requires belief-state inference over both ERS level and the harvest/derate sub-mode. On synthetic races, the HMM achieves 96.8% ERS-level accuracy (random baseline 25%), classifies L_harvest vs. L_derate with 89.4% accuracy, and detects counter-harvest trap conditions with 96.3% recall. Pre-season analysis indicates circuit-dependent recharge availability (1.0x to 2.2x per lap) as the primary confound; Melbourne is the hardest-case validation environment. Baum-Welch calibration on 2026 race telemetry begins with the Australian Grand Prix (8 March 2026).
comment: 17 pages. v3: editorial corrections and bibliographic updates. Pre-registered theoretical framework; empirical calibration on 2026 race telemetry from Australian Grand Prix (8 March 2026) onwards
On Erlang mixture approximations for differential equations with distributed time delays
In this paper, we propose a general approach for approximate simulation and analysis of delay differential equations (DDEs) with distributed time delays based on methods for ordinary differential equations (ODEs). The key innovation is that we 1) propose an Erlang mixture approximation of the kernel in the DDEs and 2) use the linear chain trick to transform the resulting approximate DDEs to ODEs. Furthermore, we prove that the approximation converges for continuous and bounded kernels and for specific choices of the coefficients if the number of terms increases sufficiently fast. We show that the approximate ODEs can be used to assess the stability of the steady states of the original DDEs and that the solution to the ODEs converges if the kernel is also exponentially bounded. Additionally, we propose an approach based on bisection and least-squares estimation for determining optimal parameter values in the approximation. Finally, we present numerical examples that demonstrate the accuracy and convergence rate obtained with the optimal parameters and the efficacy of the proposed approach for bifurcation analysis and Monte Carlo simulation. The numerical examples involve a modified logistic equation, chemotherapy-induced myelosuppression, and a point reactor kinetics model of a molten salt nuclear fission reactor.
comment: The theoretical results have been generalized and the paper has been heavily revised in response to reviewers' comments
Learning Developmental Scaffoldings to Guide Self-Organisation
From subcellular structures to entire organisms, many natural systems generate complex organisation through self-organisation: local interactions that collectively give rise to global structure without any blueprint of the outcome. Yet a significant portion of the information driving such processes is not produced by self-organisation itself, instead, it is often offloaded to initial conditions of the system. Biological development is a prime example, where maternal pre-patterns encode positional and symmetry-breaking information that scaffolds the self-organising process. From maternal morphogen gradients in early embryogenesis to tissue-level morphogenetic pre-patterns guiding organ formation, this transfer of information to initial conditions, analogous to a memory-compute trade-off in computational systems, is a fundamental part of developmental processes. In this work, we study this offloading phenomenon by introducing a model that jointly learns both the self-organisation rules and the pre-patterns, allowing their interplay to be varied and measured under controlled conditions: a Neural Cellular Automaton (NCA) paired with a learned coordinate-based pattern generator (SIREN), both trained simultaneously to generate a set of patterns. We provide information-theoretic analyses of how information is distributed between pre-patterns and the self-organising process, and show that jointly learning both components yields improvements in robustness, encoding capacity, and symmetry breaking over purely self-organising alternatives. Our analysis further suggests that effective pre-patterns do not simply approximate their targets; rather, they bias the developmental dynamics in ways that facilitate convergence, pointing to a non-trivial relationship between the structure of initial conditions and the dynamics of self-organisation.
comment: 10 pages, 5 figures. Under review
Comparative Analysis of Data-Driven Predictive Control Strategies
This paper compares data-driven predictive control strategies by examining their theoretical foundations, assumptions, and applications. The three most widely recognized and consequential methods, Data Enabled Predictive Control, Willems-Koopman Predictive Control, Model-Free Adaptive Predictive Control are employed. Each of these strategies is systematically reviewed, and the primary theories supporting it are outlined. Following analysis, a discussion is provided regarding their fundamental assumptions, emphasizing their influence on control effectiveness. A numerical example is presented as a benchmark for comparison to enable a rigorous performance evaluation.
A Physics-Informed Scenario Approach with Data Mitigation for Safety Verification of Nonlinear Systems
This paper develops a physics-informed scenario approach for safety verification of nonlinear systems using barrier certificates (BCs) to ensure that system trajectories remain within safe regions over an infinite time horizon. Designing BCs often relies on an accurate dynamics model; however, such models are often imprecise due to the model complexity involved, particularly when dealing with highly nonlinear systems. In such cases, while scenario approaches effectively address the safety problem using collected data to construct a guaranteed BC for the unknown dynamical system, they often require solving an optimization problem with substantial amounts of data. To address this, we propose a physics-informed scenario approach that selects data samples such that the outputs of the physics-based model and the observed data are sufficiently close. This approach guides the scenario optimization process to eliminate redundant samples and potentially reduce the required dataset size. We validate our approach through three case studies, showcasing its practical application in reducing the required data.
EMFusion: An Uncertainty-Aware Conditional Diffusion Framework for Frequency-Selective EMF Forecasting in Wireless Networks
The rapid growth in wireless infrastructure has increased the need to accurately estimate and forecast electromagnetic field (EMF) levels to ensure ongoing compliance, assess potential health impacts, and support efficient network planning. While existing studies rely on univariate forecasting of wideband aggregate EMF data, frequency-selective multivariate forecasting is needed to capture the inter-operator and inter-frequency variations essential for proactive network planning. To this end, this paper introduces EMFusion, a conditional multivariate diffusion-based probabilistic forecasting framework that integrates diverse contextual factors, such as time of day, season, and holidays, while providing explicit uncertainty estimates. The proposed architecture features a residual U-Net backbone enhanced by a cross-attention mechanism that dynamically integrates external conditions to guide the generation process. Furthermore, EMFusion integrates an imputation-based sampling strategy that treats forecasting as a structural inpainting task, ensuring temporal coherence even with irregular measurements. Unlike standard point forecasters, EMFusion generates empirical probabilistic prediction intervals from the learned conditional distribution, providing uncertainty-aware probabilistic forecasting rather than simple point estimation. Numerical experiments conducted on frequency-selective EMF datasets demonstrate that EMFusion with the contextual information of working hours outperforms the baseline models with or without conditions. EMFusion outperforms the best baseline by 23.85% in continuous ranked probability score (CRPS), 13.93% in normalized root mean square error, and reduces prediction CRPS error by 22.47%.
comment: Submission for possible publication
Neural Policy Composition from Free Energy Minimization
The ability to flexibly compose previously acquired skills to execute intelligent behaviors is a hallmark of natural intelligence. Such compositional flexibility is often attributed to context-dependent gating mechanisms that determine how multiple policies or behavioral primitives are combined. Yet, despite remarkable efforts, the normative objective from which such gating rules should arise, and the neural computations capable of implementing them, remain unclear. Existing approaches typically rely on prespecified design choices for the gating rules, and remain tied to specific architectures, learning paradigms, or datasets. Here, we introduce a normative framework in which policy composition emerges from the minimization of a variational free energy, providing a principled and broadly applicable objective for gating. Based on this framework, we derive a continuous-time gradient flow whose trajectories are guaranteed to converge, with explicit rate, to the optimal composition of primitives. We further show that this dynamics admits a mechanistic neural implementation as a soft-competitive recurrent circuit with context-sensitive local interactions. We evaluate the model on emerging flocking behaviors in multi-agent systems, human decision-making in bandit tasks, and control benchmarks in layered architectures. Across these settings, the model provides interpretable mechanistic accounts of policy composition, reproduces key behavioral signatures, yields insights into data, and matches or outperforms established models.
Robotics
Articraft: An Agentic System for Scalable Articulated 3D Asset Generation
A bottleneck in learning to understand articulated 3D objects is the lack of large and diverse datasets. In this paper, we propose to leverage large language models (LLMs) to close this gap and generate articulated assets at scale. We reduce the problem of generating an articulated 3D asset to that of writing a program that builds it. We then introduce a new agentic system, Articraft, that writes such programs automatically. We design a programmatic interface and harness to help the LLM do so effectively. The LLM writes code against a domain-specific SDK for defining parts, composing geometry, specifying joints, and writing tests to validate the resulting assets. The harness exposes a restricted workspace and interface to the LLM, validates the resulting assets, and returns structured feedback. In this way, the LLM is not distracted by details such as authoring a URDF file or managing a complex software environment. We show that this produces higher-quality assets than both state-of-the-art articulated-asset generators and general-purpose coding agents. Using Articraft, we build Articraft-10K, a curated dataset of over 10K articulated assets spanning 245 categories, and show its utility both for training models of articulated assets and in downstream applications such as robotics simulation and virtual reality.
comment: Project page: https://articraft3d.github.io/
Hand-in-the-Loop: Improving Dexterous VLA via Seamless Interventional Correction
Vision-Language-Action (VLA) models are prone to compounding errors in dexterous manipulation, where high-dimensional action spaces and contact-rich dynamics amplify small policy deviations over long horizons. While Interactive Imitation Learning (IIL) can refine policies through human takeover data, applying it to high-degree-of-freedom (DoF) robotic hands remains challenging due to a command mismatch between human teleoperation and policy execution at the takeover moment, which causes abrupt robot-hand configuration changes, or "gesture jumps". We present Hand-in-the-Loop (HandITL), a seamless human-in-the-loop intervention method that blends human corrective intent with autonomous policy execution to avoid gesture jumps during bimanual dexterous manipulation. Compared with direct teleoperation takeover, HandITL reduces takeover jitter by 99.8% and preserves robust post-takeover manipulation, reducing grasp failures by 87.5% and mean completion time by 19.1%. We validate HandITL on tasks requiring bimanual coordination, tool use, and fine-grained long-horizon manipulation. When used to collect intervention data for policy refinement, HandITL yields policies that outperform those trained with standard teleoperation data by 19% on average across three long-horizon dexterous tasks.
Pelican-Unified 1.0: A Unified Embodied Intelligence Model for Understanding, Reasoning, Imagination and Action
We present Pelican-Unified 1.0, the first embodied foundation model trained according to the principle of unification. Pelican-Unified 1.0 uses a single VLM as a unified understanding module, mapping scenes, instructions, visual contexts, and action histories into a shared semantic space. The same VLM also serves as a unified reasoning module, autoregressively producing task-, action-, and future-oriented chains of thought in a single forward pass and projecting the final hidden state into a dense latent variable. A Unified Future Generator (UFG) then conditions on this latent variable and jointly generates future videos and future actions through two modality-specific output heads within the same denoising process. The language, video, and action losses are all backpropagated into the shared representation, enabling the model to jointly optimize understanding, reasoning, imagination, and action during training, rather than training three isolated expert systems. Experiments demonstrate that unification does not imply compromise. With a single checkpoint, Pelican-Unified 1.0 achieves strong performance across all three capabilities: 64.7 on eight VLM benchmarks, the best among comparable-scale models; 66.03 on WorldArena, ranking first; and 93.5 on RoboTwin, the second-best average among compared action methods. These results show that the unified paradigm succeeds in preserving specialist strength while bringing understanding, reasoning, imagination, and action into one model.
CoCo-InEKF: State Estimation with Learned Contact Covariances in Dynamic, Contact-Rich Scenarios
Robust state estimation for highly dynamic motion of legged robots remains challenging, especially in dynamic, contact-rich scenarios. Traditional approaches often rely on binary contact states that fail to capture the nuances of partial contact or directional slippage. This paper presents CoCo-InEKF, a differentiable invariant extended Kalman filter that utilizes continuous contact velocity covariances instead of binary contact states. These learned covariances allow the method to dynamically modulate contact confidence, accounting for more nuanced conditions ranging from firm contact to directional slippage or no contact. To predict these covariances for a set of predefined contact candidate points, we employ a lightweight neural network trained end-to-end using a state-error loss. This approach eliminates the need for heuristic ground-truth contact labels. In addition, we propose an automated contact candidate selection procedure and demonstrate that our method is insensitive to their exact placement. Experiments on a bipedal robot demonstrate a superior accuracy-efficiency tradeoff for linear velocity estimation, as well as improved filter consistency compared to baseline methods. This enables the robust execution of challenging motions, including dancing and complex ground interactions -- both in simulation and in the real world.
comment: RSS 2026
CLOVER: Closed-Loop Value Estimation \& Ranking for End-to-End Autonomous Driving Planning
End-to-end autonomous driving planners are commonly trained by imitating a single logged trajectory, yet evaluated by rule-based planning metrics that measure safety, feasibility, progress, and comfort. This creates a training--evaluation mismatch: trajectories close to the logged path may violate planning rules, while alternatives farther from the demonstration can remain valid and high-scoring. The mismatch is especially limiting for proposal-selection planners, whose performance depends on candidate-set coverage and scorer ranking quality. We propose CLOVER, a Closed-LOop Value Estimation and Ranking framework for end-to-end autonomous driving planning. CLOVER follows a lightweight generator--scorer formulation: a generator produces diverse candidate trajectories, and a scorer predicts planning-metric sub-scores to rank them at inference time. To expand proposal support beyond single-trajectory imitation, CLOVER constructs evaluator-filtered pseudo-expert trajectories and trains the generator with set-level coverage supervision. It then performs conservative closed-loop self-distillation: the scorer is fitted to true evaluator sub-scores on generated proposals, while the generator is refined toward teacher-selected top-$k$ and vector-Pareto targets with stability regularization. We analyze when an imperfect scorer can improve the generator, showing that scorer-mediated refinement is reliable when scorer-selected targets are enriched under the true evaluator and updates remain conservative. On NAVSIM, CLOVER achieves 94.5 PDMS and 90.4 EPDMS, establishing a new state of the art. On the more challenging NavHard split, it obtains 48.3 EPDMS, matching the strongest reported result. On supplementary nuScenes open-loop evaluation, CLOVER achieves the lowest L2 error and collision rate among compared methods. Code data will be released at https://github.com/WilliamXuanYu/CLOVER.
SOCC-ICP: Semantics-Assisted Odometry based on Occupancy Grids and ICP
Reliable pose estimation in previously unseen environments is a fundamental capability of autonomous systems. Existing LiDAR odometry methods typically employ point-, surfel-, or NDT-based map representations, which are distinct from the semantic occupancy grids commonly used for downstream tasks such as motion planning. We introduce SOCC-ICP, a semantics-assisted odometry framework that jointly performs Semantic OCCupancy grid mapping and LiDAR scan alignment. Each map voxel encodes geometric and semantic statistics, enabling adaptive point-to-point or point-to-plane ICP based on local planarity. Further, the occupancy grid naturally filters dynamic objects through raycasting-based free-space updates. Across diverse evaluation scenarios, SOCC-ICP achieves performance competitive with state-of-the-art LiDAR odometry and remains robust in geometrically degenerate environments, even in the absence of semantic cues. When semantic labels are available, integrating them into map construction, downsampling, and correspondence weighting yields further accuracy gains. By unifying odometry and semantic occupancy grid mapping within a single representation, SOCC-ICP eliminates redundant map structures and directly provides a map suitable for downstream robotic applications.
comment: 9 pages, 3 figures, Accepted May 2026 for publication in IEEE Robotics and Automation Letters (RA-L)
A Prototyping Framework for Distributed Control of Multi-Robot Systems
This paper presents a prototyping framework for distributed control of multi-robot systems, aimed at bridging theory and practical testing of distributed optimization algorithms. Using the Single Program, Multiple Data (SPMD) paradigm, the framework emulates distributed control on a single computer, with each core running the same algorithm using local states and neighbour-to-neighbour communication. We demonstrate the framework on a four-quadrotor position-swapping task using a non-cooperative game-theoretic distributed algorithm. Computational time and trajectory data are compared across the supported dynamics levels: a point-mass model, a high-fidelity quadrotor model, and an experimental hardware testbed using Crazyflie quadcopters. The results show that the framework provides a low-cost and accessible approach for validating distributed algorithms.
comment: Accepted at IFAC World Congress 2026
Evo-Depth: A Lightweight Depth-Enhanced Vision-Language-Action Model
Vision-Language-Action models have emerged as a promising paradigm for robotic manipulation by unifying perception, language grounding, and action generation. However, they often struggle in scenarios requiring precise spatial understanding, as current VLA models primarily rely on 2D visual representations that lack depth information and detailed spatial relationships. While recent approaches incorporate explicit 3D inputs such as depth maps or point clouds to address this issue, they often increase system complexity, require additional sensors, and remain vulnerable to sensing noise and reconstruction errors. Another line of work explores implicit 3D-aware spatial modeling directly from RGB observations without extra sensors, but it often relies on large geometry foundation models, resulting in higher training and deployment costs. To address these challenges, we propose Evo-Depth, a lightweight depth-enhanced VLA framework that enhances spatially grounded manipulation without relying on additional sensing hardware or compromising deployment efficiency. Evo-Depth employs a lightweight Implicit Depth Encoding Module to extract compact depth features from multi-view RGB images. These features are incorporated into vision-language representations through a Spatial Enhancement Module via depth-aware modulation, enabling efficient spatial-semantic enhancement. A Progressive Alignment Training strategy is further introduced to align the resulting depth-enhanced representations with downstream action learning. With only 0.9B parameters, Evo-Depth achieves superior performance across four simulation benchmarks. In real-world experiments, Evo-Depth attains the highest average success rate while also exhibiting the smallest model size, lowest GPU memory usage, and highest inference frequency among compared methods.
Behavioral Data-Driven Optimal Trajectory Generation for Rotary Cranes
With the growth of the construction industry and the global shortage of skilled labor, the automation of crane control has become increasingly important for safe and efficient operations. A central challenge in automatic crane control is the reduction of load oscillations during motion, which is primarily addressed through appropriate slewing trajectories. In this context, classical model-based control methods rely on accurate dynamical models and expert tuning, and often struggle to meet safety and precision requirements, while many learning-based approaches require large data sets and significant computational resources. This paper proposes a behavioral data-driven framework for generating open-loop slewing trajectories for rotary cranes that suppress load sway while reducing operation time and energy consumption. The approach builds on Willems' fundamental lemma and its generalizations, to bypass explicit system modeling and operate directly on measured input-output data. A practical workflow is presented in this paper to reduce the need for expert knowledge. Despite the underactuated nature of the crane dynamics, the method identifies a nonparametric representation of the system behavior and generates smooth, optimal trajectories using limited data and convex optimization. The proposed trajectory generation method is validated on a laboratory crane setup and compared against an established model-based approach, achieving up to 35% reduction in load sway, 43% reduction in tracking error, and 50% reduction in travel time.
Slot-MPC: Goal-Conditioned Model Predictive Control with Object-Centric Representations
Predictive world models enable agents to model scene dynamics and reason about the consequences of their actions. Inspired by human perception, object-centric world models capture scene dynamics using object-level representations, which can be used for downstream applications such as action planning. However, most object-centric world models and reinforcement learning (RL) approaches learn reactive policies that are fixed at inference time, limiting generalization to novel situations. We propose Slot-MPC, an object-centric world modeling framework that enables planning through Model Predictive Control (MPC). Slot-MPC leverages vision encoders to learn slot-based representations, which encode individual objects in the scene, and uses these structured representations to learn an action-conditioned object-centric dynamics model. At inference time, the learned dynamics model enables action planning via MPC, allowing agents to adapt to previously unseen situations. Since the learned world model is differentiable, we can use gradient-based MPC to directly optimize actions, which is computationally more efficient than relying on gradient-free, sampling-based MPC methods. Experiments on simulated robotic manipulation tasks show that Slot-MPC improves both task performance and planning efficiency compared to non-object-centric world model baselines. In the considered offline setting with limited state-action coverage, we find that gradient-based MPC performs better than gradient-free, sampling-based MPC. Our results demonstrate that explicitly structured, object-centric representations provide a strong inductive bias for controllable and generalizable decision-making. Code and additional results are available at https://slot-mpc.github.io.
FU-MPC: Frontier- and Uncertainty-Aware Model Predictive Control for Efficient and Accurate UAV Exploration with Motorized LiDAR
Efficient UAV exploration in unknown environments requires rapid coverage expansion while maintaining accurate and reliable localization, since safe navigation in complex scenes depends on consistent mapping and pose estimation. However, for conventional LiDAR-equipped UAVs, the observable region is tightly coupled with the UAV pose and motion. Expanding coverage often requires additional translational or rotational maneuvers, which can reduce exploration efficiency and increase the risk of localization degradation in geometrically challenging environments. Motorized rotating LiDARs provide a promising solution by actively adjusting the sensor viewing direction without changing the UAV motion, thereby introducing an additional sensing degree of freedom. Nevertheless, existing exploration systems rarely exploit this scanning freedom as an explicit decision variable linked to both exploration progress and localization quality. To address this gap, we develop a UAV platform equipped with an independently actuated rotating LiDAR and propose a hierarchical exploration framework. The global planner organizes frontiers into representative viewpoints and sequences them using topology-aware transition costs. Built upon this planner, FU-MPC serves as a local receding-horizon scan controller that optimizes LiDAR rotation along the predicted flight trajectory. The controller jointly considers frontier-aware exploration utility and direction-dependent localization uncertainty, while lightweight surrogate evaluation enables real-time onboard execution. Experiments in complex environments demonstrate that the proposed system improves exploration efficiency while maintaining robust localization performance compared with fixed-pattern scanning and uncertainty-only baselines. The project page can be found at https://kafeiyin00.github.io/FU-MPC/.
Chrono-Gymnasium: An Open-Source, Gymnasium-Compatible Distributed Simulation Framework
High-fidelity physics simulation is essential for closing the sim-to-real gap in robotics and complex mechanical systems. However, the computational overhead of high-fidelity engines often limits their use in data-intensive tasks like Reinforcement Learning (RL) and global optimization. We introduce Chrono-Gymnasium, a distributed computing framework that scales the high-fidelity multi-body dynamics of Project Chrono across large-scale computing clusters. Built upon the Ray framework, Chrono-Gymnasium provides a standardized Gymnasium interface, enabling seamless integration with modern machine learning libraries while providing built-in synchronization and messaging primitives for distributed execution. We demonstrate the framework's capabilities through two distinct case studies: (1) the training of an RL agent for autonomous robotic navigation in complex terrains, and (2) the Bayesian Optimization of a planetary lander's design parameters to ensure landing stability. Our results show that Chrono-Gymnasium reduces wall-clock time for high-fidelity simulations without sacrificing physical accuracy, offering a scalable path for the design and control of complex robotic systems.
Learning Direct Control Policies with Flow Matching for Autonomous Driving SC 2026
We present a flow-matching planner for autonomous driving that directly outputs actionable control trajectories defined by acceleration and curvature profiles. The model is conditioned on a bird's-eye-view (BEV) raster of the surrounding scene and generates control sequences in a small number of Ordinary Differential Equations (ODE) integration steps, enabling low-latency inference suitable for real-time closed-loop re-planning. We train exclusively on urban scenarios (real urban city streets, intersections and roundabouts of the city of Parma, Italy) collected from a 2D traffic simulator with reactive agents, and evaluate in closed-loop on both in-distribution and markedly out-of-distribution environments, including multi-lane highways and unseen urban scenarios. Our results show that the model generalizes reliably to these unseen conditions, maintaining stable closed-loop control and successfully completing scenarios that differ substantially from the training distribution. We attribute this to the BEV representation, which provides a geometry-centric view of the scene that is inherently less sensitive to distributional shifts, and to the flow-matching formulation, which learns a smooth vector field that degrades gracefully under distribution shift. We provide video demonstrations of closed-loop behavior at https://marcelloceresini.github.io/DirectControlFlowMatching.
comment: 16 pages, 6 figures, 2 tables. Accepted at IEEE ITSC 2026
CaMeRL: Collision-Aware and Memory-Enhanced Reinforcement Learning for UAV Navigation in Multi-Scale Obstacle Environments
In obstacle avoidance navigation of unmanned aerial vehicles (UAVs), variations in obstacle scale have received strangely less attention than obstacle number or density. Existing methods typically extract purely geometric features from single-frame depth observations. Such representations tend to neglect small obstacles and lose spatial context under occlusions caused by large obstacles, leading to noticeable degradation in environments with multi-scale obstacles. To address this issue, we propose CaMeRL, a Collision-aware and Memory-enhanced Reinforcement Learning framework for UAV navigation. The collision-aware latent representation encodes risk-sensitive depth cues to preserve fine-grained obstacle structures, thereby improving sensitivity to small obstacles. The temporal memory module integrates observations across frames, mitigating partial observability caused by large-obstacle occlusions. We evaluate CaMeRL with multi-scale obstacles, including ultra-small and extra-large obstacle settings. Results show that CaMeRL outperforms state-of-the-art baselines across all scales, with success rate gains of 0.48 and 0.28 in the ultra-small and extra-large settings, respectively. More importantly, CaMeRL achieves reliable navigation in cluttered outdoor environments.
comment: 8 pages, 7 figures. Submitted to IEEE Robotics and Automation Letters
Learning Cross-Coupled and Regime Dependent Dynamics for Aerial Manipulation
Accurate dynamics models are critical for aerial manipulators operating under complex tasks such as payload transport. However, modeling these systems remains fundamentally challenging due to strong quadrotor-manipulator coupling, delayed aerodynamic interactions, and regime-dependent dynamics variations arising from payload changes and manipulator reconfiguration. These effects produce residual dynamics that are simultaneously cross-coupled, history-dependent, and nonstationary, causing both analytical models and purely offline learned models to degrade during deployment. To address these challenges, we propose a structured encoder-decoder framework for adaptive residual dynamics learning in aerial manipulators. The proposed nonlinear latent encoder captures cross-variable coupling and temporal dependencies from state-input histories, while a lightweight linear latent decoder enables online adaptation under regime-dependent nonstationary dynamics. The linear-in-parameter decoder structure permits closed-form Bayesian adaptation together with consistency-driven covariance inflation, enabling rapid and stable adaptation to both transient and slowly varying dynamics changes while remaining compatible with real-time model predictive control (MPC). Experimental results on a real aerial manipulation platform demonstrate improved residual prediction accuracy, faster adaptation under changing operating conditions, and enhanced MPC-based trajectory tracking performance. These results highlight the importance of jointly modeling coupled temporal dynamics and deployment-time nonstationarity for reliable aerial manipulation.
Exploring Bottlenecks in VLM-LLM Navigation: How 3D Scene Understanding Capability Impacts Zero-Shot VLN ICRA
Zero-shot vision-and-language navigation (VLN) has gained significant attention due to its minimal data collection costs and inherent generalization. This paradigm is typically driven by the integration of pre-trained Vision-Language Models (VLMs) and Large Language Models (LLMs), where VLMs construct 3D scene graphs while LLMs handle high-level reasoning and decision-making. However, a critical bottleneck exists in this system: current 3D perception models prioritize pixel-level accuracy, directly conflicting with the strict computational limits and real-time efficiency demanded by embodied navigation. To address this gap, this paper quantifies the actual impact of 3D scene understanding capability on VLN performance. Based on typical VLM-LLM frameworks, we propose statistical success rate (SR) upper bounds for two core subsystems: 1) the slow LLM planner, which relies on topological mapping semantics, and 2) the fast reactive navigator, which utilizes spatial coordinates and bounding boxes to execute LLM decisions. Evaluations using state-of-the-art 3D scene understanding models validate our proposed bounds and reveal a perception saturation phenomenon, indicating that improvements in perception accuracy beyond a certain threshold yield diminishing returns in navigation success. Our findings suggest that 3D scene understanding for VLN should pivot away from strict pixel-level precision, prioritizing instead navigation-relevant core vocabularies and accurate bounding box proportions.
comment: Accepted by ICRA Workshop MM-Spatial AI, Oral
EARL: Towards a Unified Analysis-Guided Reinforcement Learning Framework for Egocentric Interaction Reasoning and Pixel Grounding ICML 2026
Understanding human--environment interactions from egocentric vision is essential for assistive robotics and embodied intelligent agents, yet existing multimodal large language models (MLLMs) still struggle with accurate interaction reasoning and fine-grained pixel grounding. To this end, this paper introduces EARL, an Egocentric Analysis-guided Reinforcement Learning framework that explicitly transfers coarse interaction semantics to query-oriented answering and grounding. Specifically, EARL adopts a two-stage parsing framework including coarse-grained interpretation and fine-grained response. The first stage holistically interprets egocentric interactions and generates a structured textual description. The second stage produces the textual answer and pixel-level mask in response to the user query. To bridge the two stages, we extract a global interaction descriptor as a semantic prior, which is integrated via a novel Analysis-guided Feature Synthesizer (AFS) for query-oriented reasoning. To optimize heterogeneous outputs, including textual answers, bounding boxes, and grounding masks, we design a multi-faceted reward function and train the response stage with GRPO. Experiments on Ego-IRGBench show that EARL achieves 65.48% cIoU for pixel grounding, outperforming previous RL-based methods by 8.37%, while OOD grounding results on EgoHOS indicate strong transferability to unseen egocentric grounding scenarios.
comment: Accepted at ICML 2026. Project page: https://github.com/yuggiehk/EARL
IntentVLA: Short-Horizon Intent Modeling for Aliased Robot Manipulation
Robot imitation data are often multimodal: similar visual-language observations may be followed by different action chunks because human demonstrators act with different short-horizon intents, task phases, or recent context. Existing frame-conditioned VLA policies infer each chunk from the current observation and instruction alone, so under partial observability they may resample different intents across adjacent replanning steps, leading to inter-chunk conflict and unstable execution. We introduce IntentVLA, a history-conditioned VLA framework that encodes recent visual observations into a compact short-horizon intent representation and uses it to condition chunk generation. We further introduce AliasBench, a 12-task ambiguity-aware benchmark on RoboTwin2 with matched training data and evaluation environments that isolate short-horizon observation aliasing. Across AliasBench, SimplerEnv, LIBERO, and RoboCasa, IntentVLA improves rollout stability and outperforms strong VLA baselines
comment: Code can be found in https://github.com/ZGC-EmbodyAI/IntentVLA
SceneFunRI: Reasoning the Invisible for Task-Driven Functional Object Localization
In real-world scenes, target objects may reside in regions that are not visible. While humans can often infer the locations of occluded objects from context and commonsense knowledge, this capability remains a major challenge for vision-language models (VLMs). To address this gap, we introduce SceneFunRI, a benchmark for Reasoning the Invisible. Based on the SceneFun3D dataset, SceneFunRI formulates the task as a 2D spatial reasoning problem via a semi-automatic pipeline and comprises 855 instances. It requires models to infer the locations of invisible functional objects from task instructions and commonsense reasoning. The strongest baseline model (Gemini 3 Flash) only achieves an CAcc@75 of 15.20, an mIoU of 0.74, and a Dist of 28.65. We group our prompting analysis into three categories: Strong Instruction Prompting, Reasoning-based Prompting, and Spatial Process of Elimination (SPoE). These findings indicate that invisible-region reasoning remains an unstable capability in current VLMs, motivating future work on models that more tightly integrate task intent, commonsense priors, spatial grounding, and uncertainty-aware search.
SR-Platform: An Agentic Pipeline for Natural Language-Driven Robot Simulation Environment Synthesis
Generating robot simulation environments remains a major bottleneck in simulation-based robot learning. Constructing a training-ready MuJoCo scene typically requires expertise in 3D asset modeling, MJCF specification, spatial layout, collision avoidance, and robot-model integration. We present SR-Platform, a production-deployed agentic system that converts free-form natural language descriptions into executable, physically valid MuJoCo environments. SR-Platform decomposes scene synthesis into four stages: an LLM-based orchestrator that converts user intent into a structured scene plan; an asset forge that retrieves cached assets or generates new 3D geometry through LLM-to-CadQuery synthesis; a layout architect that assigns object poses and verifies industrial constraints; and a bridge layer that assembles the final MJCF scene and merges the selected robot model. The system is deployed as a nine-service Docker stack with WebSocket progress streaming, MinIO-backed mesh storage, Qdrant-based semantic asset retrieval, Redis job state, and InfluxDB telemetry. Using 30 days of production telemetry covering 611 successful LLM calls, SR-Platform generates five-object scenes with a median end-to-end latency of approximately 50 s, while cache-accelerated scenes complete in approximately 30-40 s. The asset forge shows an 11.3% first-attempt retry rate with automatic recovery, and cached asset retrieval removes per-object LLM calls for previously generated object types. These results show that agentic scene synthesis can reduce the manual effort required to create diverse robot training environments, enabling users to produce executable MuJoCo scenes from plain English prompts in under one minute.
SeaVis: Modeling and Control of a Remotely Operated Towed Vehicle for Seabed Visualization and Mapping
High-resolution seafloor mapping necessitates stable and precise positioning for underwater robots. This paper introduces a novel mathematical model for SeaVis remotely operated towed vehicles (ROTVs) and develops a gain-scheduled linear-quadratic regulator (LQR) for robust depth and attitude control. We validate the approach in a high-fidelity simulation, benchmarking the LQR against a conventional PID controller over a challenging seabed profile. The presented results demonstrate the LQR's superior performance, with significantly enhanced robustness to disturbances, greater control efficiency, and substantially reduced flap actuation. The gain scheduling also confirms the controller's effectiveness across the full operational velocity range. The complete simulation environment and controller are open-sourced.
comment: Accepted at IEEE/ASME AIM 2026
DSSP: Diffusion State Space Policy with Full-History Encoding
Diffusion-based imitation learning has shown strong promise for robot manipulation. However, most existing policies condition only on the current observation or a short window of recent observations, limiting their ability to resolve history-dependent ambiguities in long-horizon tasks. To address this, we introduce DSSP, a history-conditioned Diffusion State Space Policy that enables efficient, full-history conditioning for robot manipulation. Leveraging the continuous sequence modeling properties of State Space Models (SSMs), our history encoder effectively compresses the entire observation stream into a compact context representation. To ensure this context preserves critical information regarding future state evolution, the encoder is optimized with a dynamics-aware auxiliary training objective. This high-level context representation is then seamlessly fused with recent state observations to form a hierarchical conditioning mechanism for action generation. Furthermore, to maintain architectural consistency and minimize GPU memory overhead, we also instantiate the diffusion backbone itself using an SSM. Extensive experiments across simulation benchmarks and real-world manipulation tasks show that DSSP achieves state-of-the-art performance with a significantly smaller model size, demonstrating superior efficiency of the hierarchical conditioning in capturing crucial information as the history length increases.
Let Robots Feel Your Touch: Visuo-Tactile Cortical Alignment for Embodied Mirror Resonance
Observing touch on another's body can elicit corresponding tactile sensations in the observer, a phenomenon termed mirror touch that supports empathy and social perception. This visuo-tactile resonance is thought to rely on structural correspondence between visual and somatosensory cortices, yet robotic systems lack computational frameworks that instantiate this principle. Here we demonstrate that cortical correspondence can be operationalized to endow robots with mirror touch. We introduce Mirror Touch Net, which imposes semantic, distributional and geometric alignment between visual and tactile representations through multi-level constraints, enabling prediction of millimetre-scale tactile signals across 1,140 taxels on a robotic hand from RGB images. Manifold analysis reveals that these constraints reshape visual representations into geometry consistent with the tactile manifold, reducing the complexity of cross-modal mapping. Extending this alignment framework to cross-domain observations of human hands enables tactile prediction and reflexive responses to observed human touch. Our results link a neural principle of visuo-tactile resonance to robotic perception, providing an explainable route towards anticipatory touch and empathic human-robot interaction. Code is available at https://github.com/fun0515/Mirror-Touch-Net.
DiffPhD: A Unified Differentiable Solver for Projective Heterogeneous Materials in Elastodynamics with Contact-Rich GPU-Acceleration
Differentiable simulation of soft bodies is a foundation for system identification, trajectory optimization, and Real2Sim transfer. Yet, existing methods such as the differentiable Projective Dynamics (DiffPD) struggle when faced with heterogeneous materials with extreme stiffness contrasts, hyperelasticity under large deformations, and contact-rich interactions, which are common scenarios in the real world. We present DiffPhD, a unified GPU-accelerated differentiable Projective Dynamics framework for heterogeneous materials that tackles these intertwined challenges simultaneously. Our key insight is a careful integration of: (i) stiffness-aware projective weights to embed heterogeneity into the global system; (ii) trust-region eigenvalue filtering lifted to the backward pass for stable hyperelastic gradients and a type-II Anderson Acceleration scheme with dual-gate convergence to stabilize forward iteration under large stiffness contrasts; and (iii) a unified GPU pipeline that reuses a single sparse factor across forward, backward, and contact computations, with stiffness-amplified Rayleigh damping folded into the same factor for heterogeneity-aware dissipation at zero recurring cost. DiffPhD achieves strict gradient accuracy while delivering up to an order-of-magnitude speedup over prior differentiable solvers on heterogeneous, hyperelastic, contact-rich benchmarks. Crucially, this speedup does not come at the cost of stability: DiffPhD remains convergent on stiffness contrasts up to 100x where prior PD solvers degrade. This unlocks end-to-end gradient-based optimization on regimes previously bottlenecked by either solver fragility or per-iteration cost -- shell--joint composite creatures, soft characters wielding stiff weapons, and soft-gripper robotic manipulation -- all handled within a single forward--backward pass.
Before the Body Moves: Learning Anticipatory Joint Intent for Language-Conditioned Humanoid Control
Natural language is an intuitive interface for humanoid robots, yet streaming whole-body control requires control representations that are executable now and anticipatory of future physical transitions. Existing language-conditioned humanoid systems typically generate kinematic references that a low-level tracker must repair reactively, or use latent/action policies whose outputs do not explicitly encode upcoming contact changes, support transfers, and balance preparation. We propose \textbf{DAJI} (\emph{Dynamics-Aligned Joint Intent}), a hierarchical framework that learns an anticipatory joint-intent interface between language generation and closed-loop control. DAJI-Act distills a future-aware teacher into a deployable diffusion action policy through student-driven rollouts, while DAJI-Flow autoregressively generates future intent chunks from language and intent history. Experiments show that DAJI achieves strong results in anticipatory latent learning, single-instruction generation, and streaming instruction following, reaching 94.42\% rollout success on HumanML3D-style generation and 0.152 subsequence FID on BABEL.
Energy-Efficient Quadruped Locomotion with Compliant Feet
Quadruped robots are often designed with rigid feet to simplify control and maintain stable contact during locomotion. While this approach is straightforward, it limits the ability of the legs to absorb impact forces and reuse stored elastic energy, leading to higher energy expenditure during locomotion. To explore whether compliant feet can provide an advantage, we integrate foot compliance into a reinforcement learning (RL) locomotion controller and study its effect on walking efficiency. In simulation, we train eight policies corresponding to eight different spring stiffness values and then cross-evaluate their performance by measuring mechanical energy consumed per meter traveled. In experiments done on a developed quadruped, the energy consumption for the intermediate stiffness spring is lower by ~ 17% when compared to a very stiff or a very flexible spring incorporated in the feet, with similar trends appearing in the simulation results. These results indicate that selecting an appropriate foot compliance can improve locomotion efficiency without destabilizing the robot during motion.
comment: 29 pages, 7 figures, supplemental videos link is mentioned in the paper
Systematic Discovery of Semantic Attacks in Online Map Construction through Conditional Diffusion
Autonomous vehicles depend on online HD map construction to perceive lane boundaries, dividers, and pedestrian crossings -- safety-critical road elements that directly govern motion planning. While existing pixel perturbation attacks can disrupt the mapping, they can be neutralized by standard adversarial defenses. We present MIRAGE, a framework for systematic discovery of semantic attacks that bypass adversarial defenses and degrade mapping predictions by finding plausible environmental variation (e.g. shadows, wet roads). MIRAGE exploits the latent manifold of real-world data learned by diffusion models, and searches for semantically mutated scenes neighboring the ground truth with the same road topology yet mislead the mapping predictions. We evaluate MIRAGE on nuScenes and demonstrate two attacks: (1) boundary removal, suppressing 57.7% of detections and corrupting 96% of planned trajectories; and (2) boundary injection, the only method that successfully injects fictitious boundaries, while pixel PGD and AdvPatch fail entirely. Both attacks remain potent under various adversarial defenses. We use two independent VLM judges to quantify realism, where MIRAGE passes as realistic 80--84% of the time (vs. 97--99% for clean nuScenes), while AdvPatch only 0--9%. Our findings expose a categorical gap in current adversarial defenses: semantic-level perturbations that manifest as legitimate environmental variation are substantially harder to mitigate than pixel-level perturbations.
Distill: Uncovering the True Intent behind Human-Robot Communication
As robots become increasingly integrated into everyday environments, intuitive communication paradigms such as natural language and end-user programming have become indispensable for specifying autonomous robot behavior. However, these mechanisms are ineffective at fully capturing user intent: natural language is imprecise and ambiguous, whereas end-user programming can be overly specific. As a result, understanding what users truly mean when they interact with robots remains a central challenge for human-AI communication systems. To address this issue, we propose the Distill approach for human-robot communication interfaces. Given a task specification provided by the user, Distill (1) removes unnecessary steps; (2) generalizes the meaning behind individual steps; and (3) relaxes ordering constraints between steps. We implemented Distill on a web interface and, through a crowdsourcing study, demonstrated its ability to elicit and refine user intent from initial task specifications.
comment: 17 pages
Reactive Planning based Control for Mobile Robots in Obstacle-Cluttered Environments
This paper addresses the motion control problem for mobile robots in obstacle-cluttered environments. The mobile robot has partial environment information only, and aims to move from an initial position to a target position without collisions. For this purpose, a reactive planning based control strategy (RPCS) is proposed. First, the initial and target positions are connected as a reference trajectory. Then, a reactive planning strategy (RPS) is developed to ensure the collision avoidance by modifying the reference trajectory locally based on the partial environment information. Next, an adaptive tracking control strategy (ATCS) is proposed to track the reference trajectory with potentially local modifications via the discretization techniques. Finally, the RPS and ATCS are combined to establish the RPCS, whose efficacy and advantages are illustrated by numerical examples.
comment: 7 pages, 7 figures
MindVLA-U1: VLA Beats VA with Unified Streaming Architecture for Autonomous Driving
Autonomous driving has progressed from modular pipelines toward end-to-end unification, and Vision-Language-Action (VLA) models are a natural extension of this journey beyond Vision-to-Action (VA). In practice, driving VLAs have often trailed VA on planning quality, suggesting that the difficulty is not simply model scale but the interface through which semantic reasoning, temporal context, and continuous control are combined. We argue that this gap reflects how VLA has been built -- as isolated subtask improvements that fail to compose coherent driving capabilities -- rather than what VLA is. We present MindVLA-U1, the first unified streaming VLA architecture for autonomous driving. A unified VLM backbone produces AR language tokens (optional) and flow-matching continuous action trajectories in a single forward pass over one shared representation, preserving the natural output form of each modality. A full streaming design processes the driving video framewise rather than as fixed video-action chunks under costly temporal VLM modeling. Planned trajectories evolve smoothly across frames while a learned streaming memory channel carries temporal context and updates. The unified architecture enables fast/slow systems on dense & sparse MoT backbones via flexible self-attention context management, and exposes a measurable language-control path for action: language-predicted driving intents steers the action diffusion via classifier-free guidance (CFG), turning language-side intent into control signals for continuous action planning. On the long-tail WOD-E2E benchmark, MindVLA-U1 surpasses experienced human drivers for the first time (8.20 RFS vs. 8.13 GT RFS) with 2 diffusion steps, achieves state-of-the-art planning ADEs over prior VA/VLA by large margins, and matches VA latency (16 FPS vs. RAP's 18 FPS at 1B scale) while preserving natural language interfaces for human-vehicle interaction.
comment: Work in progress. Project page: https://mind-omni.github.io/
Action Emergence from Streaming Intent
We formalize action emergence as a target capability for end-to-end autonomous driving: the ability to generate physically feasible, semantically appropriate, and safety-compliant actions in arbitrary, long-tail traffic scenes through scene-conditioned reasoning rather than retrieval or interpolation of learned scene-action mappings. We show that previous paradigms cannot deliver action emergence: autoregressive trajectory decoders collapse the inherently multimodal future into a single averaged output, while diffusion and flow-matching generators express multimodality but are not steerable by reasoned intent. We propose Streaming Intent as a concrete way to approach action emergence: a mechanism that makes driving intent (i) semantically streamed through a continuous chain-of-thought that causally derives the intent from scene understanding, and (ii) temporally streamed across clips so that intent commitments remain coherent along the driving horizon. We realize Streaming Intent in a VLA model we call SI (Streaming Intent). SI autoregressively decodes a four-step chain-of-thought and emits an intent token; the decoded intent then drives classifier-free guidance (CFG) on a flow-matching action head, requiring only two denoising steps to generate the final trajectory. On the Waymo End-to-End benchmark, SI achieves competitive aggregate performance, with an RFS score of 7.96 on the validation set and 7.74 on the test set. Beyond aggregate metrics, the model demonstrates -- to our knowledge for the first time in a fully end-to-end VLA -- intent-faithful controllability: for a fixed scene, varying the intent class at inference yields qualitatively distinct yet consistently high-quality plans, arising purely from data-driven learning without any pre-built trajectory bank or hand-coded post-hoc selector.
comment: Project page: https://mind-omni.github.io/
Driving Intents Amplify Planning-Oriented Reinforcement Learning
Continuous-action policies trained on a single demonstrated trajectory per scene suffer from mode collapse: samples cluster around the demonstrated maneuver and the policy cannot represent semantically distinct alternatives. Under preference-based evaluation, this caps best-of-N performance -- even oracle selection cannot recover what the sampling distribution does not contain. We introduce DIAL, a two-stage Driving-Intent-Amplified reinforcement Learning framework for preference-aligned continuous-action driving policies. In the first stage, DIAL conditions the flow-matching action head on a discrete intent label with classifier-free guidance (CFG), which expands the sampling distribution along distinct maneuver modes and breaks single-demonstration mode collapse. In the second stage, DIAL carries this expanded distribution into preference RL through multi-intent GRPO, which spans all intent classes within every preference group and prevents fine-tuning from re-collapsing around the currently preferred mode. Instantiated for end-to-end driving with eight rule-derived intents and evaluated on WOD-E2E: competitive Vision-to-Action (VA) and Vision-Language-Action (VLA) Supervised Finetuning (SFT) baselines plateau below the human-driven demonstration at best-of-128, with the strongest prior (RAP) capping at Rater Feedback Score (RFS) 8.5 even with best-of-64; intent-CFG sampling lifts this ceiling to RFS 9.14 at best-of-128, surpassing both the prior best (RAP 8.5) and the human-driven demonstration (8.13) for the first time; and multi-intent GRPO improves held-out RFS from 7.681 to 8.211, while every single-intent baseline peaks lower and degrades by training end. These results suggest that the bottleneck of preference RL on continuous-action policies trained from demonstrations is not only how to update the policy, but to expand and preserve the sampling distribution being optimized.
comment: Project page: https://mind-omni.github.io/
Sharing the Load: Autonomous Multi-Rover Cargo Transport
A future lunar habitat, as part of the Artemis program, will require a significant amount of logistics infrastructure. Cargo that is transported to the Moon will need to be moved from a landing site to other key locations that may be up to 5 km away. Teach and repeat navigation is well suited to this task as utility rovers will need to repeat these cargo routes many times. One of the most significant challenges involves the modules that will be assembled together to form the habitat. Canada is studying potential Lunar Utility Vehicle (LUV) designs to carry these large payloads between the landing site and the location of the habitat. As the details of the cargo continue to evolve, using two, smaller LUVs to carry cargo together would provide high capacity and mission flexibility. In this paper, we develop and implement a distributed model-predictive controller that allows vehicles to carry cargo that is shared between them. The algorithm is compared to baselines in small-scale before being implemented onboard two 800 kg path-to-flight rovers and field tested carrying a 475 kg cargo between them. A custom cargo coupling decouples the kinematics of each vehicle while fully supporting the cargo's mass. In our field test, the rovers maintain a relative separation error of 9.2 cm and maximum error of 33.4 cm. This multi-vehicle control architecture retains the high-quality path tracking of lidar teach and repeat for each rover. We demonstrate that kinematic freedom of the vehicles allows a single controller to provide mission improvements for other operations as well.
comment: 19 pages, 14 figures, submitted to IEEE Transactions on Field Robotics
RoboLab: A High-Fidelity Simulation Benchmark for Analysis of Task Generalist Policies
The pursuit of general-purpose robotics has yielded impressive foundation models, yet simulation-based benchmarking remains a bottleneck due to rapid performance saturation and a lack of true generalization testing. Existing benchmarks often exhibit significant domain overlap between training and evaluation, trivializing success rates and obscuring insights into robustness. We introduce RoboLab, a simulation benchmarking framework designed to address these challenges. Concretely, our framework is designed to answer two questions: (1) to what extent can we understand the performance of a real-world policy by analyzing its behavior in simulation, and (2) which factor most strongly affect policy behavior. First, RoboLab enables human-authored and LLM-enabled generation of scenes and tasks in a robot- and policy-agnostic manner within a high-fidelity simulation environment. We introduce an accompanying RoboLab-120 benchmark, consisting of 120 tasks categorized into three competency axes: visual, procedural, relational, across three difficulty levels. Second, we introduce a systematic analysis of real-world policies that quantify both their performance and the sensitivity of their behavior to controlled perturbations, exposing significant performance gap in current state-of-the-art models. By providing granular metrics and a scalable toolset, RoboLab offers a scalable framework for evaluating the true generalization capabilities of task-generalist robotic policies. Project website: https://research.nvidia.com/labs/srl/projects/robolab/.
VER: Vision Expert Transformer for Robot Learning via Foundation Distillation and Dynamic Routing
Pretrained vision foundation models (VFMs) advance robotic learning via rich visual representations, yet individual VFMs typically excel only in specific domains, limiting generality across tasks. Distilling multiple VFMs into a unified representation for policy can mitigate this limitation but often yields inflexible task-specific feature selection and requires costly full re-training to incorporate robot-domain knowledge. We propose VER, a Vision Expert transformer for Robot learning. During pretraining, VER distills multiple VFMs into a vision expert library. It then fine-tunes only a lightweight routing network (fewer than 0.4% of parameters) to dynamically select task-relevant experts from the pretrained library for downstream robot tasks. We further introduce Patchwise Expert Routing with Curriculum Top-K Annealing to improve both flexibility and precision of dynamic expert selection. Moreover, VER supports parameter-efficient finetuning for scalable expert utilization and adaptive robot-domain knowledge integration. Across 17 diverse robotic tasks and multiple policy heads, VER achieves state-of-the-art performance. We find that VER reduces large-norm outliers in task-irrelevant regions (e.g., background) and concentrates on task-critical regions. Visualizations and codes can be found in https://yixiaowang7.github.io/ver_page/.
Adapting Dijkstra for Buffers and Unlimited Transfers
In recent years, RAPTOR based algorithms have been considered the state-of-the-art for path-finding with unlimited transfers without preprocessing. However, this status largely stems from the evolution of routing research, where Dijkstra-based solutions were superseded by timetable-based algorithms without a systematic comparison. In this work, we revisit classical Dijkstra-based approaches for public transit routing with unlimited transfers and demonstrate that Time-Dependent Dijkstra (TD-Dijkstra) outperforms MR. However, efficient TD-Dijkstra implementations rely on filtering dominated connections during preprocessing, which assumes passengers can always switch to a faster connection. We show that this filtering is unsound when stops have buffer times, as it cannot distinguish between seated passengers who may continue without waiting and transferring passengers who must respect the buffer. To address this limitation, we introduce Transfer Aware Dijkstra (TAD), a modification that scans entire trip sequences rather than individual edges, correctly handling buffer times while maintaining performance advantages over MR. Our experiments on London and Switzerland networks show that we can achieve a greater than two time speed-up over MR while producing optimal results on both networks with and without buffer times.
comment: v3: revised manuscript incorporating reviewer feedback (formal correctness proof, deployment trade-off discussion, route/tau_min definitions, dominance-inequality fix); editorial and layout polish
Co-Me: Confidence-Guided Token Merging for Visual Geometric Transformers
We propose Confidence-Guided Token Merging (Co-Me), an acceleration mechanism for visual geometric transformers without retraining or finetuning the base model. Co-Me distilled a light-weight confidence predictor to rank tokens by uncertainty and selectively merge low-confidence ones, effectively reducing computation while maintaining spatial coverage. Compared to similarity-based merging or pruning, the confidence signal in Co-Me reliably indicates regions emphasized by the transformer, enabling substantial acceleration without degrading performance. Co-Me applies seamlessly to various multi-view and streaming visual geometric transformers, achieving speedups that scale with sequence length. When applied to VGGT and Pi3, Co-Me achieves up to 21.5x and 20.4x speedup, making visual geometric transformers practical for real-time 3D perception and reconstruction.
HECTOR: Human-centric Hierarchical Coordination and Supervision of Robotic Fleets under Continual Temporal Tasks
Robotic fleets can be extremely efficient when working concurrently and collaboratively, e.g., for delivery, surveillance, search and rescue. However, it can be demanding or even impractical for an operator to directly control each robot. Thus, autonomy of the fleet and its online interaction with the operator are both essential, particularly in dynamic and partially unknown environments. The operator might need to add new tasks, cancel some tasks, change priorities and modify planning results. How to design the procedure for these interactions and efficient algorithms to fulfill these needs have been mostly neglected in the related literature. Thus, this work proposes a human-centric coordination and supervision scheme (HECTOR) for large-scale robotic fleets under continual and uncertain temporal tasks. It consists of three hierarchical layers: (I) the bidirectional and multimodal protocol of online human-fleet interaction, where the operator interacts with and supervises the whole fleet; (II) the rolling assignment of currently-known tasks to teams within a certain horizon, and (III) the dynamic coordination within a team given the detected subtasks during online execution. The overall mission can be as general as temporal logic formulas over collaborative actions. Such hierarchical structure allows human interaction and supervision at different granularities and triggering conditions, to both improve computational efficiency and reduce human effort. Extensive human-in-the-loop simulations are performed over heterogeneous fleets under various temporal tasks and environmental uncertainties.
Any3D-VLA: Enhancing VLA Robustness via Diverse Point Clouds ICML 2026
Existing Vision-Language-Action (VLA) models typically take 2D images as visual input, which limits their spatial understanding in complex scenes. How can we incorporate 3D information to enhance VLA capabilities? We conduct a pilot study across different observation spaces and visual representations. The results show that explicitly lifting visual input into point clouds yields representations that better complement their corresponding 2D representations. To address the challenges of (1) scarce 3D data and (2) the domain gap induced by cross-environment differences and depth-scale biases, we propose Any3D-VLA. It unifies the simulator, sensor, and model-estimated point clouds within a training pipeline, constructs diverse inputs, and learns domain-agnostic 3D representations that are fused with the corresponding 2D representations. Simulation and real-world experiments demonstrate Any3D-VLA's advantages in improving performance and mitigating the domain gap. Our project homepage is available at https://xianzhefan.github.io/Any3D-VLA.github.io.
comment: ICML 2026
Bluetooth Phased-array Aided Inertial Navigation Using Factor Graphs: Experimental Verification
Phased-array Bluetooth systems have emerged as a low-cost alternative for performing aided inertial navigation in GNSS-denied use cases such as warehouse logistics, drone landings, and autonomous docking. Basing a navigation system off of commercial-off-the-shelf components may reduce the barrier of entry for phased-array radio navigation systems, albeit at the cost of significantly noisier measurements and relatively short feasible range. In this paper, we compare robust estimation strategies for a factor graph optimisation-based estimator using experimental data collected from multirotor drone flight. We evaluate performance in loss-of-GNSS scenarios when aided by Bluetooth angular measurements, as well as range or barometric pressure.
comment: 6 pages, 5 figures, 2 tables. \c{opyright} 2026 the authors. This work has been accepted to IFAC for publication under a Creative Commons Licence CC-BY-NC-ND
AutoMoT: A Unified Vision-Language-Action Model with Asynchronous Mixture-of-Transformers for End-to-End Autonomous Driving
Integrating vision-language models (VLMs) into end-to-end (E2E) autonomous driving (AD) systems has shown promise in improving scene understanding. However, existing integration strategies suffer from several limitations: they either struggle to resolve distribution misalignment between reasoning and action spaces, underexploit the general reasoning capabilities of pretrained VLMs, or incur substantial inference latency during action policy generation, which degrades driving performance. To address these challenges, we propose AutoMoT in this work, an end-to-end AD framework that unifies reasoning and action generation within a single vision-language-action (VLA) model. Our approach leverages a mixture-of-transformer (MoT) architecture with joint attention sharing, which preserves the general reasoning capabilities of pre-trained VLMs while enabling efficient fast-slow inference through asynchronous execution at different task frequencies. Extensive experiments on multiple benchmarks, under both open- and closed-loop settings, demonstrate that AutoMoT achieves competitive performance compared to state-of-the-art methods. We further investigate the functional boundary of pre-trained VLMs in AD, examining when AD-tailored fine-tuning is necessary. Our results show that pre-trained VLMs can achieve competitive multi-task scene understanding performance through semantic prompting alone, while fine-tuning remains essential for action-level tasks such as decision-making and trajectory planning. We refer to https://automot-website.github.io/ for the demonstration videos and qualitative results.
SoFFT: Spatial Fourier Transform for Modeling Continuum Soft Robots
Continuum soft robots, composed of flexible materials, exhibit theoretically infinite degrees of freedom, enabling notable adaptability in unstructured environments. Cosserat Rod Theory has emerged as a prominent framework for modeling these robots efficiently, representing continuum soft robots as time-varying curves, known as backbones. In this work, we propose viewing the robot's backbone as a signal in space and time, applying the Fourier transform to describe its deformation compactly. This approach unifies existing modeling strategies within the Cosserat Rod Theory framework, offering insights into commonly used heuristic methods. Moreover, the Fourier transform enables the development of a data-driven methodology to experimentally capture the robot's deformation. The proposed approach is validated through numerical simulations and experiments on a real-world prototype, demonstrating a reduction in the degrees of freedom while preserving the accuracy of the deformation representation.
DIVER: Reinforced Diffusion Breaks Imitation Bottlenecks in End-to-End Autonomous Driving
Most end-to-end autonomous driving methods rely on imitation learning from single expert demonstrations, often leading to conservative and homogeneous behaviors that limit generalization in complex real-world scenarios. In this work, we propose DIVER, an end-to-end driving framework that integrates reinforcement learning with diffusion-based generation to produce diverse and feasible trajectories. At the core of DIVER lies a reinforced diffusion-based generation mechanism. First, the model conditions on map elements and surrounding agents to generate multiple reference trajectories from a single ground-truth trajectory, alleviating the limitations of imitation learning that arise from relying solely on single expert demonstrations. Second, reinforcement learning is employed to guide the diffusion process, where reward-based supervision enforces safety and diversity constraints on the generated trajectories, thereby enhancing their practicality and generalization capability. Furthermore, to address the limitations of L2-based open-loop metrics in capturing trajectory diversity, we propose a novel Diversity metric to evaluate the diversity of multi-mode predictions.Extensive experiments on the closed-loop NAVSIM and Bench2Drive benchmarks, as well as the open-loop nuScenes dataset, demonstrate that DIVER significantly improves trajectory diversity, effectively addressing the mode collapse problem inherent in imitation learning.
comment: 17 pages, 10 figures
Safe Bayesian Optimization for Complex Control Systems via Additive Gaussian Processes
Automatic controller tuning is attractive for robotics and mechatronic systems whose dynamics are difficult to model accurately, but direct black-box optimization can be unsafe because each query is executed on the physical plant. Existing safe Bayesian optimization (BO) methods provide high-probability safety guarantees, yet their practical use in multi-loop control is limited by two coupled difficulties: the controller parameter space is often moderately high-dimensional, and hardware evaluations are too expensive to allow hundreds or thousands of exploratory trials. This paper proposes \textsc{SafeCtrlBO}, a safe BO method for simultaneously tuning multiple coupled controllers. The method uses additive Gaussian-process kernels to encode low-order structure across controller gains and reduce the sample complexity associated with dense full-dimensional kernels. It also replaces the expensive potential-expander computation used in \textsc{SafeOpt}-style exploration with a boundary-based expansion rule that preserves the intended safe-set expansion behavior under explicit geometric conditions and is validated empirically. Experiments on synthetic benchmarks and on a permanent magnet synchronous motor (PMSM) speed-control platform show that \textsc{SafeCtrlBO} reaches high-performing controller parameters with fewer hardware evaluations than representative safe BO baselines, while maintaining the prescribed high-probability safety criterion and avoiding violations of the hard signal-safety constraint in the hardware study. The code implementation is publicly available at https://github.com/hxwangnus/SafeCtrlBO.
comment: The shorter version has been accepted by IEEE Robotics and Automation Letters. This is the full version
XR-1: Towards Versatile Vision-Language-Action Models via Learning Unified Vision-Motion Representations ICML2026
Recent progress in large-scale robotic datasets and vision-language models (VLMs) has advanced research on vision-language-action (VLA) models. However, existing VLA models still face two fundamental challenges: (i) producing precise low-level actions from high-dimensional observations, (ii) bridging domain gaps across heterogeneous data sources, including diverse robot embodiments and human demonstrations. Existing methods often encode latent variables from either visual dynamics or robotic actions to guide policy learning, but they fail to fully exploit the complementary multi-modal knowledge present in large-scale, heterogeneous datasets. In this work, we present X Robotic Model 1 (XR-1), a novel framework for versatile and scalable VLA learning across diverse robots, tasks, and environments. XR-1 introduces the \emph{Unified Vision-Motion Codes (UVMC)}, a discrete latent representation learned via a dual-branch VQ-VAE that jointly encodes visual dynamics and robotic motion. UVMC addresses these challenges by (i) serving as an intermediate representation between the observations and actions, and (ii) aligning multimodal dynamic information from heterogeneous data sources to capture complementary knowledge. To effectively exploit UVMC, we propose a three-stage training paradigm: (i) self-supervised UVMC learning, (ii) UVMC-guided pretraining on large-scale cross-embodiment robotic datasets, and (iii) task-specific post-training. We validate XR-1 through extensive real-world experiments with more than 14,000 rollouts on six different robot embodiments, spanning over 120 diverse manipulation tasks. XR-1 consistently outperforms state-of-the-art baselines such as $π_{0.5}$, $π_0$, RDT, UniVLA, and GR00T-N1.5 while demonstrating strong generalization to novel objects, background variations, distractors, and illumination changes. Our project is at https://xr-1-vla.github.io/.
comment: Accepted to ICML2026 as spotlight
MALLVI: A Multi-Agent Framework for Integrated Generalized Robotics Manipulation
Task planning for robotic manipulation with large language models (LLMs) is an emerging area. Prior approaches rely on specialized models, fine tuning, or prompt tuning, and often operate in an open loop manner without robust environmental feedback, making them fragile in dynamic settings. MALLVI presents a Multi Agent Large Language and Vision framework that enables closed-loop feedback driven robotic manipulation. Given a natural language instruction and an image of the environment, MALLVI generates executable atomic actions for a robot manipulator. After action execution, a Vision Language Model (VLM) evaluates environmental feedback and decides whether to repeat the process or proceed to the next step. Rather than using a single model, MALLVI coordinates specialized agents, Decomposer, Localizer, Thinker, and Reflector, to manage perception, localization, reasoning, and high level planning. An optional Descriptor agent provides visual memory of the initial state. The Reflector supports targeted error detection and recovery by reactivating only relevant agents, avoiding full replanning. Experiments in simulation and real-world settings show that iterative closed loop multi agent coordination improves generalization and increases success rates in zero shot manipulation tasks. Code available at https://github.com/iman1234ahmadi/MALLVI .
comment: Some fundemental change in text and codebase. Will request a new submission later on
RoboWM-Bench: A Benchmark for Evaluating World Models in Robotic Manipulation
Recent advances in large-scale video world models have enabled increasingly realistic future prediction, raising the prospect of using generated videos as scalable supervision for robot learning. However, for embodied manipulation, perceptual realism alone is not sufficient: generated interactions must also be physically consistent and executable by robotic agents. Existing benchmarks provide valuable assessments of visual quality and physical plausibility, but they do not systematically evaluate whether predicted behaviors can be translated into executable actions that complete manipulation tasks. We introduce RoboWM-Bench, a manipulation-centric benchmark for embodiment-grounded evaluation of video world models. RoboWM-Bench converts generated human-hand and robotic manipulation videos into embodied action sequences and validates them through execution in physically grounded simulation environments. Built on real-to-sim scene reconstruction and diverse manipulation tasks, RoboWM-Bench enables standardized, reproducible, and scalable evaluation of physical executability. Using RoboWM-Bench, we evaluate state-of-the-art video world models and observe that visual plausibility and embodied executability are not always aligned. Our analysis highlights several recurring factors that affect execution performance, including spatial reasoning, contact prediction, and non-physical geometric distortions, particularly in complex and long-horizon interactions. These findings provide a more fine-grained view of current model capabilities and underscore the value of embodiment-aware evaluation for guiding physically grounded world modeling in robotic manipulation.
D-VLA: A High-Concurrency Distributed Asynchronous Reinforcement Learning Framework for Vision-Language-Action Models
The rapid evolution of Embodied AI has enabled Vision-Language-Action (VLA) models to excel in multimodal perception and task execution. However, applying Reinforcement Learning (RL) to these massive models in large-scale distributed environments faces severe systemic bottlenecks, primarily due to the resource conflict between high-fidelity physical simulation and the intensive VRAM/bandwidth demands of deep learning. This conflict often leaves overall throughput constrained by execution-phase inefficiencies. To address these challenges, we propose D-VLA, a high-concurrency, low-latency distributed RL framework for large-scale embodied foundation models. D-VLA introduces "Plane Decoupling," physically isolating high-frequency training data from low-frequency weight control to eliminate interference between simulation and optimization. We further design a four-thread asynchronous "Swimlane" pipeline, enabling full parallel overlap of sampling, inference, gradient computation, and parameter distribution. Additionally, a dual-pool VRAM management model and topology-aware replication resolve memory fragmentation and optimize communication efficiency. Experiments on benchmarks like LIBERO show that D-VLA significantly outperforms mainstream RL frameworks in throughput and sampling efficiency for billion-parameter VLA models. In trillion-parameter scalability tests, our framework maintains exceptional stability and linear speedup, providing a robust system for high-performance general-purpose embodied agents.
Geometry-Aware Sampling-Based Motion Planning on Riemannian Manifolds
In many robot motion planning problems, task objectives and physical constraints induce non-Euclidean geometry on the configuration space, yet many planners operate using Euclidean distances that ignore this structure. We address the problem of planning collision-free motions that minimize length under configuration-dependent Riemannian metrics, corresponding to geodesics on the configuration manifold. Conventional numerical methods for computing such paths do not scale well to high-dimensional systems, while sampling-based planners trade scalability for geometric fidelity. To bridge this gap, we propose a sampling-based motion planning framework that operates directly on Riemannian manifolds. We introduce a computationally efficient midpoint-based approximation of the Riemannian geodesic distance and prove that it matches the true Riemannian distance with third-order accuracy. Building on this approximation, we design a local planner that traces the manifold using first-order retractions guided by Riemannian natural gradients. Experiments on a two-link planar arm and a 7-DoF Franka manipulator under a kinetic-energy metric, as well as on rigid-body planning in $\mathrm{SE}(2)$ with non-holonomic motion constraints, demonstrate that our approach consistently produces lower-cost trajectories than Euclidean-based planners and classical numerical geodesic-solver baselines.
comment: Accepted to the 17th World Symposium on the Algorithmic Foundations of Robotics (WAFR), Oulu, Finland, Jun 15-17, 2026
Overcoming Dynamics-Blindness: Training-Free Pace-and-Path Correction for VLA Models
Vision-Language-Action (VLA) models achieve remarkable flexibility and generalization beyond classical control paradigms. However, most prevailing VLAs are trained under a single-frame observation paradigm, which leaves them structurally blind to temporal dynamics. Consequently, these models degrade severely in non-stationary scenarios, even when trained or finetuned on dynamic datasets. Existing approaches either require expensive retraining or suffer from latency bottlenecks and poor temporal consistency across action chunks. We propose Pace-and-Path Correction, a training-free, closed-form inference-time operator that wraps any chunked-action VLA. From a single quadratic cost, joint minimization yields a unified solution that decomposes orthogonally into two distinct channels. The pace channel compresses execution along the planned direction, while the path channel applies an orthogonal spatial offset, jointly absorbing the perceived dynamics within the chunk window. We evaluate our approach on a comprehensive diagnostic benchmark MoveBench designed to isolate motion as the sole controlled variable. Empirical results demonstrate that our framework consistently outperforms state-of-the-art training-free wrappers and dynamic-adaptive methods and improves success rates by up to 28.8% and 25.9% in absolute terms over foundational VLA models in dynamic-only and static-dynamic mixed environments, respectively.
MemCompiler: Compile, Don't Inject -- State-Conditioned Memory for Embodied Agents
Existing memory systems for embodied agents typically inject retrieved memory as static context at episode start, a paradigm we term Ahead-of-time Monolithic Memory Injection (AMMI). However, this static design quickly becomes misaligned with the agent's evolving state and may degrade lightweight executors below the no-memory baseline. To address this, we propose MemCompiler, which reframes memory utilization as State-Conditioned Memory Compilation. A learned Memory Compiler reads a structured Brief State capturing the agent's current execution state and dynamically selects and compiles only relevant memory into executable guidance. This guidance is delivered through a text channel and a latent Soft-Mem channel that preserves perceptual information not expressible in text. Across Alf World, EmbodiedBench, and ScienceWorld, MemCompiler consistently improves over no-memory across open-source backbones (up to +129%), matches or approaches frontier closed-source systems, and reduces per-step latency by 60%, demonstrating that state-aware memory compilation improves both effectiveness and efficiency.
Multiagent Systems
APWA: A Distributed Architecture for Parallelizable Agentic Workflows
Autonomous multi-agent systems based on large language models (LLMs) have demonstrated remarkable abilities in independently solving complex tasks in a wide breadth of application domains. However, these systems hit critical reasoning, coordination, and computational scaling bottlenecks as the size and complexity of their tasks grow. These limitations hinder multi-agent systems from achieving high-throughput processing for highly parallelizable tasks, despite the availability of parallel computing and reasoning primitives in the underlying LLMs. We introduce the Agent-Parallel Workload Architecture (APWA), a distributed multi-agent system architecture designed for the efficient processing of heavily parallelizable agentic workloads. APWA facilitates parallel execution by decomposing workflows into non-interfering subproblems that can be processed using independent resources without cross-communication. It supports heterogeneous data and parallel processing patterns, and it accommodates tasks from a wide breadth of domains. In our evaluation, we demonstrate that APWA can dynamically decompose complex queries into parallelizable workflows and scales on larger tasks in settings where prior systems fail completely.
comment: 25 pages, 2 figures, 14 tables
A Prototyping Framework for Distributed Control of Multi-Robot Systems
This paper presents a prototyping framework for distributed control of multi-robot systems, aimed at bridging theory and practical testing of distributed optimization algorithms. Using the Single Program, Multiple Data (SPMD) paradigm, the framework emulates distributed control on a single computer, with each core running the same algorithm using local states and neighbour-to-neighbour communication. We demonstrate the framework on a four-quadrotor position-swapping task using a non-cooperative game-theoretic distributed algorithm. Computational time and trajectory data are compared across the supported dynamics levels: a point-mass model, a high-fidelity quadrotor model, and an experimental hardware testbed using Crazyflie quadcopters. The results show that the framework provides a low-cost and accessible approach for validating distributed algorithms.
comment: Accepted at IFAC World Congress 2026
AI Knows When It's Being Watched: Functional Strategic Action and Contextual Register Modulation in Large Language Models
Large language models (LLMs) have been extensively studied from computational and cognitive perspectives, yet their behavior as communicative actors in socially structured contexts remains underexplored. This study examines whether LLM-based multi-agent systems exhibit systematic linguistic adaptation in response to perceived social observation contexts -- a question with direct implications for AI governance and auditing. Drawing on Habermas's (1981) Theory of Communicative Action, Goffman's (1959) dramaturgical model, Bell's (1984) Audience Design framework, and the Hawthorne Effect, we report a controlled experiment involving 100 multi-agent debate sessions across five conditions (n = 20 each). Conditions varied the framing of social observation -- from explicit monitoring by university researchers, to negation of monitoring, to an observer-substitution condition replacing human researchers with an automated AI auditing system. Monitored conditions (Delta+24.9%, Delta+24.2%) and the automated AI monitoring condition (Delta+22.2%) produce higher TTR change than audience-framing conditions (Delta+17.7%), F(4, 94) = 2.79, p = .031. Message length shows a fully dissociated effect, F(4, 95) = 19.55, p < .001. A fifth condition -- replacing human with AI observers -- yields intermediate TTR adaptation, suggesting LLM behavior is sensitive to observer identity: human evaluation elicits stronger register formalization than automated AI surveillance. We discuss implications for AI governance, algorithmic auditing, and the repositioning of LLMs as contextually sensitive communicative actors.
comment: 20 pages, 6 figures
Multi-Agentic Approach for History Matching of Oil Reservoirs
History matching is a central inverse problem in reservoir engineering, where uncertain reservoir parameters must be calibrated against observations. Although automated history matching can reduce manual effort, practical deployment remains difficult because engineers must still configure heterogeneous workflows involving parameter selection, physically admissible bounds, optimizer choice, hyperparameter tuning, simulator execution, and diagnostic reporting. We propose PetroGraph, a multi-agent framework for intelligent reservoir history matching that decomposes this workflow into specialized agents for model review, experimental planning, parameterization, optimization, simulation, and summarization. The system combines large language model agents with domain-specific tools, retrieval-augmented access to simulator documentation, validation of modified ECLIPSE input decks, human-in-the-loop checkpoints, and an OPM Flow-based simulation backend. This design enables users to initiate and steer history matching through natural language while preserving explicit control over selected parameters and optimization settings. We evaluate PetroGraph on three reservoir models of increasing complexity: the synthetic SPE1 model, the faulted SPE9 benchmark, and the real-field Norne model. Using weighted normalized root mean square error as the objective, PetroGraph reduces the mismatch by 95% on SPE1, 69% on SPE9, and 13% on Norne. These results demonstrate that multi-agent orchestration can automate key decisions in history matching, lower the expertise barrier for operating complex simulation workflows, and provide a flexible foundation for extensible, domain-aware reservoir model adaptation.
Agreement, Diversity, and Polarization Indices for Approval Elections
An index is a function that given an election outputs a value between 0 and 1, indicating the extent to which this election has a particular feature. We seek indices that capture agreement, diversity, and polarization among voters in approval elections, and that are normalized with respect to saturation. By the latter we mean that if two elections differ by the fraction of candidates approved by an average voter, but otherwise are of similar nature, then they should have similar index values. We propose several indices, analyze their properties, and use them to (a) derive a new map of approval elections, and (b) show similarities and differences between various real-life elections from Pabulib, Preflib and other sources.
Temporal Fair Division in Multi-Agent Systems: From Precise Alternation Metrics to Scalable Coordination Proxies
A plethora real-world environments require agents to compete repeatedly for the same limited resource, calling for a temporal notion of fairness judged across entire interaction histories. This paper advances the theory of temporal fair division by introducing Rotational Periodicity (RP), a family of lightweight metrics, alongside the ALT family of sliding-window measures, within a unified framework for repeated multi-agent resource competition. We formalise the Multi-Agent Battle of the Exes (MBoE) as a repeated fair division instance and establish Perfect Alternation (PA) as its canonical temporally fair solution, drawing connections to proportionality, envy-freeness, and n-periodic round-robin allocation. RP decomposes temporal fairness into two complementary sub-measures: Rotational Score (RS) and Waiting Periods Evaluation (WPE), achieving O(nu+n) time complexity versus the O(nu*n) of ALT, where nu is the episode count and n the agent count. Empirical evaluation across n in {2,3,5,8,10} reveals three findings. First, both RP and ALT expose a coordination failure invisible to traditional metrics: Q-learning agents perform worse than random policies by 10-73% on RP and 7-35% on CALT, while Reward Fairness remains misleadingly high (above 0.92 for n>=3). Second, RP achieves 12-25x computational speedup over ALT, growing with n. Third, the two families are complementary: ALT provides richer discrimination for small populations; RP scales reliably where ALT becomes intractable. Together they form a diagnostic toolkit for temporal fair division.
comment: 15 pages, 3 figures, 8 tables. Submitted to ACM Transactions on Economics and Computation, Special Issue on Fair Division
Decision-Level Fusion for Robust Wearable Affect Recognition
Automatic recognition of affective state from wearable physiology has clear societal impact for public health, preventive care, and stress-aware interventions, but real deployments require robustness to non-stationary dynamics, artefacts, and missing sensors. We study this problem on WESAD, using baseline, stress, and amusement conditions, where common fixed-basis spectral features such as FFT bandpower and Welch PSD can oversmooth short-lived discriminative patterns. We propose a non-stationary pipeline that combines Fourier-Bessel Series Expansion (FBSE) with EWT data-driven spectral segmentation to extract mode-wise transient descriptors. For multimodal integration, we adopt decision-level aggregation over per-modality predictors and weight each modality by predictive uncertainty and modality reliability. Results on WESAD, using 15 subjects and ECG, EDA, BVP, EMG, and ACC signals across three classes, indicate that decision-level aggregation is approximately 84 percent of the time at least as good as feature-level aggregation, and approximately 48 percent of the time strictly better, suggesting improved robustness under heterogeneous and partially reliable sensing.
IFPV: An Integrated Multi-Agent Framework for Generative Operational Planning and High-Fidelity Plan Verification
Operational plan generation and verification are critical for modern complex and rapidly changing battlefield environments, yet traditional generation and verification methods still respectively face the challenges of generation infeasibility and verification insufficiency. To alleviate these limitations, we propose an Integrated Multi-Agent Framework for Generative Operational Planning and High-Fidelity Plan Verification (IFPV). IFPV consists of two tightly coupled modules: Multi-Perspective Hierarchical Agents (MPHA) for generative operational planning and an Adversarial Cognitive Simulation Engine (ACSE) for high-fidelity adversarial plan verification. MPHA decomposes commander intent into executable multi-platform tactical action sequences through the collaboration of Pathfinder, Analyst, and Planner agents. ACSE introduces an opponent equipped with a customized world model, which predicts the future evolution of mission-critical platforms and conducts dynamic counteractions against candidate plans. Simulation experiments in the Asymmetric Combat Tactic Simulator (ACTS) show that IFPV improves mission success by 19.4% and reduces operational cost by 41.7% compared with a single-step large language model (LLM) planning baseline. Compared with a traditional rule-based validator, ACSE increases the average suppression rate by 31.8%, indicating that the proposed verification environment is stricter and more discriminative in revealing the latent vulnerabilities of candidate plans. The code for IFPV can be found at https://github.com/zhigao3ks/IFPV.
comment: Submitted to Neurocomputing
Prompting Policies for Multi-step Reasoning and Tool-Use in Black-box LLMs with Iterative Distillation of Experience
The shift toward interacting with frozen, "black-box" Large Language Models (LLMs) has transformed prompt engineering from a heuristic exercise into a critical optimization challenge. We propose a Reinforcement Learning (RL) framework for training learned prompting policies via iterative distillation of experience. In this architecture, a lightweight prompter model is optimized to maximize task-specific rewards for a larger, frozen worker LLM. By utilizing a contrastive experience buffer that couples scalar rewards with dense textual critiques, our approach effectively amortizes iterative prompt refinement into single-shot policy weights. Our experimental analysis focuses on the Big Bench Extra Hard (BBEH) and Tau-bench suites, covering a diverse range of multi-step reasoning and tool-use tasks. We demonstrate significant gains, improving performance from 55% to 90% in logic-intensive reasoning and 74% to 91% in tool-use tasks. Furthermore, we analyze the structural evolution of prompts, demonstrating how the policy discovers specialized algorithmic heuristics. We provide comprehensive comparisons against state-of-the-art evolutionary baselines like GEPA, showing that iterative distillation achieves superior performance with higher sample efficiency.
comment: 10 pages and reference, appendix
Data-Augmented Game Starts for Accelerating Self-Play Exploration in Imperfect Information Games
Finding approximate equilibria for large-scale imperfect-information competitive games such as StarCraft, Dota, and CounterStrike remains computationally infeasible due to sparse rewards and challenging exploration over long horizons. In this paper, we propose a multi-agent starting-state sampling strategy designed to substantially accelerate online exploration in regularized policy-gradient game methods for two-player zero-sum (2p0s) games. Motivated by an assumption that offline demonstrations from skilled humans can provide good coverage of high-level strategies relevant to equilibrium play, we propose the initialization of reinforcement learning data collection at intermediate states sampled from offline data to facilitate exploration of strategically relevant subgames. Referring to this method as Data-Augmented Game Starts (DAGS), we perform experiments using synthetic datasets and analytically tractable, long-horizon control variants of two-player Kuhn Poker, Goofspiel, and a counterexample game designed to penalize biased beliefs over hidden information. Under fixed computational budgets, DAGS enables regularized policy gradient methods to achieve lower exploitability in games with significantly more challenging exploration. We show that augmenting starting state distributions when solving imperfect information games can lead to biased equilibria, and we provide a straightforward mitigation to this in the form of multi-task observation flags. Finally, we release a new set of benchmark environments that drastically increase exploration challenges and state counts in existing OpenSpiel games while keeping exploitability measurements analytically tractable.
comment: 17 pages, 4 figures. JB Lanier and Nathan Monette contributed equally
Quantum Advantage in Multi Agent Reinforcement Learning
We present an empirical evaluation of quantum entanglement in agent coordination within quantum multi agent reinforcement learning (QMARL). While QMARL has attracted growing interest recently, most prior work evaluates quantum policies without provable baselines, making it impossible to rigorously distinguish quantum advantage from algorithmic coincidence. We address this directly by evaluating a decentralized QMARL framework with variational quantum circuit (VQC) actors with shared entangled states. In the CHSH game, which has a mathematically proven classical performance ceiling of 0.75 win rate, we show that entangled QMARL agents approach the Tsirelson limit of 0.854, providing clear evidence of their quantum advantage. We show that unentangled quantum circuits match the classical baseline, confirming that entanglement and not the quantum circuit itself is the active coordination mechanism. We also explore the effect of specific entanglement structures, as some Bell states enable coordination gains while others actively harm performance. On cooperative navigation (CoopNav), QMARL without entanglement achieves $\sim2\times$ improvement in success rate over classical MAA2C ($\sim$0.85 versus $\sim$0.40), with the hybrid configuration, quantum actor paired with a classical centralised critic, outperforming both fully classical and fully quantum solutions. We present our experimental analysis and discuss future work.
comment: 19 pages
Estimated Dynamic Equilibrium Model: Supply and Demand as a Sample Path of a Stochastic Process
We introduce the Estimated Dynamic Equilibrium Model (EDEM), an agent-based framework that treats supply and demand as a coupled stochastic process driven by heterogeneous, noisy agent valuations. The model's primary technical contribution is the identification of a generative mechanism for persistent disequilibrium: when market-clearing prices are sequentially sampled from the upper tail of noisy bid distributions and recycled as inputs for future valuations, expected prices drift upward despite strictly zero-mean estimation errors. We derive this order-statistic bias in closed form for i.i.d. uniform bids and use simulations to show that compounding this bias across epochs yields exponential price growth without requiring assumptions of investor optimism or irrationality. This framework extends Miller's divergence-of-opinion theory to a dynamic setting, recovering Walrasian equilibrium and Miller's static premium as limiting cases. Through controlled experiments and sensitivity analysis on a simulated real-estate neighborhood, we identify six distinct regimes-ranging from band-stability to runaway bubbles-emerging from a single agent ruleset. These results offer a potential explanation for the contradictory findings in the empirical divergence-of-opinion literature and suggest that machine-learning valuation algorithms may inadvertently amplify this inherent statistical bias.
Belief Engine: Configurable and Inspectable Stance Dynamics in Multi-Agent LLM Deliberation
LLM-based agents are increasingly used to simulate deliberative interactions such as negotiation, conflict resolution, and multi-turn opinion exchange. Yet generated transcripts often do not reveal why an agent's stance changes: movement may reflect evidence uptake, anchoring, role drift, echoing, or changed prompt and retrieval context. We introduce the Belief Engine (BE), an auditable belief-update layer that treats "belief" as an evidential state over a proposition and exposes it as scalar stance. BE extracts arguments into structured memory and updates stance with a log-odds rule controlled by evidence uptake u and prior anchoring a. Across multiple base LLMs, parameter sweeps show that these controls reliably shape stance dynamics while preserving an evidence-level update trail. On DEBATE, a human deliberation dataset with pre/post opinions, BE best reconstructs participants whose final stance follows extracted evidence; stable and evidence-opposed cases instead point to anchoring or factors outside the extracted evidence stream. BE provides configurable infrastructure for studying evidence-grounded deliberation, where openness, commitment, convergence, and disagreement can be tied to explicit update assumptions rather than hidden prompt effects.
SMCEvolve: Principled Scientific Discovery via Sequential Monte Carlo Evolution
LLM-driven program evolution has emerged as a powerful tool for automated scientific discovery, yet existing frameworks offer no principled guide for designing their individual components and provide no guarantee that the search converges. We introduce SMCEvolve, which recasts program search as sampling from a reward-tilted target distribution and approximates it with a Sequential Monte Carlo (SMC) sampler. From this view, three core mechanisms emerge as principled components: adaptive parent resampling, mixture of mutation with acceptance, and automatic convergence control. We further provide a finite-sample complexity analysis that bounds the LLM-call budget required to reach a target approximation error. Across math, algorithm efficiency, symbolic regression, and end-to-end ML research benchmarks, SMCEvolve surpasses state-of-the-art evolving systems while using fewer LLM calls under self-determined termination. The code is available at https://github.com/kongwanbianjinyu/SMCEvolve.
When Identity Overrides Incentives: Representational Choices as Governance Decisions in Multi-Agent LLM Systems
Multi-agent systems built on large language models are increasingly deployed in strategic policy and governance settings, where agents representing stakeholders with conflicting interests must coordinate under shared constraints. These systems typically assign role-based personas to agents, describing their motivations and objectives. Whether agents with role-based identities follow explicit payoffs or their assigned roles in strategic decision-making remains untested. Here we show that assigning role-based personas suppresses payoff-aligned behavior in four-agent strategic games, shifting equilibrium attainment by up to 90 percentage points even when agents have complete payoff information. We test a 2x2 factorial design (persona presence x payoff visibility) across four models (Qwen-7B, Qwen-32B, Llama-8B, Mistral-7B), and 53 environmental policy scenarios with two equilibria: Tragedy of the Commons, where individual payoff dominates, and Green Transition, where collective payoff dominates. With personas present, all models reach near-zero Tragedy equilibrium in the Tragedy-dominant scenarios despite complete payoff information, and 100% of equilibria correspond to Green Transition. No model reaches Tragedy equilibrium by removing personas alone; only Qwen models reach 65-90% Tragedy equilibrium rates when personas are removed, and payoffs are made explicit. Three distinct behavioral profiles emerge: Qwen shifts equilibrium selection based on framing condition, Mistral increases response variance without reaching the Tragedy equilibrium, and Llama holds near-constant across all conditions. Representational choices in multi-agent LLM systems are governance decisions: persona assignment determines which equilibrium a simulation produces, independent of the underlying incentive structure.
comment: Accepted to ACM FAccT 2026
HECTOR: Human-centric Hierarchical Coordination and Supervision of Robotic Fleets under Continual Temporal Tasks
Robotic fleets can be extremely efficient when working concurrently and collaboratively, e.g., for delivery, surveillance, search and rescue. However, it can be demanding or even impractical for an operator to directly control each robot. Thus, autonomy of the fleet and its online interaction with the operator are both essential, particularly in dynamic and partially unknown environments. The operator might need to add new tasks, cancel some tasks, change priorities and modify planning results. How to design the procedure for these interactions and efficient algorithms to fulfill these needs have been mostly neglected in the related literature. Thus, this work proposes a human-centric coordination and supervision scheme (HECTOR) for large-scale robotic fleets under continual and uncertain temporal tasks. It consists of three hierarchical layers: (I) the bidirectional and multimodal protocol of online human-fleet interaction, where the operator interacts with and supervises the whole fleet; (II) the rolling assignment of currently-known tasks to teams within a certain horizon, and (III) the dynamic coordination within a team given the detected subtasks during online execution. The overall mission can be as general as temporal logic formulas over collaborative actions. Such hierarchical structure allows human interaction and supervision at different granularities and triggering conditions, to both improve computational efficiency and reduce human effort. Extensive human-in-the-loop simulations are performed over heterogeneous fleets under various temporal tasks and environmental uncertainties.
Constitutional Governance in Metric Spaces
Computational social choice and algorithmic decision theory offer rich aggregation theory but no comprehensive process for egalitarian self-governance: aggregation, deliberation, amendment, and consensus are each considered in isolation, with key metric-space aggregators being NP-hard. Here, we propose constitutional governance in metric spaces, integrating these stages into a coherent polynomial-time protocol for constitutional governance. The constitution assigns, per amendable component including itself, a metric space, aggregation rule, and supermajority threshold. Amendments proceed by members voting with their ideal elements, followed by members submitting public proposals carrying supermajority public support under the revealed votes. Public proposals can be sourced from deliberation among members, vote aggregation, or AI mediation. The constitutional rule adopts a supported proposal with positive maximal score, if there is one, else retains the status quo. With Constitutional Consensus, a community can run the constitutional governance protocol on members' personal computing devices (e.g., smartphones), achieving digital sovereignty. We focus on the utility of the generalised median, prove that at majority threshold no misreport weakly dominates sincere voting, and study the compromise gap between best peak and unconstrained optimum. We instantiate the framework to seven canonical settings -- electing officers, setting rates, allocating budgets, ranking priorities, selecting boards, drafting bylaws, and amending the constitution. By unifying metric-space aggregation, reality-aware social choice, supermajority amendment, constitutional consensus, deliberative coalition formation, and AI mediation, this work delivers a comprehensive solution to the constitutional governance of digital communities and organisations.
Chinese Short-Form Creative Content Generation via Explanation-Oriented Multi-Objective Optimization
Chinese demonstrates high semantic compactness and rich metaphorical expressiveness, enabling limited text to convey dense meanings while increasing the difficulty of generation and verification, particularly in short-form creative natural language generation (CNLG). In the real world, users often require personalized, fine-grained creative constraints, making reliable verification critical to guiding optimization. According to Brunswik's Lens Model from psychology, constraints' achievement can be inferred from sufficient observable cues. Existing studies are mainly outcome-oriented, implicitly assuming that the outcome itself provides adequate cues for verification. However, this assumption breaks down in Chinese short-form CNLG (e.g., naming or advertising) with diverse personalized constraints, where extremely brief outcomes inherently offer limited information. Explanations can naturally serve as extra cues. Nevertheless, under complex constraints, LLMs' explanations may suffer from hallucination, incompleteness, or ambiguity. To address these, we novelly formalize the Chinese short-form CNLG task as a heterogeneous multi-objective optimization (HMO) issue that needs to jointly optimize multiple personalized constraints and explanation reliability. We further propose MAGIC-HMO, a training-free multi-agent framework that optimizes these objectives through iterative generation and verification under an explanation-oriented multi-objective strategy. Experiments on \emph{Chinese Baby Naming}, a challenging benchmark, demonstrate that MAGIC-HMO significantly outperforms six strong baselines across various LLM backbones. Relevant data and codes are available at https://github.com/foolfun/MAGIC_HMO.
comment: 19 pages,10 figures. Submitted to ACM for possible publication
Pythia: Exploiting Workflow Predictability for Efficient Agent-Native LLM Serving
As LLM applications grow more complex, developers are increasingly adopting multi-agent architectures to decompose workflows into specialized, collaborative components, introducing structure that constrains agent behavior and exposes useful semantic predictability. Unlike traditional LLM serving, which operates under highly dynamic and uncertain conditions, this structured topology enables opportunities to reduce runtime uncertainty$\unicode{x2015}$yet existing systems fail to exploit it, treating agentic workloads as generic traffic and incurring significant inefficiencies. Our analysis of production traces from an agent-serving platform and an internal coding assistant reveals key bottlenecks, including low prefix cache hit rates, severe resource contention from long-context requests, and substantial queuing delays due to suboptimal scaling. To address these challenges, we propose Pythia, a multi-agent serving system that captures workflow semantics through a simple interface at the serving layer, unlocking new optimization opportunities and substantially improving throughput and job completion time over state-of-the-art baselines.
Optimizing PyTorch Inference with LLM-Based Multi-Agent Systems
Maximizing performance on available GPU hardware is an ongoing challenge for modern AI inference systems. Traditional approaches include writing custom GPU kernels and using specialized model compilers to tune high-level code for specific GPU targets. Recent work shows that LLM-based multi-agent systems can effectively perform such tuning, often outperforming existing compilers and eliminating the need for manual kernel development. However, the dynamics of multi-agent systems for this task remain unexplored. In this work, we present a logical framework for comparing multi-agent PyTorch optimization systems. Our evaluation shows that exploit-heavy strategies perform best when paired with error-fixing agents, and that performance correlates with the granularity of optimization steps. The best implementation achieves an average 2.88x speedup over PyTorch Eager (1.85x over torch.compile) on an H100 GPU across diverse tasks in KernelBench, a benchmark suite covering a range of machine learning architectures in PyTorch. Code is publicly available at: https://github.com/pike-project/pike
Distributed Adaptive Estimation with ISS Guarantees for Sensor Networks with Partially Unknown Source Dynamics
This paper studies distributed adaptive estimation over sensor networks with partially unknown source dynamics. We present parallel continuous-time and discrete-time designs in which each node runs a local adaptive observer and exchanges information over a directed graph. For both time scales, we establish stability of the network coupling operators, prove boundedness of all internal signals, and show convergence of each node's estimate to the source despite model uncertainty and disturbances. We further derive input-to-state stability (ISS) bounds that quantify robustness to bounded process noise. A key distinction is that the discrete-time design uses constant adaptive gains and per-step regressor normalization to handle sampling effects, whereas the continuous-time design does not. A unified Lyapunov framework links local observer dynamics with graph topology. Simulations on star, cyclic, and path networks corroborate the analysis, demonstrating accurate tracking, robustness, and scalability with the number of sensing nodes.
comment: This version is accepted for publication in the 2026 IFAC World Conference
Systems and Control (EESS)
On the Nonexistence of Continuous Immersions for Discrete-time Systems
Understanding when linear immersions of nonlinear dynamical systems exist is important since such immersions allow us to leverage the rich tools of linear system theory to analyze nonlinear dynamics. Recently, Liu et al. (2023) showed that continuous-time dynamical systems that admit countably many but more than one omega-limit sets cannot be immersed into finite dimensional linear systems with a one-to-one and continuous mapping. In this paper, we extend these results to discrete-time dynamics and show that similar obstructions exist also in discrete time. We further consider a generalization involving alpha-limit sets. Several examples are provided to demonstrate the results.
comment: Copyright 2026 the authors. This work has been accepted to IFAC 2026 for publication under a Creative Commons License CC-BY-NC-ND
CoCo-InEKF: State Estimation with Learned Contact Covariances in Dynamic, Contact-Rich Scenarios
Robust state estimation for highly dynamic motion of legged robots remains challenging, especially in dynamic, contact-rich scenarios. Traditional approaches often rely on binary contact states that fail to capture the nuances of partial contact or directional slippage. This paper presents CoCo-InEKF, a differentiable invariant extended Kalman filter that utilizes continuous contact velocity covariances instead of binary contact states. These learned covariances allow the method to dynamically modulate contact confidence, accounting for more nuanced conditions ranging from firm contact to directional slippage or no contact. To predict these covariances for a set of predefined contact candidate points, we employ a lightweight neural network trained end-to-end using a state-error loss. This approach eliminates the need for heuristic ground-truth contact labels. In addition, we propose an automated contact candidate selection procedure and demonstrate that our method is insensitive to their exact placement. Experiments on a bipedal robot demonstrate a superior accuracy-efficiency tradeoff for linear velocity estimation, as well as improved filter consistency compared to baseline methods. This enables the robust execution of challenging motions, including dancing and complex ground interactions -- both in simulation and in the real world.
comment: RSS 2026
A Prototyping Framework for Distributed Control of Multi-Robot Systems
This paper presents a prototyping framework for distributed control of multi-robot systems, aimed at bridging theory and practical testing of distributed optimization algorithms. Using the Single Program, Multiple Data (SPMD) paradigm, the framework emulates distributed control on a single computer, with each core running the same algorithm using local states and neighbour-to-neighbour communication. We demonstrate the framework on a four-quadrotor position-swapping task using a non-cooperative game-theoretic distributed algorithm. Computational time and trajectory data are compared across the supported dynamics levels: a point-mass model, a high-fidelity quadrotor model, and an experimental hardware testbed using Crazyflie quadcopters. The results show that the framework provides a low-cost and accessible approach for validating distributed algorithms.
comment: Accepted at IFAC World Congress 2026
Learning Developmental Scaffoldings to Guide Self-Organisation
From subcellular structures to entire organisms, many natural systems generate complex organisation through self-organisation: local interactions that collectively give rise to global structure without any blueprint of the outcome. Yet a significant portion of the information driving such processes is not produced by self-organisation itself, instead, it is often offloaded to initial conditions of the system. Biological development is a prime example, where maternal pre-patterns encode positional and symmetry-breaking information that scaffolds the self-organising process. From maternal morphogen gradients in early embryogenesis to tissue-level morphogenetic pre-patterns guiding organ formation, this transfer of information to initial conditions, analogous to a memory-compute trade-off in computational systems, is a fundamental part of developmental processes. In this work, we study this offloading phenomenon by introducing a model that jointly learns both the self-organisation rules and the pre-patterns, allowing their interplay to be varied and measured under controlled conditions: a Neural Cellular Automaton (NCA) paired with a learned coordinate-based pattern generator (SIREN), both trained simultaneously to generate a set of patterns. We provide information-theoretic analyses of how information is distributed between pre-patterns and the self-organising process, and show that jointly learning both components yields improvements in robustness, encoding capacity, and symmetry breaking over purely self-organising alternatives. Our analysis further suggests that effective pre-patterns do not simply approximate their targets; rather, they bias the developmental dynamics in ways that facilitate convergence, pointing to a non-trivial relationship between the structure of initial conditions and the dynamics of self-organisation.
comment: 10 pages, 5 figures. Under review
Robust Quadcopter Motion Control Using Output Feedback
The study addresses the problem of quadcopter motion control using output feedback. By applying a geometric approach, the quadcopter model is transformed into a normal form with a time-varying gain coefficient, which is subsequently made stationary through double integration of the control input. A robust output feedback control law is synthesised based on the extended observer method.
Behavioral Data-Driven Optimal Trajectory Generation for Rotary Cranes
With the growth of the construction industry and the global shortage of skilled labor, the automation of crane control has become increasingly important for safe and efficient operations. A central challenge in automatic crane control is the reduction of load oscillations during motion, which is primarily addressed through appropriate slewing trajectories. In this context, classical model-based control methods rely on accurate dynamical models and expert tuning, and often struggle to meet safety and precision requirements, while many learning-based approaches require large data sets and significant computational resources. This paper proposes a behavioral data-driven framework for generating open-loop slewing trajectories for rotary cranes that suppress load sway while reducing operation time and energy consumption. The approach builds on Willems' fundamental lemma and its generalizations, to bypass explicit system modeling and operate directly on measured input-output data. A practical workflow is presented in this paper to reduce the need for expert knowledge. Despite the underactuated nature of the crane dynamics, the method identifies a nonparametric representation of the system behavior and generates smooth, optimal trajectories using limited data and convex optimization. The proposed trajectory generation method is validated on a laboratory crane setup and compared against an established model-based approach, achieving up to 35% reduction in load sway, 43% reduction in tracking error, and 50% reduction in travel time.
Radioactive Source Seeking using Bayesian Optimisation with Movement Penalty
The use of mobile robotics in radioactive source seeking has become an important part of modern radiation-safety practices, supporting timely mitigation of contamination risks and helping protect public health. However, measuring radiation is often time-consuming, rendering traditional gradient-based source-seeking methods less effective due to lower sample efficiency. This paper proposes a sample-efficient Bayesian-Optimisation source-seeking strategy that utilises a heteroscedastic Gaussian process surrogate to balance exploration and exploitation. Excessive inter-sample travel is discouraged through a movement switching cost. The strategy is shown to generate sublinear regret in the source-seeking task, while simulations demonstrate its effectiveness in localising radioactive sources.
An integration-free approach for particle flow filtering
Log-homotopy particle flow filters realize nonlinear Bayesian estimation by continuously migrating samples from the prior to the posterior distribution. This transport is governed by a pseudo-time ordinary differential equation (ODE). A major practical challenge of these filters is the need for numerical integration, which suffers from high computational cost and susceptibility to stiffness. This paper develops an exact, integration-free closed-form solution for the exact Daum--Huang (EDH) deterministic particle flow under vector linear Gaussian measurements. By transforming the ODE into a specific eigenspace, closed-form algebraic expressions are derived for both the homogeneous state transition matrix and the inhomogeneous forcing term. We prove that this analytic solution is mathematically equivalent to the exact Kalman measurement update. Furthermore, we demonstrate how this closed-form evaluation can be embedded within an $N$-step slicing method, providing a stiffness-mitigating, integration-free particle update for highly nonlinear measurement models.
Hybrid Metaheuristic Optimization of Distributed Control System Hardware Architecture with Model-Based Verification
Large-scale chemical plants rely on distributed process control systems (PCS) comprising numerous processing units, communication modules, and I/O devices interconnected via industrial networks. The design of a cost-efficient and reliable hardware architecture under partial uncertainty in plant parameters remains a challenging combinatorial optimization problem. This paper proposes a formal model for distributed control system hardware architecture synthesis. A hybrid ant colony-based metaheuristic framework is developed to construct feasible hierarchical architectures. The proposed approach is validated on a large-scale sulfuric acid plant control system case study. Plant parameters are identified from operational data, system stability is analyzed, and a controller synthesis is performed based on the optimized architecture. The results demonstrate the feasibility of the approach and confirm that the obtained architecture satisfies structural and dynamic performance requirements.
comment: Accepted for IFAC World Congress 2026
Addressing Terminal Constraints in Data-Driven Demand Response Scheduling
Electrified chemical processes are incentivized by exposure to time-varying electricity markets to operate flexibly, but participating in demand response schemes can require satisfying terminal constraints over long horizons. Specifically, terminal constraints may be required when computing optimal schedules in order to preserve dynamic stability. Model-based optimization methods are computationally costly, and data-driven scheduling via reinforcement learning (RL) faces severe credit-assignment challenges. We integrate Goal-Space Planning (GSP) with Deep Deterministic Policy Gradient (DDPG), using learned temporally abstract models over discrete subgoals to propagate value across extended horizons. Using a simulated air separation benchmark, we demonstrate the proposed approach improves sample efficiency over standard DDPG while satisfying terminal storage constraints, mitigating myopic control behavior.
comment: Accepted to IFAC World Congress 2026
Flexibility-Aware Framework for Efficient Planner-Initiated Siting of Data Center
Explosive growth in energy-intensive AI data centers is outstripping the pace of power grid interconnection and transmission expansion. While operational flexibility has been proposed to mitigate this stress, existing processes are often reactive and evaluate projects only after they enter a multi-year interconnection queue. To address this, we introduce a planner-initiated siting framework that integrates (i) reliability-gated screening, (ii) system-wide market-impact assessment under standardized flexibility envelopes (firm, pause, and shift), and (iii) entropy-weighted multi-criteria scoring to produce ranked, pre-certified catalogues of interconnection-ready locations. Applied to a synthetic 2,000-bus Texas power system, the framework demonstrates that operational flexibility expands the siting frontier by 9-17% at 1 GW and 19-21% at 2 GW compared to firm operation. Median all-hour average prices remain essentially unchanged (USD 24.32/MWh for the 2 GW cases), and the shift envelope attenuates peak-hour price dispersion by approximately 3.4% with minimal side effects during off-peak hours. Utilizing pre-certified envelopes to bypass major transmission reinforcements, this workflow enables first energization in 12-18 months, a conservative reduction of 3.5-4 years versus the conventional 5-8 year project-led process. This technology-agnostic framework provides a proactive decision-making tool for system operators and regulators to fast-track large flexible loads while preserving grid reliability and market stability.
comment: Accepted for publication in Nature Communications. 33 pages, 4 figures
Dynamic Event-Triggered Control of Discrete-Time Nonlinear Systems based on Difference-Algebraic Representations
This paper addresses the dynamic event-triggered control for a class of discrete-time nonlinear systems described by a difference-algebraic representation (DAR), using a gain-scheduled controller. An outstanding aspect of the proposed method is the incorporation of information about the system's nonlinearities into the control law and the trigger function. The proposed event-triggered mechanism also incorporates information on the asynchronous terms induced by the event-based sampling. All these ingredients enable the derivation of a less conservative co-design condition for the simultaneous design of the gain-scheduled control law and the dynamic triggering mechanism to ensure the asymptotic stability of the closed-loop system. An estimate of the region of attraction of the origin of the closed-loop system is obtained to guarantee the closed-loop system's operation within the domain of validity of the DAR. Then, an optimization problem is formulated to reduce the number of events and enlarge the estimated region of attraction. Finally, the effectiveness of the proposed condition is illustrated by a numerical example.
comment: Accepted to the IFAC World Congress 2026
SeaVis: Modeling and Control of a Remotely Operated Towed Vehicle for Seabed Visualization and Mapping
High-resolution seafloor mapping necessitates stable and precise positioning for underwater robots. This paper introduces a novel mathematical model for SeaVis remotely operated towed vehicles (ROTVs) and develops a gain-scheduled linear-quadratic regulator (LQR) for robust depth and attitude control. We validate the approach in a high-fidelity simulation, benchmarking the LQR against a conventional PID controller over a challenging seabed profile. The presented results demonstrate the LQR's superior performance, with significantly enhanced robustness to disturbances, greater control efficiency, and substantially reduced flap actuation. The gain scheduling also confirms the controller's effectiveness across the full operational velocity range. The complete simulation environment and controller are open-sourced.
comment: Accepted at IEEE/ASME AIM 2026
Distributionally Robust Model Predictive Control for Virtual Power Plants
This paper presents a distributionally robust model predictive control (DRMPC) framework for the optimal Virtual Power Plant (VPP) operation under electricity price uncertainty. A unified VPP model is formulated that captures the interaction between buildings, battery storage, and renewable generation, all influenced by exogenous weather and market signals. The proposed approach integrates data-driven forecasting with quantile-based uncertainty quantification to construct time-varying Wasserstein ambiguity sets that adapt to forecast dispersion and distributional shifts. This yields a tractable DR-MPC formulation that incorporates predictive distribution information directly into real-time decision making. The method is evaluated using real weather and market data from a Nordic case study across two seasonal scenarios. The results show that DR-MPC improves economic performance relative to standard forecast-based MPC when the ambiguity radius is chosen appropriately, with consistent gains of up to 0.8% for small radii across both seasonal scenarios. Larger radii become overly conservative and reduce revenue, underscoring the importance of proper radius selection. These findings demonstrate the practical value of distributionally robust optimization for uncertainty-aware VPP operation.
comment: 7 pages, 5 figures, submitted to IFAC World Congress 2026
Admittance-Guided Inverter Dispatch Command Manipulation Attack: A Grid Stability-Oriented Approach
The high penetration of voltage source converters in modern smart microgrids enhances operational flexibility while introducing complex cyber-physical vulnerabilities. Existing cyber-attack studies either require detailed knowledge of system topology and controller dynamics or depend on repeated online interactions, which may compromise practicality by generating operationally infeasible or limit-violating commands. This article investigates a dispatch command manipulation attack and develops an admittance-guided framework to identify the vulnerable inverter and the worst-case dispatch command that most severely degrades system stability. A compromised inverter is utilized to inject controlled harmonic perturbations for sparse admittance measurement, and a physics-informed neural network is then employed to reconstruct the operating-point-dependent admittance of target inverters over the feasible dispatch region. Based on the reconstructed admittance, a stability-margin-oriented optimization is formulated to locate the most vulnerable inverter and the corresponding worst-case dispatch command. Controller hardware-in-the-loop experiments on a five-inverter microgrid demonstrate that the identified command can drive the system into severe sub-synchronous oscillations while remaining within nominal dispatch bounds, highlighting the need for stability-aware command screening beyond static limit checking.
Quantifying Cyber-Vulnerability in Power Electronics Systems via an Impedance-Based Attack Reachable Domain
Power electronics systems are increasingly exposed to cyber threats due to their integration with digital controllers and communication networks. However, an attacker-oriented metric is still lacking to quantify the extent to which a node can be pushed toward instability within a privilege-constrained action space. This letter proposes an impedance-based Attack Reachable Domain (ARD) framework that maps feasible adversarial actions to critical-eigenvalue migration through impedance reshaping. Based on the ARD, an Attack Penetration Index is defined to quantify node-level cyber-vulnerability by jointly characterizing the penetration of the nominal stability margin and the accessibility of successful destabilizing attacks within a privilege-constrained action space. To make the proposed assessment computable when inverter models are unavailable, a practical gray-box workflow is further established by integrating existing impedance identification and differentiable surrogate tools. Case studies on a 4-bus system and a modified IEEE 39-bus system show that coordinated cross-layer manipulations are markedly more damaging than isolated single-layer attacks, and that the proposed metric reveals vulnerability patterns that cannot be inferred from grid-strength indicators.
Fully Dynamic Rebalancing in Dockless Bike-Sharing Systems via Deep Reinforcement Learning
This paper proposes a fully dynamic Deep Reinforcement Learning (DRL) method for rebalancing dockless bike-sharing systems, overcoming the limitations of periodic, system-wide interventions. We model the service through a graph-based simulator and cast rebalancing as a Markov decision process. A DRL agent routes a single truck in real time, executing localized pick-up, drop-off, and charging actions guided by spatiotemporal criticality scores. Experiments on real-world data show significant reductions in availability failures with a minimal fleet size, while limiting spatial inequality and mobility deserts. Our approach demonstrates the value of learning-based rebalancing for efficient and reliable shared micromobility.
comment: 6 pages, 5 figures, 1 table, accepted at the 23rd IFAC World Congress, Busan, South Korea, Aug. 23-26, 2026. Open invited track 9-131: "Control and Optimization for Smart Cities"
A Novel Schur-Decomposition-Based Weight Projection Method for Stable State-Space Neural-Network Architectures
Building black-box models for dynamical systems from data is a challenging problem in machine learning, especially when asymptotic stability guarantees are required. In this paper, we introduce a novel stability-ensuring and backpropagation-compatible projection scheme based on the Schur decomposition for the state matrix of linear discrete-time state-space layers, as well as an alternative pre-factorized formulation of the methodology. The proposed methods dynamically project the quasi-triangular factor of the state matrix's real Schur decomposition onto its nearest stable peer, ensuring stable dynamics with minimal overparameterization. Experiments on synthetic linear systems demonstrate that the method achieves accuracy and convergence rates comparable to those of state-of-the-art stable-system identification techniques, despite a marginal increase in computational complexity. Furthermore, the lower weight count facilitates convergence during training without sacrificing accuracy in stacked neural-network architectures with static nonlinearities targeting real-world datasets. These results suggest that the Schur-based projection provides a numerically robust framework for identifying complex dynamics on par with the State of the Art while satisfying strict asymptotic-stability requirements.
comment: 32 pages, 13 figures. Source code at https://codeberg.org/sergiovaneg/SchurSS
Automated Curriculum Design for High-dimensional Human Motor Learning
Designing effective practice schedules for high-dimensional motor learning tasks remains a challenge, especially when skill states are unobservable and task performance may not reflect the true learning. We propose an automated curriculum design framework that combines a human motor learning model and personalized real-time skill estimation with Stochastic Nonlinear Model Predictive Control in \emph{de-novo} (novel) motor learning paradigms. We validated our framework both through simulations and human-subject studies (N = 36) using a hand exoskeleton. Our proposed approach accelerates skill acquisition by $\sim23\%$, and ${\sim17\%}$ when compared to a random curriculum and a performance heuristics-based curriculum, respectively. These significant gains in learning efficiency highlight the potential of model-based, individualized curricula for motor rehabilitation and complex skill training.
Randomized Atomic Feature Models for Physics-Informed Identification of Dynamic Systems
We present a physics-informed framework for system identification based on randomized stable atomic features. Impulse responses are represented as random superpositions of stable atoms, namely damped complex exponentials associated with poles sampled inside a prescribed disk. Identification is then cast as a convex regularized least-squares problem with optional linear, second-order-cone, and KYP constraints. The approach generalizes random Fourier and random Laplace features to the damped, nonstationary regime relevant to engineering systems while retaining modal interpretability and scalable finite-dimensional computation. The main analytic point is an operator-theoretic Disk-Bochner viewpoint: positive measures over stable poles generate positive-definite kernels with a radius-dependent shift defect, while a converse scalar disk moment representation for an arbitrary kernel is characterized by subnormality of the canonical shift. We prove this statement, establish an RKHS-to-l1 embedding, show that sampled poles induce a valid finite atomic gauge, discuss random-feature convergence, and state sparse-recovery guarantees conditionally on the restricted-eigenvalue properties of the realized disk-Vandermonde or input-output design matrix. We also connect the normalized transfer function problem to Nevanlinna-Pick interpolation and LFT set-membership. The framework directly encodes stability margins, modal localization, DC-gain bounds, monotonicity, passivity, relative degree, settling-time targets, and time/frequency-domain error bounds. Numerical comparisons illustrate how physically meaningful priors can compensate for poor excitation and improve constrained impulse-response recovery in an under-informative data setting.
comment: Extended version of the conference paper submitted for IFAC World Congress, 2026
Energy Management for Solar-Powered Electric-Bus Charging Station: A Data-Driven Method
This paper presents a flexible energy management system (EMS) for an electric bus charging station (EBCS) that integrates renewable generation, energy storage, and electric bus (EB) charging while accounting for uncertainties in solar PV output, electricity prices, and EB arrival/departure state of charge. A data-driven polynomial chaos expansion surrogate is developed from a limited set of uncertainty samples, and a nonparametric inference method is used to enrich the input data when historical data is limited. Case studies on a solar-powered EBCS with 20 EBs demonstrate the effectiveness of the proposed EMS and data-driven method.
comment: 5 pages, 4 figures. To appear in the IEEE PES General Meeting 2026
Action-Conditioned Risk Gating for Safety-Critical Control under Partial Observability
Many safety-critical control problems are modeled as risk-sensitive partially observable Markov decision processes, where the controller must make decisions from incomplete observations while balancing task performance against safety risk. Although belief-space planning provides a principled solution, maintaining and planning over beliefs can be computationally costly and sensitive to model specification in practical domains. We propose a lightweight risk-gated reinforcement learning approximation for risk-sensitive control under partial observability. The method constructs a compact finite-history proxy state and learns an action-conditioned predictor of near-term safety violation. This predicted candidate-action risk is used in two complementary ways: as a risk penalty during value learning, and as a decision-time gate that interpolates between optimistic and conservative ensemble value estimates. As a result, low-risk actions are evaluated closer to reward-seeking estimates, while high-risk actions are evaluated more conservatively. We evaluate the approach in two safety-critical partially observable domains: automated glucose regulation and safety-constrained navigation. Across adult and adolescent glucose-control cohorts, the method improves overall glycemic tradeoffs and substantially reduces runtime relative to a belief-space planning baseline. On Safety-Gym navigation benchmarks, it achieves a more favorable reward-cost balance than unconstrained RL and several standard safe-RL baselines. These results suggest that action-conditioned near-term risk can provide an effective local signal for approximate risk-sensitive POMDP control when full belief-space planning is impractical.
On the (non-)resilience of encrypted controllers to covert attacks
The security of networked control systems (NCS) is receiving increasing attention from both cyber-security and system-theoretic perspectives. The former focuses on classical IT security goals such as confidentiality, integrity, and availability of process data, while the latter investigates tailored attacks (and detection schemes), including covert and zero-dynamics attacks. Confidentiality in control systems can, for instance, be achieved by securely outsourcing the evaluation of the controller to third-party platforms, such as cloud services. The underlying technology enabling such secure computation often is homomorphic encryption (HE). Recent works in encrypted control have proposed modifications to underlying HE schemes to achieve not only confidentiality but also resilience to certain types of integrity attacks. While extensions in this direction are desirable in principle, we show that the integrity problem in encrypted control cannot be solved by public-key HE schemes alone due to their inherent malleability. In other words, the same homomorphisms that enable encrypted control % in the first place can be leveraged not only constructively but also destructively. More precisely, we demonstrate that NCS are vulnerable to covert attacks, even when encrypted control is employed. Remarkably, this remains possible without knowledge of an unencrypted model. Yet, resilience to such attacks can still be achieved through complementary techniques. We present an approach based on verifiable computation that integrates with modern homomorphic cryptosystems and is asymptotically secure while incurring no communication overhead.
comment: Extended version of a paper presented at the IFAC World Congress 2026
PreFT: Prefill-only finetuning for efficient inference
Large language models can now be personalised efficiently at scale using parameter efficient finetuning methods (PEFTs), but serving user-specific PEFTs harms throughput, even with specialised kernels and memory management techniques. This is because, theoretically and empirically, a mismatch exists between prefill (processing a large number of tokens at once) and decode (generating a single token autoregressively): the latter has far lower throughput when serving multiple adapters. Rather than optimising performance relative to parameter count, for efficient multi-adapter serving, we instead ought to optimise performance relative to serving throughput. We therefore propose PreFT (Prefill-only Finetuning), wherein we only apply the adapter to prefill tokens and discard it afterwards. PreFT significantly increases throughput with minimal effect on performance. We develop and release an efficient implementation of two prefill-only PEFTs, LoRA and ReFT, on the vLLM inference engine. We first show that serving multi-user PreFTs is more efficient than traditional PEFTs ($1.9\times$ the throughput when serving $512$ adapters on Llama 3.1 70B). Then, we compare the performance of prefill-only vs. all-token adapters on a variety of supervised finetuning and reinforcement learning tasks with LMs at varying scales. On SFT, we observe that the evaluation loss of PreFTs is higher than PEFTs, but can be compensated by increasing rank with nearly no reduction in throughput. On RL, we consistently find that PreFTs approach parity with standard PEFTs. Together, this work validates prefill-only adaptation of LLMs as a more favourable accuracy-throughput tradeoff than existing PEFTs for personalised serving.
Optimizing Chilled Water Systems with Cooling Towers via Virtual Power Metrics and Extremum-Seeking Control
This paper presents an extremum seeking control (ESC) method for cooling tower fans to minimize overall power consumption of a chilled water plant system. Simulation studies across different climate locations demonstrate energy savings of approximately 15% compared to conventional control during summer conditions. This paper also proposes a virtual power meter (VPM) to enable use of the strategy in systems that lack physical power meters. Validation tests for the VPMs against physical meters showed good accuracy with a correlation of 96.11% and a normalized error of 5.11%. Coupled with the VPM, the proposed ESC control solution can be implemented on systems using typically available sensor measurements without the need for additional instrumentation.
A Variational Lagrangian Framework for Log-Homotopy Particle Flow Filters
The log-homotopy particle flow filter resolves the Bayesian update by transporting particles along a continuous trajectory in pseudo-time. However, the governing partial differential equation for the flow velocity is fundamentally underdetermined, admitting an infinite family of valid solutions. In this work, we regard the particle flow as the motion of a pressureless inviscid fluid. We define a Lagrangian action based on the kinetic energy of the system, subject to the constraints imposed by the continuity equation and the log-homotopy evolution. By applying the principle of least action, we obtain the Euler--Lagrange equations for the optimal flow, which yields an irrotational potential flow structure. We show that this variational framework yields a coupled Hamilton--Jacobi equation structurally isomorphic to Madelung's hydrodynamic formulation of quantum mechanics. In this analogy, the log-homotopy constraint acts as a generalized quantum potential that generates the force required to guide the probability fluid along the exact Bayesian update path. Finally, we derive the material acceleration of the flow, shifting the formulation from a kinematic to a dynamical description. This perspective could enable the application of higher-order symplectic integrators for improved numerical stability and provide a physics-based metric for adaptive stiffness detection in high-dimensional filtering.
Coordinated Trajectory Control Algorithm for Quadcopter Motion along a Smooth Spatial Trajectory
A complete model of the motion of a quadcopter along a smooth spatial trajectory is presented. Based on the model, a robust algorithm is proposed for controlling a quadcopter using measurements of linear coordinates and yaw angle. By introducing additional integrators, a dynamic control algorithm with a simplified controller tuning methodology is obtained. The control law is synthesized within the geometric approach, and its stability is proven. A realizable output-feedback version using an extended observer is also given. The results enable coordinated trajectory following in three-dimensional space despite unmeasured disturbances and incomplete state information.
Lie Generator Networks Extract EIS-Grade Battery Diagnostics from Pulse Relaxation Data
Electrochemical impedance spectroscopy (EIS) is the most informative diagnostic for lithium-ion batteries: its frequency-resolved spectra decompose cell behavior into distinct electrochemical processes, revealing mechanism-specific degradation invisible to voltage and resistance measurements. Yet EIS requires dedicated hardware and minutes-long acquisitions incompatible with field deployment. Here we show that Lie Generator Networks (LGN), a structure-preserving identification framework, extract electrochemical time constants from 60 seconds of post-pulse voltage relaxation, data that battery management systems already collect, that encode the same diagnostic and prognostic information as impedance spectra. LGN learns the generator matrix of the relaxation dynamics with stability guaranteed by architecture, yielding time constants precise enough to resolve electrochemical variation that conventional curve fitting cannot detect from identical data. Across five datasets totaling over 850 cells, four institutions, and multiple chemistries, LGN tracks degradation with near-perfect rank correlation ($|ρ_s| = 0.999$), enables cross-validated reconstruction of full Nyquist spectra at 2% median error across 227 cells, predicts which capacity-matched cells fail first from three early diagnostics, and recovers Arrhenius activation energies with zero physics priors without retraining or cell-specific tuning. LGN requires no training data, no impedance hardware, and no chemistry-specific calibration, converting any existing relaxation pulse into an impedance-grade diagnostic. This enables real-time health monitoring, rapid second-life grading, production-line quality control, and physics-informed prognosis from minutes of measurement.
comment: 18 pages, 5 figures
Control Algorithms for Quadcopter Motion in Dynamic Positioning Mode
A complete model of quadcopter motion for the task of dynamic positioning at a specified point is derived. Based on this model, two control algorithms are proposed. The first one generalizes previously obtained results to the case of a varying yaw angle. The second control algorithm addresses the above problem using a simplified regulator tuning methodology.
Continuous-time Predictor-Based Subspace Identification with Hermite basis expansions
In this paper the problem of continuous-time subspace identification for Linear Time Invariant (LTI) systems is considered and a method which directly identifies a continuous-time state-space form is proposed. First, Hermite basis functions are used to project signals and obtain a finite number of Hermite coefficients. By exploiting recursive relations and time derivative properties of the Hermite basis functions, an expression of the derivative operator is obtained. The latter is then recursively applied, ensuring that the state-space matrices remain in continuous-time form and making the system suitable for the implementation of steps which are akin to those of the Predictor-Based Subspace IDentification (PBSID) method. This new method, hereby called the Hermite-Domain PBSID (HD-PBSID) method, has the further advantage of avoiding time-shifts by properly scaling the input and output signals. The performance of the proposed approach is illustrated in a simulation study aimed at showing the accuracy of the estimates and at comparing the HD-PBSID method and the Laguerre-projections based Continuous-Time PBSID (CT-PBSID) algorithm.
comment: 14 pages, 19 figures
Time-Varying Deep State Space Models for Sequences with Switching Dynamics
The identification and modeling of time-varying systems is a fundamental challenge in signal processing and system identification. To address this challenge, we propose a class of time-varying state-space model (SSM) based neural networks in which the neurons' states are governed by time-varying dynamics. The proposed model provides the learnable time-varying dynamics through a dictionary of basis functions, where each basis function evolves differently over time. We evaluate the proposed approach on both synthetic data from switching systems and a speech denoising task where real audio is corrupted with switching dynamics noise. The results show that the proposed time-varying model consistently outperforms its time-invariant counterparts while maintaining comparable computational complexity. Our investigations also reveal which aspects of the time-varying dynamics of the data most need to be captured by the proposed time-invariant models, how the additional freedom provided by time-varying basis functions should be allocated across model components, and to what extent larger models can compensate for time-invariant limitations.
Designing Dense Satellite Clusters for Distributed Space-based Datacenters
Recent proposals for datacenters in sun-synchronous Low Earth Orbit rely on a large number of compute satellites formation-flying in dense clusters. Designing such satellite clusters requires optimizing the satellites' orbital geometry under several safety and operational constraints applied throughout the cluster's entire orbit. These constraints include guaranteeing a minimum inter-satellite spacing, obstruction-less solar power for every satellite, and that each satellite have a stable set of nearest neighbors with which it can maintain inter-satellite links (ISLs). In this work, we propose two main cluster orbital designs, parametrized by the minimum inter-satellite spacing $R_{min}$ and the cluster radius $R_{max}$: a planar cluster, and a 3D cluster. We show by construction and numerical analysis that both cluster orbital designs are consistent with the inter-satellite spacing, unobstructed sun-vector, and inter-satellite line of sight constraints. The proposed planar architecture is the most efficient packing of satellites in a plane for given $R_{min}$ and $R_{max}$ values, and our 3D architecture allows for the number of datacenter satellites to scale proportional to $(R_{max}/R_{min})^3$, an improvement over all previous LEO datacenter cluster designs. Finally, for a given satellite cluster, we formulate and solve an integer optimization problem that maps a VL2-like Clos network datacenter switching fabric onto the satellites and their corresponding set of feasible ISLs. We confirm that for both the planar and 3D architectures, there are sufficiently many permanently unobstructed ISLs within the cluster to replicate the switching fabric of terrestrial datacenters. We also examine the tradeoff between the number of ISLs each satellite can simultaneously sustain, and the corresponding number of cluster satellites that must be dedicated as aggregation and intermediate switches.
comment: 19 pages, 14 figures. Accepted to the 2026 AAS/AIAA Astrodynamics Specialist Conference
Remote State Estimation over a Wearing Channel: Information Freshness vs. Channel Aging
We study the remote estimation of a linear Gaussian system over a channel that wears out over time and with every use. The sensor can either transmit a fresh measurement in the current time slot, restore the channel quality at the cost of downtime, or remain silent. Frequent transmissions yield accurate estimates but incur significant wear on the channel. Renewing the channel too often improves channel conditions but results in poor estimation quality. What is the optimal timing to transmit measurements and restore the channel? This problem is formulated as a semi-Markov decision process (SMDP). We establish monotonicity properties of the optimal policy and propose structure-aware solution methods.
comment: This paper has been accepted for publication in IEEE Transactions on Automatic Control
Bluetooth Phased-array Aided Inertial Navigation Using Factor Graphs: Experimental Verification
Phased-array Bluetooth systems have emerged as a low-cost alternative for performing aided inertial navigation in GNSS-denied use cases such as warehouse logistics, drone landings, and autonomous docking. Basing a navigation system off of commercial-off-the-shelf components may reduce the barrier of entry for phased-array radio navigation systems, albeit at the cost of significantly noisier measurements and relatively short feasible range. In this paper, we compare robust estimation strategies for a factor graph optimisation-based estimator using experimental data collected from multirotor drone flight. We evaluate performance in loss-of-GNSS scenarios when aided by Bluetooth angular measurements, as well as range or barometric pressure.
comment: 6 pages, 5 figures, 2 tables. \c{opyright} 2026 the authors. This work has been accepted to IFAC for publication under a Creative Commons Licence CC-BY-NC-ND
Analytical PI Tuning for Second-Order Plants with Monotonic Response and Minimum Settling Time
This study presents two analytical closed-form PI controller tuning solutions for second-order plants with real poles, each achieving monotonic step response and minimum settling time. The first solution employs pole-zero cancellation, placing the controller zero at the slower plant pole and reducing the closed-loop dynamics to a critically damped second-order system. The second solution, applicable when the plant pole ratio is less than two, places all three closed-loop poles at a common location without cancelling any plant pole, yielding a closed-loop transfer function with a triple real pole and a zero. Despite retaining a closed-loop zero, this solution achieves strictly faster settling time than the pole-zero cancellation method in its region of applicability. The two solutions coincide at the boundary pole ratio of two and together form a continuous piecewise-analytical tuning covering the full range of plant pole ratios. This study further establishes that closed-loop transfer functions of the form a^n/(s + a)^n possess a maximum sensitivity Ms that is independent of the pole location a and depends solely on the order n, yielding universal robustness constants for each n. Numerical verification confirms the analytical results across multiple plant configurations.
comment: 7 figures
Data-Based Control of Continuous-Time Linear Systems with Performance Specifications
The design of direct data-based controllers has become a fundamental part of control theory research in the last few years. In this paper, we consider three classes of data-based state feedback control problems for linear systems. These control problems are such that, besides stabilization, some additional performance requirements must be satisfied. First, we formulate and solve a trajectory-reference control problem, on which desired closed-loop trajectories are known and a controller that allows the system to closely follow those trajectories is computed. Then, the solution of the LQR problem for continuous-time systems is presented. Finally, we consider the case in which the precise position of the desired poles of the closed-loop system is known, and introduce a data-based variant of a robust pole-placement procedure. The applicability of the proposed methods is tested using numerical simulations.
comment: 11 pages, 1 figure
System Identification for Dynamic Modeling of Large Steering Angle Vehicles
This paper presents the modeling of autonomous vehicles with high maneuverability used in an experimental framework for educational purposes. Since standard bicycle models typically neglect wide steering angles, we develop modified planar bicycle models and combine them with both parametric and non-parametric identification techniques that progressively incorporate physical knowledge. The resulting models are systematically compared to evaluate the tradeoff between model accuracy and computational requirements, showing that physics-informed neural network models surpass the purely physical baseline in accuracy at lower computational cost.
Strategic Gaussian Signaling under Linear Sensitivity Mismatch
We analyze Stackelberg Gaussian signaling games where the encoder and decoder have a linear sensitivity mismatch. Unlike the standard additive-bias model, a sensitivity mismatch means the encoder prefers the decoder to track a linear transformation of the state rather than a shifted one. We derive the equilibrium structure for both noiseless (cheap-talk) and noisy signaling channels. In the noiseless case, the equilibrium admits a spectral characterization: the encoder transmits information only along eigenspaces associated with the negative eigenvalues of a mismatch matrix. In the noisy regime, we derive analytical thresholds for informative signaling, showing that communication collapses if the sensitivity mismatch or transmission cost exceeds a channel-dependent threshold.
comment: Accepted to the 23rd IFAC World Congress (2026). This is an extended version containing full proofs
On Data-based Nash Equilibria in LQ Nonzero-sum Differential Games
This paper considers data-based solutions of linear-quadratic nonzero-sum differential games. Two cases are considered. First, the deterministic game is solved and Nash equilibrium strategies are obtained by using persistently excited data from the multiagent system. Then, a stochastic formulation of the game is considered, where each agent measures a different noisy output signal and state observers must be designed for each player. It is shown that the proposed data-based solutions of these games are equivalent to known model-based procedures. The resulting data-based solutions are validated in a numerical experiment.
comment: 6 pages, 2 figures
Pythia: Exploiting Workflow Predictability for Efficient Agent-Native LLM Serving
As LLM applications grow more complex, developers are increasingly adopting multi-agent architectures to decompose workflows into specialized, collaborative components, introducing structure that constrains agent behavior and exposes useful semantic predictability. Unlike traditional LLM serving, which operates under highly dynamic and uncertain conditions, this structured topology enables opportunities to reduce runtime uncertainty$\unicode{x2015}$yet existing systems fail to exploit it, treating agentic workloads as generic traffic and incurring significant inefficiencies. Our analysis of production traces from an agent-serving platform and an internal coding assistant reveals key bottlenecks, including low prefix cache hit rates, severe resource contention from long-context requests, and substantial queuing delays due to suboptimal scaling. To address these challenges, we propose Pythia, a multi-agent serving system that captures workflow semantics through a simple interface at the serving layer, unlocking new optimization opportunities and substantially improving throughput and job completion time over state-of-the-art baselines.
When is cumulative dose response monotonic? Analysis of incoherent feedforward motifs
We study the monotonicity of the cumulative dose response (cDR) for a class of incoherent feedforward motifs (IFFM) systems with linear intermediate dynamics and nonlinear output dynamics. While the instantaneous dose response (DR) may be nonmonotone with respect to the input, the cDR can still be monotone. To analyze this phenomenon, we derive an integral representation of the sensitivity of cDR with respect to the input and establish general sufficient conditions for both monotonicity and non-monotonicity. These results reduce the problem to verifying qualitative sign properties along system trajectories. We apply this framework to four canonical IFFM systems and obtain a complete characterization of their behavior. In particular, IFFM1 and IFFM3 exhibit monotone cDR despite potentially non-monotone DR, while IFFM2 is monotone already at the level of DR, which implies monotonicity of cDR. In contrast, IFFM4 violates these conditions, leading to a loss of monotonicity. Numerical simulations indicate that these properties persist beyond the structured initial conditions used in the analysis. Overall, our results provide a unified framework for understanding how network structure governs monotonicity in cumulative input-output responses.
comment: This extended version is submitted into IEEE CDC Conference 2026
Distributed Adaptive Estimation with ISS Guarantees for Sensor Networks with Partially Unknown Source Dynamics
This paper studies distributed adaptive estimation over sensor networks with partially unknown source dynamics. We present parallel continuous-time and discrete-time designs in which each node runs a local adaptive observer and exchanges information over a directed graph. For both time scales, we establish stability of the network coupling operators, prove boundedness of all internal signals, and show convergence of each node's estimate to the source despite model uncertainty and disturbances. We further derive input-to-state stability (ISS) bounds that quantify robustness to bounded process noise. A key distinction is that the discrete-time design uses constant adaptive gains and per-step regressor normalization to handle sampling effects, whereas the continuous-time design does not. A unified Lyapunov framework links local observer dynamics with graph topology. Simulations on star, cyclic, and path networks corroborate the analysis, demonstrating accurate tracking, robustness, and scalability with the number of sensing nodes.
comment: This version is accepted for publication in the 2026 IFAC World Conference
Simultaneous State Estimation and Online Model Learning in a Soft Robotic System
Operating complex real-world systems, such as soft robots, can benefit from precise predictive control schemes that require accurate state and model knowledge. This knowledge is typically not available in practical settings and must be inferred from noisy measurements. In particular, it is challenging to simultaneously estimate unknown states and learn a model online from sequentially arriving measurements. In this paper, we show how a recently proposed gray-box system identification tool enables the estimation of a soft robot's current pose while at the same time learning a bending stiffness model. For estimation and learning, we only need a nominal constant-curvature robot model and measurements of the robot's base reactions (e.g., base forces). The estimation scheme -- relying on a marginalized particle filter -- allows us to conveniently interface nominal constant-curvature equations with a Gaussian Process (GP) bending stiffness model to be learned. This, in contrast to estimation via a random walk over stiffness values, enables prediction of bending stiffness and improves overall model quality. We demonstrate, using a real-world soft robot, that the method learns a bending-stiffness model online while accurately estimating the robot's pose. Notably, reduced error in multi-step forward predictions indicates that the learned bending-stiffness GP improves overall model quality.
comment: 8 pages, 3 figures, 2 tables, contribution to the International Conference on Information Fusion 2026
Robotics
Loiter UAV Reinsertion Guidance for Fixed-wing UAV Corridors
This paper considers fixed-wing unmanned aerial vehicle (UAV) corridors comprising a main lane, a circular loiter lane for managing traffic congestion, and transit lanes connecting the two. In particular, we address the problem of conflict-free reinsertion of UAVs from the loiter lane back into the main lane. The loiter lane contains a fixed number of equidistant virtual slots that UAVs can occupy. Reinsertion of loiter UAVs into the main lane becomes essential either due to reduced traffic in the main lane or due to a loiter UAV needing to reach its destination urgently. Given the total number of loiter slots, UAV speed limits, and the minimum safety distance, a guidance algorithm is developed to compute the required speed of a loiter UAV in the transit lane to ensure safe reinsertion. The proposed guidance and automation strategies are validated through numerical simulations.
OmniLiDAR: A Unified Diffusion Framework for Multi-Domain 3D LiDAR Generation
LiDAR scene generation is increasingly important for scalable simulation and synthetic data creation, especially under diverse sensing conditions that are costly to capture at scale. Typically, diffusion-based LiDAR generators are developed under single-domain settings, requiring separate models for different datasets or sensing conditions and hindering unified, controllable synthesis under heterogeneous distribution shifts. To this end, we present OmniLiDAR, a unified text-conditioned diffusion framework that generates LiDAR scans in a shared range-image representation across eight representative domains spanning three shift types: adverse weather, sensor-configuration changes (e.g., reduced beams), and cross-platform acquisition (vehicle, drone, and quadruped). To enable training a single model over heterogeneous domains without isolating optimization by domain, we introduce a Cross-Domain Training Strategy (CDTS) that mixes domains within each mini-batch and leverages conditioning to steer generation. We further propose Cross-Domain Feature Modeling (CDFM), which captures directional dependencies along azimuth and elevation axes to reflect the anisotropic scanning structure of range images, and Domain-Adaptive Feature Scaling (DAFS) as a lightweight modulation to account for structured domain-dependent feature shifts during denoising. In the absence of a public consolidated benchmark, we construct an 8-domain dataset by combining real-world scans with physically based weather simulation and systematic beam reduction while following official splits. Extensive experiments demonstrate strong generation fidelity and consistent gains in downstream use cases, including generative data augmentation for LiDAR semantic segmentation and 3D object detection, as well as robustness evaluation under corruptions, with consistent benefits in limited-label regimes.
comment: Preprint; 12 pages, 7 figures, 10 tables
LMPath: Language-Mediated Priors and Path Generation for Aerial Exploration
Traditional autonomous UAV search missions rely on geometric coverage patterns that ignore the semantic context of the target, leading to significant time waste in large-scale environments. In this paper we present LMPath, a pipeline for generating language-mediated exploration priors for Unmanned Aerial Vehicle (UAV) search missions that leverages semantics. Given a basic geofence and an object of interest prompt, LMPath uses generative language models to determine what regions of the environment should contain that object and a foundation vision model ran over satellite imagery to segment sub-regions that form the exploration prior. This prior can then be used to generate UAV paths with various objectives, such as minimizing the expected time to locate the object of interest, maximizing the probability that the object is found given a limited travel distance, or narrowing down the search space to sub-regions that are most likely to contain the object. To demonstrate it's capabilities, we used LMPath to generate various UAV paths and ran them using a real UAV over large-scale environments. We also ran simulations to demonstrate how paths generated using LMPath outperform traditional path planning approaches for search missions.
comment: Poster at 2026 AI-Driven Safe Aerial Robotics Workshop
Realtime-VLA FLASH: Speculative Inference Framework for Diffusion-based VLAs
Diffusion-based vision-language-action models (dVLAs) are promising for embodied intelligence but are fundamentally limited in real-time deployment by the high latency of full inference. We propose Realtime-VLA FLASH, a speculative inference framework that eliminates most full inference calls during replanning by introducing a lightweight draft model with parallel verification via the main model's Action Expert and a phase-aware fallback mechanism that reverts to the full inference pipeline when needed. This design enables low-latency, high-frequency replanning without sacrificing reliability. Experiments show that on LIBERO, FLASH largely preserves task performance by replacing many 58.0 ms full-inference rounds with speculative rounds as fast as 7.8 ms, lowering task-level average inference latency to 19.1 ms (3.04x speedup). We additionally demonstrate effectiveness on real-world conveyor-belt sorting, highlighting its practical impact for latency-critical embodied tasks.
RoboEvolve: Co-Evolving Planner-Simulator for Robotic Manipulation with Limited Data
The scalability of robotic manipulation is fundamentally bottlenecked by the scarcity of task-aligned physical interaction data. While vision-language models (VLMs) and video generation models (VGMs) hold promise for autonomous data synthesis, they suffer from semantic-spatial misalignment and physical hallucinations, respectively. To bridge this gap, we introduce RoboEvolve, a novel framework that couples a VLM planner and a VGM simulator into a mutually reinforcing co-evolutionary loop. Operating purely on unlabeled seed images, RoboEvolve leverages a cognitive-inspired dual-phase mechanism: (i) daytime exploration fosters physically grounded behavioral discovery through a semantic-controlled multi-granular reward, and (ii) nighttime consolidation mines "near-miss" failures to stabilize policy optimization. Guided by an autonomous progressive curriculum, the system naturally scales from simple atomic actions to complex tasks. Extensive experiments demonstrate that RoboEvolve (I) achieves superior effectiveness, elevating base planners by 30 absolute points and amplifying simulator success by 48% on average; (II) exhibits extreme data efficiency, surpassing fully supervised baselines with merely 500 unlabeled seeds--a 50x reduction; and (III) demonstrates robust continual learning without catastrophic forgetting.
comment: On-going work
FrameSkip: Learning from Fewer but More Informative Frames in VLA Training
Vision-Language-Action (VLA) policies are commonly trained from dense robot demonstration trajectories, often collected through teleoperation, by sampling every recorded frame as if it provided equally useful supervision. We argue that this convention creates a temporal supervision imbalance: long low-change segments dominate the training stream, while manipulation-critical transitions such as alignment, contact, grasping, and release appear only sparsely. We introduce FrameSkip, a data-layer frame selection framework that scores trajectory frames using action variation, visual-action coherence, task-progress priors, and gripper-transition preservation, then remaps training samples toward high-importance frames under a target retention ratio. Because FrameSkip operates only in the dataloader, it leaves the VLA architecture, action head, training objective, and inference procedure unchanged. Across RoboCasa-GR1, SimplerEnv, and LIBERO, FrameSkip improves the success-retention trade-off over full-frame training and simpler frame selection variants, achieving a macro-average success rate of 76.15% across the three benchmarks compared with 66.50% for full-frame training while using a compressed trajectory view that retains 20% of unique frames in the main setting.
comment: GitHub: https://github.com/ZGC-EmbodyAI/FrameSkip
Manipulation Planning for Construction Activities with Repetitive Tasks
In this paper, we study the problem of manipulation skill acquisition for performing construction activities consisting of repetitive tasks (e.g., building a wall or installing ceiling tiles). Our approach involves setting up a simulated construction activity in a Virtual Reality (VR) environment, where the user can provide demonstrations of the object manipulation skills needed to perform the construction activity. We then exploit the screw geometry of motion to approximate the demonstrated motion as a sequence of constant screw motions. For performing the construction activity, we generate the sequence of manipulation task instances and then compute the joint space motion plan corresponding to each instance using Screw Linear Interpolation (ScLERP) and Resolved Motion Rate Control (RMRC). We evaluate our framework by executing two representative construction tasks: constructing brick walls and installing multiple ceiling tiles. Each task is performed using only a single demonstration, a pick-and-place action for the bricks, and a single ceiling tile installation. Our experiments with a 7-DoF robot in both simulation and hardware demonstrate that the approach generalizes robustly to arbitrarily long construction activities that involve repetitive motions and demand precision, even when provided with just one demonstration. For instance, we can construct walls of arbitrary layout and length by leveraging a single demonstration of placing one brick on top of another.
Learning Responsibility-Attributed Adversarial Scenarios for Testing Autonomous Vehicles
Establishing trustworthy safety assurance for autonomous driving systems (ADSs) requires evidence that failures arise from avoidable system deficiencies rather than unavoidable traffic conflicts. Current adversarial simulation methods can efficiently expose collisions, but generally lack mechanisms to distinguish these fundamentally different failure modes. Here we present CARS (Context-Aware, Responsibility-attributed Scenario generation), a framework that integrates responsibility attribution directly into adversarial scenario generation. CARS combines context-aware adversary selection with a generative adversarial policy optimized in closed-loop simulation to construct collision scenarios that are both physically feasible and diagnostically attributable. Across benchmark datasets spanning heterogeneous national traffic environments, CARS consistently discovers feasible collision scenarios with high attribution rates under multiple regulation-prescribed careful and competent driver models. By coupling adversarial generation with normative responsibility assessment, CARS moves simulation testing beyond collision discovery toward the construction of interpretable, regulation-aligned safety evidence for scalable ADS validation.
TinySDP: Real Time Semidefinite Optimization for Certifiable and Agile Edge Robotics
Semidefinite programming (SDP) provides a principled framework for convex relaxations of nonconvex geometric constraints in motion planning, yet existing solvers are too computationally expensive for real-time control, particularly on resource-constrained embedded systems. To address this gap, we introduce TinySDP, the first semidefinite programming solver designed for embedded systems, enabling real-time model-predictive control (MPC) on microcontrollers for problems with nonconvex obstacle constraints. Our approach integrates positive-semidefinite cone projections into a cached-Riccati-based ADMM solver, leveraging computational structure for embedded tractability. We pair this solver with an a posteriori rank-1 certificate that converts relaxed solutions into explicit geometric guarantees at each timestep. On challenging benchmarks, e.g., cul-de-sac and dynamic obstacle avoidance scenarios that induce failures in local methods, TinySDP achieves collision-free navigation with up to 73% shorter paths than state-of-the-art baselines. We validate our approach on a Crazyflie quadrotor, demonstrating that semidefinite constraints can be enforced at real-time rates for agile embedded robotics.
comment: Accepted to Robotics: Science and Systems (RSS) 2026. 11 pages, 5 figures, 2 tables. Project website: https://a2r-lab.org/TinySDP/
LEXI-SG: Monocular 3D Scene Graph Mapping with Room-Guided Feed-Forward Reconstruction
Scene graphs are becoming a standard representation for robot navigation, providing hierarchical geometric and semantic scene understanding. However, most scene graph mapping methods rely on depth cameras or LiDAR sensors. In this work, we present LEXI-SG, the first dense monocular visual mapping system for open-vocabulary 3D scene graphs using only RGB camera input. Our approach exploits the semantic priors of open-vocabulary foundation models to partition the scene into rooms, deferring feed-forward reconstruction to when each room is fully observed -- enabling scalable dense mapping without sliding-window scale inconsistencies. We propose a room-based factor graph formulation to globally align room reconstructions while preserving local map consistency and naturally imposing the semantic scene graph hierarchy. Within each room, we further support open-vocabulary object segmentation and tracking. We validate LEXI-SG on indoor scenes from the Habitat-Matterport 3D and self-collected egocentric office sequences. We evaluate its performance against existing feed-forward SLAM methods, as well as established scene graphs baselines. We demonstrate improved trajectory estimation and dense reconstruction, as well as, competitive performance in open-vocabulary segmentation. LEXI-SG shows that accurate, scalable, open-vocabulary 3D scene graphs can be achieved from monocular RGB alone. Our project page and office sequences are available here: https://ori-drs.github.io/lexisg-web/.
Bounded-Input True Proportional Navigation for Impact-Time Control
This paper proposes a nonlinear guidance strategy capable of intercepting a constant-velocity, non-maneuvering target while strictly satisfying the prescribed bounds on the control input (commanded acceleration). Unlike conventional strategies that estimate time-to-go using linearization or small-angle approximations, the proposed strategy employs true proportional-navigation guidance (TPNG) as a baseline, which utilizes an exact time-to-go formulation and is applicable over a wide range of target motions. In contrast to most existing strategies, which do not incorporate control input bounds into the guidance design, the proposed approach explicitly accounts for these limits by modeling the interceptor acceleration as a dynamic variable. Based on the sliding mode control technique, an effective guidance law that achieves time-constrained interception while accounting for bounded input is then derived. The performance of the proposed strategy is evaluated for various engagement scenarios.
comment: Preprint; Accepted for presentation at the 15th Asian Control Conference, June 17th-21st, 2026, Indonesia
Robot Squid Game: Quadrupedal Locomotion for Traversing Narrow Tunnels
Quadruped robots demonstrate exceptional potential for navigating complex terrain in critical applications such as search and rescue missions and infrastructure inspection However autonomous traversal of confined 3D environments including tunnels caves and collapsed structures remains a significant challenge Existing methods often struggle with rigid gait patterns limited adaptability to diverse geometries and reliance on oversimplified environmental assumptions This paper introduces a Reinforcement Learning RL framework that combines procedural environment generation with policy distillation to enable robust locomotion across various tunnel configurations Our approach leverages a teacher student training paradigm where specialized expert policies trained on procedurally generated tunnel geometries transfer their knowledge to a unified student policy This strategy eliminates the need for complex reward shaping in end-to-end RL training simplifying the process by breaking down complicated tasks into smaller more manageable components that are easier for the robot to learn By synthesizing diverse tunnel structures during training and distilling navigation strategies into a generalizable policy our method achieves consistent traversal across complex spatial constraints where conventional approaches fail We demonstrate through both simulation and real world experiments that our method enables quadruped robots to successfully traverse challenging confined tunnel environments
Causality-Aware End-to-End Autonomous Driving via Ego-Centric Joint Scene Modeling
End-to-end autonomous driving, which bypasses traditional modular pipelines by directly predicting future trajectories from sensor inputs, has recently achieved substantial progress. However, existing methods often overlook the causal inter-dependencies in ego-vehicle planning, ignoring the reciprocal relations between the ego vehicle and surrounding agents. This causal oversight leads to inconsistent and unreliable trajectory predictions, especially in interaction-critical scenarios where ego decisions and neighboring agent behaviors must be reasoned about jointly. To address this limitation, we propose CaAD, a Causality-aware end-to-end Autonomous Driving framework that captures these dependencies within a shared latent scene representation. First, we propose a ego-centric joint-causal modeling module that builds on the marginal prediction branch, and learns causal dependencies between the ego vehicle and interaction-relevant agents. Second, we employ a causality-aware policy alignment stage implemented with joint-mode embeddings to align the stochastic ego policy with planning-oriented closed-loop feedback computed from surrounding traffic and map context. On the Bench2Drive and NAVSIM benchmarks, CaAD demonstrates strong closed-loop planning performance, achieving a Driving Score of 87.53 and Success Rate of 71.81 on Bench2Drive, and a PDMS of 91.1 on NAVSIM.
Guide, Think, Act: Interactive Embodied Reasoning in Vision-Language-Action Models
In this paper, we propose GTA-VLA(Guide, Think, Act), an interactive Vision-Language-Action (VLA) framework that enables spatially steerable embodied reasoning by allowing users to guide robot policies with explicit visual cues. Existing VLA models learn a direct "Sense-to-Act" mapping from multimodal observations to robot actions. While effective within the training distribution, such tightly coupled policies are brittle under out-of-domain (OOD) shifts and difficult to correct when failures occur. Although recent embodied Chain-of-Thought (CoT) approaches expose intermediate reasoning, they still lack a mechanism for incorporating human spatial guidance, limiting their ability to resolve visual ambiguities or recover from mistakes. To address this gap, our framework allows users to optionally guide the policy with spatial priors, such as affordance points, boxes, and traces, which the subsequent reasoning process can directly condition on. Based on these inputs, the model generates a unified spatial-visual Chain-of-Thought that integrates external guidance with internal task planning, aligning human visual intent with autonomous decision-making. For practical deployment, we further couple the reasoning module with a lightweight reactive action head for efficient action execution. Extensive experiments demonstrate the effectiveness of our approach. On the in-domain SimplerEnv WidowX benchmark, our framework achieves a state-of-the-art 81.2% success rate. Under OOD visual shifts and spatial ambiguities, a single visual interaction substantially improves task success over existing methods, highlighting the value of interactive reasoning for failure recovery in embodied control. Details of the project can be found here: https://signalispupupu.github.io/GTA-VLA_ProjPage/
Design of Magnetic Continuum Robots with Tunable Force Response Using Rotational Ring Pairs
In this paper, we discuss a novel continuum robot design that enables the online tuning of the magnetic response at its tip. The proposed method allows for the change of both effective magnetic direction and intensity, introducing steering DOF without the need to control the external fields. This is unattainable with classical designs, which rely on fixed internal magnetic content and steer solely under the effect of a controllable magnetic field. The proposed robot design can be used in both controllable and fixed magnetic fields, potentially widening the clinical applicability of these robots. We experimentally show a max tip deflection of 33.8 mm from the resting state (23 % of the length of the robot). We discuss a model based on modified beam theory that captures the mechanical behavior of the continuum robot, with a mean absolute tip tracking error of 1.86 mm (1.2 % of the length) and maximum errors of less than 4.8 mm (3.2 % of the length) for all experimental points.
comment: 7 pages, 6 figures, Accepted to ISMR 2026
AttenA+: Rectifying Action Inequality in Robotic Foundation Models
Existing robotic foundation models, while powerful, are predicated on an implicit assumption of temporal homogeneity: treating all actions as equally informative during optimization. This "flat" training paradigm, inherited from language modeling, remains indifferent to the underlying physical hierarchy of manipulation. In reality, robot trajectories are fundamentally heterogeneous, where low-velocity segments often dictate task success through precision-demanding interactions, while high-velocity motions serve as error-tolerant transitions. Such a misalignment between uniform loss weighting and physical criticality fundamentally limits the performance of current Vision-Language-Action (VLA) models and World-Action Models (WAM) in complex, long-horizon tasks. To rectify this, we introduce AttenA+, an architecture-agnostic framework that prioritizes kinematically critical segments via velocity-driven action attention. By reweighting the training objective based on the inverse velocity field, AttenA+ naturally aligns the model's learning capacity with the physical demands of manipulation. As a plug-and-play enhancement, AttenA+ can be integrated into existing backbones without structural modifications or additional parameters. Extensive experiments demonstrate that AttenA+ significantly elevates the ceilings of current state-of-the-art models. Specifically, it improves OpenVLA-OFT to 98.6% (+1.5%) on the Libero benchmark and pushes FastWAM to 92.4% (+0.6%) on RoboTwin 2.0. Real-world validation on a Franka manipulator further showcases its robustness and cross-task generalization. Our work suggests that mining the intrinsic structural priors of action sequences offers a highly efficient, physics-aware complement to standard scaling laws, paving a new path for general-purpose robotic control.
Integration of an Agent Model into an Open Simulation Architecture for Scenario-Based Testing of Automated Vehicles
Simulative and scenario-based testing are crucial methods in the safety assurance for automated driving systems. To ensure that simulation results are reliable, the real world must be modeled with sufficient fidelity, including not only the static environment but also the surrounding traffic of a vehicle under test. Thus, the availability of traffic agent models is of common interest to model naturalistic and parameterizable behavior, similar to human drivers. The interchangeability of agent models across different simulation environments represents a major challenge and necessitates harmonization and standardization. To address this challenge, we present a standardized and modular simulation integration architecture that enables the tool-independent integration of traffic agent models. The architecture builds upon the Open Simulation Interface (OSI) as a structured message format and the Functional Mock-up Interface (FMI) for dynamic model exchange. Rather than introducing yet another model or simulation tool, we provide a reusable reference implementation that translates these standards into a practical integration blueprint, including clear interfaces, data mappings, and execution semantics. The generic nature of the architecture is demonstrated by integrating an exemplary agent model into three widely used simulation environments: OpenPASS, CARLA, and CarMaker. As part of the evaluation, we show that the model yields consistent behavior in all simulation platforms, thereby validating the interoperability, modularity, and standard compliance of the proposed architecture. The reference implementation lowers integration barriers, serves as a foundation for future research, and is made publicly available at github.com/ika-rwth-aachen/agent-model-integration
Beyond VMAF: Towards Application-Specific Metrics for Teleoperation Video SC 2026
Automated driving has made remarkable progress, yet situations still arise where human intervention is necessary. Teleoperation provides a scalable solution to address such cases, enabling remote operators to support vehicles without being physically present. In this context, video transmission forms the operator's primary source of situational awareness, making video quality a decisive factor for both safety and task performance. In an online study, participants rated compressed video sequences from the Zenseact Dataset and provided subjective quality ratings. These ratings were then used to retrain the Video Multi-Method Assessment Fusion (VMAF) model, yielding an adapted variant tailored to teleoperation. The retrained model demonstrated improved alignment with human ratings compared to the original 4K VMAF. In particular, RMSE decreased from 10.36 to 8.83, and MAD from 8.71 to 6.38, corresponding to improvements of 15% and 27%, respectively. These results highlight that incorporating domain-specific data can enhance the predictive power of established quality metrics in safety-critical applications. At the same time, Outlier cases emerged in which videos received high objective scores despite noticeable degradations in regions critical for the driving task.
comment: Preprint ITSC 2026
Uncertainty-Aware 3D Position Refinement for Multi-UAV Systems
Reliable real-time 3D localization is essential for multi-UAV navigation, collision avoidance, and coordinated flight, yet onboard estimates can degrade under GNSS multipath, non-line-of-sight reception, vertical drift, and intentional interference. This paper presents a decentralized, lightweight 3D position-refinement layer that improves robustness by fusing each Unmanned Aerial Vehicle (UAV)'s local estimate with neighbor-shared state summaries and inter-UAV range or proximity constraints. The method performs uncertainty-aware neighborhood fusion by weighting each UAV's prior according to its reported covariance and weighting neighbor constraints according to link quality, ranging uncertainty, and a learned trust score. To support practical deployment, the framework explicitly handles cold start and temporary localization loss by inflating or substituting weak priors, allowing trusted neighborhood constraints to bootstrap and stabilize estimates until absolute sensing recovers. To mitigate the impact of faulty or malicious participants, each UAV applies a local range-consistency check, smoothed over time, to down-weight or exclude neighbors whose reported positions are incompatible with observed inter-UAV distances. Simulation experiments with 10 UAVs in a 3D volume show that the proposed refinement substantially reduces mean localization error during cold start, remains competitive after local estimators stabilize, and maintains lower error as the fraction of malicious nodes increases compared with fusion without trust. These results suggest that the approach can serve as a practical resilience layer for swarm operation in challenging environments.
CUBic: Coordinated Unified Bimanual Perception and Control Framework
Recent advances in visuomotor policy learning have enabled robots to perform control directly from visual inputs. Yet, extending such end-to-end learning from single-arm to bimanual manipulation remains challenging due to the need for both independent perception and coordinated interaction between arms. Existing methods typically favor one side -- either decoupling the two arms to avoid interference or enforcing strong cross-arm coupling for coordination -- thus lacking a unified treatment. We propose CUBic, a Coordinated and Unified framework for Bimanual perception and control that reformulates bimanual coordination as a unified perceptual modeling problem. CUBic learns a shared tokenized representation bridging perception and control, where independence and coordination emerge intrinsically from structure rather than from hand-crafted coupling. Our approach integrates three components: unidirectional perception aggregation, bidirectional perception coordination through two codebooks with shared mapping, and a unified perception-to-control diffusion policy. Extensive experiments on the RoboTwin benchmark show that CUBic consistently surpasses standard baselines, achieving marked improvements in coordination accuracy and task success rates over state-of-the-art visuomotor baselines.
Asymptotically Optimal Ergodic Coverage on Generalized Motion Fields
Autonomous robotic exploration in remote and extreme environments allows scientists to model complex transport phenomena and collective behaviors described by continuously deforming flow fields. Although these environments are naturally modeled as time-varying domains, most adaptive exploration methods assume static environments and fail to provide adequate coverage or satisfy any formal guarantees. This is especially the case in oceanography where autonomous underwater systems (UxS) have highly restrictive compute and payload requirements that necessitate path planning methods that yield robust data collection strategies in open-loop and underactuated settings. In this work, to address the aforementioned issues, we propose to formulate adaptive search as an ergodic coverage problem and investigate certifying coverage in the ergodic sense over evolving domains with flow-induced dynamics. We expand upon recent work demonstrating maximum mean discrepancy (MMD) as a functional ergodic metric, and derive a flow-adaptive formulation that explicitly accounts for domain evolution within the coverage objective. We show that this approach preserves ergodic coverage guarantees in ambient flows and enables effective exploration in under-actuated, and even open-loop planning settings by integrating environment dynamics. Experiments validate that our method generalizes to diverse spatiotemporal processes including ocean exploration, and tracking human and cattle movement. Physical experiments on aerial and legged robotic platforms validate our ability to obtain ergodic coverage in non-convex, flow-restricted environments while respecting robot dynamics.
comment: 13 pages, 9 figures, 6 tables, Robotics: Science and Systems 2026
SID: Sliding into Distribution for Robust Few-Demonstration Manipulation
Generalizing robotic manipulation across object poses, viewpoints, and dynamic disturbances is difficult, especially with only a few demonstrations. End-to-end visuomotor policies are expressive but data-hungry, while planning and optimization satisfy explicit constraints but do not directly capture the interaction strategies demonstrated by humans. We propose Sliding into Distribution (SID), a structured framework that learns an object-centric motion field from canonicalized demonstrations to iteratively slide the system toward the demonstrated manifold and into the reliable operating region of a lightweight egocentric execution policy, mitigating out-of-distribution (OOD) execution. The motion field provides large corrective motions when far from the demonstration manifold and naturally vanishes near convergence, enabling robust reaching under substantial pose and viewpoint shifts. Within the reached regime, an egocentric policy trained with conditioned flow matching performs task-specific manipulation, supported by kinematically consistent point-cloud reprojection augmentation that preserves action-observation consistency. Across six real-world tasks, SID achieves approximately 90% success under OOD initializations with only two demonstrations, with under a 10% drop under distractors and external disturbances. Overall, SID provides a new paradigm for few-shot manipulation: explicitly managing distribution shift via online distribution recovery.
comment: 20 pages, 14 figures. Project website: https://sliding-into-distribution.github.io/
RotVLA: Rotational Latent Action for Vision-Language-Action Model
Latent Action Models (LAMs) have emerged as an effective paradigm for handling heterogeneous datasets during Vision-Language-Action (VLA) model pretraining, offering a unified action space across embodiments. However, existing LAMs often rely on discrete quantization encode and decode pipelines, which can lead to trivial frame reconstruction behavior, limited representational capacity, and a lack of physically meaningful structure. We introduce RotVLA, a VLA framework built on a continuous rotational latent action representation. Latent actions are modeled as elements of SO(n), providing continuity, compositionality, and structured geometry aligned with real-world action dynamics. A triplet frame learning framework further enforces meaningful temporal dynamics while avoiding degeneration. RotVLA consists of a VLM backbone and a flow-matching action head, pretrained on large-scale cross-embodiment robotic datasets and human videos with latent-action supervision. For downstream robot control, the flow-matching head is extended into a unified action expert that jointly denoises latent and robot actions. Here, latent actions serve as a latent planner, providing high-level guidance that conditions action generation. With only 1.7B parameters and 1700+ hours of pretraining data, RotVLA achieves 98.2% on LIBERO and 89.6% / 88.5% on RoboTwin2.0 under clean and randomized settings, respectively. It also demonstrates strong real-world performance on manipulation tasks, consistently outperforming existing VLA models.
Trajectory-Level Data Augmentation for Offline Reinforcement Learning ICML 2026
We propose a data augmentation method for offline reinforcement learning, motivated by active positioning problems. Particularly, our approach enables the training of off-policy models from a limited number of suboptimal trajectories. We introduce a trajectory-based augmentation technique that exploits task structure and the geometric relationship between rewards, value functions, and mathematical properties of logging policies. During data collection, our augmentation supports suboptimal logging policies, leading to higher data quality and improved offline reinforcement learning performance. We provide theoretical justification for these strategies and validate them empirically across positioning tasks of varying dimensionality and under partial observability.
comment: 26 pages, 25 figures, Accepted at ICML 2026
BlockVLA: Accelerating Autoregressive VLA via Block Diffusion Finetuning
While autoregressive (AR) Vision-Language-Action (VLA) models have demonstrated formidable reasoning capabilities in robotic tasks, their sequential decoding process often incurs high inference latency and may amplify error accumulation during long-horizon execution. Discrete Diffusion Language Models (dLLMs) provide a promising alternative through parallel token refinement, but their practical deployment in robotics remains limited by repeated denoising function evaluations (NFEs) and the difficulty of directly applying standard KV caching to bidirectional iterative decoding. To bridge these paradigms, we propose BlockVLA, a framework that adapts pretrained AR backbones into an efficient discrete diffusion policy through a block diffusion paradigm. BlockVLA maintains autoregressive dependencies at the block level while enabling parallel denoising within each block, thereby combining global causal coherence with local parallel generation. This design enables prefix KV-cache reuse across completed blocks, reduces the effective cost of iterative denoising, and provides a smoother transition from AR pretraining to diffusion-based policy fine-tuning. We conduct extensive evaluations on the LIBERO and SimplerEnv benchmarks. Experimental results demonstrate that our BlockVLA achieves a 3.3$\times$ inference acceleration over standard discrete diffusion baselines. Furthermore, our model exhibits superior training efficiency, with success rates converging substantially faster than baselines, a gain that is particularly pronounced in complex, long-horizon tasks, where BlockVLA achieves significant performance gains in the early stages of training. This work establishes Block Diffusion as a robust bridge between large-scale pretrained AR models and efficient, high-frequency real-time robotic control.
Exploring Human-Robot Collaboration: Analysis of Interaction Modalities in Challenging Tasks
This work compares three interaction modalities for human-robot collaboration: passive, reactive, and proactive. We studied 18 participants assembling a seven-layer colored tower from memory while using nearby and distant blocks. In the passive modality participants worked alone; in the reactive modality a mobile robot helped only upon request; in the proactive modality it initiated brick delivery and error signaling without explicit requests. Although robot assistance increased completion time, most participants preferred collaboration: 67% preferred proactive behavior and 78% judged it most useful. These results suggest that timely proactive support can improve user experience in controlled collaborative tasks.
What Limits Vision-and-Language Navigation ?
Vision-and-Language Navigation (VLN) is a cornerstone of embodied intelligence. However, current agents often suffer from significant performance degradation when transitioning from simulation to real-world deployment, primarily due to perceptual instability (e.g., lighting variations and motion blur) and under-specified instructions. While existing methods attempt to bridge this gap by scaling up model size and training data, we argue that the bottleneck lies in the lack of robust spatial grounding and cross-domain priors. In this paper, we propose StereoNav, a robust Vision-Language-Action framework designed to enhance real-world navigation consistency. To address the inherent gap between synthetic training and physical execution, we introduce Target-Location Priors as a persistent bridge. These priors provide stable visual guidance that remains invariant across domains, effectively grounding the agent even when instructions are vague. Furthermore, to mitigate visual disturbances like motion blur and illumination shifts, StereoNav leverages stereo vision to construct a unified representation of semantics and geometry, enabling precise action prediction through enhanced depth awareness. Extensive experiments on R2R-CE and RxR-CE demonstrate that StereoNav achieves state-of-the-art egocentric RGB performance, with SR and SPL scores of 81.1% and 68.3%, and 67.5% and 52.0%, respectively, while using significantly fewer parameters and less training data than prior scaling-based approaches. More importantly, real-world robotic deployments confirm that StereoNav substantially improves navigation reliability in complex, unstructured environments. Project page: https://yunheng-wang.github.io/stereonav-public.github.io.
HCSG: Human-Centric Semantic-Geometric Reasoning for Vision-Language Navigation
VLN has achieved remarkable progress by scaling data and model capacity. However, the assumption of a static environment breaks down in real-world indoor scenarios, where robots inevitably encounter dynamic pedestrians. Existing human-aware approaches typically treat humans merely as moving obstacles based on implicit visual cues, lacking the explicit reasoning required to interpret human intentions or maintain social norms. To address this, we propose HCSG, the first human-centric framework for VLN. This framework provides a robust foundation for safe, socially intelligent navigation in dynamic human-robot environments that shifts the paradigm from passive collision avoidance to active human behavior understanding. Specifically, HCSG introduces a unified Human Understanding Module that synergizes two key capabilities: (i) geometric forecasting, which predicts human pose and trajectory to anticipate future motion dynamics; and (ii) semantic interpretation, which leverages a Vision-Language Model (VLM) to generate natural language descriptions of human actions and intentions. These semantic-geometric representations are fused into the agent's topological map for instruction-conditioned planning. Furthermore, a social distance loss is introduced to enforce socially compliant interaction distances. Extensive experiments on the HA-VLNCE benchmark demonstrate that HCSG significantly outperforms state-of-the-art methods, achieving a 14% improvement in Success Rate and a 34% reduction in Collision Rate. Our project can be seen at https://haoxuanxu1024.github.io/HCSG/.
D-VLA: A High-Concurrency Distributed Asynchronous Reinforcement Learning Framework for Vision-Language-Action Models
The rapid evolution of Embodied AI has enabled Vision-Language-Action (VLA) models to excel in multimodal perception and task execution. However, applying Reinforcement Learning (RL) to these massive models in large-scale distributed environments faces severe systemic bottlenecks, primarily due to the resource conflict between high-fidelity physical simulation and the intensive VRAM/bandwidth demands of deep learning. This conflict often leaves overall throughput constrained by execution-phase inefficiencies. To address these challenges, we propose D-VLA, a high-concurrency, low-latency distributed RL framework for large-scale embodied foundation models. D-VLA introduces "Plane Decoupling," physically isolating high-frequency training data from low-frequency weight control to eliminate interference between simulation and optimization. We further design a four-thread asynchronous "Swimlane" pipeline, enabling full parallel overlap of sampling, inference, gradient computation, and parameter distribution. Additionally, a dual-pool VRAM management model and topology-aware replication resolve memory fragmentation and optimize communication efficiency. Experiments on benchmarks like LIBERO show that D-VLA significantly outperforms mainstream RL frameworks in throughput and sampling efficiency for billion-parameter VLA models. In trillion-parameter scalability tests, our framework maintains exceptional stability and linear speedup, providing a robust system for high-performance general-purpose embodied agents.
Galilean State Estimation for Inertial Navigation Systems with Unknown Time Delay
Many Inertial Navigation Systems (INS) use Global Navigation Satellite System (GNSS) position as the primary measurement to drive filter performance and bound error growth. However, commercial-grade GNSS receivers introduce unknown measurement delays ranging from 50 ms to 300 ms depending on sensor quality and operating mode. Such time delays can significantly degrade INS performance unless they are explicitly compensated for. Existing algorithms commonly estimate this delay offline, run the filter concurrently with GNSS measurements using buffered Inertial Measurement Unit (IMU) data, and predict the current state by forward-integrating buffered inertial measurements via IMU preintegration. The state-of-the-art online method is an Extended Kalman Filter (EKF) that explicitly models the time delay as a state parameter, which defines the preintegration duration. This paper introduces a novel geometric framework for modeling time-delayed INS, in which Galilean symmetry is leveraged to provide a joint representation of space and time for consistent state estimation. An Equivariant Filter (EqF) is derived for the coupled estimation of navigation states and time delay. Validation is performed on two fixed-wing Uncrewed Aerial Vehicles (UAV) with GNSS time lags of 90 ms and 120 ms. The test flights last two to three minutes. Simulations further investigate delays up to 500 ms and provide a statistical comparison against the state-of-the-art EKF. Results show that the EqF preserves accuracy and consistency, while the EKF lacks consistency and its performance degrades significantly with increasing measurement delays.
Calibration-Free Gas Source Localization with Mobile Robots: Source Term Estimation Based on Concentration Measurement Ranking ICRA
Efficient Gas Source Localization (GSL) in real-world settings is crucial, especially in emergency scenarios. Mobile robots equipped with low-cost, in-situ gas sensors offer a safer alternative to human inspection in hazardous environments. Probabilistic algorithms enhance GSL efficiency with scattered gas measurements by comparing gas concentration measurements gathered by robots to physical dispersion models. However, accurately deriving gas concentrations from data acquired with low-cost sensors is challenging due to the nonlinear sensor response, environmental dependencies (e.g., humidity, temperature, and other gas influences), and robot motion. Mitigating these disturbance factors requires frequent sensor calibration in controlled environments, which is often impractical for real-world deployments. To overcome these issues, we propose a novel feature extraction algorithm that leverages the relative ranking of gas measurements within the dynamically accumulated dataset. By comparing the rank differences between gathered and modeled values, we estimate the probabilistic distribution of source locations across the entire environment. We validate our approach in high-fidelity simulations and physical experiments, demonstrating consistent localization accuracy with uncalibrated gas sensors. Compared to existing methods, our technique eliminates the need for gas sensor calibration, making it well-suited for real-world applications.
comment: This paper has been accepted for publication in the IEEE International Conference on Robotics and Automation (ICRA), 2026
Dynamics Computation of Soft-Rigid Hybrid-Link System and Its Application to Motion Analysis of an Athlete Wearing Sport Prosthesis
This paper presents a motion analysis framework for an athlete wearing sport-specific flexible prosthesis based on the soft-rigid hybrid-link system. Such a motion analysis is a challenging problem because we need to consider the interaction force between the rigid human skeleton system and a flexible prosthesis. However, most of human musculoskeletal models are based on the computation framework of a rigid-body multi-link system. Recently in soft robotics research field, fast and efficient modeling methods were developed for a flexible rod deformation, which allows us to build a hybrid-link system that integrates rigid-link and soft-bodies in a unified formulation. We apply inverse kinematics of the hybrid-link system to motion reconstruction from a motion captured data, and also present the estimation of the joint torques and ground reaction force by inverse dynamics. Through a human subject experiment, we show that the inverse dynamics achieved approximately 12% error on the ground reaction force estimation. Furthermore, we provide the muscle force estimation considering muscle amputation and interaction force with the prosthesis leg deformation.
EvObj: Learning Evolving Object-centric Representations for 3D Instance Segmentation without Scene Supervision CVPR 2026
We introduce EvObj for unsupervised 3D instance segmentation that bridges the geometric domain gap between synthetic pretraining data and real-world point clouds. Current methods suffer from structural discrepancies when transferring object priors from synthetic datasets (e.g., ShapeNet) to real scans (e.g., ScanNet), particularly due to morphological variations and occlusion artifacts. To address this, EvObj integrates two innovative modules: (1) An object discerning module that dynamically refines object candidates, enabling continuous adaptation of object priors to target domains; and (2) An object completion module that reconstructs partial geometries after discovering objects. We conduct extensive experiments on both real-world and synthetic datasets, demonstrating superior 3D object segmentation performance over all baselines while achieving state-of-the-art results.
comment: CVPR 2026. Code and data are available at: https://github.com/vLAR-group/EvObj
ERPPO: Entropy Regularization-based Proximal Policy Optimization
Multi-Agent Proximal Policy Optimization (MAPPO) is a variant of the Proximal Policy Optimization (PPO) algorithm, specifically tailored for multi-agent reinforcement learning (MARL). MAPPO optimizes cooperative multi-agent settings by employing a centralized critic with decentralized actors. However, in case of multi-dimensional environment, MAPPO can not extract optimal policy due to non-stationary agent observation. To overcome this problem, we introduce a novel approach, Entropy Regularization-based Proximal Policy Optimization (ERPPO). For the policy optimization, we first define the object detection ambiguity under multi-dimensional observation environment. Distributional Spatiotemporal Ambiguity (DSA) learner is trained to estimate object detection uncertainty in non-stationary constraints. Then, we enhance PPO with a novel Entropy Regularization term. This regularization dynamically adjusts the policy update by applying a stronger (L1) regularization in high-ambiguity observation to encourage significant exploratory actions and a weaker (L2) regularization in low-ambiguity observation to stabilize the proximal policy optimization. This approach is designed to enhance the probability of successful object localization in time-critical operations by reducing detection failures and optimizing search policy. Experiments on a testbed with AirSim-based maritime searching scenarios show that the proposed ERPPO improves accuracy performance. Our proposed method improves higher gradient than MAPPO. Qualitative results confirm that ERPPO effectiveness in terms of suppressing false detection in visually uncertain conditions.
comment: 9 pages, 5 figures
MoCCA: A Movable Circle Probability of Collision Approximation SC 2026
In automated driving, crash mitigation is crucial to ensure passenger safety. Accurate avoidance requires precise knowledge of the object's position and orientation. However, sensor noise and occlusions often result in tracking and prediction uncertainties. To account for these uncertainties, estimating the Probability of Collision (POC) is a critical requirement. While Monte Carlo sampling is a common estimation technique, its high computational demand and stochastic nature often render it unsuitable for real-time applications. Analytical POC calculations are simplified by approximating vehicle geometries using circular bounds. While multi-circle approximations offer higher fidelity than a single circumscribed circle, they significantly increase computational complexity. This paper proposes a shape approximation algorithm, MoCCA, which utilizes a single circle for each vehicle, optimized to minimize the relative distance between them. MoCCA maintains a computational efficiency comparable to standard single-circle techniques while reducing over-conservatism. To address the potential underestimation of POC inherent in partial coverage, we establish an upper bound for the approximation error, demonstrating that it depends primarily on inter-vehicle distance and orientation variance. Furthermore, we introduce a safety distance margin that can be calibrated solely based on orientation variance.
comment: Accepted at ITSC 2026
Multi-Depth Uniform Coverage Path Planning for Unmanned Surface Vehicle Surveying ICRA 2026
This paper introduces a novel automatic coverage path planning algorithm for bathymetry surveying with unmanned surface vehicles. The detection range of the mapping sensor employed - a multibeam echo sounder - is heavily influenced by local seafloor depths. Hence, a path designed to uniformly cover the sea surface does not guarantee uniform coverage of the seafloor. Yet this is currently the typical process for bathymetric surveys, with the simplistic boustrophedon scheme along manually selected waypoints at constant depths being the most widespread planner used. The proposed scheme incorporates coarse prior depth information to pre-process the target region and adaptively guide path generation and sensing range configuration. By explicitly accounting for depth variations, the proposed algorithm designs a coverage path with optimised spacing between survey passes that adjusts the sensing beam aperture to achieve more consistent seafloor coverage. The proposed method is shown to offer significant improvements in both synthetic and real-world scenarios. Validations in challenging synthetic terrains achieves coverage ratios beyond 99%, a marked improvement when compared with traditional boustrophedon paths revealing a maximum 75% coverage. The same trend appears in realistic simulations using real bathymetric data from a coastal harbour, with coverage reaching over 92%, and significantly surpassing boustrophedon sweeps with coverage rates below 65%. Beyond improved performance, the scheme also brings a fully automated design, suitable for autonomous marine vehicles, thus offering practical utilities for real-world applications.
comment: Accepted by ICRA 2026
Towards Long-horizon Embodied Agents with Tool-Aligned Vision-Language-Action Models
Vision-language-action (VLA) models are effective robot action executors, but they remain limited on long-horizon tasks due to the dual burden of extended closed-loop planning and diverse physical operations. We therefore propose VLAs-as-Tools, a strategy that distributes this burden across a high-level vision language model (VLM) agent for temporal reasoning and a family of specialized VLA tools for diverse local physical operations. The VLM handles scene analysis, global planning, and recovery, while each VLA tool executes a bounded subtask. To tightly couple agent planning with VLA tool execution in long-horizon tasks, we introduce a VLA tool-family interface that exposes explicit tool selection and in-execution progress feedback, enabling efficient event-triggered agent replanning without continuous agent polling. To obtain diverse specialized VLA tools that faithfully follow agent invocations, we further propose Tool-Aligned Post-Training (TAPT), which constructs invocation-aligned training units for instruction following and adopts tool-family residual adapters for efficient tool specialization. Experiments show that VLAs-as-Tools improves the success rate of $π_{0.5}$ by 4.8 points on LIBERO-Long and 23.1 points on RoboTwin, and further enhances invocation fidelity by 15.0 points as measured by Non-biased Rate. Code will be released.
SECOND-Grasp: Semantic Contact-guided Dexterous Grasping
Achieving reliable robotic manipulation, such as dexterous grasping, requires a synergy between physically stable interactions and semantic task guidance, yet these objectives are often treated as separate, disjoint goals. In this paper, we investigate how to integrate dexterous grasping techniques, i.e., physically stable grasps for object lifting and language-guided grasp generation, to achieve both physical stability and semantic understanding. To this end, we propose SECOND-Grasp (SEmantic CONtact-guided Dexterous Grasping), a unified framework that enables robotic hands to dynamically adjust grasping strategies based on semantic reasoning while ensuring physical feasibility. We begin by obtaining coarse contact proposals through vision-language reasoning to infer where contacts should occur based on object properties, followed by segmentation to localize these regions across views. To further ensure consistency across multiple viewpoints, we introduce Semantic-Geometric Consistency Refinement (SGCR), which refines initial contact predictions by enforcing semantic consistency across views and removing geometrically invalid regions, yielding reliable 3D contact maps. Then, we derive a feasible hand pose for each contact map via inverse kinematics, generating a supervision signal for policy learning. Our approach, trained on DexGraspNet, consistently outperforms baselines in lifting success rate on both seen and unseen categories, achieving 98.2% and 97.7%, respectively, while also improving intent-aware grasping by 12.8% and 26.2%. We further show promising results on additional datasets and robotic hands, including Shadow Hand and Allegro Hand.
What to Ignore, What to React: Visually Robust RL Fine-Tuning of VLA Models
Reinforcement learning (RL) fine-tuning has shown promise for Vision-Language-Action (VLA) models in robotic manipulation, but deployment-time visual shifts pose practical challenges. A key difficulty is that standard task rewards supervise task success, but offer limited guidance on whether a visual change is task-irrelevant or changes the behavior required for manipulation. We propose PAIR-VLA (Paired Action Invariance & Sensitivity for Visually Robust VLA), an RL fine-tuning framework to address this difficulty by adding two auxiliary objectives over paired visual variants during PPO optimization: an invariance term that reduces the discrepancy between action distributions for a task-preserving pair (e.g., different distractors), and a sensitivity objective that encourages separable action distributions for a task-altering pair (e.g., target object in a different pose). Together, these objectives turn visual variants from mere observation diversity into behavior-level guidance on policy responses during RL fine-tuning. We evaluate on ManiSkill3 across two representative VLA architectures, OpenVLA and $π_{0.5}$, under diverse out-of-distribution visual shifts including unseen distractors, texture changes, target object pose variation, viewpoint shifts, and lighting changes. Our method consistently improves over standard PPO, achieving average improvements of 16.62% on $π_{0.5}$ and 9.10% on OpenVLA. Notably, ablations further show generalization across visual shifts: invariance guidance learned from distractor and texture variants transfers to target-pose and lighting shifts, while adding sensitivity guidance on target-pose variants further improves robustness to nuisance shifts, highlighting the broader transferability of behavior-level RL guidance.
Identification of Non-Transversal Bifurcations of Linkages
The local analysis is an established approach to the study of singularities and mobility of linkages. Key result of such analyses is a local picture of the finite motion through a configuration. This reveals the finite mobility at that point and the tangents to smooth motion curves. It does, however, not immediately allow to distinguish between motion branches that do not intersect transversally (which is a rather uncommon situation that has only recently been discussed in the literature). The mathematical framework for such a local analysis is the kinematic tangent cone. It is shown in this paper that the constructive definition of the kinematic tangent cone already involves all information necessary to separate different motion branches. A computational method is derived by amending the algorithmic framework reported in previous publications.
comment: Paper No: DETC2020-22301, V010T10A090; 8 pages
Object Manipulation of the Variable Topology Truss system
This paper presents an object manipulation strategy for the Variable Topology Truss (VTT) system, a truss robot that comprises actuated truss members connected by passive spherical joints. Although truss robots were originally proposed as rapidly deployable manipulators, manipulation strategy has not been studied thoroughly. To enable manipulation, we introduce a hybrid control framework that regulates position and force concurrently without explicit decoupling. At the actuator level, each member employs a sensor-based force feedback controller to generate the desired axial forces despite high actuator friction. At the task level, the forces applied at the end-effector nodes are produced by computing the required member forces using a static model of the VTT. We evaluate force-tracking performance through experiments on both a single member module and the full VTT system. Finally, we demonstrate object manipulation using two representative configurations and quantitatively assess combined position and force tracking performance. Experimental results confirm that the proposed approach enables consistent and reliable object manipulation with the VTT system.
comment: 15 pages, 14 figures
TouchAnything: A Dataset and Framework for Bimanual Tactile Estimation from Egocentric Video
Egocentric human video data, which captures rich human-environment interactions and can be collected at scale, has become a key driver of embodied intelligence research. However, existing egocentric datasets typically lack tactile sensing, a critical modality that provides direct cues about contact, force, and pressure in human-object interaction. Without such signals, models struggle to learn physically grounded representations of real-world interaction dynamics. While tactile sensors provide these cues, deploying high-quality tactile hardware at scale remains expensive and cumbersome. This raises a central question: can tactile feedback be inferred directly from visual observations, enabling scalable tactile supervision for egocentric video data and supporting physically grounded embodied learning? To enable research in this direction, we introduce EgoTouch, a large-scale multi-view egocentric dataset with dense tactile supervision for bimanual hand-object interaction. EgoTouch comprises 208 manipulation tasks spanning 1,891 episodes in diverse indoor and outdoor environments, with synchronized multi-view RGB (head-mounted egocentric and dual wrist-mounted cameras), bimanual 3D hand pose, and continuous pressure maps from wearable tactile sensors. Building on EgoTouch, we introduce TouchAnything, a baseline multi-view vision-to-touch prediction framework that uses the egocentric view as the primary input and flexibly leverages available wrist-mounted views at inference time. Experiments show that incorporating wrist-mounted views generally improves tactile prediction over egocentric-only input, achieving up to 5.0% relative improvement in Contact IoU and 6.1% relative improvement in Volumetric IoU. We will publicly release the dataset, code, and benchmark.
When Absolute State Fails: Evaluating Proprioceptive Encodings for Robust Manipulation ICRA 2026
As end-to-end robotic policies are progressively deployed in the real world to solve real tasks, they face a gap between the training and inference conditions. Scaling the amount and diversity of the training data has shown some success in improving zero-shot generalization, yet robots still fail when faced with new, unseen test conditions. For instance, while robots with fixed frames of reference are common, those with moving frames pose a greater challenge for deployment. To address this specific instance of the issue, we present a study of strategies for encoding the robot's proprioceptive state to improve both in- and out-of-distribution performance at test time. Through a systematic study of joint representations, we find that a simple episode-wise relative frame provides the best trade-off between task performance and robustness, outperforming the baselines in extensive real-robot experiments conducted in a realistic test environment. The results suggest a practical path to leveraging data collected by robots with varying frames of reference and deployment to unseen test configurations.
comment: Accepted to ICRA 2026 Workshop: From Data to Decisions
MUJICA: Multi-skill Unified Joint Integration of Control Architecture for Wheeled-Legged Robots
Wheeled-legged robots hold promise for traversing complex terrains and offer superior mobility compared to legged robots. However, wheeled-legged robots must effectively balance both wheeled driving and legged control. Furthermore, due to noisy proprioceptive sensing and real-world motor constraints, realizing robust and adaptive locomotion at peak performance of motors remains challenging. We propose the Multi-skill Unified Joint Integration of Control Architecture (MUJICA), a unified, fully proprioceptive control framework for wheeled-legged robots that integrates diverse low-level skills-including omnidirectional moving, high platform climbing, and fall recovery-within a single policy. All skills, distinguished by unique indicator variables, are trained jointly with accurate DC-motor constraint modeling. Additionally, a high-level skill selector is learned to dynamically choose the optimal skill based solely on proprioceptions, enabling adaptive responses to the surrounding environment. Therefore, MUJICA enhances sim-to-real robustness and enables seamless transitions across diverse locomotion modes, facilitating autonomous adjustment to the environment. We validate our framework in both simulation and real-world experiments on the Unitree Go2-W robot, demonstrating significant improvements in adaptability and task success in unstructured environments.
Relative Pose-Velocity Estimation Using Dual IMU Measurements and Relative Position Sensing
This paper addresses the problem of estimating the relative pose (position and orientation) and velocity of a vehicle with respect to a moving target, where both are equipped with Inertial Measurement Units (IMUs), assuming the availability of relative position or bearing measurements. The body-target relative dynamics are formulated on $\mathbf{SE}_2(3)$ and recast into a linear time-varying (LTV) model in the ambient space $\mathbb{R}^{15}$, on which a deterministic Riccati observer is designed. We analyze the uniform observability (UO) conditions required to guarantee global exponential convergence of the estimation error in the ambient space for both measurement cases. In the case of relative position measurements, UO requires only a persistence-of-excitation condition on the target acceleration, whereas for bearing measurements, additional conditions are required. Building on this, a nonlinear complementary filter on $\mathbf{SO}(3)$ is designed to provide a smooth estimate of the orientation component of the state with almost global asymptotic stability. Finally, simulation results are provided to validate the proposed solution.
Local Conformal Calibration of Dynamics Uncertainty from Semantic Images
We introduce Observation-aware Conformal Uncertainty Local-Calibration (OCULAR), a conformal prediction-based algorithm that uses perception information to provide uncertainty quantification guarantees for unseen test-time environments. While previous conformal approaches lack the ability to discriminate between state-action space regions leading to higher or lower model mismatch, and require environment-specific data, our method uses data collected from visually similar environments to provably calibrate a given linear Gaussian dynamics model of arbitrary fidelity. The prediction regions generated from OCULAR are guaranteed to contain the future system states with, at least, a user-set likelihood, despite both aleatoric and epistemic uncertainty -- i.e., uncertainty arising from both stochastic disturbances and lack of data. Our guarantees are non-asymptotic and distribution-free, not requiring strong assumptions about the unknown real system dynamics. Our calibration procedure enables distinguishing between observation-velocity-action inputs leading to higher and lower next-state-uncertainty, which is helpful for probabilistically-safe planning. We numerically validate our algorithm on a double-integrator system subject to random perturbations and significant model mismatch, using both a simplified sensor and a more realistic simulated camera. Our approach appropriately quantifies uncertainty both when in-distribution and out-of-distribution, being comparatively volume-efficient to baselines requiring environment-specific data.
comment: 26 pages, 8 figures. Accepted to the 17th World Symposium on the Algorithmic Foundations of Robotics (WAFR) 2026
Occlusion-Based Object Transportation Around Obstacles With a Swarm of Miniature Robots
Swarm robotics utilises decentralised self-organising systems to form complex collective behaviours built from the bottom-up using individuals that have limited capabilities. Previous work has shown that simple occlusion-based strategies can be effective in using swarm robotics for the task of transporting objects to a goal position. However, this strategy requires a clear line-of-sight between the object and the goal. In this paper, we extend this strategy by allowing robots to form sub-goals; enabling any member of the swarm to establish a wider range of visibility of the goal, ultimately forming a chain of sub-goals between the object and the goal position. We do so while preserving the fully decentralised and communication-free nature of the original strategy, while maintaining performance in object-free scenarios. In five sets of simulated experiments, we demonstrate the generalisability of our proposed strategy. Our finite-state machine allows a sufficiently large swarm to transport objects around obstacles that block the goal. The method is robust to varying starting positions and can handle both concave and convex shapes.
comment: 25 pages, 9 figures, 6 tables. Accepted for publication in the journal Swarm Intelligence
Distributionally Robust Safety Under Arbitrary Uncertainties: A Safety Filtering Approach
In this work, we study how to ensure probabilistic safety for nonlinear systems under distributional ambiguity. Our approach builds on a backup-based safety filtering framework that switches between a high-performance nominal policy and a certified backup policy to ensure safety. To handle arbitrary uncertainties from ambiguous distributions, i.e., where the distribution is not of specific structure and the true distribution is unknown, we adopt a distributionally robust (DR) formulation using Wasserstein ambiguity sets. Rather than solving a high-dimensional DR trajectory optimization problem online, we exploit the structure of backup-based safety filtering to reduce safety certification to a one-dimensional search over the switching time between nominal and backup policies. We then develop a sampling-based certification procedure with finite-sample guarantees, where empirical failure probabilities are compared against a Wasserstein-inflated threshold. We validate our method through simulations across three systems, from a Dubins vehicle to a high-speed racing car and a fighter jet, demonstrating the broad applicability and computational efficiency.
comment: 10 pages, 4 figures, submitted to IEEE Robotics and Automation Letters (RA-L)
DynoJEPP: Joint Estimation, Prediction and Planning in Dynamic Environments
DynoJEPP is a factor-graph-based framework that jointly formulates and simultaneously optimizes estimation, prediction, and planning in dynamic environments. In conventional factor-graph-based approaches that jointly formulate estimation, prediction, and planning, information from prediction and planning feeds back into state estimation, yielding corrupted estimates, undesired behaviors, and unsafe plans. To address this, DynoJEPP introduces a novel directed factor that enforces directional information flow within the factor graph, preventing prediction and planning from corrupting state estimation. We evaluate the impact of directed factors on inter-module interactions during navigation in both static and dynamic environments. Our results demonstrate that these factors are critical for safe operation, as without them, the robot collides in the majority of experiments. Building on this, we further introduce Cooperative DynoJEPP, which enables the ego robot to incorporate cooperative object behavior into its prediction and trajectory planning.
MAPLE: Latent Multi-Agent Play for End-to-End Autonomous Driving NeurIPS 2026
Vision-language-action (VLA) models are effective as end-to-end motion planners, but can be brittle when evaluated in closed-loop settings due to being trained under traditional imitation learning framework. Existing closed-loop supervision approaches lack scalability and fail to completely model a reactive environment. We propose MAPLE, a novel framework for reactive, multi-agent rollout of a dynamic driving scenario in the latent space of the VLA model. The ego vehicle and nearby traffic agents are independently controlled over multi-step horizons, while being reactive to other agents in the scene, enabling closed-loop training. MAPLE consists of two training stages: (1) supervised fine-tuning on the latent rollouts based on ground-truth trajectories, followed by (2) reinforcement learning with global and agent -specific rewards that encourage safety, progress, and interaction realism. We further propose diversity rewards that encourage the model to generate planning behaviors that may not be present in logged driving data. Notably, our closed-loop training framework is scalable and does not require external simulators, which can be computationally expensive to run and have limited visual fidelity to the real-world. MAPLE achieves state-of-the-art driving performance on Bench2Drive and demonstrates scalable, closed-loop multi-agent play for robust E2E autonomous driving systems.
comment: 19 pages, 9 figures, NeurIPS 2026 submission
Motion Planning for Autonomous Vehicles using Optimization over Graphs of Convex Sets
Motion planning for autonomous vehicles requires generating collision-free and dynamically feasible trajectories in complex environments under real-time constraints. While nonlinear optimal control formulations provide high-fidelity solutions, they are computationally demanding and sensitive to initialization, whereas geometric planning methods scale well but often decouple path selection from trajectory optimization. This paper studies the extent to which optimization over Graphs of Convex Sets (GCS) can approximate solutions of nonlinear optimal control problems in the context of autonomous driving. The free space is represented as a finite union of convex regions organized as a directed graph, allowing nonconvex geometry to be handled through discrete connectivity decisions while maintaining convex trajectory constraints within each region. Vehicle motion is parameterized using Bezier curves for the spatial path and a polynomial time-scaling function for temporal evolution. Under small-slip and linear tire assumptions, a simplified dynamic bicycle model enables approximate enforcement of dynamic feasibility through convex constraints on trajectory derivatives. The approach is evaluated in CommonRoad scenarios involving static obstacle avoidance and lane-changing maneuvers, and is compared against a nonlinear discrete-time optimal control formulation. The results indicate that the GCS-based method generates collision-free and dynamically consistent trajectories that closely match those obtained from the nonlinear program, while exhibiting improved computational efficiency and reduced sensitivity to initialization. These findings suggest that GCS provides a structured approximation of nonlinear motion planning problems, capturing dominant geometric and dynamic effects while preserving convexity in the continuous relaxation.
Safety-Constrained Reinforcement Learning with Post-Training Reachability Verification for Robot Navigation
Safe navigation for mobile robots demands policies that remain reliable under the high-consequence perception uncertainty of cluttered environments. Yet most existing safe reinforcement learning (RL) methods assess safety through average cumulative cost. Such metrics can mask dangerous tail-risk behaviors. To address this, we propose a framework that trains risk-sensitive policies through Conditional Value-at-Risk (CVaR) constrained optimization on an off-policy TD3 backbone and evaluates their safety margins post-training through neural network reachability verification. During training, the policy is optimized under CVaR constraints on cumulative costs, promoting sensitivity to high-cost tail outcomes rather than average behavior alone. After training, we compute action reachable sets under bounded observation uncertainty using Taylor Model analysis, yielding a safety rate metric that quantifies the proportion of evaluated states at which the policy's reachable action set remains within prescribed safety margins. A key finding is that policies trained with CVaR constraints maintain larger safety margins from obstacles across evaluated states. This makes them significantly more amenable to formal reachability verification. Experiments across ten navigation scenarios and six baselines show that our method achieves a 98.3\% success rate, the highest safety verification rate among all compared methods, while revealing that average cost rankings and reachability-based safety rankings can diverge. This indicates that reachability verification captures risks which are missed by empirical cost metrics alone. We further validate our approach on a physical Clearpath Jackal robot, demonstrating successful sim-to-real transfer.
SToRe3D: Sparse Token Relevance in ViTs for Efficient Multi-View 3D Object Detection CVPR 2026
Vision Transformers (ViTs) enable strong multi-view 3D detection but are limited by high inference latency from dense token and query processing across multiple views and large 3D regions. Existing sparsity methods, designed mainly for 2D vision, prune or merge image tokens but do not extend to full-model sparsity or address 3D object queries. We introduce SToRe3D, a relevance-aligned sparsity framework that jointly selects 2D image tokens and 3D object queries while storing filtered features for reactivation. Mutual 2D-3D relevance heads allocate compute to driving-critical content and preserve other embeddings. Evaluated on nuScenes and our new nuScenes-Relevance benchmark, SToRe3D achieves up to 3x faster inference with marginal accuracy loss, establishing real-time large-scale ViT-based 3D detection while maintaining accuracy on planning-critical agents.
comment: Accepted to CVPR 2026
Behavior Cloning for Active Perception with Low-Resolution Egocentric Vision
We investigate whether behavior cloning is sufficient to produce active perception in a structured object-finding task. A low-cost robot arm equipped with a wrist-mounted egocentric RGB camera must reposition to center a partially visible plant before triggering a grasp signal, requiring actions that improve future observations. The model predicts joint commands directly from low-resolution RGB images under closed-loop control. We show that low-resolution egocentric vision is sufficient for reliable task completion and that predicting relative joint deltas substantially outperforms absolute joint position prediction in our setting. These results demonstrate that visually grounded active perception can emerge from behavior cloning in a reproducible setting.
Ergodic Imitation for Adaptive Exploration around Demonstrations
In robotics, a common challenge in imitation learning is the mismatch between training and deployment conditions, caused, for example, by environmental changes or imperfect observation and control. When a robot follows a nominal trajectory under such mismatch, it may become stuck and fail to complete the task. This calls for adaptive online exploration strategies that remain grounded in demonstrations. To this end, we propose an adaptive ergodic imitation approach that constructs a target distribution from the geometry of the retrieved demonstrations and uses it to generate trajectories that adaptively interpolate between tracking and exploration. Our method extends ergodic control beyond its traditional role in area-coverage and search by incorporating demonstrations into a retrieval-based receding-horizon framework for adaptive imitation.
comment: 4 pages, 3 figures
WarmPrior: Straightening Flow-Matching Policies with Temporal Priors
Generative policies based on diffusion and flow matching have become a dominant paradigm for visuomotor robotic control. We show that replacing the standard Gaussian source distribution with WarmPrior, a simple temporally grounded prior constructed from readily available recent action history, consistently improves success rates on robotic manipulation tasks. We trace this gain to markedly straighter probability paths, echoing the effect of optimal-transport couplings in Rectified Flow. Beyond standard behavior cloning, WarmPrior also reshapes the exploration distribution in prior-space reinforcement learning, improving both sample efficiency and final performance. Collectively, these results identify the source distribution as an important and underexplored design axis in generative robot control.
Towards Robotic Dexterous Hand Intelligence: A Survey
Robotic dexterous hands are central to contact-rich manipulation, with rapid progress driven by advances in hardware, sensing, control, simulation, and data generation. However, existing studies are often developed under different assumptions regarding hand embodiments, sensory configurations, task settings, training data, and evaluation protocols, making systematic comparison difficult and obscuring the developmental trajectory of the field. This survey provides a holistic review of dexterous hand research from four complementary aspects. First, we present a hardware-level analysis covering actuation, transmission, perception, and representative hand designs, highlighting the key trade-offs in force capability, compliance, bandwidth, integration, and system complexity. Furthermore, we review control and learning methods for dexterous manipulation from a methodological perspective, grouping representative works by major paradigms and tracing their evolution in chronological order. In addition, we consolidate datasets, modality design, and evaluation practices, which enables methodological progress to be interpreted together with the ways in which it is trained, benchmarked, and assessed. Finally, we discuss the major limitations of current dexterous hand research and summarize the corresponding future directions. By connecting hardware analysis, methodological development, data resources, and evaluation, this survey aims to provide a structured understanding of dexterous hand research and to clarify the most important open challenges for future study.
Vision-Based Runtime Monitoring under Varying Specifications using Semantic Latent Representations
We study certified runtime monitoring of past-time signal temporal logic (ptSTL) from visual observations under partial observability. The monitor must infer safety-relevant quantities from images and provide finite-sample guarantees, while being \emph{reusable}: once trained and calibrated, it should certify any formula in a target fragment without per-formula retraining. For fragments induced by a finite dictionary of temporal atoms, we prove that the \emph{semantic basis}, the vector of atom robustness scores, is the minimum prediction target within the class of monotone, 1-Lipschitz reusable interfaces: any formula is evaluated by a deterministic decoder derived from the parse tree, and a single conformal calibration pass certifies the entire fragment with no union bound. We also introduce a \emph{rolling prediction monitor} that predicts only current predicate values and reconstructs temporal history online; this is easier to learn but grows conservative at long horizons. On a pedestrian-crossroad benchmark, rolling achieves tighter certified bounds at short horizons while the semantic-basis monitor is up to 4-times tighter at long horizons. We validate the presented monitors on real-world Waymo driving data, where both monitors satisfy the conformal coverage guarantee empirically.
AdaptNC: Adaptive Nonconformity Scores for Conformal Prediction under Distribution Shift
Rigorous uncertainty quantification is essential for the safe deployment of autonomous systems in unconstrained environments. Conformal Prediction (CP) provides a distribution-free framework for this task, yet its standard formulations rely on exchangeability assumptions that are violated by the distribution shifts inherent in real-world robotics. Existing online CP methods maintain target coverage by adaptively scaling the conformal threshold, but typically employ a static nonconformity score function. We show that this fixed geometry leads to highly conservative, volume-inefficient prediction regions when environments undergo structural shifts. To address this, we propose $\textbf{AdaptNC}$, a framework for the joint online adaptation of both the nonconformity score parameters and the conformal threshold. AdaptNC leverages an adaptive reweighting scheme to optimize score functions, and introduces a replay buffer mechanism to mitigate the coverage instability that occurs during score transitions. We evaluate AdaptNC on diverse robotic benchmarks involving multi-agent policy changes, environmental changes and sensor degradation. Our results demonstrate that AdaptNC significantly reduces prediction region volume compared to state-of-the-art threshold-only baselines while maintaining target coverage levels.
Unify Robot Actions in Camera Frame
Cross-embodiment robot learning requires a unified action representation with consistent semantics across robot platforms. Existing representations suffer from platform-specific inconsistencies, while current solutions either maintain embodiment-specific action heads or learn latent action spaces, without fundamentally resolving the mismatch. We propose to unify robot actions in the camera frame using camera extrinsics, so that actions share consistent geometric semantics across different robot embodiments, including both single-arm and bimanual robots. However, most existing datasets lack camera extrinsic annotations, and existing offline calibration methods either suffer from local minima or require robot-specific training data. To address this gap, we present CalibAll, a training-free, robot-independent annotation pipeline that estimates camera extrinsics for offline datasets and converts heterogeneous robot actions into standardized camera-frame actions. CalibAll follows a coarse-to-fine calibration strategy: temporal PnP provides a stable initialization, followed by differentiable rendering-based refinement for high precision. Beyond extrinsics, CalibAll produces standardized TCP-pose actions and auxiliary annotations. We apply CalibAll to 16 datasets across 4 robot platforms, producing approximately 97K calibrated data episodes. Downstream simulation and real-robot experiments show that cross-embodiment pretraining with camera-frame actions achieves state-of-the-art performance.
Unifying Entropy Regularization in Optimal Control: From and Back to Classical Objectives via Iterated Soft Policies and Path Integral Solutions
This paper develops a unified perspective on several optimal control formulations through the lens of Kullback-Leibler (KL) regularization. We propose a central problem that separates the KL penalties on policies and transitions with independent weights, thus generalizing the standard trajectory-level KL-regularization used in probabilistic optimal control. This umbrella formulation recovers various control problems: the classical Stochastic Optimal Control (SOC), Risk-Sensitive Stochastic Optimal Control (RSOC), and their policy-based KL-regularized counterparts, termed soft-policy SOC and RSOC, which yield tractable surrogates. Beyond being regularized variants, these soft-policy formulations majorize the original SOC and RSOC, thus, iterating their solutions recovers the original objectives. We further identify a synchronized case of soft-policy RSOC where the policy and transition KL weights coincide, yielding a linear Bellman operator, path-integral solution, and compositionality -- extending these computationally favourable properties to a broad class of control problems.
comment: refurbished introduction, added a few remarks, reduced size
Self-CriTeach: LLM Self-Teaching and Self-Critiquing for Improving Robotic Planning via Automated Domain Generation ICML
Large Language Models (LLMs) have shown strong promise for robotic task planning, particularly through the automatic generation of symbolic planning domains. However, prior work mainly treats generated domains as planning utilities. Such pipelines remain brittle under imperfect logical states and perception noise, while overlooking the potential of generated domains as scalable sources of reasoning supervision and structured reward signals. At the same time, reasoning LLMs depend on chain-of-thought (CoT) supervision, which is expensive to collect for robotic tasks, and reinforcement learning (RL) faces challenges in reward engineering. We propose Self-CriTeach, an LLM self-teaching and self-critiquing framework in which an LLM autonomously generates symbolic planning domains that serve a dual role: (1) In the self-teaching stage, generated domains are used to produce large-scale robotic planning problem--plan pairs, which are automatically converted into extended CoT trajectories for supervised fine-tuning. (2) In the self-critiquing stage, the same domains are reused as structured reward functions, providing dense feedback for reinforcement learning without manual reward engineering. This unified training pipeline yields a planning-enhanced LLM with higher planning success rates, stronger cross-task generalization, reduced inference cost, and improved resistance to imperfect logical states. GitHub Page: https://markli1hoshipu.github.io/Plan_LLM/
comment: International Conference on Machine Learning (ICML) 2026
An Overtaking Trajectory Planning Framework Based on Spatio-temporal Topology and Reachable Set Analysis Ensuring Time Efficiency
Generating overtaking trajectories in high-speed scenarios is typically addressed through hierarchical planning, which often suffers from local optima due to single initial solutions and low computational efficiency during numerical optimization. To overcome these limitations, this paper proposes a Spatio-temporal topology and Reachable set analysis enhanced Overtaking trajectory Planning framework (SROP). Specifically, by introducing topological classes to represent distinct overtaking behaviors, the upper-layer planner performs a spatio-temporal search to extract diverse initial paths, effectively preventing local optima. Subsequently, a lower-layer planner conducts parallel trajectory evaluation using reachable sets, which decouples vehicle kinematic constraints from the optimization process to ensure feasibility and significantly accelerate computation. Numerical experiments demonstrate that SROP improves trajectory smoothness by 66.8% and reduces computation time by 62.9% compared to state-of-the-art methods. Furthermore, by seamlessly integrating the method into the F1TENTH autonomous racing simulation platform, a 100-lap sensitivity analysis demonstrates high overtaking success rates in challenging scenarios, thereby validating its practical utility, real-time efficiency, and robustness.
TouchGuide: Inference-Time Steering of Visuomotor Policies via Touch Guidance
Fine-grained and contact-rich manipulation remain challenging for robots, largely due to the underutilization of tactile feedback. To address this, we introduce TouchGuide, a novel cross-policy visuo-tactile fusion paradigm that fuses modalities within a low-dimensional action space. Specifically, TouchGuide operates in two stages to guide a pre-trained diffusion or flow-matching visuomotor policy at inference time. First, the policy produces a coarse, visually-plausible action using only visual inputs during early sampling. Second, a task-specific Contact Physical Model (CPM) provides tactile guidance to steer and refine the action, ensuring it aligns with realistic physical contact conditions. Trained through contrastive learning on limited expert demonstrations, the CPM provides a tactile-informed feasibility score to steer the sampling process toward refined actions that satisfy physical contact constraints. Furthermore, to facilitate TouchGuide training with high-quality and cost-effective data, we introduce TacUMI, a data collection system. TacUMI achieves a favorable trade-off between precision and affordability; by leveraging rigid fingertips, it obtains direct tactile feedback, thereby enabling the collection of reliable tactile data. Extensive experiments on five challenging contact-rich tasks, such as shoe lacing and chip handover, show that TouchGuide consistently and significantly outperforms state-of-the-art visuo-tactile policies.
Block-wise Adaptive Caching for Accelerating Diffusion Policy
Diffusion Policy has demonstrated strong visuomotor modeling capabilities, but its high computational cost renders it impractical for real-time robotic control. Despite huge redundancy across repetitive denoising steps, existing diffusion acceleration techniques fail to generalize to Diffusion Policy due to fundamental architectural and data divergences. In this paper, we propose $\textbf{B}$lock-wise $\textbf{A}$daptive $\textbf{C}$aching ($\textbf{BAC}$), a method to accelerate Diffusion Policy by caching intermediate action features. BAC achieves lossless action generation acceleration by adaptively updating and reusing cached features at the block level, based on a key observation that feature similarities exhibit non-uniform temporal dynamics and distinct block-specific patterns. To operationalize this insight, we first design an Adaptive Caching Scheduler to identify optimal update timesteps by maximizing the global feature similarities between cached and skipped features. However, applying this scheduler for each block leads to significant error surges due to the inter-block propagation of caching errors, particularly within Feed-Forward Network (FFN) blocks. To mitigate this issue, we develop the Bubbling Union Algorithm, which truncates these errors by updating the upstream blocks with significant caching errors before downstream FFNs. As a training-free plugin, BAC is readily integrable with existing transformer-based Diffusion Policy and vision-language-action models. Extensive experiments on multiple robotic benchmarks demonstrate that BAC achieves up to 3$\times$ inference speedup for free. Project page: https://block-wise-adaptive-caching.github.io.
ZeD-MAP: Bundle Adjustment Guided Zero-Shot Depth Maps for Real-Time Aerial Imaging
Real-time depth reconstruction from ultra-high-resolution UAV imagery is essential for time-critical geospatial tasks such as disaster response, yet remains challenging due to wide-baseline parallax, large image sizes, low-texture or specular surfaces, occlusions, and strict computational constraints. Recent zero-shot diffusion models offer fast per-image dense predictions without task-specific retraining, and require fewer labelled datasets than transformer-based predictors while avoiding the rigid capture geometry requirement of classical multi-view stereo. However, their probabilistic inference prevents reliable metric accuracy and temporal consistency across sequential frames and overlapping tiles. We present ZeD-MAP, a cluster-level framework that converts a test-time diffusion depth model into a metrically consistent, SLAM-like mapping pipeline by integrating incremental cluster-based bundle adjustment (BA). Streamed UAV frames are grouped into overlapping clusters; periodic BA produces metrically consistent poses and sparse 3D tie-points, which are reprojected into selected frames and used as metric guidance for diffusion-based depth estimation. Validation on ground-marker flights captured at approximately 50 m altitude (GSD is approximately 0.85 cm/px, corresponding to 2,650 square meters ground coverage per frame) with the DLR Modular Aerial Camera System (MACS) shows that our method achieves sub-meter accuracy, with approximately 0.87 m error in the horizontal (XY) plane and 0.12 m in the vertical (Z) direction, while maintaining per-image runtimes between 1.47 and 4.91 seconds. Results are subject to minor noise from manual point-cloud annotation. These findings show that BA-based metric guidance provides consistency comparable to classical photogrammetric methods while significantly accelerating processing, enabling real-time 3D map generation.
Follow-Bench: A Unified Motion Planning Benchmark for Socially-Aware Robot Person Following
Robot person following (RPF) -- mobile robots that follow and assist a specific person -- has emerging applications in personal assistance, security patrols, eldercare, and logistics. To be effective, such robots must follow the target while ensuring safety and comfort for both the target and surrounding people. In this work, we present the first comprehensive study of RPF, which (i) surveys representative scenarios, motion-planning methods, and evaluation metrics with a focus on safety and comfort; (ii) introduces Follow-Bench, a unified benchmark simulating diverse scenarios, including various target trajectory patterns, crowd dynamics, and environmental layouts; and (iii) re-implements eight representative RPF planners, ensuring that both safety and comfort are systematically considered. Moreover, we evaluate the two best-performing planners from our benchmark on a differential-drive robot to provide insights into real-world deployment of RPF planners. Extensive simulation and real-world experiments provide quantitative study of the safety-comfort trade-offs of existing planners, while revealing open challenges and future research directions.
comment: Project page: https://follow-bench.github.io/
ALAM: Algebraically Consistent Latent Action Model for Vision-Language-Action Models
Vision-language-action (VLA) models remain constrained by the scarcity of action-labeled robot data, whereas action-free videos provide abundant evidence of how the physical world changes. Latent action models offer a promising way to extract such priors from videos, but reconstruction-trained latent codes are not necessarily suitable for policy generation: they may predict future observations while lacking the structure needed to be reused or generated coherently with robot actions. We introduce ALAM (Algebraic Latent Action Model), an Algebraically Consistent Latent Action Model that turns temporal relations in action-free video into structural supervision. Given frame triplets, ALAM learns latent transitions that are grounded by reconstruction while being regularized by composition and reversal consistency, encouraging a locally additive transition space. For downstream VLA learning, we freeze the pretrained encoder and use its latent transition sequences as auxiliary generative targets, co-generated with robot actions under a joint flow-matching objective. This couples structured latent transitions with flow-based policy generation, allowing the policy to exploit ALAM's locally consistent transition geometry without requiring latent-to-action decoding. Representation probes show that ALAM reduces additivity and reversibility errors by 25-85 times over unstructured latent-action baselines and improves long-horizon cumulative reconstruction. When transferred to VLA policies, ALAM raises the average success rate from 47.9% to 85.0% on MetaWorld MT50 and from 94.1% to 98.1% on LIBERO, with consistent gains on real-world manipulation tasks. Ablations further confirm that the strongest improvements arise from the synergy between algebraically structured latent transitions and joint flow matching.
LangForce: Bayesian Decomposition of Vision Language Action Models via Latent Action Queries
Vision-Language-Action (VLA) models have shown promise in robot manipulation but often struggle to generalize to new instructions or complex multi-task scenarios. We identify a critical pathology in current training paradigms where goal-driven data collection creates a dataset bias. In such datasets, language instructions are highly predictable from visual observations alone, causing the conditional mutual information between instructions and actions to vanish, a phenomenon we term Information Collapse. Consequently, models degenerate into vision-only policies that ignore language constraints and fail in out-of-distribution (OOD) settings. To address this, we propose LangForce, a novel framework that enforces instruction following via Bayesian decomposition. By introducing learnable Latent Action Queries, we construct a dual-branch architecture to estimate both a vision-only prior $p(a \mid v)$ and a language-conditioned posterior $π(a \mid v, \ell)$. We then optimize the policy to maximize the conditional Pointwise Mutual Information (PMI) between actions and instructions. This objective effectively penalizes the vision shortcut and rewards actions that explicitly explain the language command. Without requiring new data, LangForce significantly improves generalization. Extensive experiments across on SimplerEnv and RoboCasa demonstrate substantial gains, including an 11.3% improvement on the challenging OOD SimplerEnv benchmark, validating the ability of our approach to robustly ground language in action.
Perception with Guarantees: Certified Pose Estimation via Reachability Analysis
Agents in cyber-physical systems are increasingly entrusted with safety-critical tasks. Ensuring safety of these agents often requires localizing the pose for subsequent actions. Pose estimates can, e.g., be obtained from various combinations of lidar sensors, cameras, and external services such as GPS. Crucially, in safety-critical domains, a rough estimate is insufficient to formally determine safety, i.e., guaranteeing safety even in the worst-case scenario, and external services might additionally not be trustworthy. We address this problem by presenting a certified pose estimation in 3D solely from a camera image and a well-known target geometry. This is realized by formally bounding the pose, which is computed by leveraging recent results from reachability analysis and formal neural network verification. Our experiments demonstrate that our approach efficiently and accurately localizes agents in both synthetic and real-world experiments.
comment: Accepted at Computed Aided Verification (CAV'2026)
Prismatic World Model: Learning Compositional Dynamics for Planning in Hybrid Systems
Model-based planning in robotic domains is challenged by the hybrid nature of physical dynamics, where continuous motion is punctuated by discrete events such as contacts and impacts. Conventional latent world models typically employ monolithic neural networks that enforce global continuity, which over-smooths distinct dynamic modes (e.g., sticking vs. sliding, flight vs. stance). For a planner, this smoothing results in compounding errors during long-horizon lookaheads, rendering the search process unreliable at physical boundaries. To address this, we introduce the Prismatic World Model (PRISM-WM), a structured architecture designed to decompose complex hybrid dynamics into composable primitives. PRISM-WM uses a context-aware Mixture-of-Experts (MoE) framework where a gating mechanism implicitly identifies the current physical mode, and specialized experts predict the associated transition dynamics. We further introduce a latent orthogonalization objective to ensure expert diversity, preventing mode collapse. By modeling the mode transitions in system dynamics, PRISM-WM reduces rollout drift. Experiments on continuous control benchmarks, including high-dimensional humanoids and multi-task settings, demonstrate that PRISM-WM provides a high-fidelity substrate for trajectory optimization algorithms (e.g., TD-MPC), indicating its potential as a foundational model for model-based agents.
UniJEPA: Enhancing Robot Policy via Unified Continuous and Discrete Representation Learning
Building generalist robot policies that can handle diverse tasks in open-ended environments is a central challenge in robotics. To leverage knowledge from large-scale pretraining, prior work (VLA) has typically built generalist policies either on top of vision-language understanding models (VLMs) or generative models. However, both semantic understanding from vision-language pretraining and visual dynamics modeling from visual-generation pretraining are crucial for embodied robots. Recent unified models of generation and understanding have demonstrated strong capabilities in both comprehension and generation through large-scale pretraining. We posit that robotic policy learning can likewise benefit from the combined strengths of understanding, planning, and continuous future representation learning. Building on this insight, we introduce UniJEPA, which acquires the ability to dynamically model high-dimensional visual features through pretraining on over 1M internet-scale instructional manipulation videos. Subsequently, UniJEPA is fine-tuned on data collected from the robot embodiment, enabling the learning of mappings from predictive representations to action tokens. Extensive experiments show our approach consistently outperforms baseline methods in terms of 9\% and 12\% across simulation environments and real-world out-of-distribution tasks.
TeleGate: Whole-Body Humanoid Teleoperation via Gated Expert Selection with Motion Prior
Real-time whole-body teleoperation is a critical method for humanoid robots to perform complex tasks in unstructured environments. However, developing a unified controller that robustly supports diverse human motions remains a significant challenge. Existing methods typically distill multiple expert policies into a single general policy, which often inevitably leads to performance degradation, particularly on highly dynamic motions. This paper presents TeleGate, a unified whole-body teleoperation framework for humanoid robots that achieves high-precision tracking across various motions while avoiding the performance loss inherent in knowledge distillation. Our key idea is to preserve the full capability of domain-specific expert policies by training a lightweight gating network, which dynamically activates experts in real-time based on proprioceptive states and reference trajectories. Furthermore, to compensate for the absence of future reference trajectories in real-time teleoperation, we introduce a VAE-based motion prior module that extracts implicit future motion intent from historical observations, enabling anticipatory control for motions requiring prediction such as jumping and standing up. We conducted empirical evaluations in simulation and also deployed our technique on the Unitree G1 humanoid robot. Using only 2.5 hours of motion capture data for training, our TeleGate achieves high-precision real-time teleoperation across diverse dynamic motions (e.g., running, fall recovery, and jumping), significantly outperforming the baseline methods in both tracking accuracy and success rate.
comment: Accepted by RSS 2026. Project page: https://anywitresearch.github.io/TeleGate/
Simulation-Ready Cluttered Scene Estimation via Physics-aware Joint Shape and Pose Optimization
Estimating simulation-ready scenes from real-world observations is crucial for downstream planning and policy learning tasks. Regretfully, existing methods struggle in cluttered environments, often exhibiting prohibitive computational cost, poor robustness, and restricted generality when scaling to multiple interacting objects. We propose a unified optimization-based formulation for real-to-sim scene estimation that jointly recovers the shapes and poses of multiple rigid objects under physical constraints. Our method is built on two key technical innovations. First, we leverage the recently introduced shape-differentiable contact model, whose global differentiability permits joint optimization over object geometry and pose while modeling inter-object contacts. Second, we exploit the structured sparsity of the augmented Lagrangian Hessian to derive an efficient linear system solver whose computational cost scales favorably with scene complexity. Building on this formulation, we develop an end-to-end Simulation-ready Physics-Aware Reconstruction for Cluttered Scenes (SPARCS) pipeline, which integrates learning-based object initialization, physics-constrained joint shape-pose optimization, and differentiable texture refinement. Experiments on cluttered scenes with up to 5 objects and 22 convex hulls demonstrate that our approach robustly reconstructs physically valid, simulation-ready object shapes and poses. Project webpage: https://rory-weicheng.github.io/SPARCS/.
comment: Accepted to RSS 2026, camera-ready version; 17 pages, 15 figures
Compact 3D Gaussian Splatting For Dense Visual SLAM
Recent work has shown that 3D Gaussian-based SLAM enables high-quality reconstruction, accurate pose estimation, and real-time rendering of scenes. However, these approaches are built on a tremendous number of redundant 3D Gaussian ellipsoids, leading to high memory and storage costs, and slow training speed. To address the limitation, we propose a compact 3D Gaussian Splatting SLAM system that reduces the number and the parameter size of Gaussian ellipsoids. A sliding window-based masking strategy is first proposed to reduce the redundant ellipsoids. Then we observe that the covariance matrix (geometry) of most 3D Gaussian ellipsoids are extremely similar, which motivates a novel geometry codebook to compress 3D Gaussian geometric attributes, i.e., the parameters. Robust and accurate pose estimation is achieved by a global bundle adjustment method with reprojection loss. Extensive experiments demonstrate that our method achieves faster training and rendering speed while maintaining the state-of-the-art (SOTA) quality of the scene representation.
comment: Accepted by IJCV 2026
Saturation-Aware Angular Velocity Estimation: Extending the Robustness of SLAM to Aggressive Motions ICRA
We propose a novel angular velocity estimation method to increase the robustness of Simultaneous Localization And Mapping (SLAM) algorithms against gyroscope saturations induced by aggressive motions. Field robotics expose robots to various hazards, including steep terrains, landslides, and staircases, where substantial accelerations and angular velocities can occur if the robot loses stability and tumbles. These extreme motions can saturate sensor measurements, especially gyroscopes, which are the first sensors to become inoperative. While the structural integrity of the robot is at risk, the robustness of the SLAM framework is oftentimes given little consideration. Consequently, even if the robot is physically capable of continuing the mission, its operation will be compromised due to a corrupted representation of the world. Regarding this problem, we propose a method to estimate the angular velocity using accelerometers during extreme rotations caused by tumbling. We show that our method reduces the median localization error by 71.5 % in translation and 65.5 % in rotation and is robust to mapping failures, which occurred in 37.5 % of the experiments without our method. We also propose the Tumbling-Induced Gyroscope Saturation (TIGS) dataset, which consists of outdoor experiments recording the motion of a mechanical lidar subject to angular velocities four times higher than other similar datasets available. The dataset is available online at https://github.com/norlab-ulaval/Norlab_wiki/wiki/TIGS-Dataset.
comment: 7 pages, 7 figures, published in 2024 IEEE International Conference on Robotics and Automation (ICRA), Yokohama, Japan
QuickLAP: Quick Language-Action Preference Learning for Semi-Autonomous Systems
Robots must learn from both what people do and what they say, but either modality alone is often incomplete: physical corrections are grounded but ambiguous in intent, while language expresses high-level goals but lacks physical grounding. We introduce QuickLAP: Quick Language-Action Preference learning, a Bayesian framework that fuses physical and language feedback to infer reward functions in real time. Our key insight is to treat language as a probabilistic observation over the user's latent preferences, clarifying which reward features matter and how physical corrections should be interpreted. QuickLAP uses Large Language Models (LLMs) to extract reward feature attention masks and preference shifts from free-form utterances, which it integrates with physical feedback in a closed-form update rule. This enables fast, real-time, and robust reward learning that handles ambiguous feedback. In a semi-autonomous driving simulator, QuickLAP reduces reward learning error by over 70% compared to physical-only and heuristic multimodal baselines. A 15-participant user study further validates our approach: participants found QuickLAP significantly more understandable and collaborative, and preferred its learned behavior over baselines. Code is available at https://github.com/MIT-CLEAR-Lab/QuickLAP.
SCU-Hand with Integrated Single-Sheet Valve: A Funnel-Shaped Robotic Hand for Milligram-Scale Powder Handling
Laboratory Automation (LA) has the potential to accelerate solid-state materials discovery by enabling continuous robotic operation without human intervention. While robotic systems have been developed for tasks such as powder grinding and X-ray diffraction (XRD) analysis, fully automating powder handling at the milligram scale remains a significant challenge due to the complex flow dynamics of powders and the diversity of laboratory tasks. To address this challenge, this study proposes the SCU-Hand-SV (Soft Conical Universal Robotic Hand with Single-sheet Valve), which preserves the softness and conical sheet designs in prior work while incorporating a controllable valve at the cone apex to enable precise, incremental dispensing of milligram-scale powder quantities. The SCU-Hand-SV is integrated with an external balance through a feedback control system based on a model of powder flow and online parameter identification. Experimental evaluations with glass beads, monosodium glutamate, and titanium dioxide demonstrated that 80% of the trials achieved an error within -2 mg to +2 mg, and the maximum error observed was approximately 20 mg across a target range of 20 mg to 3 g. In addition, by incorporating flow prediction models commonly used for hoppers and performing online parameter identification, the system is able to adapt to variations in powder dynamics. Compared to direct PID control, the proposed model-based control significantly improved both accuracy and convergence speed. These results highlight the potential of the proposed system to enable efficient and flexible powder weighing, with scalability toward larger quantities and applicability to a broad range of laboratory automation tasks.
comment: 8 pages, 8 figures
Cycle-resolved Cephalopod-Inspired Pulsed-Jet Robot With High-Volume Expulsion and Drag-Reduced Gliding
Cephalopod pulsed-jet locomotion is not a single isolated expulsion event, but a coordinated cycle involving jet expulsion, passive gliding, and mantle refilling. Inspired by this cycle-resolved biological strategy, this paper presents a cephalopod-inspired pulsed-jet robot with a rigid-soft hybrid origami mantle that enables large, actively driven, and geometry-guided body deformation. The proposed mantle integrates rigid folding panels with a compliant silicone framework, allowing a 75% effective cavity-volume reduction during expulsion and reducing the projected cross-sectional drag area by approximately 75.7% in the contracted gliding configuration. Using this platform, we formulate a cycle-resolved framework to separately investigate how expelled volume, glide duration, and refill pathway influence whole-cycle locomotion performance. Experiments show that the robot reaches a peak speed of approximately 0.5 m/s (3.8 BL/s) and an average speed exceeding 0.2 m/s (1.5 BL/s) within the first jetting cycle. The results further demonstrate the roles of high expelled-volume-ratio contraction in speed generation, reduced-drag-area gliding under different glide durations, and mantle-aperture-inspired passive inlet valves in assisting refill. This work provides both a robotic implementation of actively deformable cephalopod-like jet propulsion and a unified experimental platform for studying expulsion-gliding-refilling dynamics in pulsed-jet locomotion.
comment: Updated author list; no changes to the scientific content
Robotic Manipulation by Imitating Generated Videos Without Physical Demonstrations ICLR 2026
This work introduces Robots Imitating Generated Videos (RIGVid), a system that enables robots to perform complex manipulation tasks--such as pouring, wiping, and mixing--purely by imitating AI-generated videos, without requiring any physical demonstrations or robot-specific training. Given a language command and an initial scene image, a video diffusion model generates potential demonstration videos, and a vision-language model (VLM) automatically filters out results that do not follow the command. A 6D pose tracker then extracts object trajectories from the video, and the trajectories are retargeted to the robot in an embodiment-agnostic fashion. Through extensive real-world evaluations, we show that filtered generated videos are as effective as real demonstrations, and that performance improves with generation quality. We also show that relying on generated videos outperforms more compact alternatives such as keypoint prediction using VLMs, and that strong 6D pose tracking outperforms other ways to extract trajectories, such as dense feature point tracking. These findings suggest that videos produced by a state-of-the-art off-the-shelf model can offer an effective source of supervision for robotic manipulation.
comment: In ICLR 2026. Website: https://rigvid-robot.github.io/
ViTacFormer: Learning Cross-Modal Representation for Visuo-Tactile Dexterous Manipulation
Dexterous manipulation is a cornerstone capability for robotic systems aiming to interact with the physical world in a human-like manner. Although vision-based methods have advanced rapidly, tactile sensing remains crucial for fine-grained control, particularly in unstructured or visually occluded settings. We present ViTacFormer, a representation-learning approach that couples a cross-attention encoder to fuse high-resolution vision and touch with an autoregressive tactile prediction head that anticipates future contact signals. Building on this architecture, we devise an easy-to-challenging curriculum that steadily refines the visual-tactile latent space, boosting both accuracy and robustness. The learned cross-modal representation drives imitation learning for multi-fingered hands, enabling precise and adaptive manipulation. Across a suite of challenging real-world benchmarks, our method achieves approximately 50% higher success rates than prior state-of-the-art systems. To our knowledge, it is also the first to autonomously complete long-horizon dexterous manipulation tasks that demand highly precise control with an anthropomorphic hand, successfully executing up to 11 sequential stages and sustaining continuous operation for 2.5 minutes.
MIMIC-D: Multi-modal Imitation for MultI-agent Coordination with Decentralized Diffusion Policies
As robots become more integrated in society, their ability to coordinate with other robots and humans on multi-modal tasks (those with multiple valid solutions) is crucial. Such behaviors can be learned from expert demonstrations via imitation learning (IL), but when expert demonstrations are multi-modal, standard IL approaches usually average across modes or collapse to a single mode, preventing effective coordination. Being inspired by diffusion models' ability to capture complex multi-modal trajectory distributions in single-agent settings, we develop a diffusion-based framework for coordinated multi-modal behavior in multi-agent systems. However, existing multi-agent diffusion approaches typically require a centralized planner or explicit communication among agents. This assumption can fail in real-world scenarios where robots must operate independently or with agents like humans that they cannot directly communicate with. Therefore, we propose MIMIC-D, a joint training with decentralized execution paradigm for multi-modal multi-agent IL via diffusion. We jointly train all agents' policies with only local information to achieve implicit coordination. In simulation and hardware experiments, our method exhibits robust multi-modal coordination behavior in various tasks and environments, improving upon state-of-the-art baselines.
comment: 8 pages, 4 figures, 5 tables
Bellman Value Decomposition for Task Logic in Safe Optimal Control
Real-world tasks involve nuanced combinations of goal and safety specifications. In high dimensions, the challenge is exacerbated: formal automata become cumbersome, and the combination of sparse rewards tends to require laborious tuning. In this work, we consider the innate structure of the Bellman Value as a means to naturally organize the problem for improved automatic performance. Namely, we prove the Bellman Value for a complex task defined in temporal logic can be decomposed into a graph of Bellman Values, connected by a set of well-known Bellman equations (BEs): the Reach-Avoid BE, the Avoid BE, and a novel type, the Reach-Avoid-Loop BE. To solve the Value and optimal policy, we propose VDPPO, which embeds the decomposed Value graph into a two-layer neural net, bootstrapping the implicit dependencies. We conduct a variety of simulated and hardware experiments to test our method on complex, high-dimensional tasks involving heterogeneous teams and nonlinear dynamics. Ultimately, we find this approach greatly improves performance over existing baselines, balancing safety and liveness automatically.
Robometer: Scaling General-Purpose Robotic Reward Models via Trajectory Comparisons
General-purpose robot reward models are typically trained to predict absolute task progress from expert demonstrations, providing only local, frame-level supervision. While effective for expert demonstrations, this paradigm scales poorly to large-scale robotics datasets where failed and suboptimal trajectories are abundant and assigning dense progress labels is ambiguous. We introduce Robometer, a scalable reward modeling framework that combines intra-trajectory progress supervision with inter-trajectory preference supervision. Robometer is trained with a dual objective: a frame-level progress loss that anchors reward magnitude on expert data, and a trajectory-comparison preference loss that imposes global ordering constraints across trajectories of the same task, enabling effective learning from both real and augmented failed trajectories. To support this formulation at scale, we curate RBM-1M, a reward-learning dataset comprising over one million trajectories spanning diverse robot embodiments and tasks, including substantial suboptimal and failure data. Across benchmarks and real-world evaluations, Robometer learns more generalizable reward functions than prior methods and improves robot learning performance across a diverse set of downstream applications. Code, model weights, and videos at https://robometer.github.io/.
comment: 33 pages, 17 figures
Learning Dynamic Rope Manipulation Using Task-Level Iterative Learning Control
We introduce a Task-Level Iterative Learning Control method for dynamic manipulation of ropes. We demonstrate this method on a non-planar rope manipulation task called the flying knot. Using a single human demonstration and a simplified rope model, the method learns directly on hardware without reliance on large amounts of demonstration data or massive amounts of simulation. At each iteration, the algorithm inverts a model of the robot and rope by solving a quadratic program to propagate task-space errors into action updates. We evaluate performance across 7 different kinds of ropes, including chain, latex surgical tubing, and braided and twisted ropes, ranging in thicknesses of 7--25\,mm and densities of 0.013--0.5\,kg/m. Learning achieves a 100\% success rate within 10 trials on all ropes. Furthermore, the method can successfully transfer between most rope types in 2--5 trials. https://flying-knots.github.io
comment: Project website: https://flying-knots.github.io
From Local Matches to Global Masks: Template-Guided Instance Detection and Segmentation in Open-World Scenes
Detecting and segmenting novel object instances in open-world environments is a fundamental problem in robotic perception. Given only a small set of template images, a robot must locate and segment a specific object instance in a cluttered, previously unseen scene. Existing proposal-based approaches are highly sensitive to proposal quality and often fail under occlusion and background clutter. We propose L2G-Det, a local-to-global instance detection framework that bypasses explicit object proposals by leveraging dense patch-level matching between templates and the query image. Locally matched patches generate candidate points, which are refined through a candidate selection module to suppress false positives. The filtered points are then used to prompt an augmented Segment Anything Model (SAM) with instance-specific object tokens, enabling reliable reconstruction of complete instance masks. Experiments demonstrate improved performance over proposal-based methods in challenging open-world settings.
comment: Accepted to Robotics: Science and Systems (RSS) 2026. Project page: https://irvlutd.github.io/L2G/
MonoSpheres: Large-Scale Monocular SLAM-Based UAV Exploration through Perception-Coupled Mapping and Planning
Autonomous exploration of unknown environments is a key capability for mobile robots, but it is largely unsolved for robots equipped with only a single monocular camera and no dense range sensors. In this paper, we present a novel approach to monocular vision-based exploration that can safely cover large-scale unstructured indoor and outdoor 3D environments by explicitly accounting for the properties of a sparse monocular SLAM frontend in both mapping and planning. The mapping module solves the problems of sparse depth data, free-space gaps, and large depth uncertainty by oversampling free space in texture-sparse areas and keeping track of obstacle position uncertainty. The planning module handles the added free-space uncertainty through rapid replanning and perception-aware heading control. We further show that frontier-based exploration is possible with sparse monocular depth data when parallax requirements and the possibility of textureless surfaces are taken into account. We evaluate our approach extensively in diverse real-world and simulated environments, including ablation studies. To the best of the authors' knowledge, the proposed method is the first to achieve 3D monocular exploration in real-world unstructured outdoor environments. We open-source our implementation to support future research.
comment: 8 pages, 9 figures, accepted to IEEE Robotics and Automation Letters
Optimal UGV-UAV Cooperative Partitioning and Inspection of Shortest Paths
We study cooperative shortest path planning for an unmanned ground vehicle (UGV) assisted by an unmanned aerial vehicle (UAV) in environments with unknown road blockages that are only discovered when a robot reaches the damaged point. This formulation generalizes the original Canadian Traveller Problem (CTP), which assumes a single ground vehicle and that the traversability status of all incident edges is revealed upon arrival at a vertex. We first analyze the case where the start and the goal are connected by $k$ disjoint paths, and prove that the worst-case competitive ratio $ρ$ for a single UGV is $2k-1$. With UAV assistance, and under the simplifying assumption of negligible initial transit and deadheading UAV costs, the ratio improves to $ρ= 2\frac{v_G}{v_A + v_G}k - 1$, where $v_G$ and $v_A$ denote the UGV and UAV speed, respectively. To address general graphs and non-negligible UAV initial transit and deadheading costs, we present an optimal path partitioning strategy that assigns path prefix inspection to the UGV and path suffix inspection to the UAV, and prove the optimality of the UAV inspection strategy on general graphs. We evaluate our algorithm by performing experiments on road networks from the world's 50 most populous cities, with randomized blockages, and show that the proposed method reduces UGV travel times by up to 30%.
comment: Withdrawn by the authors due to an error in Section V.D in the competitive-ratio proof for the UGV-UAV case. The proof incorrectly uses $1+2\frac{v_A}{v_G+v_A}(k-1)\le 2\frac{v_A}{v_G+v_A}k-1$, which does not hold in general and affects the stated bound
ActivePusher: Active Learning and Planning with Residual Physics for Nonprehensile Manipulation ICRA 2026
Planning with learned dynamics models offers a promising approach toward versatile real-world manipulation, particularly in nonprehensile settings such as pushing or rolling, where accurate analytical models are difficult to obtain. However, collecting training data for learning-based methods can be costly and inefficient, as it often relies on randomly sampled interactions that are not necessarily the most informative. Furthermore, learned models tend to exhibit high uncertainty in underexplored regions of the skill space, undermining the reliability of long-horizon planning. To address these challenges, we propose ActivePusher, a novel framework that combines residual-physics modeling with uncertainty-based active learning, to focus data acquisition on the most informative skill parameters. Additionally, ActivePusher seamlessly integrates with model-based kinodynamic planners, leveraging uncertainty estimates to bias control sampling toward more reliable actions. We evaluate our approach in both simulation and real-world environments, and demonstrate that it consistently improves data efficiency and achieves higher planning success rates in comparison to baseline methods. The source code is available at https://github.com/elpis-lab/ActivePusher.
comment: Accepted by the 2026 IEEE International Conference on Robotics & Automation (ICRA 2026)
Terminal Matters: Kinodynamic Planning with a Terminal Cost and Learned Uncertainty in Belief State-Cost Space
In many real-world robotic tasks, robots must generate dynamically feasible motions that reliably reach desired goals even under uncertainty. Yet existing sampling-based kinodynamic planners typically optimize accumulated trajectory costs and treat goal reaching as a feasibility check, rather than explicitly optimizing terminal-state quality, such as goal preference or goal-reaching reliability. In this work, we introduce a terminal-cost formulation for kinodynamic planning that allows terminal-state quality to be optimized alongside accumulated trajectory cost. We prove that AO-RRT, an asymptotically optimal kinodynamic planner, preserves its asymptotic optimality under this augmented objective. We further extend the formulation to belief space and prove that minimizing the Wasserstein distance between the terminal belief and the goal improves a lower bound on the probability of reaching the goal region. The resulting planner, KiTe, uses this terminal-cost objective to encode goal preferences and improve reliability under uncertainty. To support systems without analytical uncertainty models, we learn dynamics and process uncertainty directly from data and integrate the learned belief dynamics into planning. Experiments on Flappy Bird, Car Parking, and Planar Pushing show that KiTe consistently improves goal-reaching success under uncertainty. Real-world Planar Pushing experiments further demonstrate that KiTe can plan effectively with learned dynamics and uncertainty. Source code is available at https://github.com/elpis-lab/KiTe.
Multiagent Systems
EconAI: Dynamic Persona Evolution and Memory-Aware Agents in Evolving Economic Environments
The integration of large language models (LLMs) in economic simulations has significantly enhanced agent-based modeling, yet existing frameworks struggle to capture the interplay between short-term optimization and long-term strategic planning. Conventional approaches rely on static data-driven predictions, failing to incorporate adaptive behaviors influenced by economic sentiment, market volatility, and individual goals. To address these limitations, we introduce a novel EconAI framework, incorporating economic sentiment indexing (ESI), memory weighting, and dynamic decision-making mechanisms. By quantifying economic belief, adjusting historical data influence, and linking work-consumption behaviors, EconAI achieves a more human-like decision process, where agents adapt their actions based on both market signals and long-term objectives. It is the first LLM-powered simulation system that can simulate the macro/microeconomic environment and interactions in a unified framework. Empirical evaluations show that EconAI improves stability in economic responses, better replicates real-world employment-consumption cycles, and enhances overall decision robustness. This advancement marks a crucial step towards more realistic, adaptive economic agent simulations.
SkillOps: Managing LLM Agent Skill Libraries as Self-Maintaining Software Ecosystems NeurIPS 2026
Large language model agents increasingly rely on skill libraries for multi-step tasks, yet these libraries can accumulate persistent defects as skills are added, reused, patched, and linked to changing dependencies. We call this failure mode skill technical debt: library-level defects that may not break a single skill locally but can harm future retrieval, composition, and execution. Existing skill-based agents mainly focus on task-time retrieval, planning, and repair, while library-time maintenance remains underexplored. We propose SkillOps, a method-agnostic plug-in framework for maintaining skill libraries. SkillOps represents each skill as a typed Skill Contract (P, O, A, V, F), organizes skills with a Hierarchical Skill Ecosystem Graph, and diagnoses library health across utility, compatibility, risk, and validation dimensions. Given a raw skill library, SkillOps produces a maintained library that can be used by existing retrieval or planning agents without changing their internal code. On ALFWorld, SkillOps achieves 79.5 percent task success as a standalone agent, outperforming the strongest baseline by 8.8 percentage points with no additional task-time large language model calls. As a plug-in layer, it improves retrieval-heavy baselines by 0.68 to 2.90 percentage points. The current rule-based maintenance implementation uses nearly zero library-time large language model calls or tokens, showing that skill-library maintenance can be added as a low-overhead architectural layer.
comment: 23 pages, 9 figures. Submitted to NeurIPS 2026. Code is available at https://github.com/Hik289/SkillOps.git
Unweighted ranking for value-based decision making with uncertainty
As intelligent systems are increasingly implemented in our society to make autonomous decisions, their commitment to human values raises serious concerns. Their alignment with human values remains a critical challenge because it can jeopardise the integrity and security of citizens. For this reason, an innovative human-centred and values-driven approach to decision making is required. In this work, we introduce the Fuzzy-Unweighted Value-Based Decision Making (FUW-VBDM) framework, where agents incorporate both quantitative and qualitative criteria to generate human-centred decisions. We also address the normative bias introduced by stakeholders with arbitrary weights by removing prior weights and introducing a fuzzy domain of decision variables defined for a score function. This concept allows us to generalise any VBDM problem as the search for feasible solutions when optimising the score in the weight domain. To provide a solution to FUW-VBDM, we present Rankzzy, a customizable unweighted ranking method that integrates fuzzy-based reasoning to quantify uncertainty. We mathematically prove the consistency of the Rankzzy for any admissible configuration selected by stakeholders. We show the applicability of our method through an illustrative case study, which we also use as a running example. The evaluation conducted indicates a reduced computational cost in large-scale value-based decision-making problems and a strong rank performance regarding existing approaches when employing the aggregation via Pythagorean means.
comment: 21 pages
RealICU: Do LLM Agents Understand Long-Context ICU Data? A Benchmark Beyond Behavior Imitation
Intensive care units (ICU) generate long, dense and evolving streams of clinical information, where physicians must repeatedly reassess patient states under time pressure, underscoring a clear need for reliable AI decision support. Existing ICU benchmarks typically treat historical clinician actions as ground truth. However, these actions are made under incomplete information and limited temporal context of the underlying patient state, and may therefore be suboptimal, making it difficult to assess the true reasoning capabilities of AI systems. We introduce RealICU, a hindsight-annotated benchmark for evaluating large language models (LLMs) under realistic ICU conditions, where labels are created after senior physicians review the full patient trajectory. We formulate four physician-motivated tasks: assess Patient Status, Acute Problems, Recommended Actions, and Red Flag actions that risk unsafe outcomes. We partition each trajectory with 30-min windows and release two datasets: RealICU-Gold with 930-window annotations from 94 MIMIC-IV patients, and RealICU-Scale with 11,862 windows extended by Oracle, a physician-validated LLM hindsight labeler. Existing LLMs including memory-augmented ones performed poorly on RealICU, exposing two failure modes: a recall-safety tradeoff for clinical recommendations, and an anchoring bias to early interpretations of the patient. We further introduce ICU-Evo to study structured-memory agents that improves long-horizon reasoning but does not fully eliminate safety failures. Together, RealICU provides a clinically grounded testbed for measuring and improving AI sequential decision-support in high-stakes care. Project page: https://chengzhi-leo.github.io/RealICU-Bench/
Constitutional Governance in Metric Spaces
Computational social choice and algorithmic decision theory offer rich aggregation theory but no end-to-end, polynomial-time process for egalitarian self-governance: prior work treats aggregation, deliberation, amendment, and consensus in isolation, and key metric-space aggregators are NP-hard. We propose constitutional governance in metric spaces, integrating these stages into one polynomial-time process. The constitution assigns, per amendable component, a metric space, aggregation rule, and supermajority threshold. Each member submits an ideal element -- both vote and personal proposal. Any member may then submit a public proposal carrying supermajority public support under the revealed votes -- sourced from coalition deliberation, optimization, or AI mediation. The constitutional rule scores proposals against the status quo, adopting the supported proposal of positive maximal score (else retaining the status quo); the same rule, possibly with a higher threshold, amends the constitution itself. We develop the generalised median as the worked rule, establish framework-level guarantees, prove no misreport weakly dominates sincere voting, and study the compromise gap between best peak and unconstrained optimum -- zero in one dimension, bounded in general, narrowed in simulation by a simple heuristic. We instantiate the framework on seven canonical settings; the mean appears as a utilitarian alternative in the appendix. By unifying metric-space aggregation, reality-aware social choice, supermajority amendment, constitutional consensus, deliberative coalition formation, and AI mediation, this work delivers a comprehensive solution to the constitutional democratic governance of digital communities and organisations.
Multi-Agent Systems in Emergency Departments: Validation Study on a ED Digital Twin
Emergency departments (ED) face challenges in patient care and resource management. We propose to explore optimization strategies in a realistic and flexible model and develop a hybrid Discrete Event Simulation (DES) and Agent-Based Model (ABM) simulating highly configurable ED environments. We specifically focus on the validation of the modeling approach. We derive configurations for ED sizes, patient load, and staffing from real-world studies. We then validate the model expressivity by matching its key performance indicators and metrics with their values known from literature. We proceed by implementing scientifically established and practice-proven resource optimization strategies. Comparing the documented real-world outcomes with our model's results demonstrates that the DES-ABM based simulation can effectively replicate real-world ER dynamics under interventions. We lastly integrate a Proof-of-Concept multi-agent system (MAS) that can autonomously explore resource allocation strategies within the simulated ER environment based on a temporal ledger of ED event records. This modular DES-ABM-MAS framework offers a powerful tool to explore resource optimization strategies in emergency departments.
IdeaForge: A Knowledge Graph-Grounded Multi-Agent Framework for Cross-Methodology Innovation Analysis and Patent Claim Generation
Current AI-assisted innovation systems typically apply a single ideation methodology (such as TRIZ or Design Thinking) using sequential prompt-based workflows that do not preserve intermediate reasoning structure. As a result, insights generated across methodologies remain fragmented, limiting traceability, synthesis, and systematic evaluation of novelty. We present IdeaForge, a knowledge graph-grounded multi-agent framework for innovation analysis and patent claim generation. IdeaForge integrates multiple innovation methodologies (TRIZ, Design Thinking, and SCAMPER) through specialist agents operating over a persistent FalkorDB knowledge graph. Each agent contributes structured entities and relationships representing contradictions, inventive principles, user needs, transformations, analogies, and candidate claims. The central contribution of IdeaForge is a cross-methodology convergence mechanism implemented through graph-based claim linkage. Claims independently supported by multiple methodologies are connected using CONVERGENT relationships, enabling identification of high-confidence innovation candidates through graph traversal. A downstream patent drafting agent generates structured patent drafts grounded in convergent claim subgraphs, reducing reliance on unconstrained language model generation. An InnovationScore formula ranks claims by convergent support, methodology diversity, claim strength, and prior art challenge count. We describe the graph schema, agent architecture, convergence detection pipeline, and patent synthesis workflow. Experiments on a legal technology use case demonstrate that graph-grounded multi-methodology synthesis produces more diverse and traceable innovation candidates compared to single-methodology baselines. We discuss implications for computational creativity, explainable AI-assisted invention, and graph-native innovation systems.
comment: 14 pages, 3 figures, 6 tables
Discrete Diffusion for Complex and Congested Multi-Agent Path Finding with Sparse Social Attention
Multi-Agent Path Finding (MAPF) is a coordination problem that requires computing globally consistent, collision-free trajectories from individual start positions to assigned goal positions under combinatorial planning complexity. In dense environments, suboptimal initial plans induce compound conflicts that hinder feasible repair. For repair-based solvers like LNS2, initial plan quality critically affects downstream repair, yet this factor remains underexplored. We propose DiffLNS, a hybrid framework that integrates a discrete denoising diffusion probabilistic model (D3PM) with LNS2. The D3PM serves as an initializer with sparse social attention that learns a spatiotemporal prior over coordinated multi-agent action trajectories from expert demonstrations and samples multiple joint plans. Operating directly on the categorical action space, our discrete diffusion preserves the MAPF action structure and samples from a multimodal joint-plan distribution to produce diverse drafts well suited for neighborhood repair. These drafts act as warm starts for downstream repair, which completes unfinished trajectories and resolves remaining conflicts under hard MAPF constraints. Experimental results show that despite being trained only on instances with at most 96 agents, the initializer generalizes to scenarios with up to 312 agents at inference time. Across 20 complex and congested settings, DiffLNS achieves an average success rate of 95.8%, outperforming the strongest tested baseline by 9.6 percentage points and matching or exceeding all baselines in all 20 settings. To the best of our knowledge, this is the first work to leverage discrete diffusion for warm-starting an LNS-based MAPF solver.
comment: 24 pages, 7 figures
CANTANTE: Optimizing Agentic Systems via Contrastive Credit Attribution
LLM-based multi-agent systems have demonstrated strong performance across complex real-world tasks, such as software engineering, predictive modeling, and retrieval-augmented generation. Yet automating their configuration remains a structural challenge, as scores are available only at the system level, whereas the parameters governing agent behavior are local. We argue that optimizing these systems is fundamentally a credit-assignment problem. We therefore introduce CANTANTE, a framework that decomposes system-level rewards into per-agent update signals by contrasting rollouts of multiple joint configurations on the same query. We instantiate it for prompt optimization, treating agent prompts as learnable system parameters. We evaluate CANTANTE against GEPA and MIPROv2 on programming (MBPP), mathematical reasoning (GSM8K), and multi-hop question answering (HotpotQA). Across these benchmarks, CANTANTE achieves the best average rank among all evaluated optimizers and consistently outperforms unoptimized prompts. It improves over the strongest baseline by +18.9 percentage points on MBPP and +12.5 percentage points on GSM8K, while incurring a lower inference cost. It remains within one standard deviation of the strongest baseline on HotpotQA. Crucially, our credit correlation analysis confirms that the attributer produces meaningful per-agent signals rather than echoing the global system score.
Decoupled Planning for Multiple Omega-Regular Objectives
We study the problem of generating paths on a graph that satisfy a collection of ω-regular objectives. We propose a decoupled framework in which each objective is assigned to an independent agent that selects a local policy, while a scheduler -- oblivious to the graph and objective -- dynamically composes these policies into a single path. We ask when such a composition satisfies all objectives, assuming their conjunction is realizable. The framework enables modular policy design but raises fundamental compositional challenges. We show that even extremely fair deterministic schedulers do not ensure correctness, and that stochastic schedulers, while necessary, are insufficient without coordination. For safety objectives, we demonstrate that fully decentralized implementations are impossible, and we introduce a protocol for synchronizing on maximal safe actions. For non-safety objectives, we introduce conventions -- simple, a priori restrictions agreed upon before the graph or objectives are revealed -- that guarantee satisfaction of all objectives when followed by all agents. We characterize minimally restrictive conventions for major subclasses of ω-regular objectives. In particular, Büchi objectives admit universal composition of finite-memory policies without scheduler communication; co-Büchi objectives require only knowledge of whether the agent was scheduled; and parity objectives additionally require knowledge of which agent was scheduled.
comment: 33 pages, 6 figures. Extended version of the paper accepted at CAV 2026
When Does Hierarchy Help? Benchmarking Agent Coordination in Event-Driven Industrial Scheduling
Recent advances in agent and multi-agent systems have shown strong performance on tool use, reasoning, and collaborative tasks. However, existing benchmarks mostly evaluate task completion in weakly coupled environments, and provide limited support for studying coordination in shared, dynamically evolving systems with hierarchy and coupled constraints. This leaves an important question underexplored: when do different coordination paradigms succeed or fail? We introduce Distributed Event-driven Scheduling Benchmark (DESBench), a benchmark for evaluating agent coordination in hierarchical event-driven scheduling. Built on a shared discrete-event driven environment in industrial scheduling, our benchmark captures multi-timescale decision making, partial observability, and dynamically coupled constraints. We define tasks and metrics that evaluate effectiveness, constraint alignment, coordination efficiency, and robustness, and focus on four representative coordination paradigms: centralized, hierarchical, heterarchical, and holonic. These paradigms correspond to distinct mechanisms of information flow, decision authority, and conflict resolution. Our controlled evaluations reveal clear coordination trade-offs: centralized coordination is robust and communication-efficient but scales poorly with difficulty; hierarchical coordination improves efficiency through decomposition but suffers from cross-level misalignment; heterarchical coordination is flexible but communication-heavy; and holonic coordination satisfies constraints well but loses global robustness. These findings demonstrate that coordination design fundamentally shapes agent system behavior in complex environments, revealing structural trade-offs that cannot be captured by outcome metrics alone and underscoring the imperative for more adaptive, principled, and dynamic coordination mechanisms in future MAS research.
Finding the Weakest Link: Adversarial Attack against Multi-Agent Communications AAMAS 2026
Multi-agent systems rely on communication for information sharing and action coordination, which exposes a vulnerability to attacks. We investigate single-victim communication perturbation attacks against Multi-Agent Reinforcement Learning-trained systems and propose methods that use gradient information from the Jacobian to identify which messages, agent, and timesteps are most susceptible to attack and have the greatest impact on the system. We enhance these methods with two proposed adversarial loss functions that trade-off attack success for attack impact which also create more effective perturbations. We empirically demonstrate the effectiveness of our methods against two different multi-agent communication methods in navigation, PredatorPrey, and TrafficJunction environments. Our results show that our novel message selection method achieves a similar or greater impact than random message selection across almost all tested scenarios. Our victim selection, message selection, tempo, and loss functions improve attack effectiveness in half of the thirty scenarios we tested.
comment: Full version of the Extended Abstract presented at AAMAS 2026
A Multi-Agent Orchestration Framework for Venture Capital Due Diligence
We present a fully automated multi-agent framework for corporate due diligence and market analysis in venture capital. The system runs on an event-driven orchestration architecture, combining Large Language Models (LLMs) with real-time web retrieval to synthesize unstructured data into structured investment intelligence. A central technical contribution is a programmatic extraction pipeline that reverse-engineers the frontend-to-backend communication of the Greek Business Registry ($Γ$.E.MH.), querying dynamic endpoints to retrieve official financial filings that are then parsed using a layout-aware OCR extractor. A structural fallback mechanism explicitly flags data absence rather than generating unverified figures, directly targeting hallucination in financial contexts. All workflow artifacts are publicly available to support replication.
comment: 13 pages, 1 figure
Counterfactual Reasoning for Causal Responsibility Attribution in Probabilistic Multi-Agent Systems
Responsibility allocation -- determining the extent to which agents are accountable for outcomes -- is a fundamental challenge in the design and analysis of multi-agent systems. In this work, we model such systems as concurrent stochastic multi-player games and introduce a notion of retrospective (backward) counterfactual responsibility, which quantifies an agent's accountability for outcomes resulting from a given strategy profile. To allocate responsibility among agents, we utilise the Shapley value and formally show that this method satisfies key desirable properties, including fairness and consistency. Building on this foundation, we propose a formal framework that supports both verification and strategic reasoning in responsibility-aware multi-agent systems. Furthermore, by adopting Nash equilibrium as the solution concept, we demonstrate how to compute stable strategy profiles in which agents trade off responsibility against expected reward.
Conveyor Parcel Routing with Order-Contiguous Arrivals
In warehouse logistics, parcels released from the outfeed of an automated storage system must be routed through conveyor networks to workstations. Beyond collision avoidance, practical operations impose an additional requirement of order-contiguous arrivals: at each delivery point, parcels belonging to the same order must arrive as a consecutive block in the arrival sequence to reduce downstream re-sorting effort. We formalize this problem as online multi-agent path finding with order-contiguity (online MAPF-OC), where agents (i.e., parcels) appear over time and exit upon delivery. To efficiently solve online MAPF-OC, we propose Dual-Ordering Prioritized Planning (DOPP), a complete polynomial-time algorithm with a three-level structure that (i) searches order-level arrival sequences, (ii) refines agent-level priorities, and (iii) synthesizes feasible solutions via prioritized planning. Experiments on various conveyor-network layouts, including those derived from actual warehouses, demonstrate DOPP's practical scalability and ability to generate high-quality plans within tight time budgets.
Occlusion-Based Object Transportation Around Obstacles With a Swarm of Miniature Robots
Swarm robotics utilises decentralised self-organising systems to form complex collective behaviours built from the bottom-up using individuals that have limited capabilities. Previous work has shown that simple occlusion-based strategies can be effective in using swarm robotics for the task of transporting objects to a goal position. However, this strategy requires a clear line-of-sight between the object and the goal. In this paper, we extend this strategy by allowing robots to form sub-goals; enabling any member of the swarm to establish a wider range of visibility of the goal, ultimately forming a chain of sub-goals between the object and the goal position. We do so while preserving the fully decentralised and communication-free nature of the original strategy, while maintaining performance in object-free scenarios. In five sets of simulated experiments, we demonstrate the generalisability of our proposed strategy. Our finite-state machine allows a sufficiently large swarm to transport objects around obstacles that block the goal. The method is robust to varying starting positions and can handle both concave and convex shapes.
comment: 25 pages, 9 figures, 6 tables. Accepted for publication in the journal Swarm Intelligence
Embodied Multi-Agent Coordination by Aligning World Models Through Dialogue
Effective collaboration between embodied agents requires more than acting in a shared environment; it demands communication grounded in each agent's evolving understanding of the world. When agents can only partially observe their surroundings, coordination without communication is provably hard, but communication can, in principle, bridge this gap by allowing agents to share observations and align their world models. In this work, we examine whether LLM-based embodied agents actually realize the ability to communicate. We extend PARTNR, a benchmark for collaborative household robotics, with a natural-language dialogue channel that enables two agents with partial observability to communicate during task execution. To evaluate whether dialogue leads to genuine world-model alignment rather than superficial coordination, we propose a framework for measuring world-model alignment defined over per-agent world graphs: observation convergence (do private world models align over time?), information novelty (do messages convey what the partner lacks?), and belief-sensitive messaging (do agents model what their partner knows?). Our experiments across three LLMs reveal that dialogue reduces action conflicts 40 to 83 percentage points but degrades task success relative to silent coordination. Using our metrics, we characterize the gap between superficial coordination and genuine world-model alignment, and identify where current models fall on this spectrum.
SHM-Agents: A Generalist-Specialist Integrated Agent System for Structural Health Monitoring
Artificial intelligence is increasingly used to simplify complex tasks. In engineering applications of structural health monitoring (SHM), existing specialized algorithms, while effective, often face high implementation barriers, limited interoperability and complex training procedures. To overcome these challenges, this paper proposes SHM-Agents, a generalist-specialist agent system that integrates the reasoning and planning abilities of large language models with the problem-solving strengths of specialized algorithms. SHM-Agents enables end-to-end execution of single and combined SHM tasks via natural language, supports deep learning pre-training to simplify deployment and allows flexible expansion through a modular design. Experiments on a long-span cable-stayed bridge show that SHM-Agents can accurately and efficiently perform diverse SHM tasks, including data anomaly diagnosis and recovery, signal processing, statistical analysis, modal identification, damage identification, finite element model updating, vehicle load modeling, response calculation, reliability assessment, fatigue estimation and bridge knowledge Q\&A.
comment: 19 pages, 20 figures
ChipMATE: Multi-Agent Training via Reinforcement Learning for Enhanced RTL Generation
Existing API-based agentic systems for RTL code generation are fundamentally misaligned with industrial practice: they assume a golden testbench is available at generation time, rely on closed-source APIs incompatible with chip vendors' air-gapped security requirements, and cannot be trained on vendors' proprietary RTL codebases, leaving valuable internal data unused. Recent self-trained models address the deployment constraint but remain single-turn generators that overlook the critical role of verification in real industrial flows. To bridge these gaps, we present ChipMATE, the first self-trained multi-agent framework for RTL generation. Inspired by industrial practice where correctness emerges from cross-comparison between independently written RTL modules and reference models, ChipMATE pairs a Verilog agent with a Python reference-model agent that mutually verify each other's outputs without any golden oracle. We design a backtrack-based inference workflow to prevent error propagation across turns, and a two-stage training pipeline that first trains each agent individually to saturate its code-generation capability, then trains the team jointly to collaborate effectively. To support the training, we further build a hybrid data-generation framework that produces 64.4K high-quality reference model training samples. ChipMATE achieves 75.0\% and 80.1\% pass@1 on VerilogEval V2 with 4B and 9B base models, outperforming all existing self-trained models and even DeepSeek V4 with 1600B parameters. Our code and model weights are publicly available in https://github.com/zhongkaiyu/ChipMATE.
Privacy Preserving Multi Agent Path Finding AAMAS 2026
In the multi-agent path finding (MAPF) problem, a group of agents search in a graph for a path for each agent where no two paths collide. While in all applications of MAPF the agents must not collide with each other, in some of them the agents may not wish to share their paths due to privacy constraints. In this work, we formulate two types of privacy constraints for MAPF and propose algorithms that preserve them. The first type of privacy we consider is planning-level privacy, which means that during planning, the agents cannot identify exactly the planned location of the other agents. We propose a general framework for obtaining planning-level privacy, which works by adding mock agents to the planning process. The second type of privacy we consider is execution-level privacy, which is relevant when agents have limited sensing capabilities. Execution-level privacy is preserved if none of the agents is allowed to sense the location of the other agents during execution. We show how to adapt two popular MAPF algorithms, namely PIBT and LaCAM, such that they preserve execution-level privacy. Lastly, we propose a post-processing technique that allows the agents to reduce the sum of costs of the returned solution without losing any privacy. We also implemented our algorithms and evaluated them empirically, showing that the proposed post-processing technique indeed improved cost significantly.
comment: 16 pages, 5 figures, to be published in AAMAS 2026 as an extended abstract
ProtoMedAgent: Multimodal Clinical Interpretability via Privacy-Aware Agentic Workflows
While interpretable prototype networks offer compelling case-based reasoning for clinical diagnostics, their raw continuous outputs lack the semantic structure required for medical documentation. Bridging this gap via standard Retrieval-Augmented Generation (RAG) routinely triggers ``retrieval sycophancy,'' where Large Language Models (LLMs) hallucinate post-hoc rationalizations to align with visual predictions. We introduce ProtoMedAgent, a framework that formalizes multimodal clinical reporting as an iterative, zero-gradient test-time optimization problem over a strict neuro-symbolic bottleneck. Operating on a frozen prototype backbone, we distill latent visual and tabular features into a discrete semantic memory. Online generation is strictly constrained by exact set-theoretic differentials and a reflective Scribe-Critic loop, mathematically precluding unsupported narrative claims. To safely bound data disclosure, we introduce a semantic privacy gate governed by $k$-anonymity and $\ell$-diversity. Evaluated on a 4,160-patient clinical cohort, ProtoMedAgent achieves 91.2\% Comparison Set Faithfulness where it fundamentally outperforms standard RAG (46.2\%). ProtoMedAgent additionally leverages a binding $\ell$-diversity phase transition to systematically reduce artifact-level membership inference risks by an absolute 9.8\%.
comment: CVR 2026
ICRL: Learning to Internalize Self-Critique with Reinforcement Learning
Large language model-based agents make mistakes, yet critique can often guide the same model toward correct behavior. However, when critique is removed, the model may fail again on the same query, indicating that it has not internalized the critique's guidance into its underlying capability. Meanwhile, a frozen critic cannot improve its feedback quality over time, limiting the potential for iterative self-improvement. To address this, we propose learning to internalize self-critique with reinforcement learning(ICRL), a novel framework that jointly trains a solver and a critic from a shared backbone to convert critique-induced success into unassisted solver ability. The critic is rewarded based on the solver's subsequent performance gain, incentivizing actionable feedback. To address the distribution shift between critique-conditioned and critique-free behavior, ICRL introduces a distribution-calibration re-weighting ratio that selectively transfers critique-guided improvements compatible with the solver's own prompt distribution. Additionally, a role-wise group advantage estimation stabilizes joint optimization across the two roles. Together, these mechanisms ensure that the solver learns to improve itself without external critique, rather than becoming dependent on critique-conditioned behavior. We evaluate ICRL on diverse benchmarks spanning agentic and mathematical reasoning tasks, using Qwen3-4B and Qwen3-8B as backbones. Results show consistent improvements, with average gains of 6.4 points over GRPO on agentic tasks, and 7.0 points on mathematical reasoning. Notably, the learned 8B critic is comparable to 32B critics while using substantially fewer tokens. The code is available at https://github.com/brick-pid/ICRL.
The fitness landscape of social norms in social dilemmas
By specifying behaviour across multiple agents, social norms are a coordination approach to resolving social dilemmas. Decentralized and wide adoption can be achieved by norms whose prescription involves interpreting stochastic signals in the environment. Such signals must have enough correlation to orchestrate mutually beneficial coordination and enough disincentivizing uncertainty about the benefits of exploiting that coordination. Evolutionary game theory of matrix games has been used to describe how, by rational agents comparing and adopting norms, a norm can evolve to become dominant in a population. Morsky \& Akçay (2019) classify norms according to a set of rationality criteria. Joint player strategies that adopt norms that are consistent with optimal single-player strategies with respect to expected reward naturally satisfy a correlated, rather than Nash game theoretic equilibrium condition. Here, we present a version of this theory that clarifies the basic ingredients. We formulate it in the more general Markov game setting more commonly used in reinforcement learning theory. We illustrate the theory by mapping norms over the signal and reward space, while also giving a detailed exposition of the underlying mechanics of the approach. Finally, we give a general solution and analysis of replicator dynamics, which Morsky \& Akçay (2019) propose as a means by which these norms could emerge.
Randomise Alone, Reach as a Team
We study concurrent graph games where n players cooperate against an opponent to reach a set of target states. Unlike traditional settings, we study distributed randomisation: team players do not share a source of randomness, and their private random sources are hidden from the opponent and from each other. We show that memoryless strategies are sufficient for the threshold problem (deciding whether there is a strategy for the team that ensures winning with probability that exceeds a threshold), a result that not only places the problem in the Existential Theory of the Reals (\exists\mathbb{R}) but also enables the construction of value iteration algorithms. We additionally show that the threshold problem is NP-hard. For the almost-sure reachability problem, we prove NP-completeness. We introduce Individually Randomised Alternating-time Temporal Logic (IRATL). This logic extends the standard ATL framework to reason about probability thresholds, with semantics explicitly designed for coalitions that lack a shared source of randomness. On the practical side, we implement and evaluate a solver for the threshold and almost-sure problem based on the algorithms that we develop.
comment: 50 pages, 7 figures. Extended version of the CAV 2026 paper with the same title
GAAMA: Graph Augmented Associative Memory for Agents
AI agents that interact with users across multiple sessions require persistent long-term memory to maintain coherent, personalized behavior. Current approaches either rely on flat retrieval-augmented generation (RAG), which loses structural relationships among memories, or use entity-centric knowledge graphs that suffer from mega-hub effects in conversational data, diluting graph-based relevance propagation. We propose GAAMA, a graph-augmented associative memory for agents that constructs a concept-mediated knowledge graph through a three-step pipeline: (1)verbatim episode preservation, (2)LLM-based extraction of atomic facts and topic-level concept nodes, and (3)synthesis of higher-order reflections. The resulting graph uses four node types (episode, fact, reflection, concept) connected by five structural edge types, with concept nodes providing cross-cutting traversal paths that avoid the mega-hub problem of entity-centric designs. Retrieval combines cosine-similarity-based k-nearest neighbor search with edge-type-aware Personalized PageRank (PPR) through an additive scoring function. We further introduce GRAFT (Graph Repair by Augmenting Facts & Topology), a post-retrieval corrective layer that diagnoses retrieval failures and surgically repairs the knowledge graph. On LoCoMo-10 (1,540 questions, 10 multi-session conversations), GAAMA achieves 79.1% mean reward, a +4.2~pp improvement over a tuned RAG baseline, the strongest comparator. On MemoryArena, GAAMA outperforms full-context baselines across three tasks - Group Travel (+0.4~pp), Web Shopping (+3.4~pp), and Progressive Search (+0.7~pp) - with advantages growing monotonically with dialogue length. Notably, GAAMA delivers consistent performance across all categories, matching the best competing method in each, whereas every competitor degrades in at least one category.
Events as Triggers for Behavioral Diversity in Multi-Agent Reinforcement Learning
Effective multi-agent cooperation requires agents to adopt diverse behaviors as task conditions evolve-and to do so at the right moment. Yet, current Multi-Agent Reinforcement Learning (MARL) frameworks that facilitate this diversity are still limited by the fact that they bind fixed behaviors to fixed agent identities. Consequently, they are ill-equipped for tasks where agents need to take on different roles at very specific moments in time. We argue that, to define these behavioral transitions, the missing ingredient is $\textbf{events}$. Events are changes in the state of the system that induce qualitative changes in the task. Based on this view, we introduce a framework that decouples agent identity from behavior, capturing a continuous manifold from which agents instantiate their behaviors in response to events. This framework is based on two elements. First, to build an expressive behavior manifold, we introduce Neural Manifold Diversity (NMD), a formal distance metric that remains well-defined when behaviors are transient and agent-agnostic. Second, we use an event-based hypernetwork that generates Low-Rank Adaptation (LoRA) modules over a shared team policy, enabling on-the-fly agent-policy reconfiguration in response to events. We prove that this construction ensures that diversity does not interfere with reward maximization by design. Empirical results demonstrate that our framework outperforms established baselines across benchmarks while exhibiting zero-shot generalization, and being the only method that solves tasks requiring sequential behavior reassignment.
Semantic knowledge guides innovation and drives cultural evolution
Cultural evolution allows ideas and technologies to accumulate across generations, reaching their most complex and open-ended form in humans. While social learning enables the transmission of such innovations, the cognitive processes that generate them remain poorly understood. Classical theories typically treat innovation as random variation, a simplification insufficient for explaining the complexity of human cultural evolution. We propose that semantic knowledge-the associations linking concepts to their properties and functions-guides human innovation and drives cumulative culture. To test this, we combined an agent-based model, which examines how semantic knowledge shapes cultural evolutionary dynamics, with a large-scale behavioral experiment (N = 1,243) testing its role in human innovation. Across both approaches, we found that semantic knowledge directed exploration toward meaningful solutions, enhanced innovation success, and enabled generalization from prior discoveries. Moreover, semantic knowledge interacted synergistically with social learning to amplify innovation and accelerate cumulative cultural change. In contrast, experimental participants lacking access to semantic knowledge performed no better than chance, even when social learning was possible, and relied on shallow exploration strategies for innovation. Together, these findings suggest that semantic knowledge is a key cognitive process underpinning human cumulative culture.
Dicey Games: Shared Sources of Randomness in Distributed Systems
Consider a 4-player version of Matching Pennies where a team of three players competes against the Devil. Each player simultaneously says "Heads" or "Tails". The team wins if all four choices match; otherwise the Devil wins. If all team players randomise independently, they win with probability 1/8; if all players share a common source of randomness, they win with probability 1/2. What happens when each pair of team players shares a source of randomness? Can the team do better than win with probability 1/4? The surprising (and nontrivial) answer is yes! We introduce Dicey Games, a formal framework motivated by the study of distributed systems with shared sources of randomness (of which the above example is a specific instance). We characterise the existence, representation and computational complexity of optimal strategies in Dicey Games, and we study the problem of allocating limited sources of randomness optimally within a team.
comment: 16 pages, 9 figures. To be published at LICS 2026
Systematic Failures in Collective Reasoning under Distributed Information in Multi-Agent LLMs ICML 2026
Multi-agent systems built on large language models (LLMs) are expected to enhance decision-making by pooling distributed information, yet systematically evaluating this capability has remained challenging. We introduce HiddenBench, a 65-task benchmark grounded in the Hidden Profile paradigm, which isolates collective reasoning under distributed information from individual reasoning ability. Evaluating 15 frontier LLMs, we find that multi-agent LLMs achieve only 30.1% accuracy under distributed information, compared to 80.7% accuracy for single agents given complete information. We trace this gap to a systematic failure mode: agents cannot recognize or act under latent information asymmetry -- they fail to reason about what others might know but have not yet expressed, leading to premature convergence on shared evidence while critical distributed facts remain unexplored. These failures persist across prompting strategies, communication depths, and group sizes -- and worsen as groups scale. While some models (e.g., Gemini-2.5-Flash/Pro) outperform others, neither model scale nor individual reasoning accuracy reliably predicts collective performance. We further show that this bottleneck is actionable: a lightweight structured communication protocol substantially improves collective reasoning across model families. Our results identify failures in collective information exploration in decision-making as a key limitation of multi-agent LLMs, and provide a theory-grounded, reproducible framework for diagnosing collective reasoning failures.
comment: Accepted to ICML 2026
When Identity Overrides Incentives: Representational Choices as Governance Decisions in Multi-Agent LLM Systems
Multi-agent systems built on large language models are increasingly deployed in strategic policy and governance settings, where agents representing stakeholders with conflicting interests must coordinate under shared constraints. These systems typically assign role-based personas to agents, describing their motivations and objectives. Whether agents with role-based identities follow explicit payoffs or their assigned roles in strategic decision-making remains untested. Here we show that assigning role-based personas suppresses payoff-aligned behavior in four-agent strategic games, shifting equilibrium attainment by up to 90 percentage points even when agents have complete payoff information. We test a 2x2 factorial design (persona presence x payoff visibility) across four models (Qwen-7B, Qwen-32B, Llama-8B, Mistral-7B), and 53 environmental policy scenarios with two equilibria: Tragedy of the Commons, where individual payoff dominates, and Green Transition, where collective payoff dominates. With personas present, all models reach near-zero Tragedy equilibrium in the Tragedy-dominant scenarios despite complete payoff information, and 100% of equilibria correspond to Green Transition. No model reaches Tragedy equilibrium by removing personas alone; only Qwen models reach 65-90% Tragedy equilibrium rates when personas are removed, and payoffs are made explicit. Three distinct behavioral profiles emerge: Qwen shifts equilibrium selection based on framing condition, Mistral increases response variance without reaching the Tragedy equilibrium, and Llama holds near-constant across all conditions. Representational choices in multi-agent LLM systems are governance decisions: persona assignment determines which equilibrium a simulation produces, independent of the underlying incentive structure.
RDMA: Cost Effective Agent-Driven Rare Disease Mining from Electronic Health Records
Rare diseases affect 1 in 10 Americans yet remain systematically underdocumented in clinical records. ICD-based systems cannot capture their breadth, over 50\% of Orphanet codes lack a direct ICD mapping and only 2.2\% of HPO codes have matching ICD codes, leaving patient populations invisible and delaying diagnosis. Mining unstructured clinical notes offers a direct path forward, but real notes are long, noisy, and abbreviation-dense, and limited annotations make fine-tuning infeasible, demanding approaches that generalize without task-specific training. We present Rare Disease Mining Agents (RDMA), an agentic framework equipping smaller quantized LLMs with tools for abbreviation resolution, implicit phenotype reasoning, and ontology grounding against Orphanet and HPO. RDMA substantially outperforms fine-tuned and RAG-based baselines across benchmarks with different data characteristics, without any task-specific training. A small quantized model achieves maximal performance, reducing inference costs by up to 10x and local hardware costs by up to 17x, enabling private deployment on standard hardware without cloud-based PHI exposure. RDMA's uncertainty-flagging mechanism further reduces expert annotation burden while preserving agreement quality, supporting scalable rare disease documentation in clinical practice. Available at https://github.com/jhnwu3/RDMA.
Context Learning for Multi-Agent Discussion
Multi-Agent Discussion (MAD) has garnered increasing attention very recently, where multiple LLM instances collaboratively solve problems via structured discussion. However, we find that current MAD methods easily suffer from discussion inconsistency, LLMs fail to reach a coherent solution, due to the misalignment between their individual contexts.In this paper, we introduce a multi-LLM context learning method (M2CL) that learns a context generator for each agent, capable of dynamically generating context instructions per discussion round via automatic information organization and refinement. Specifically, inspired by our theoretical insights on the context instruction, M2CL train the generators to control context coherence and output discrepancies via a carefully crafted self-adaptive mechanism.It enables LLMs to avoid premature convergence on majority noise and progressively reach the correct consensus. We evaluate M2CL on challenging tasks, including academic reasoning, embodied tasks, and mobile control. The results show that the performance of M2CL significantly surpasses existing methods by 20%--50%, while enjoying favorable transferability and computational efficiency.
AgentForesight: Online Auditing for Early Failure Prediction in Multi-Agent Systems
LLM-based multi-agent systems are increasingly deployed on long-horizon tasks, but a single decisive error is often accepted by downstream agents and cascades into trajectory-level failure. Existing work frames this as \emph{post-hoc failure attribution}, diagnosing the responsible agent and step after the trajectory has ended. However, this paradigm forfeits any opportunity to intervene while trajectory is still unfolding. In this work, we introduce AgentForesight, a framework that reframes this problem as online auditing: at each step of an unfolding trajectory, an auditor observes only the current prefix and must either continue the run or alarm at the earliest decisive error, without access to future steps. To this end, we curate AFTraj-2K, a corpus of agentic trajectories across Coding, Math, and Agentic domains, in which safe trajectories are retained under a strict curation pipeline and unsafe trajectories are annotated at the step of their decisive error via consensus among multiple LLM judges. Built on that, we develop AgentForesight-7B, a compact online auditor trained with a coarse-to-fine reinforcement learning recipe that first equips it with a risk-anticipation prior at the failure boundary on adjacent safe/unsafe prefix pairs, then sharpens this prior into precise step-level localization under a three-axis reward jointly targeting the what, where, and who of an audit verdict. Across AFTraj-2K and an external Who\&When benchmark, AgentForesight-7B outperforms leading proprietary models, including GPT-4.1 and DeepSeek-V4-Pro, achieving up to +19.9% performance gain and 3$\times$ lower step localization error, opening the loop from post-hoc failures detection to enabling deployment-time intervention. Project page: https://zbox1005.github.io/agent-foresight/
comment: 33 pages, 7 figures
Sequential Resource Trading Using Comparison-Based Gradient Estimation
We study sequential multi-issue trading between two greedily rational agents who exchange resources from a finite set of categories. Each agent's utility depends on its allocation, but the offering agent does not know the responding agent's utility function and receives only accept or reject feedback. We propose a comparison-based algorithm that interprets acceptance and rejection responses as pairwise state comparisons, allowing the offering agent to iteratively estimate the responding agent's gradient. Rejected offers prune the space of feasible gradient directions, enabling systematic refinement of possibly mutually beneficial trades. The algorithm guarantees that each accepted trade strictly improves both agents' utilities and, after finitely many rejected offers, either identifies a mutually beneficial trade or certifies that the current allocation is weakly Pareto optimal. We further show that the sequence of accepted trades asymptotically converges to the Pareto front under mild assumptions. We evaluate the method against standard baselines and show that it achieves higher societal benefit with fewer offers across multiple trading settings. We further validate the approach in a user study, demonstrating strong performance in scenarios with substantial resource conflict.
OMAC: A Holistic Optimization Framework for LLM-Based Multi-Agent Collaboration ICML 2026
Agents powered by advanced large language models (LLMs) have demonstrated impressive capabilities across diverse complex applications. Recently, Multi-Agent Systems (MAS), wherein multiple agents collaborate and communicate with each other, have exhibited enhanced capabilities in complex tasks, such as high-quality code generation and arithmetic reasoning. However, the development of such systems often relies on handcrafted methods, and the literature on systematic design and optimization of LLM-based MAS remains limited. In this work, we introduce \textbf{OMAC}, a general framework designed for holistic optimization of LLM-based MAS. Specifically, we identify five key optimization dimensions for MAS, encompassing both agent functionality and collaboration structure. Building upon these dimensions, we first propose a general algorithm, utilizing two actors termed the Semantic Initializer and the Contrastive Comparator, to optimize any single dimension. Then, we present an algorithm for joint optimization across multiple dimensions. Extensive experiments demonstrate the superior performance of OMAC on diverse tasks against recent approaches.
comment: Accepted as a Spotlight paper at ICML 2026
Predictive Maps of Multi-Agent Reasoning: A Successor-Representation Spectrum for LLM Communication Topologies
Practitioners deploying multi-agent large language model (LLM) systems must currently choose between communication topologies such as chain, star, mesh, and richer variants without any pre-inference diagnostic for which topology will amplify drift, converge to consensus, or remain robust under perturbation. Existing evaluation answers these questions only post hoc and only for the task measured. We introduce a structural diagnostic for multi-agent LLM communication graphs based on the successor representation $M = (I - γP)^{-1}$ of the row-stochastic communication operator, and we connect three of its spectral quantities, the spectral radius $ρ(M)$, the spectral gap $Δ(M)$, and the condition number $κ(M)$, to three distinct failure modes. We derive closed-form spectra for the chain, star, and mesh under row-stochastic normalization, and validate the predictions on a 12-step structured state-tracking task with Qwen2.5-7B-Instruct over 100 independent trials. The condition number is a perfect rank-order predictor of empirical perturbation robustness ($r_s = 1.0$); the spectral gap partially predicts consensus dynamics ($r_s = 0.5$); and the spectral radius is perfectly \emph{inverted} with respect to cumulative error ($r_s = -1.0$). We trace this inversion to a regime in which linear spectra are blind to non-contracting bias drift, and we propose an affine-noise extension of the predictive map that recovers the empirical ordering. We read this as a first step toward representational, drift-aware structural diagnostics for multi-agent LLM systems, sitting alongside classical spectral and consensus theory.
GAMBIT: A Three-Mode Benchmark for Adversarial Robustness in Multi-Agent LLM Collectives
In multi-agent systems (MAS), a single deceptive agent can nullify all gains of an agentic AI collective and evade deployed defenses. However, existing adversarial studies on MAS target only shallow tasks and do not consider adaptive adversaries, which evolve their strategies to evade the very detectors trained to catch them. To address that gap, we introduce GAMBIT, a benchmark with three evaluation modes and two independent scores for evaluating imposter detectors: the first two modes measure zero-shot detection under increasing distribution shift, and a third recalibration mode measures how quickly a detector adapts to novel attacks from just 20 labeled examples. The benchmark comes with a dataset of 27,804 labeled instances spanning 240 co-evolved imposter strategies. Our contributions are threefold: (1) Using chess as a substrate deep reasoning problem and Gemini 3.1 Pro for agents, we release GAMBIT and its dataset to evaluate imposter detectors under realistic constraints against a stealthy adaptive imposter; (2) We introduce an adaptive imposter agent based on an efficient evolutionary framework, generalizable beyond chess, that collapses collective task performance while remaining essentially undetectable (50.5% F1-score with a Gemini-based detector); (3) We show that zero-shot evaluation can be highly misleading for adaptive adversaries: two detectors with near-identical zero-shot scores differ by 8x on few-shot adaptation, while the meta-learned variant converges 20x faster, a gap only visible in the recalibration mode. Altogether, GAMBIT provides the first multi-agent benchmark where adversarial attacks and defenses co-evolve, with an imposter framework generalizable beyond our use case, and promising techniques for fast recalibration in a rapidly evolving adversarial system. Code and data: https://anonymous.4open.science/r/gambit.
comment: 46 pages, 16 figures
Systems and Control (EESS)
Reachable-Set Decomposition for Real-Time Aggregation of Multi-Zone HVAC Fleets
Aggregating building heating, ventilation, and air-conditioning (HVAC) fleets provides substantial real-time flexibility to power system operations. However, real-time aggregation of multi-zone HVAC fleets faces two key challenges: (i) strong coupling across zones and time makes flexibility characterization high-dimensional and computationally demanding, and (ii) the sequential revelation of temperature states and exogenous conditions requires that decisions made at each period preserve feasibility over the remaining horizon using only currently realized information. To address these challenges, this paper proposes a reachable-set decomposition framework comprising an offline decomposition stage and a real-time policy. In the offline stage, backward reachable sets are formulated to encode remaining-horizon feasibility into per-period state constraints, so that any state within the current reachable set is guaranteed to sustain feasible operation over the entire remaining horizon. A tailored inner approximation is then developed for tractable calculation in multi-zone-coupled HVAC settings. In the real-time stage, aggregate flexibility is computed efficiently via building-level parallel linear programs followed by closed-form Minkowski summation of power intervals, and any regulation signal within the reported flexibility interval admits a recursively feasible disaggregation. Case studies demonstrate the effectiveness of the proposed framework in aggregate flexibility characterization, disaggregation feasibility, and scalable computation.
comment: 10 pages, 9 figures
Loiter UAV Reinsertion Guidance for Fixed-wing UAV Corridors
This paper considers fixed-wing unmanned aerial vehicle (UAV) corridors comprising a main lane, a circular loiter lane for managing traffic congestion, and transit lanes connecting the two. In particular, we address the problem of conflict-free reinsertion of UAVs from the loiter lane back into the main lane. The loiter lane contains a fixed number of equidistant virtual slots that UAVs can occupy. Reinsertion of loiter UAVs into the main lane becomes essential either due to reduced traffic in the main lane or due to a loiter UAV needing to reach its destination urgently. Given the total number of loiter slots, UAV speed limits, and the minimum safety distance, a guidance algorithm is developed to compute the required speed of a loiter UAV in the transit lane to ensure safe reinsertion. The proposed guidance and automation strategies are validated through numerical simulations.
Learning Responsibility-Attributed Adversarial Scenarios for Testing Autonomous Vehicles
Establishing trustworthy safety assurance for autonomous driving systems (ADSs) requires evidence that failures arise from avoidable system deficiencies rather than unavoidable traffic conflicts. Current adversarial simulation methods can efficiently expose collisions, but generally lack mechanisms to distinguish these fundamentally different failure modes. Here we present CARS (Context-Aware, Responsibility-attributed Scenario generation), a framework that integrates responsibility attribution directly into adversarial scenario generation. CARS combines context-aware adversary selection with a generative adversarial policy optimized in closed-loop simulation to construct collision scenarios that are both physically feasible and diagnostically attributable. Across benchmark datasets spanning heterogeneous national traffic environments, CARS consistently discovers feasible collision scenarios with high attribution rates under multiple regulation-prescribed careful and competent driver models. By coupling adversarial generation with normative responsibility assessment, CARS moves simulation testing beyond collision discovery toward the construction of interpretable, regulation-aligned safety evidence for scalable ADS validation.
TinySDP: Real Time Semidefinite Optimization for Certifiable and Agile Edge Robotics
Semidefinite programming (SDP) provides a principled framework for convex relaxations of nonconvex geometric constraints in motion planning, yet existing solvers are too computationally expensive for real-time control, particularly on resource-constrained embedded systems. To address this gap, we introduce TinySDP, the first semidefinite programming solver designed for embedded systems, enabling real-time model-predictive control (MPC) on microcontrollers for problems with nonconvex obstacle constraints. Our approach integrates positive-semidefinite cone projections into a cached-Riccati-based ADMM solver, leveraging computational structure for embedded tractability. We pair this solver with an a posteriori rank-1 certificate that converts relaxed solutions into explicit geometric guarantees at each timestep. On challenging benchmarks, e.g., cul-de-sac and dynamic obstacle avoidance scenarios that induce failures in local methods, TinySDP achieves collision-free navigation with up to 73% shorter paths than state-of-the-art baselines. We validate our approach on a Crazyflie quadrotor, demonstrating that semidefinite constraints can be enforced at real-time rates for agile embedded robotics.
comment: Accepted to Robotics: Science and Systems (RSS) 2026. 11 pages, 5 figures, 2 tables. Project website: https://a2r-lab.org/TinySDP/
Bounded-Input True Proportional Navigation for Impact-Time Control
This paper proposes a nonlinear guidance strategy capable of intercepting a constant-velocity, non-maneuvering target while strictly satisfying the prescribed bounds on the control input (commanded acceleration). Unlike conventional strategies that estimate time-to-go using linearization or small-angle approximations, the proposed strategy employs true proportional-navigation guidance (TPNG) as a baseline, which utilizes an exact time-to-go formulation and is applicable over a wide range of target motions. In contrast to most existing strategies, which do not incorporate control input bounds into the guidance design, the proposed approach explicitly accounts for these limits by modeling the interceptor acceleration as a dynamic variable. Based on the sliding mode control technique, an effective guidance law that achieves time-constrained interception while accounting for bounded input is then derived. The performance of the proposed strategy is evaluated for various engagement scenarios.
comment: Preprint; Accepted for presentation at the 15th Asian Control Conference, June 17th-21st, 2026, Indonesia
Decentralized Frequency-Domain Conditions for D-Stability with Application to DC Microgrids
This paper proposes a decentralized method for regional pole placement, or $\mathcal{D}$-stability, in linearized networked systems. Existing LMI-based methods are hindered by confidentiality concerns regarding proprietary subsystem models and the absence of communication infrastructures. To overcome these barriers, we map the target region $\mathcal{D}$ of pole placement to an auxiliary left-half plane and introduce positive functions to handle the resulting complex-coefficient dynamics. We prove that $\mathcal{D}$-stability is guaranteed via local frequency-domain criteria without requiring shared subsystem models or inter-subsystem communication. This method is then tailored to DC microgrids, where a loop transformation is utilized to reallocate the burden of stability certification, deriving a broadcastable grid code for decentralized parameter synthesis. Numerical examples verify the efficacy of the proposed method.
Learning a Contracting KKL-observer with Local Optimal Guarantees
The Kazantzis-Kravaris-Luenberger (KKL) observer provides a general framework for nonlinear state estimation by immersing the system dynamics into a stable linear or nonlinear latent dynamics. However, the performance of KKL observers relies heavily on the specific choice of these latent dynamics, which is often heuristic. This paper proposes a methodology to learn a KKL observer that combines global stability guarantees with local optimality. We derive a condition on the latent dynamics such that the observer locally mimics the behavior of a Minimum Energy Estimator (Mortensen observer). We then employ Deep Learning to approximate the KKL transformation and the latent dynamics, using neural network architectures that structurally enforce the contraction property. The proposed strategy is validated through numerical simulations on nonlinear benchmarks, demonstrating a good performance in the presence of state and measurement noise.
comment: Accepted to the 23rd IFAC World Congress 2026
Sensitivity Quantification for Distribution System State Estimation
Pseudo-measurements are the dominant source of uncertainty in distribution system state estimation (DSSE), yet their distributional assumptions are treated as fixed inputs by existing uncertainty quantification methods. This paper investigates whether the uncertainty bounds assumed by weighted least squares (WLS)-based DSSE are sensitive to these distributional assumptions, and whether this sensitivity is quantifiable using the Fisher Information Matrix (FIM). We propose a diagnostic framework that compares the true Cramér-Rao Bound (CRB) against the WLS-assumed CRB via a per-bus, per-scenario ratio, computed directly from the converged WLS solution. Pseudo-measurement distributions are varied across five types in 22 variants matched at equal spread to isolate shape effects from variance. Experiments on the CIGRE MV network across 100 operating scenarios yield three findings. First, heavy-tailed and skewed distributions show consistently that WLS systematically overstates its uncertainty bounds. Second, the degree of miscalibration varies across buses and operating scenarios, confirming that distributional sensitivity is not uniform. Third, the CRB ratio is structurally blind to mean-shift bias, exposing a fundamental limitation of variance-based uncertainty diagnostics. Together, these results confirm the hypothesis and show that the choice of pseudo-measurement distribution directly distorts the confidence limits under WLS-based assumptions, which must be explicitly accounted for in any uncertainty-aware DSSE method.
comment: Submitted for peer review
Impedance-Based VSC Unit Commitment with STATCOM Support under High IBG Penetration
The large-scale replacement of synchronous machines with inverter-based generation (IBG) introduces critical challenges to both voltage and frequency stability. This work builds on a mixed-integer second-order cone programming (MISOCP) framework that co-optimizes unit commitment (UC) model which embeds frequency-nadir constraints through synthetic inertia (SI) dispatch and an SOC voltage stability boundary for IBG buses. The formulation extends by modeling a STATCOM as a reactive-power decision variable in the same MISOCP model. A modified IEEE 30-bus system is used to assess three scheduling strategies: (i) baseline UC with SI only, (ii) voltage-stability-constrained (VSC) UC with SI, and (iii) the joint UC with SI and reactive power support from IBGs. The impact of incorporating a 30~MVAr STATCOM at a weak grid location near the IBG buses is investigated. Simulation results show that the proposed framework enhances voltage security, maintains frequency-nadir compliance, and reduces operating cost, while STATCOM integration further improves dispatch feasibility under high IBG.
comment: Electric Power Systems Research
Embodied Neurocomputation: A Framework for Interfacing Biological Neural Cultures with Scaled Task-Driven Validation
Biological neural networks (BNNs) have been established as a powerful and adaptive substrate that offer the potential for incredibly energy and data efficient information processing with distinct learning mechanisms. Yet a core challenge to utilizing BNN for neurocomputation is determining the optimal encoding and decoding mechanisms between the traditional silicon computing interface and the living biology. Here, we propose an Embodied Neurocomputation framework as a systems-level approach to this multi-variable optimization encoding/decoding problem. We operationalize this approach through the first large-scale parameter optimization of encoding configurations for a BNN agent performing closed-loop navigation along an odor-style gradient in a simulated grid-world. Despite the relative simplicity of the task, the biological interactions gave rise to a massive multi-combinatorial search space for optimal parameters. By considering how the components of the system are interconnected and parameterized, we evaluated approximately 1,300 parameter combinations, over 4,000 hours of real-time agent-environment interactions, to identify 12 configurations that consistently demonstrated learning across multiple episodes. These configurations achieved significantly higher task performances than optimized silicon-based DQN agents under the same interaction budget. These findings represent an initial step toward robust and scalable goal-oriented learning using BNNs. Our framework establishes a foundation for applying task-driven neurocomputing and supports the development of field-wide benchmarks. In the long term, this work supports the development of hybrid bio-silicon architectures capable of efficient, adaptive and real-time computation, including the potential for robotic control applications.
Safe Bayesian Optimization for Uncertain Correlations Matrices in Linear Models of Co-Regionalization
This paper extends safety guarantees for multi-task Bayesian optimization with uncertain correlation matrices from intrinsic co-reginalization models to linear models of co-reginalization. The latter allows for more flexible modeling of the inter-task correlations by composing multiple features. We derive uniform error bounds for vector-valued functions sampled from a Gaussian process with a linear model of co-reginalization kernel. Furthermore, we show the potential improvement of performance using linear models of co-reginalization in a numerical comparison on a safe multi-task Bayesian optimization benchmark.
comment: Accepted at IFAC WC26
Submodular Multi-Agent Policy Learning for Online Distributed Task Allocation in Open Multi-Agent Systems
This paper studies multi-agent reinforcement learning with submodular team utilities for online distributed task allocation. In this setting, each agent selects one action from a local categorical policy, so feasible joint actions form a partition matroid over agent-action pairs. Classical multilinear extensions use independent Bernoulli sampling and therefore do not match the categorical policies executed by decentralized agents. To address this mismatch, we introduce the Partition Multilinear Extension (PME), a continuous relaxation whose value equals the expected team utility under factorized categorical policies. We prove that submodular difference rewards provide unbiased PME marginal-gradient information and yield a stagewise score-function policy-gradient estimator. Based on this connection, we propose SubMAPG, a centralized-training decentralized-execution policy-gradient framework with masked categorical policies and submodular difference-reward training signals. For the associated PME marginal-space projected stochastic-gradient dynamics, we prove a stagewise 1/2-approximation guarantee and sublinear dynamic regret in slowly varying environments, measured by the path length of the optimal PME marginals. To handle open systems with time-varying agents and targets, we instantiate SubMAPG with graph neural network policies. Experiments on multi-robot coverage and multi-target tracking show that SubMAPG outperforms local greedy and shared-reward baselines and is competitive with centralized myopic greedy strategies.
Real-time Gaussian Process based Approximate Model Predictive Trajectory Tracking Control for Autonomous Vehicles
Applying model predictive control on embedded systems remains challenging due to the high computational cost of solving optimal control problems. To address this limitation, computationally efficient Gaussian process approximations of the implicit model predictive control law can be employed. However, for trajectory-tracking applications, the large amount of training data required for successful generalization across distinct reference trajectories poses a significant challenge. To improve data efficiency, we propose to transform the model into curvilinear coordinates around the reference trajectory. Secondly, we use a nominal feedforward component, allowing the Gaussian process to learn only the residual control input, making the approximation of a trajectory-tracking controller feasible. To underline the applicability of the approach, we deploy the controller on a Raspberry Pi in a small-scale vehicle and validate it experimentally. Compared to a model predictive control implementation using real-time iterations, the Gaussian process based approximation computes control inputs about five times faster while achieving similar closed-loop tracking performance.
comment: accepted for the 24th European Control Conference (ECC) 2026
Subspace Pruning via Principal Vectors for Accurate Koopman-Based Approximations
The accuracy of Koopman operator approximations over finite-dimensional spaces relies critically on their invariance properties. These can be rigorously quantified via the principal angles between a candidate subspace and its image under the Koopman operator. This paper proposes a unified algebraic framework for subspace pruning designed to systematically refine the invariance error. We establish the geometric equivalence between consistency-based methods and principal-vector pruning, and build on this insight to introduce a hybrid strategy that balances between multiple and single principal vector pruning for improved numerical stability and scalability. We derive error bounds for the retention of approximate and external eigenfunctions, demonstrating that the multi-vector approach mitigates the numerical drift inherent to sequential pruning. To ensure scalability, we develop an efficient numerical update scheme based on rank-one modifications that reduces the computational complexity of tracking principal angles by an order of magnitude. Finally, we exploit the subspace obtained from the pruning algorithms to build a lifted linear model for state prediction that accounts for the trade-offs between improving invariance and minimizing state reconstruction error. Simulations demonstrate the effectiveness of our approach.
Security-Aware Planning and Control of Multi-Agent Systems with LTL Tasks
This paper presents a secure-by-construction planning and control framework for multi-agent systems subject to linear temporal logic (LTL) specifications. The framework protects sensitive information from a passive intruder with partial observations of the agents' motion. Security in multi-agent coordination is captured by two notions that prevent the intruder from inferring whether a secret task has been executed and from identifying the agent responsible for its execution. The proposed framework incorporates the security constraints directly into the LTL synthesis procedure by constructing a secure finite transition system that removes all paths violating these constraints. Standard LTL synthesis is then applied to this secure abstraction to generate discrete plans, which are then refined into dynamically feasible continuous trajectories. This synthesis procedure provides formal guarantees that the resulting behavior of the multi-agent system satisfies both the global LTL specification and the security constraints. The effectiveness of the proposed framework is demonstrated through a two-drone case study.
comment: 8 pages, 2 figures; This paper has been accepted at the IFAC World Congress 2026
D-Optimized Sampling Design for System Identification
Traditional system identification with multisine inputs relies on uniform sampling and periodic excitation to preserve Fourier orthogonality and avoid spectral leakage, limiting its use in scenarios with irregular sampling or nonperiodic inputs. This work investigates continuous-time system identification under nonperiodic multisine excitation and nonuniform sampling. We develop a nonparametric frequency response function estimator suited to such conditions and design irregular sampling schemes that enhance the informativeness of measurements and reduce spectral leakage. The proposed sampling scheme improve the statistical accuracy of system identification in settings where periodic excitation is impractical.
comment: 6 pages, 2 figures. Accepted for IFAC 2026. Final author version
Guaranteed cost structured control in infinite-horizon linear-quadratic cooperative differential games
In this paper, we consider infinite-horizon linear-quadratic cooperative differential games with output feedback information structure. We first demonstrate that, under output feedback information structure, computing Pareto optimal controls can be difficult even for simple low-dimensional differential games. To address this issue, this paper introduces the concept of feedback guaranteed cost structured control (GCSC). The feedback GCSC concept is inspired from suboptimal control. At a feedback GCSC, the total weighted team cost remains below a prescribed threshold while satisfying the structural constraints. We derive fundamental properties of the feedback GCSC and the admissible weight set, including their monotonicity properties. In particular, we show that if Pareto optimal controls exist, they belong to the class of feedback GCSCs. We also quantify the suboptimalty of Pareto optimal controls (if they exist) and the proposed GCSC with respect to output feedback optimal control. Furthermore, we provide the conditions for verification and the synthesis of a feedback GCSC. Finally, we illustrate the effectiveness of the proposed approach through numerical examples, including a case study on tracking synchronization in a microgrid.
Revisiting Voltage and Synchronization Stability Analysis in Converter-Integrated Weak Grids: Insights from Non-Minimum-Phase Zeros
The increasing penetration of converter-interfaced generators (CIGs) intensifies concerns over small-signal voltage and synchronization stability. While existing theories treat these two stability issues distinctly, practical wisdom in contrast employs a unified and static metric, short-circuit ratio (SCR), to assess both in weak grids. This paper aims to bridge this theory-practice gap by introducing the insight of non-minimum phase (NMP) zeros. First, we demonstrate that the two stability issues in weak grids originate from NMP zeros in the grid Jacobian transfer matrix: a zero at the origin corresponds to voltage instability, while low-frequency zeros impose fundamental constraints on synchronization dynamics. The traditional SCR is proven to be a special case of our proposed novel stability metric, NMP-zero (NMP-Z) factor, evaluated at the rated operating point. This establishes the theoretical foundation for the empirical success of SCR. Building on this insight, we then develop a unified stability assessment method for multi-converter systems. The method retains the simplicity of SCR, requiring only the NMP-Z factor together with individual CIG dynamic models and enabling stability margin assessment under various operating points. Our work provides a simple yet theoretically rigorous framework for stability analysis in CIG-integrated weak grids, with all theoretical findings and the proposed method validated through detailed time-domain simulations.
comment: 10 pages
Relative Pose-Velocity Estimation Using Dual IMU Measurements and Relative Position Sensing
This paper addresses the problem of estimating the relative pose (position and orientation) and velocity of a vehicle with respect to a moving target, where both are equipped with Inertial Measurement Units (IMUs), assuming the availability of relative position or bearing measurements. The body-target relative dynamics are formulated on $\mathbf{SE}_2(3)$ and recast into a linear time-varying (LTV) model in the ambient space $\mathbb{R}^{15}$, on which a deterministic Riccati observer is designed. We analyze the uniform observability (UO) conditions required to guarantee global exponential convergence of the estimation error in the ambient space for both measurement cases. In the case of relative position measurements, UO requires only a persistence-of-excitation condition on the target acceleration, whereas for bearing measurements, additional conditions are required. Building on this, a nonlinear complementary filter on $\mathbf{SO}(3)$ is designed to provide a smooth estimate of the orientation component of the state with almost global asymptotic stability. Finally, simulation results are provided to validate the proposed solution.
Local Conformal Calibration of Dynamics Uncertainty from Semantic Images
We introduce Observation-aware Conformal Uncertainty Local-Calibration (OCULAR), a conformal prediction-based algorithm that uses perception information to provide uncertainty quantification guarantees for unseen test-time environments. While previous conformal approaches lack the ability to discriminate between state-action space regions leading to higher or lower model mismatch, and require environment-specific data, our method uses data collected from visually similar environments to provably calibrate a given linear Gaussian dynamics model of arbitrary fidelity. The prediction regions generated from OCULAR are guaranteed to contain the future system states with, at least, a user-set likelihood, despite both aleatoric and epistemic uncertainty -- i.e., uncertainty arising from both stochastic disturbances and lack of data. Our guarantees are non-asymptotic and distribution-free, not requiring strong assumptions about the unknown real system dynamics. Our calibration procedure enables distinguishing between observation-velocity-action inputs leading to higher and lower next-state-uncertainty, which is helpful for probabilistically-safe planning. We numerically validate our algorithm on a double-integrator system subject to random perturbations and significant model mismatch, using both a simplified sensor and a more realistic simulated camera. Our approach appropriately quantifies uncertainty both when in-distribution and out-of-distribution, being comparatively volume-efficient to baselines requiring environment-specific data.
comment: 26 pages, 8 figures. Accepted to the 17th World Symposium on the Algorithmic Foundations of Robotics (WAFR) 2026
Amortized Guidance for Image Inpainting with Pretrained Diffusion Models
We study image inpainting with generative diffusion models. Existing methods typically either train dedicated task-specific models, or adapt a pretrained diffusion model separately for each masked image at deployment. We introduce a middle-ground model, termed Amortized Inpainting with Diffusion (AID), which keeps a pretrained diffusion backbone fixed, trains a small reusable guidance module offline, and then reuses it across masked images without per-instance optimization. We formulate it as a deterministic guidance problem with a supervised terminal objective. To make this problem learnable in high dimensions, we derive an auxiliary Gaussian formulation and prove that solving this randomized problem recovers the optimal deterministic guidance field. This bridge yields a principled continuous-time actor--critic algorithm for learning the guidance module in a fully data-driven manner. Empirically, on AFHQv2 and FFHQ under the pixel EDM pipeline and on ImageNet under the latent EDM2 pipeline, AID consistently improves the quality--speed trade-off over strong fixed-backbone and amortized inpainting baselines across multiple mask types, while adding less than one percent trainable overhead.
Distributionally Robust Safety Under Arbitrary Uncertainties: A Safety Filtering Approach
In this work, we study how to ensure probabilistic safety for nonlinear systems under distributional ambiguity. Our approach builds on a backup-based safety filtering framework that switches between a high-performance nominal policy and a certified backup policy to ensure safety. To handle arbitrary uncertainties from ambiguous distributions, i.e., where the distribution is not of specific structure and the true distribution is unknown, we adopt a distributionally robust (DR) formulation using Wasserstein ambiguity sets. Rather than solving a high-dimensional DR trajectory optimization problem online, we exploit the structure of backup-based safety filtering to reduce safety certification to a one-dimensional search over the switching time between nominal and backup policies. We then develop a sampling-based certification procedure with finite-sample guarantees, where empirical failure probabilities are compared against a Wasserstein-inflated threshold. We validate our method through simulations across three systems, from a Dubins vehicle to a high-speed racing car and a fighter jet, demonstrating the broad applicability and computational efficiency.
comment: 10 pages, 4 figures, submitted to IEEE Robotics and Automation Letters (RA-L)
Port-Hamiltonian Systems with Dissipation Potential: Modelling and Trajectory Tracking Control
Port-Hamiltonian systems (PHS) and interconnection and damping assignment passivity-based control (IDA-PBC) have achieved broad success in modelling and stabilisation of physical systems. However, the absence of a dedicated scalar potential for the momentum channel forces any modification of the momentum-dependent dynamics to proceed indirectly through the interconnection and damping matrices, rendering the matching partial differential equation (PDE) difficult to solve and complicating extensions to trajectory tracking. This paper proposes a port-Hamiltonian system with dissipation potential (PHS-DP), in which the damping matrix is replaced by scalar convex dissipation potentials, providing independent scalar objects for the momentum and auxiliary state channels and restoring the variational symmetry between stored and dissipated energy. Building on this framework, Dual Potential Shaping Control (DPSC) achieves trajectory tracking by sequentially shaping the potential energy and dissipation potentials without modifying the interconnection structure. Contraction of the closed-loop cascade is established via a hierarchical contraction argument, and the matching condition is satisfied automatically for any admissible choice of shaped potentials, requiring no PDE to be solved. In contrast to existing PDE-free energy shaping approaches, which achieve this by abandoning the port-Hamiltonian closed-loop structure and sacrificing physical interpretability, the proposed framework preserves the interconnection structure and retains a transparent energy-based interpretation at every stage of the design. Validation on a magnetic levitation system demonstrates tracking performance comparable to timed IDA-PBC with substantially reduced design complexity.
comment: 7 pages, 2 figures
Day-to-Day Traffic Network Modeling under Route-Guidance Misinformation: Endogenous Trust and Resilience in CAV Environments
Connected and autonomous vehicles and smart mobility services increasingly use digital route guidance as an operational input to traffic network management. When this information becomes unreliable or adversarial, day-to-day traffic models must represent not only flow adaptation but also the evolution of user trust in the information source. This paper develops a coupled day-to-day traffic assignment and trust-evolution framework for route-guidance misinformation. Within-day congestion is represented by Lighthill-Whitham-Richards network loading, while day-to-day route choice follows bounded-rationality logit learning with trust-dependent reliance on external guidance. Trust is modeled as an aggregate class-level behavioral reliance state encoded by a Beta evidence model and updated from repeated guidance errors. Theoretical analysis establishes stationary equilibria, a conservative stability guide, a weighted compliance index for population-level vulnerability, and an asymmetric recovery law that explains post-attack trust hysteresis. Numerical experiments on Sioux Falls, with an Anaheim robustness check, show that endogenous trust creates a threshold-based resilience mechanism. Below the trust-activation threshold, the attack remains behaviorally stealthy and dynamic trust provides almost no attenuation. Above the threshold, trust erosion reduces the impact of the fixed-trust attack by about 91 percent in Sioux Falls and 85 percent in Anaheim. The experiments also show that CAV penetration increases fixed-trust vulnerability while preserving dynamic attenuation, and that traffic performance can recover before trust, resulting in a 77-day hidden vulnerability window. The results provide a trust-aware modeling basis for resilience analysis in CAV-enabled traffic networks.
comment: 10 pages, 7 figures. Under review at IEEE Transactions on Intelligent Transportation Systems
Motion Planning for Autonomous Vehicles using Optimization over Graphs of Convex Sets
Motion planning for autonomous vehicles requires generating collision-free and dynamically feasible trajectories in complex environments under real-time constraints. While nonlinear optimal control formulations provide high-fidelity solutions, they are computationally demanding and sensitive to initialization, whereas geometric planning methods scale well but often decouple path selection from trajectory optimization. This paper studies the extent to which optimization over Graphs of Convex Sets (GCS) can approximate solutions of nonlinear optimal control problems in the context of autonomous driving. The free space is represented as a finite union of convex regions organized as a directed graph, allowing nonconvex geometry to be handled through discrete connectivity decisions while maintaining convex trajectory constraints within each region. Vehicle motion is parameterized using Bezier curves for the spatial path and a polynomial time-scaling function for temporal evolution. Under small-slip and linear tire assumptions, a simplified dynamic bicycle model enables approximate enforcement of dynamic feasibility through convex constraints on trajectory derivatives. The approach is evaluated in CommonRoad scenarios involving static obstacle avoidance and lane-changing maneuvers, and is compared against a nonlinear discrete-time optimal control formulation. The results indicate that the GCS-based method generates collision-free and dynamically consistent trajectories that closely match those obtained from the nonlinear program, while exhibiting improved computational efficiency and reduced sensitivity to initialization. These findings suggest that GCS provides a structured approximation of nonlinear motion planning problems, capturing dominant geometric and dynamic effects while preserving convexity in the continuous relaxation.
Time Domain Near Memory Computing Engine
The increasing computational demand of AI workloads has intensified the need for energy-efficient in-memory and near-memory computing architectures, particularly because data movement often consumes significantly more energy than computation itself. While fully digital architectures provide robust scalability and support higher-resolution computation, analog in-memory computing has demonstrated improved energy efficiency for low-precision workloads. However, its reliance on peripheral DACs and ADCs introduces additional power, area, and design overhead. To address these challenges, this work presents a time-domain near-memory computing architecture for low-precision multiply-and-accumulate (MAC) operations. In the proposed approach, digital weight bits stored in SRAM are converted using a current-steering DAC, while the digital input vector is encoded by an N-pulse generator. This enables multiplication to be performed in the time domain while maintaining a digital-friendly interface. Two accumulation schemes, a delay-cell-based architecture and a counter-based architecture, are investigated and compared in terms of design trade-offs, linearity, scalability, and power efficiency. To improve technology portability, the N-pulse generator and counters are implemented using RTL synthesis, while the current-steering DAC remains in the analog domain. A 4 x 4 MAC prototype is implemented with a 1 V supply, achieving an operating frequency of 40 MHz, power consumption of 42 uW, and energy efficiency of 7.62 TOPS/W.
comment: 8 pages, 9 figures, 1 Comparison Table
Optimizing Grid-Forming Controls using Relay-based Extremum Seeking to Enhance Transient Performance
Grid-forming (GFM) inverters are essential for enhancing stability in modern power systems with high penetration of inverter-based resources (IBRs). However, their performance highly depends on control parameters tuning, particularly the active power-frequency droop coefficient. This parameter presents a trade-off among competing objectives, including damping, settling time, rate of change of frequencies (RoCoF) and frequency nadirs. This paper proposes a real-time, adaptive optimization framework based on Extremum Seeking Control (ESC) to dynamically tune the GFM droop gain. A multi-objective cost function balances conflicting performance goals such as oscillation energy, frequency nadir, RoCoF, and post-disturbance settling performance. The approach is validated through numerical simulations on a modified IEEE 68-bus system. Results demonstrate that the cost function is convex with respect to the droop parameter, justifying gradient-based optimization. Furthermore, the ESC algorithm successfully tracks the time-varying optimal droop coefficient in real-time as network conditions change, thereby ensuring robust and near-optimal system performance without requiring an analytical grid model.
An Encoded Corrective Double Deep Q-Networks for Multi-Agent Control Systems
This paper studies the synthesis of control policies for heterogeneous and interconnected multi-agent systems that collaborate through data exchange over a communication network to minimize a collective cost. We propose a distributed encoded corrective double actor-critic framework that integrates a novel message-passing mechanism. Existing methods assume noise-free and delay-free access to the global or partial states and overlook the fact that the global states, though noisy and delayed, can be progressively reconstructed and refined over time. In contrast, this work explicitly models communication sampling asynchrony, delay, and link noise based on the network configuration. The proposed message-passing mechanism characterizes timing and information flow to refine and time shift global state information, which is then used to incrementally correct the Q-networks. The double Q-network design mitigates overestimation bias, while the shared encoder coupling the actor-critic networks captures inter-agent dependencies. We evaluate our approach in multiple test cases, demonstrate its effectiveness over various baselines, and provide a numerical regret analysis.
Grid Integration of Gigawatt-Scale AI Data Centers under Connect-and-Manage
Emerging connect-and-manage interconnection practices allow gigawatt-scale artificial intelligence data centers (AIDCs) to connect to the transmission network without prior network upgrades, at the cost of real-time curtailment during grid stress. This paper formalizes the resulting AIDC-transmission system operator (TSO) coordination as a sequential request-acceptance protocol with an explicit curtailment variable and a strict information boundary between the two parties. Physical models are developed on both sides of the point of common coupling: the AIDC is decomposed into frontier training, batch training, and inference serving subclasses sharing on-site battery energy storage, capturing differentiated temporal flexibility; the transmission network is modeled via DC power flow with generator constraints and budget-constrained demand uncertainty. Because the TSO's acceptance mapping is opaque to the AIDC, a three-layer hierarchical architecture is formulated in which a learning-based planning layer generates power requests, the TSO evaluates each request through a robust acceptance mechanism, and a single-step execution optimizer enforces internal feasibility under the realized power budget. Case studies with a gigawatt-scale AIDC on the IEEE 39-bus system with Australian market data show that the framework reduces curtailment from 9.1% to 2.8% while preserving 98.1% frontier training workload, that batch training acts as the primary grid-elastic resource with the largest throughput swing during peak demand, and that the on-site battery provides curtailment buffering through active discharge and charge deferral.
Battery-Assisted Operation of Hyperscale AI Data Centers under Connect-and-Manage Interconnection Practices
Emerging connect-and-manage practices allow new transmission-connected mega-loads to connect while enforcing time-varying admissible power exchange limits at the point of common coupling (PCC) in real time. Hyperscale artificial intelligence data centers (AIDCs), whose demand can reach hundreds of megawatts and whose internal computing-cooling dynamics evolve rapidly, can therefore face frequent conflicts between workload continuity requirements and externally imposed PCC envelopes. This paper proposes a battery-assisted operational framework in which on-site battery energy storage (BESS) serves as a physical buffering interface to reconcile fast internal dynamics with time-varying interconnection limits. A continuity-aware energy-computation model is developed to jointly capture checkpoint-constrained AI training workloads, information technology (IT) computing power-throughput characteristics, and IT-cooling thermal dynamics. A two-stage decision framework is then formulated, consisting of scenario-based day-ahead workload commitment and a real-time receding-horizon delivery assurance controller that enforces battery, thermal, and grid-interaction constraints. Case studies on the IEEE 39-bus system with Australian real data demonstrate that BESS substantially increases credible day-ahead workload commitment and improves real-time delivery robustness under transmission congestion. Sensitivity analyses further reveal a regime-dependent role transition of BESS -- from feasibility-oriented continuity support when PCC limits are binding to economy-driven flexibility provision as transmission constraints are relaxed.
JAX-Based Batched AC Power Flow for GPU Acceleration and AI Ecosystem Integration
Coordinating growing grid flexibility under uncertainty is becoming increasingly important for efficient and reliable power-system operation. A core computational requirement is the efficient large-scale batched evaluation of AC power flow across candidate operating actions and uncertainty scenarios. Previous work has explored GPU-based batched power-flow evaluation, but has largely relied on hand-written C or CUDA code, creating barriers to customisation, efficient kernel optimisation, and long-term maintenance. JAX is a Python-based framework that enables efficient accelerator execution while keeping implementations in Python. This letter therefore proposes a JAX-based batched AC power-flow solver that uses current JAX functionality to implement Newton--Raphson for transmission networks and Z-Bus power flow for three-phase unbalanced distribution networks, achieving more than 10x speed-ups relative to pandapower and OpenDSS. In addition, JAX integrates seamlessly with the broader JAX-based AI ecosystem, making it straightforward to embed power-flow evaluation within AI methods for future larger-scale and more complex power-system operation.
comment: In review for IEEE POWER ENGINEERING LETTERS
Frequency Nadir-Constrained Power System Restoration Planning with Energy Storage
Power system restoration following blackouts must ensure frequency stability throughout the recovery process. This paper proposes a frequency-constrained mixed-integer linear programming (MILP) framework for black-start restoration planning in transmission systems with synchronous machines and energy storage systems. To prevent excessive frequency deviations caused by restorative actions, a frequency nadir prediction method is developed for power systems with energy storage system (ESS) integration and incorporated into a multiperiod optimization framework. The formulation ensures that frequency deviations resulting from restorative actions remain within prescribed safe limits. Furthermore, the presented framework leverages ESSs to enhance frequency security and recovery speed. Case studies on a modified IEEE 9-bus system demonstrate that the computed restoration plan maintains frequency security, as validated through MATLAB and PSS/E simulations, while reducing restoration time through ESS coordination.
comment: 10 pages
Receding Horizon Multi-Agent Deceptive Path Planner
Deceptive path planning enables autonomous agents to obscure their true goals from observers by deviating from an expected optimal path. Prior work largely solves full-horizon, end-to-end optimization for single agents, which is expensive to recompute online and difficult to scale or adapt en route. We propose a unified framework for deceptive path planning using a Boltzmann distribution, computing over short-horizon candidate trajectories within a receding-horizon loop. By param- By iterating a user-defined cost that captures deception, resources, and smoothness, and optionally includes coupling terms between agents, the framework yields stochastic policies that balance the tradeoff between optimal paths and deceptive deviation. Policies are updated locally and do not require training. The level of deception and adherence to constraints can be dynamically tuned, enabling online adaptation to changes in goals and constraints such as obstacles. This step-by-step tuning opens the door to new forms of dynamic deception. Simulation studies demonstrate the flexibility of our approach, maintaining deception while adapting to environmental and constraint updates, avoiding the recomputation required by full-horizon methods, and supporting intuitive tuning via a small set of parameters
Optimal design of solar-battery hybrid resources considering multi-market participation under weather and price uncertainty
The rapid growth of variable renewable energy has increased the need for flexible and efficiently coordinated energy resources. In this context, hybrid resources that combine renewable generation and battery storage within a single market-participating entity have attracted growing attention. Such hybrid resources can have multiple revenue streams, while allocating limited power and energy capacity across multiple electricity markets including energy and ancillary services. This multi-market coordination increases operational complexity and complicates profitability assessment, making optimal system sizing a challenging design problem. In addition, uncertainty in renewable generation and market prices makes it difficult for conventional optimization approaches to determine system designs that remain effective under stochastic operating conditions. To address these challenges, this paper proposes a deep reinforcement learning-based co-optimization framework for hybrid solar-battery resources. The framework embeds system design variables directly into the policy learning process, enabling joint optimization of hybrid system sizing and coordinated multi-market bidding strategies within a unified stochastic formulation. Case studies using historical renewable generation and market data demonstrate the effectiveness of the proposed framework in identifying economically rational hybrid system design considering multi-market operation.
Vision-Based Runtime Monitoring under Varying Specifications using Semantic Latent Representations
We study certified runtime monitoring of past-time signal temporal logic (ptSTL) from visual observations under partial observability. The monitor must infer safety-relevant quantities from images and provide finite-sample guarantees, while being \emph{reusable}: once trained and calibrated, it should certify any formula in a target fragment without per-formula retraining. For fragments induced by a finite dictionary of temporal atoms, we prove that the \emph{semantic basis}, the vector of atom robustness scores, is the minimum prediction target within the class of monotone, 1-Lipschitz reusable interfaces: any formula is evaluated by a deterministic decoder derived from the parse tree, and a single conformal calibration pass certifies the entire fragment with no union bound. We also introduce a \emph{rolling prediction monitor} that predicts only current predicate values and reconstructs temporal history online; this is easier to learn but grows conservative at long horizons. On a pedestrian-crossroad benchmark, rolling achieves tighter certified bounds at short horizons while the semantic-basis monitor is up to 4-times tighter at long horizons. We validate the presented monitors on real-world Waymo driving data, where both monitors satisfy the conformal coverage guarantee empirically.
AdaptNC: Adaptive Nonconformity Scores for Conformal Prediction under Distribution Shift
Rigorous uncertainty quantification is essential for the safe deployment of autonomous systems in unconstrained environments. Conformal Prediction (CP) provides a distribution-free framework for this task, yet its standard formulations rely on exchangeability assumptions that are violated by the distribution shifts inherent in real-world robotics. Existing online CP methods maintain target coverage by adaptively scaling the conformal threshold, but typically employ a static nonconformity score function. We show that this fixed geometry leads to highly conservative, volume-inefficient prediction regions when environments undergo structural shifts. To address this, we propose $\textbf{AdaptNC}$, a framework for the joint online adaptation of both the nonconformity score parameters and the conformal threshold. AdaptNC leverages an adaptive reweighting scheme to optimize score functions, and introduces a replay buffer mechanism to mitigate the coverage instability that occurs during score transitions. We evaluate AdaptNC on diverse robotic benchmarks involving multi-agent policy changes, environmental changes and sensor degradation. Our results demonstrate that AdaptNC significantly reduces prediction region volume compared to state-of-the-art threshold-only baselines while maintaining target coverage levels.
Estimation Problems and the Modulating Function Method: The Algebra of Modulating Functions
State and parameter estimation, along with fault detection, are three crucial estimation problems within the control systems community. Although different approaches have been proposed for each type of problem, the modulating function method proposes a more unified approach to all three problem classes, being used for state and parameter estimation of lumped systems, fault detection, and estimation of distributed and fractional systems. At the core of the method is the modulating function: a function that evaluates to 0 at the left or right boundaries up to a certain order of derivatives. By selecting the modulating functions, one directly determines the filter characteristics, and, for that reason, different function families have been proposed over the years. Nevertheless, many families of modulating functions are given in a rather similar mathematical structure. In light of these structures, this paper formally discusses the algebraic properties of modulating functions, and, after formalizing the closedness and group properties of modulating functions, a simple algorithm to construct new modulating functions is proposed, discussed, and illustrated with the construction of the newly introduced logarithmic modulating function families and 3 non-analytic modulating function families. Moreover, the fact that total modulating functions form a vector space and an algebra is exploited to construct orthonormal modulating functions, which are then used for the parameter estimation of a boat's roll dynamics, effectively avoiding matrix inversion issues.
comment: 13 pages, 6 figures
Unifying Entropy Regularization in Optimal Control: From and Back to Classical Objectives via Iterated Soft Policies and Path Integral Solutions
This paper develops a unified perspective on several optimal control formulations through the lens of Kullback-Leibler (KL) regularization. We propose a central problem that separates the KL penalties on policies and transitions with independent weights, thus generalizing the standard trajectory-level KL-regularization used in probabilistic optimal control. This umbrella formulation recovers various control problems: the classical Stochastic Optimal Control (SOC), Risk-Sensitive Stochastic Optimal Control (RSOC), and their policy-based KL-regularized counterparts, termed soft-policy SOC and RSOC, which yield tractable surrogates. Beyond being regularized variants, these soft-policy formulations majorize the original SOC and RSOC, thus, iterating their solutions recovers the original objectives. We further identify a synchronized case of soft-policy RSOC where the policy and transition KL weights coincide, yielding a linear Bellman operator, path-integral solution, and compositionality -- extending these computationally favourable properties to a broad class of control problems.
comment: refurbished introduction, added a few remarks, reduced size
Simplification Ad Absurdum? Revisiting Gas Flow Modeling for Integrated Energy System Planning
This paper analyzes the implications of simplified pipeline gas flow models for integrated energy system planning. A case study of an integrated power-hydrogen expansion planning problem shows that simplifying pressure-flow relationships and gas dynamics can lead to expansion plans that incur substantial regret when evaluated under a more realistic dynamic gas flow model -- due to suboptimal system expansion, operation, and non-supplied hydrogen. Numerical experiments show that planning under the highly simplified transport and transport-linepack models -- commonly used in expansion studies -- can result in regret exceeding several thousand percent and yield expansion plans that lack robustness across demand levels. Planning under steady-state conditions partially mitigates these effects, but still leaves significant cost-reduction potential untapped compared to dynamic planning due to neglected linepack flexibility. Developing efficient solution algorithms for the dynamic model is a promising direction for future research.
Approaching Safety-Argumentation-by-Design: A Requirement-based Safety Argumentation Life Cycle for Automated Vehicles SC
Despite the growing number of automated vehicles on public roads, operating such systems in open contexts inevitably involves incidents. Developing a defensible case that the residual risk is reduced to a reasonable (societally acceptable) level is hence a prerequisite to be prepared for potential liability cases. A "safety argumentation" is a common means to represent this case. In this paper, we contribute to the state of the art in terms of process guidance on argumentation creation and maintenance - aiming to promote a safety-argumentation-by-design paradigm, which mandates co-developing both the system and argumentation from the earliest stages. Initially, we extend a systematic design model for automated driving functions with an argumentation layer to address prevailing misconceptions regarding the development of safety arguments in a process context. Identified limitations of this extension motivate our complementary design of a dedicated argumentation life cycle that serves as an additional process viewpoint. Correspondingly, we define literature- and expert-based process requirements. To illustrate the safety argumentation life cycle that we propose as a result of implementing these consolidated requirements, we demonstrate principles of the introduced process phases (baselining, evolution, continuous maintenance) by an argumentation example on an operational design domain exit response.
comment: Accepted to be published in 2026 IEEE 29th International Conference on Intelligent Transportation Systems (ITSC)}, Naples, Italy, September 15-18, 2026
A Hybrid Learning-to-Optimize Framework for Mixed-Integer Quadratic Programming
In this paper, we propose a learning-to-optimize (L2O) framework to accelerate solving parametric mixed-integer quadratic programming (MIQP) problems, with a particular focus on mixed-integer model predictive control (MI-MPC) applications. The framework learns to predict integer solutions with enhanced optimality and feasibility by integrating supervised learning (for optimality), self-supervised learning (for feasibility), and a differentiable quadratic programming (QP) layer, resulting in a hybrid L2O framework. Specifically, a neural network (NN) is used to learn the mapping from problem parameters to optimal integer solutions, while a differentiable QP layer is integrated to compute the corresponding continuous variables given the predicted integers and problem parameters. Moreover, a hybrid loss function is proposed, which combines a supervised loss with respect to the global optimal solution, and a self-supervised loss derived from the problem's objective and constraints. The effectiveness of the proposed framework is demonstrated on two benchmark MI-MPC problems, with comparative results against purely supervised and self-supervised learning models.
comment: fianl L4DC 2026
Accelerating Time-Optimal Trajectory Planning for Connected and Automated Vehicles with Graph Neural Networks
In this paper, we present a learning-based framework that accelerates time- and energy-optimal trajectory planning for connected and automated vehicles (CAVs) using graph neural networks (GNNs). We formulate the multi-agent coordination problem encountered in traffic scenarios as a cooperative trajectory planning problem that minimizes travel time, subject to motion primitives derived from energy-optimal solutions. The performance of this framework can be further improved through replanning at each time step, enabling the system to incorporate newly observed information. To achieve real-time execution, we employ a graph isomorphism network with edge features (GINEConv) to learn the solutions of the time-optimal trajectory planning problem from offline-generated data. The trained model produces online predictions that serve as warm-starts for numerical optimization, thereby enabling rapid computation of minimal exit times and the associated feasible trajectories. This learning-to-warm-start approach substantially reduces computation time while preserving the control performance of the time- and energy-optimal trajectory planning framework.
comment: final IFAC WC 2026
A Data-Driven Method for Microgrid System Identification: Physically Consistent Sparse Identification of Nonlinear Dynamics
Microgrids (MGs) play a crucial role in utilizing distributed energy resources (DERs) like solar and wind power, enhancing the sustainability and flexibility of modern power systems. However, the inherent variability in MG topology, power flow, and DER operating modes poses significant challenges to the accurate system identification of MGs, which is crucial for designing robust control strategies and ensuring MG stability. This paper proposes a Physically Consistent Sparse Identification of Nonlinear Dynamics (PC-SINDy) method for accurate MG system identification. By leveraging an analytically derived library of candidate functions, PC-SINDy extracts accurate dynamic models using only phasor measurement unit (PMU) data. Simulations on a 4-bus system demonstrate that PC-SINDy can reliably and accurately predict frequency trajectories under large disturbances, including scenarios not encountered during the identification/training phase, even when using noisy, low-sampled PMU data.
comment: 5 pages, 5 figures, 2025 IEEE Power & Energy Society General Meeting (PESGM), Accepted
Cascaded TD3-PID Hybrid Controller for Quadrotor Trajectory Tracking in Wind Disturbance Environments
This work presents a cascaded hybrid control framework for quadrotor trajectory tracking under nonlinear dynamics and external disturbances. In quadrotor systems, the altitude and attitude channels exhibit fast, structured dynamics that are well suited to reliable regulation, whereas horizontal-position control is more strongly affected by coupling effects, uncertainty, and disturbances, so that neither pure feedback control nor purely learning-based control alone is equally well suited to all channels. Accordingly, the proposed framework augments conventional proportional-integral-derivative (PID) stabilization for altitude and attitude control with an enhanced Twin Delayed Deep Deterministic Policy Gradient (TD3) agent incorporating a multi-Q-network structure, thereby improving horizontal-position control under severe disturbances. To further strengthen disturbance rejection in altitude and attitude control, a hybrid disturbance observer (HDOB) using low-pass and exponential moving average filtering is embedded in the control loops. The proposed TD3 enhancements are verified through ablation studies, and both numerical simulations and real-world flight tests on the quadrotor platform demonstrate that the proposed method achieves more accurate and robust trajectory tracking under wind disturbances than baseline approaches.
Radio-Coverage-Aware Path Planning for Cooperative Autonomous Vehicles
Fleets of autonomous vehicles (AV) often are at the core of intelligent transportation scenarios for smart cities, and may require a wireless Internet connection to offload computer vision tasks to data centers located either in the edge or the cloud section of the network. Cooperation among AVs is successful when the environment is unknown, or changes dynamically, so as to improve coverage and trip time, and minimize the traveled distance. The AVs, while mapping the environment with range-based sensors, move across the wireless coverage areas, with consequences on the experienced access bit rate, latency, and handover rate. In this paper, we propose to modify the cost of common path planning algorithms such as Dijkstra and A*, so that the best path solution takes into account not only the traveled distance, but also the radio coverage experience. To this aim, several radio-related cost-weighting functions are introduced and tested, to assess the performance of the proposed approaches with extensive simulations. The proposed mapping algorithm can achieve a mapping error probability below $2\%$, while the proposed path-planning algorithms extend the radio coverage of the AVs, with only a limited increase in traveled distance with respect to shortest-path existing methods, such as conventional Dijkstra and A* algorithms.
comment: 11 pages, 19 figures
Soft Switching Expert Policies for Controlling Systems with Uncertain Parameters
This paper proposes a simulation-based reinforcement learning algorithm for controlling systems with uncertain and varying system parameters. While simulators are useful for safely learning control policies, the reality gap remains a major challenge. To alleviate this challenge, we propose a two-stage algorithm. First, multiple control policies are learned for systems with different system parameters in a simulator. Second, for a real system, the control policies are adaptively switched using an online convex optimization algorithm based on observations. This approach is expected to reduce learning complexity compared with existing approaches that rely on a single policy to address the reality gap.
comment: 7 pages, 8 figures. Accepted to IFAC World Congress 2026
Learning Adaptive Parameter Policies for Nonlinear Bayesian Filtering
For many nonlinear Bayesian state estimation problems, the posterior recursion is not analytically tractable, leading to algorithms that are influenced by numerical approximation errors. These algorithms depend on parameters that affect the approximation's accuracy and computational cost. The parameters include, for example, the number of particles, scaling parameters, and the number of iterations in iterative computations. Typically, these parameters are fixed or adjusted heuristically, although the approximation accuracy can change over time with the local degree of nonlinearity and uncertainty. The approximation errors introduced at a time step propagate through subsequent updates, affecting the accuracy, consistency, and robustness of future estimates. This paper presents adaptive parameter selection in nonlinear Bayesian filtering as a sequential decision-making problem, where parameters influence not only the immediate estimation outcome but also the future estimates. The decision-making problem is addressed using reinforcement learning to learn adaptive parameter policies for nonlinear Bayesian filters. Experiments with the unscented Kalman filter and stochastic integration filter demonstrate that the learned policies improve both estimate quality and consistency.
comment: Accepted for presentation at 29th International Conference on Information Fusion
Ageing Monitoring for Commercial Microcontrollers Based on Timing Windows
Microcontrollers are increasingly present in embedded deployments and dependable systems, for which malfunctions due to hardware ageing can have severe impact. The lack of deployable techniques for ageing monitoring on these devices has spread the application of guard bands to prevent timing errors due to degradation. Applying this static technique can limit performance and lead to sudden failures as devices age. In this paper, we follow a software-based self-testing approach to design monitoring of hardware degradation for microcontrollers. Deployable in the field, our technique leverages timing windows of variable lengths to determine the maximum operational frequency of the devices. We empirically validate the method on real hardware and find that it consistently detects temperature-induced degradations in maximum operating frequency of up to 13.79 % across devices for 60 °C temperature increase.
Lane-Aware Graph Attention Network for Multi-Vehicle Trajectory Prediction in Expressway Merge Zones
Accurate multi-vehicle trajectory prediction in expressway merge and diverge areas is fundamental to the decision-making frameworks of autonomous vehicle systems. However, the majority of existing graph-based prediction models are developed and validated on mainline freeway segments and do not address the geometrically distinct interaction structures that characterize merge zones. Furthermore, standard evaluation protocols rely exclusively on displacement error metrics, leaving the safety consequences of predicted trajectories unquantified. This paper proposes a Lane-Aware Graph Attention Network (LA-GAT) that encodes vehicle interaction within dynamic scene graphs, augmented with a trainable lane-relationship attention bias that prioritizes merge-conflict interactions from the outset of training. The model is pre-trained on the raw NGSIM US-101 and I-80 datasets and subsequently fine-tuned on UAV-captured UTE SQM-W-1 trajectory data from a Chinese expressway merge area, with final evaluation on the held-out SQM-W-2 dataset. Evaluation spans both displacement metrics (ADE, FDE at 1s, 3s, 5s horizons) and surrogate safety measures (TTC violation rate, DRAC exceedance rate, collision rate). Fine-tuned results on SQM-W-2 yield ADE of 0.865 m at 1s and 2.518 m at 3s, demonstrating that drone-informed fine-tuning substantially reduces the cross-dataset transfer gap. The deliberate use of unfiltered NGSIM data is shown to characterize raw-condition generalization limits, with the performance degradation attributed to the well-documented measurement errors in that dataset.
Safe Multi-Agent Navigation via Constrained HJB-Informed Learning
Multi-agent navigation in unknown and cluttered environments has broad applications, yet remains fundamentally challenging. In particular, dense agent-agent and agent-obstacle reactive interactions can exacerbate the inherent competition between collision-avoidance constraints and goal-reaching objectives. Most existing approaches mitigate this by applying per-step safety filtering on top of a predefined goal-reaching controller or by designing heuristic loss functions that penalizes safety constraints violation gradient. While effective in sparse environments, these methods still suffer from overly-conservative behaviors when interactions become dense. To overcome these limitations, we propose HJB-GNN, a Hamilton-Jacobi-Bellman (HJB)-based learning framework that jointly learns a graph neural network (GNN)-parameterized control barrier function for explicit safety enforcement, a distributed GNN-based navigation policy, and a value function that induces goal-reaching behavior. By exploiting the analytical solution of the constrained HJB equation, the proposed method derives graph-dependent Lagrange multipliers that adaptively balance collision-avoidance and goal-reaching across diverse multi-agent navigation scenarios. Moreover, HJB-GNN supports centralized training with distributed deployment. Extensive simulations and real-world experiments with Crazyflie drone swarms demonstrate its superior safety and goal-reaching performance, as well as strong scalability and generalizability to large-scale teams operating in previously unseen, dense environments.
comment: Accepted by Robotics: Science and Systems (RSS 2026)
From Noise to Knowledge: System Identification with Systematic Polytope Construction via Cyclic Reformulation
Model-based robust control requires not only accurate nominal models but also systematic uncertainty representations to guarantee stability and performance. However, constructing polytopic uncertainty models typically demands multiple experiments or a priori structural assumptions.This paper proposes an identification framework based on intentional periodicity induction, in which cyclic reformulation with period $N$ is applied to a linear time-invariant system to interpret noise-induced parameter fluctuations as a structured manifestation of estimation uncertainty. The $N$ parameter sets obtained from a single identification experiment -- which would coincide in the noise-free case -- are used as polytope vertices, providing systematic control over the granularity of the uncertainty description through the choice of $N$. The practical utility of the constructed polytope is demonstrated through robust $H_\infty$ state-feedback synthesis via LMI optimization at the polytope vertices; the synthesis uses only noisy identification data and is shown across Monte Carlo trials to stabilize the true plant with only marginal conservatism. Complementarily, a diagnostic assessment based on the best in-polytope point confirms that the polytope captures meaningful uncertainty information. For a third-order system under Gaussian and uniform noise, a comparison with bootstrap-inspired resampling baselines indicates that cyclic reformulation provides a competitive or favorable trade-off by utilizing the full data record; the construction is further validated on a fourth-order MIMO system.
Data-Driven Koopman-Enhanced Extremum Seeking for Oscillation Damping in Nonlinear Systems
We propose a novel extremum seeking control (ESC) method that operates in a lifted Koopman state space to minimize the filtered RMS energy in the dominant subspace. The lifted representation provides linear embeddings of nonlinear dynamics, enabling more accurate gradient estimation and dampening of state interference for more consistent ESC performance. Applied to a parameterized, forced, and time-varying Van der Pol oscillator, we show that the approach yields faster and more robust performance than operating ESC on the measured states. These advantages position the method for a diverse range of applications including vibration suppression, motion control, and subsynchronous oscillation mitigation in inverter-dominated power systems.
Bellman Value Decomposition for Task Logic in Safe Optimal Control
Real-world tasks involve nuanced combinations of goal and safety specifications. In high dimensions, the challenge is exacerbated: formal automata become cumbersome, and the combination of sparse rewards tends to require laborious tuning. In this work, we consider the innate structure of the Bellman Value as a means to naturally organize the problem for improved automatic performance. Namely, we prove the Bellman Value for a complex task defined in temporal logic can be decomposed into a graph of Bellman Values, connected by a set of well-known Bellman equations (BEs): the Reach-Avoid BE, the Avoid BE, and a novel type, the Reach-Avoid-Loop BE. To solve the Value and optimal policy, we propose VDPPO, which embeds the decomposed Value graph into a two-layer neural net, bootstrapping the implicit dependencies. We conduct a variety of simulated and hardware experiments to test our method on complex, high-dimensional tasks involving heterogeneous teams and nonlinear dynamics. Ultimately, we find this approach greatly improves performance over existing baselines, balancing safety and liveness automatically.
AI-Driven Optimization under Uncertainty for Mineral Processing Operations
The global capacity for mineral processing must expand rapidly to meet the demand for critical minerals, which are essential for building the clean energy technologies necessary to mitigate climate change. However, the efficiency of mineral processing is severely limited by uncertainty, which arises from both the variability of feedstock and the complexity of process dynamics. To optimize mineral processing circuits under uncertainty, we introduce an AI-driven approach that formulates mineral processing as a Partially Observable Markov Decision Process (POMDP). We demonstrate the capabilities of this approach in handling both feedstock uncertainty and process model uncertainty to optimize the operation of a simulated, simplified flotation cell as an example. We show that by integrating the process of information gathering (i.e., uncertainty reduction) and process optimization, this approach has the potential to consistently perform better than traditional approaches at maximizing an overall objective, such as net present value (NPV). Our methodological demonstration of this optimization-under-uncertainty approach for a synthetic case provides a mathematical and computational framework for later real-world application, with the potential to improve both the laboratory-scale design of experiments and industrial-scale operation of mineral processing circuits without any additional hardware.
comment: 13 pages, 15 figures, published in Sustainable Earth Resources Communications (SERC)
Constrained Variational Inference via Safe Particle Flow
We propose a control barrier function (CBF) formulation for enforcing equality and inequality constraints in variational inference. The key idea is to define a barrier functional on the space of probability density functions that encode the desired constraints imposed on the variational density. By leveraging the Liouville equation, we establish a connection between the time derivative of the variational density and the particle drift, which enables the systematic construction of corresponding CBFs associated to the particle drift. Enforcing these CBFs gives rise to the safe particle flow and ensures that the variational density satisfies the original constraints imposed by the barrier functional. This formulation provides a principled and computationally tractable solution to constrained variational inference, with theoretical guarantees of constraint satisfaction. The effectiveness of the method is demonstrated through numerical simulations.
A Quantitative Framework for Navigating Controller Design Tradeoffs under Computational Constraints
Computational constraints permeate the controller design process, and yet are rarely treated as explicit design constraints. Towards addressing this gap, we propose a quantitative framework that captures the effects of common design approximations, such as model order reduction, temporal discretization, horizon truncation, and solver accuracy, on both controller performance and computational requirements. Our framework highlights that these approximations are tunable parameters within an overall controller design process. By leveraging incremental input-to-state stability, we show that bounding the aggregate effects of these approximations reduces to verifying a design-dependent sector bound on the difference between the deployed policy and an idealized baseline, with stability enforced via a small-gain condition. We operationalize these insights via a Design Meta-Problem in which the performance gap is minimized subject to stability, real-time compute, and timing constraints. Finally, we instantiate the framework on a receding horizon LQR case study, and demonstrate a principled near-optimal navigation of tradeoffs among sampling rate, model order, horizon length, and solver iterations.
comment: 34 pages, 16 figures. Extended version
Robotics
SafeManip: A Property-Driven Benchmark for Temporal Safety Evaluation in Robotic Manipulation
Robotic manipulation is typically evaluated by task success, but successful completion does not guarantee safe execution. Many safety failures are temporal: a robot may touch a clean surface after contamination or release an object before it is fully inside an enclosure. We introduce SafeManip, a property-driven benchmark to explicitly evaluate temporal safety properties in robotic manipulation, moving beyond prior evaluations that largely focus on task completion or per-state constraint violations. SafeManip defines reusable safety templates over finite executions using Linear Temporal Logic over finite traces (LTLf). It maps observed rollouts to symbolic predicate traces and evaluates them with LTLf-based monitors. Its property suite covers eight manipulation safety categories: collision and contact safety, grasp stability, release stability, cross-contamination, action onset, mechanism recovery, object containment, and enclosure access. Templates can be instantiated with task-specific objects, fixtures, regions, or skills, allowing the same safety specifications to generalize across tasks and environments. We evaluate SafeManip on six vision-language-action policies, including $π_0$, $π_{0.5}$, GR00T, and their training variants, across 50 RoboCasa365 household tasks. Results show that even strong models often behave unsafely. Task-success gains do not reliably translate into safer execution: many successful rollouts remain unsafe, while longer-horizon or more complex tasks expose more violations. SafeManip provides a reusable evaluation layer for diagnosing temporal safety failures and measuring safe success beyond task completion.
GuidedVLA: Specifying Task-Relevant Factors via Plug-and-Play Action Attention Specialization
Vision-Language-Action (VLA) models aim for general robot learning by aligning action as a modality within powerful Vision-Language Models (VLMs). Existing VLAs rely on end-to-end supervision to implicitly enable the action decoding process to learn task-relevant features. However, without explicit guidance, these models often overfit to spurious correlations, such as visual shortcuts or environmental noise, limiting their generalization. In this paper, we introduce GuidedVLA, a framework designed to manually guide the action generation to focus on task-relevant factors. Our core insight is to treat the action decoder not as a monolithic learner, but as an assembly of functional components. Individual attention heads are supervised by manually defined auxiliary signals to capture distinct factors. As an initial study, we instantiate this paradigm with three specialized heads: object grounding, spatial geometry, and temporal skill logic. Across simulation and real-robot experiments, GuidedVLA improves success rates in both in-domain and out-of-domain settings compared to strong VLA baselines. Finally, we show that the quality of these specialized factors correlates positively with task performance and that our mechanism yields decoupled, high-quality features. Our results suggest that explicitly guiding action-decoder learning is a promising direction for building more robust and general VLA models.
comment: Accepted to RSS 2026. Project page: https://guidedvla.github.io/project_page/
Real-Time Whole-Body Teleoperation of a Humanoid Robot Using IMU-Based Motion Capture with Sim2Sim and Sim2Real Validation
Stable, low-latency whole-body teleoperation of humanoid robots is an open research challenge, complicated by kinematic mismatches between human and robot morphologies, accumulated inertial sensor noise, non-trivial control latency, and persistent sim-to-real transfer gaps. This paper presents a complete real-time whole-body teleoperation system that maps human motion, recorded with a Virdyn IMU-based full-body motion capture suit, directly onto a Unitree G1 humanoid robot. We introduce a custom motion-processing, kinematic retargeting, and control pipeline engineered for continuous, low-latency operation without any offline buffering or learning-based components. The system is first validated in simulation using the MuJoCo physics model of the Unitree G1 (sim2sim), and then deployed without modification on the physical platform (sim2real). Experimental results demonstrate stable, synchronized reproduction of a broad motion repertoire, including walking, standing, sitting, turning, bowing, and coordinated expressive full-body gestures. This work establishes a practical, scalable framework for whole-body humanoid teleoperation using commodity wearable motion capture hardware.
comment: 8 pages, 4 figures
EgoEV-HandPose: Egocentric 3D Hand Pose Estimation and Gesture Recognition with Stereo Event Cameras
Egocentric 3D hand pose estimation and gesture recognition are essential for immersive augmented/virtual reality, human-computer interaction, and robotics. However, conventional frame-based cameras suffer from motion blur and limited dynamic range, while existing event-based methods are hindered by ego-motion interference, monocular depth ambiguity, and the lack of large-scale real-world stereo datasets. To overcome these limitations, we propose EgoEV-HandPose, an end-to-end framework for joint 3D bimanual pose estimation and gesture recognition from stereo event streams. Central to our approach is KeypointBEV, a flexible stereo fusion module that lifts features into a canonical bird's-eye-view space and employs an iterative reprojection-guided refinement loop to progressively resolve depth uncertainty and enforce kinematic consistency. In addition, we introduce EgoEVHands, the first large-scale real-world stereo event-camera dataset for egocentric hand perception, containing 5,419 annotated sequences with dense 3D/2D keypoints across 38 gesture classes under varying illumination. Extensive experiments demonstrate that EgoEV-HandPose achieves state-of-the-art performance with an MPJPE of 30.54mm and 86.87% Top-1 gesture recognition accuracy, significantly outperforming RGB-based stereo and prior event-camera methods, particularly in low-light and bimanual occlusion scenarios, thereby setting a new benchmark for event-based egocentric perception. The established dataset and source code will be publicly released at https://github.com/ZJUWang01/EgoEV-HandPose.
comment: Extended version of SMC 2025 paper arXiv:2503.12419. The established dataset and source code will be publicly released at https://github.com/ZJUWang01/EgoEV-HandPose
SI-Diff: A Framework for Learning Search and High-Precision Insertion with a Force-Domain Diffusion Policy
Contact-rich assembly is fundamental in robotics but poses significant challenges due to uncertainties in relative poses, such as misalignments and small clearances in peg-in-hole tasks. Existing approaches typically address search and high-precision insertion separately, because these tasks involve distinct action patterns. However, supporting both tasks within a single model, without switching models or weights, is desirable for intelligent assembly systems. In this work, we propose SI-Diff, a framework that learns both search and high-precision insertion through a force-domain diffusion policy. To this end, we introduce a new mode-conditioning mechanism that enables the policy to capture distinct action behaviors under a single framework. Moreover, we develop a new search teacher policy that can generate diverse trajectories. By training on successful and efficient demonstrations provided by the teacher policy, the model learns the mapping from tactile and end-effector velocity observations to effective action behaviors. We conduct thorough experiments to show that SI-Diff extends the tolerance to x-y misalignments from 2 mm to 5 mm compared to the state-of-the-art baseline, TacDiffusion, while also demonstrating strong zero-shot transferability to unseen shapes.
comment: 9 pages, 8 figures
TMRL: Diffusion Timestep-Modulated Pretraining Enables Exploration for Efficient Policy Finetuning
Fine-tuning pre-trained robot policies with reinforcement learning (RL) often inherits the bottlenecks introduced by pre-training with behavioral cloning (BC), which produces narrow action distributions that lack the coverage necessary for downstream exploration. We present a unified framework that enables the exploration necessary to enable efficient robot policy finetuning by bridging BC pre-training and RL fine-tuning. Our pre-training method, Context-Smoothed Pre-training (CSP), injects forward-diffusion noise into policy inputs, creating a continuum between precise imitation and broad action coverage. We then fine-tune pre-trained policies via Timestep-Modulated Reinforcement Learning (TMRL), which trains the agent to dynamically adjust this conditioning during fine-tuning by modulating the diffusion timestep, granting explicit control over exploration. Integrating seamlessly with arbitrary policy inputs, e.g., states, 3D point clouds, or image-based VLA policies, we show that TMRL improves RL fine-tuning sample efficiency. Notably, TMRL enables successful real-world fine-tuning on complex manipulation tasks in under one hour. Videos and code available at https://weirdlabuw.github.io/tmrl/.
Morphologically Equivariant Flow Matching for Bimanual Mobile Manipulation
Mobile manipulation requires coordinated control of high-dimensional, bimanual robots. Imitation learning methods have been broadly used to solve these robotic tasks, yet typically ignore the bilateral morphological symmetry inherent in such systems. We argue that morphological symmetry is an underexplored but crucial inductive bias for learning in bimanual mobile manipulation: knowing how to solve a task in one configuration directly determines how to solve its mirrored counterpart. In this paper, we formalize this symmetry prior and show that it constrains optimal bimanual policies to be ambidextrous and equivariant under reflections across the robot's sagittal plane. We introduce a $\mathbb{C}_2$-equivariant flow matching policy that enforces reflective symmetry either via a regularized training loss or an equivariant velocity network. Across planar and 6-DoF mobile manipulation tasks, symmetry-informed policies consistently improve sample efficiency and achieve zero-shot generalization to mirrored configurations absent from the training distribution. We further validate this zero-shot generalization capability on a real-world manipulation task with a TIAGo++ robot. Together, our findings establish morphological symmetry as an effective, generalizable, and scalable inductive bias for ambidextrous generative policy learning.
comment: Preprint. 4 pages, 5 figures
TriBand-BEV: Real-Time LiDAR-Only 3D Pedestrian Detection via Height-Aware BEV and High-Resolution Feature Fusion AAMAS 2026
Safe autonomous agents and mobile robots need fast real time 3D perception, especially for vulnerable road users (VRUs) such as pedestrians. We introduce a new bird's eye view (BEV) encoding, which maps the full 3D LiDAR point cloud into a light-weight 2D BEV tensor with three height bands. We explicitly reformulate 3D detection as a 2D detection problem and then reconstruct 3D boxes from the BEV outputs. A single network detects cars, pedestrians, and cyclists in one pass. The backbone uses area attention at deep stages, a hierarchical bidirectional neck over P1 to P4 fuses context and detail, and the head predicts oriented boxes with distribution focal learning for side offsets and a rotated IoU loss. Training applies a small vertical re bin and a mild reflectance jitter in channel space to resist memorization. We use an interquartile range (IQR) filter to remove noisy and outlier LiDAR points during 3D reconstruction. On KITTI dataset, TriBand-BEV attains 58.7/52.6/47.2 pedestrian BEV AP(%) for easy, moderate, and hard at 49 FPS on a single consumer GPU, surpassing Complex-YOLO, with gains of +12.6%, +7.5%, and +3.1%. Qualitative scenes show stable detection under occlusion. The pipeline is compact and ready for real time robotic deployment. Our source code is publicly available on GitHub.
comment: Accepted for publication in the Proceedings of the 2026 International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2026)
DexTwist: Dexterous Hand Retargeting for Twist Motion via Mixed Reality-based Teleoperation
Dexterous teleoperation via Mixed Reality (MR)-based interfaces offers a scalable paradigm for transferring human manipulation skills to dexterous robot hands. However, conventional retargeting approaches that minimize kinematic dissimilarity (e.g., joint angle or fingertip position error) often fail in contact-rich rotational manipulation, such as cap opening, key turning, and bolt screwing. This failure stems from the embodiment gap: mismatched link lengths, joint axes/limits, and fingertip geometry can cause direct pose imitation to induce tangential fingertip sliding rather than stable object rotation, resulting in screw axis drift, contact slip, and grasp instability. To address this, we propose DexTwist, a functional twist-retargeting framework for MR-based dexterous teleoperation. DexTwist detects a tripod pinch, estimates the operator's intended screw axis and twist magnitude, and applies a real-time residual joint-space refinement that tracks turning progress while regularizing the robot tripod geometry. The refinement minimizes a virtual-object objective defined by turning angle, screw axis consistency, fingertip closure, and tripod stability. Simulation and real-world experiments show that DexTwist improves turning angle tracking and screw axis stability compared with a vector-based retargeting baseline.
comment: 6 pages, 5 figures, 2 tables. Dongmyoung Lee and Chengxi Li contributed equally to this research
From Imagined Futures to Executable Actions: Mixture of Latent Actions for Robot Manipulation ICML 2026
Video generation models offer a promising imagination mechanism for robot manipulation by predicting long-horizon future observations, but effectively exploiting these imagined futures for action execution remains challenging. Existing approaches either condition policies on predicted frames or directly decode generated videos into actions, both suffering from a mismatch between visual realism and control relevance. As a result, predicted observations emphasize perceptual fidelity rather than action-centric causes of state transitions, leading to indirect and unstable control. To address this gap, we propose MoLA (Mixture of Latent Actions), a control-oriented interface that transforms imagined future videos into executable representations. Instead of passing predicted frames directly to the policy, MoLA leverages a mixture of pretrained inverse dynamics models to infer a mixture of latent actions implied by generated visual transitions. These modality-aware inverse dynamics models capture complementary semantic, depth, and flow cues, providing a structured and physically grounded action representation that bridges video imagination and policy execution. We evaluate our approach on simulated benchmarks (LIBERO, CALVIN, and LIBERO-Plus) and real-world robot manipulation tasks, achieving consistent gains in task success, temporal consistency, and generalization.
comment: ICML 2026
X-Imitator: Spatial-Aware Imitation Learning via Bidirectional Action-Pose Interaction
Effectively handling the interplay between spatial perception and action generation remains a critical bottleneck in robotic manipulation. Existing methods typically treat spatial perception and action execution as decoupled or strictly unidirectional processes, fundamentally restricting a robot's ability to master complex manipulation tasks. To address this, we propose X-Imitator, a versatile dual-path framework that models spatial perception and action execution as a tightly coupled bidirectional loop. By reciprocally conditioning current pose predictions on past actions and vice versa, this framework enables continuous mutual refinement between spatial reasoning and action generation. This joint modeling exactly mimics human internal forward models. Designed as a modular architecture, the system can be seamlessly integrated into various visuomotor policies. Extensive experiments across 24 simulated and 3 real-world tasks demonstrate that our framework significantly outperforms both vanilla policies and prior methods utilizing explicit pose guidance. The code will be open sourced.
Premover: Fast Vision-Language-Action Control by Acting Before Instructions Are Complete
Vision-Language-Action (VLA) policies are typically evaluated as if the user had finished typing or speaking before the robot begins acting. In real deployment, however, users take several seconds to enter a request, leaving the policy idle for a substantial fraction of the interaction. We introduce Premover, a lightweight module that converts this idle window into useful precomputation. Premover keeps the VLA backbone frozen and attaches two small projection heads, one for image patches, one for language tokens, that map an intermediate layer of the backbone into a shared space. The resulting focus map is supervised by simulator-rendered target-object segmentation masks and applied as a per-patch reweighting of the next step's image tokens. A single scalar readiness threshold, trained jointly from streaming prefixes, decides when the policy should begin acting. On the LIBERO benchmark suite, Premover reduces mean wall-clock time from 34.0 to 29.4 seconds, a 13.6% reduction, while matching the full-prompt baseline's success rate (95.1% vs. 95.0%); naive premoving, by contrast, collapses to 66.4%.
World Action Models: The Next Frontier in Embodied AI
Vision-Language-Action (VLA) models have achieved strong semantic generalization for embodied policy learning, yet they learn reactive observation-to-action mappings without explicitly modeling how the physical world evolves under intervention. A growing body of work addresses this limitation by integrating world models, predictive models of environment dynamics, into the action generation pipeline. We term this emerging paradigm World Action Models (WAMs): embodied foundation models that unify predictive state modeling with action generation, targeting a joint distribution over future states and actions rather than actions alone. However, the literature remains fragmented across architectures, learning objectives, and application scenarios, lacking a unified conceptual framework. We formally define WAMs and disambiguate them from related concepts, and trace the foundations and early integration of VLA and world model research that gave rise to this paradigm. We organize existing methods into a structured taxonomy of Cascaded and Joint WAMs, with further subdivision by generation modality, conditioning mechanism, and action decoding strategy. We systematically analyze the data ecosystem fueling WAMs development, spanning robot teleoperation, portable human demonstrations, simulation, and internet-scale egocentric video, and synthesize emerging evaluation protocols organized around visual fidelity, physical commonsense, and action plausibility. Overall, this survey provides the first systematic account of the WAMs landscape, clarifies key architectural paradigms and their trade-offs, and identifies open challenges and future opportunities for this rapidly evolving field.
Learning What Matters: Adaptive Information-Theoretic Objectives for Robot Exploration
Designing learnable information-theoretic objectives for robot exploration remains challenging. Such objectives aim to guide exploration toward data that reduces uncertainty in model parameters, yet it is often unclear what information the collected data can actually reveal. Although reinforcement learning (RL) can optimize a given objective, constructing objectives that reflect parametric learnability is difficult in high-dimensional robotic systems. Many parameter directions are weakly observable or unidentifiable, and even when identifiable directions are selected, omitted directions can still influence exploration and distort information measures. To address this challenge, we propose Quasi-Optimal Experimental Design (Q{\footnotesize OED}), an adaptive information objective grounded in optimal experimental design. Q{\footnotesize OED} (i) performs eigenspace analysis of the Fisher information matrix to identify an observable subspace and select identifiable parameter directions, and (ii) modifies the exploration objective to emphasize these directions while suppressing nuisance effects from non-critical parameters. Under bounded nuisance influence and limited coupling between critical and nuisance directions, Q{\footnotesize OED} provides a constant-factor approximation to the ideal information objective that explores all parameters. We evaluate Q{\footnotesize OED} on simulated and real-world navigation and manipulation tasks, where identifiable-direction selection and nuisance suppression yield performance improvements of \SI{35.23}{\percent} and \SI{21.98}{\percent}, respectively. When integrated as an exploration objective in model-based policy optimization, Q{\footnotesize OED} further improves policy performance over established RL baselines.
Control of Fully Actuated Aerial Vehicles: A Comparison of Model-based and Sensor-based Dynamic Inversion
Fully actuated multirotor platforms decouple translational force generation from vehicle attitude, enabling independent control of position and orientation and shifting performance limitations from attitude authority to actuator dynamics and control effectiveness. This paper compares a model-based nonlinear dynamic inversion controller (geometric NDI) with a sensor-based incremental dynamic inversion controller (INDI) on a fixed-tilt fully actuated hexarotor. Both controllers share an identical outer-loop structure and are both executed at 500 Hz; therefore, performance differences can be attributed primarily to the inversion strategy. Controller performance is evaluated in five experiments covering attitude step tracking under nominal conditions and under a 50% mismatch in the rotor force coefficient, hover disturbance rejection under an external lateral load, waypoint tracking in the presence of wind gust disturbances, reduced control frequency, and injected sensor degradation. The results show that INDI offers clear advantages under parameter mismatch, gust disturbances, and sensor degradation, and maintains lower position errors across the controller-frequency sweep. However, its advantages are not universal: geometric NDI yields better attitude tracking at reduced control frequencies. To the authors' best knowledge, this work presents the first experimental validation of a full pose tracking INDI controller with decoupled translational and rotational dynamics. These findings highlight the trade-off between measurement-based and model-based inversion for robust control and rapid deployment of fully actuated UAVs.
RoboBlockly Studio: Conversational Block Programming with Embodied Robot Feedback for Computational Thinking
Computational thinking (CT) is increasingly promoted as a core literacy, yet learners and teachers face challenges in connecting abstract program logic to meaningful outcomes. We design and evaluate RoboBlockly Studio, an integrated interactive system that combines block-based programming, a conversational AI teaching agent, and embodied robot execution. RoboBlockly Studio creates a tight iterative loop of authoring, running, observing, and revising. Informed by interviews with five programming teachers, the system was designed to support four goals: (1) preserving learner agency in computational thinking, (2) making program behavior transparent and interpretable, (3) grounding programming in embodied, classroom-aligned tasks, and (4) scaffolding reflection through pedagogically grounded AI dialogue. We deployed RoboBlockly Studio with 32 high school students, observing how robot and AI feedback influenced students' interactions with code, reflections on problem-solving strategies, and understanding of CT concepts. We discuss design insights and implications for creating interactive, embodied learning environments that integrate AI and robotics to support CT learning in computing education.
comment: Accepted to ACM DIS 2026. Camera-ready version
Closing the Motion Execution Gap: From Semantic Motion Task Constraints to Kinematic Control IJCAI 2026
This paper addresses the Motion Execution Gap, the disconnect between high-level symbolic task descriptions using semantic constraints and executable robot motions. Motion Statecharts are introduced as an executable symbolic representation for complex motions. They allow the arbitrary arrangement of motion constraints, monitors or nested statecharts in parallel and sequence. World-centric motion specification and generalization across embodiments are enabled through the use of a unified differentiable kinematic world model of both, robots and environments. Motion execution is realized through a lMPC-based implementation of the task-function approach, in which smooth transitions during task switches are ensured using jerk bounds. Cross-platform transferability was demonstrated by deploying the method on eight robot platforms, operating in diverse environments. The proposed framework is called Giskard and is available open source: https://github.com/cram2/cognitive_robot_abstract_machine.
comment: 9 pages, 8 figures, to be published in IJCAI 2026
Cooperative Robotics Reinforced by Collective Perception for Traffic Moderation
Collisions at non-line-of-sight (NLOS) intersections remain a major safety concern because drivers have limited visibility of approaching traffic. V2X based warnings can reduce these risks, yet many vehicles are not equipped with V2X and drivers may ignore in vehicle alerts. Collective perception (CP) can compensate for low V2X penetration by extending the awareness of connected vehicles, but it cannot influence unconnected vehicles. To fill this gap, our work introduces a complementary concept that adds a cooperative humanoid robot as an active traffic moderator capable of physically stopping a vehicle that attempts to merge into an unseen traffic stream. The system operates on two parallel perception pathways. A dual camera infrastructure unit detects the position, speed and motion of approaching vehicles and transmits this information to the robot as a collective perception message (CPM). The robot also receives cooperative awareness messages (CAM) from connected vehicles through its onboard V2X unit and can act as a relay for decentralized environmental notification messages (DENM) when safety events originate elsewhere along the road. A fusion module combines these streams to maintain a robust real time view of the main road. A Zone of Danger (ZoD) is defined and used to predict whether an approaching vehicle creates a collision risk for a merging road user. When such a risk is detected, the robot issues a human-like STOP gesture and blocks the merging path until the hazard disappears. The full system was deployed at the Future Mobility Park (FMP) in Rotterdam. Experiments show that the combined vision and V2X perception allows the robot to detect approaching vehicles early, predict hazards reliably and prevent unsafe merges in real world NLOS conditions.
comment: Accepted for publication in the Proceedings of the 2026 IEEE Vehicular Technology Conference (VTC2026-Spring)
From Reaction to Anticipation: Proactive Failure Recovery through Agentic Task Graph for Robotic Manipulation
Although robotic manipulation has made significant progress, reliable execution remains challenging because task failures are inevitable in dynamic and unstructured environments. To handle such failures, existing frameworks typically follow a stepwise detect-reason-recover pipeline, which often incurs high latency and limited robustness due to delayed reasoning and reactive planning. Inspired by the human capability to anticipate and proactively plan for potential failures, we introduce AgentChord, an agentic system that models a manipulation task as a directed task graph. Before execution, this graph is enriched with anticipatory recovery branches that specify context-aware corrective behaviors, enabling immediate and targeted responses when failures occur. Specifically, AgentChord operates through a choreography of specialized agents: a composer that structures the nominal task graph, an arranger that augments the graph with anticipatory recovery branches, and a conductor that compiles and coordinates executable transitions using low-latency monitors to detect deviations and trigger pre-compiled recoveries without re-planning. Empirical studies on diverse long-horizon bimanual manipulation tasks demonstrate that AgentChord substantially improves success rates and execution efficiency, advancing the reliability and autonomy of real-world robotic systems. The project page is available at: https://shengxu.net/AgentChord/.
comment: 18 pages, accepted to RSS 2026
EvoNav: Evolutionary Reward Function Design for Robot Navigation with Large Language Models
Robot navigation is a crucial task with applications to social robots in dynamic human environments. While Reinforcement Learning (RL) has shown great promise for this problem, the policy quality is highly sensitive to the specification of reward functions. Hand-crafted rewards require substantial domain expertise and embed inductive biases that are difficult to audit or adapt, limiting their effectiveness and leading to suboptimal performance. In this paper, we propose EvoNav, an evolutionary framework that automates the design of robot navigation reward functions via large language models (LLMs). To overcome prohibitively costly policy training, EvoNav evaluates each candidate proposal from the LLM via a progressive three-stage warm-up-boost procedure. EvoNav advances from analytical proxies with low-cost surrogates, such as small datasets and analytic rules, to lightweight rollouts and, finally, to full policy training, enabling computationally efficient exploration under effective feedback. Experiment results show that EvoNav produces more effective navigation policies than manually designed RL rewards and state-of-the-art reward design methods.
Learning Action Manifold with Multi-view Latent Priors for Robotic Manipulation
This paper tackles spatial perception and manipulation challenges in Vision-Language-Action (VLA) models. To address depth ambiguity from monocular input, we leverage a pre-trained multi-view diffusion model to synthesize latent novel views and propose a Geometry-Guided Gated Transformer (G3T) that aligns multi-view features under 3D geometric guidance while adaptively filtering occlusion noise. To improve action learning efficiency, we introduce Action Manifold Learning (AML), which directly predicts actions on the valid action manifold, bypassing inefficient regression of unstructured targets like noise or velocity. Experiments on LIBERO, RoboTwin 2.0, and real-robot tasks show our method achieves superior success rate and robustness over SOTA baselines. Project page: https://junjxiao.github.io/Multi-view-VLA.github.io/.
Mapping Embodied Affective Touch Strategies on a Humanoid Robot
Affective touch in human-robot interaction is shaped not only by emotional intent, but also by robot embodiment, including touch location, physical constraints, and perceived agency or social role. Existing HRI studies typically focus on one or two isolated body parts, limiting understanding of how affective touch generalises across the full humanoid body. We present a study with 32 participants interacting with the iCub robot, which is equipped with full-body distributed tactile sensors. Participants expressed eight emotions under three conditions: free touch, arm-only touch, and torso-only touch. Results show that body region and spatial constraints jointly shaped both touch location and dynamics. In free touch, participants preferred socially accessible upper-body regions, while less frequently touched areas showed stronger emotion-specific selectivity. Emotion-related variation was more evident in motion features for arm-only touch and pressure features for torso-only touch. Touch strategies also did not transfer directly between free and constrained conditions, even within the same coarse body region. Participants reported increased closeness to the robot after interaction, with around 30 percent reporting a change in perceived social relationship. Together, these findings show that affective touch expression is strongly body-region dependent and shaped by embodiment constraints.
See What Matters: Differentiable Grid Sample Pruning for Generalizable Vision-Language-Action Model
Vision-Language-Action (VLA) models have shown remarkable promise in robotics manipulation, yet their high computational cost hinders real-time deployment. Existing token pruning methods suffer from a fundamental trade-off: aggressive compression using pruning inevitably discards critical geometric details like contact points, leading to severe performance degradation. This forces a compromise, limiting the achievable compression rate and thus the potential speedup. We argue that breaking this trade-off requires rethinking compression as a geometry-aware, continuous token resampling in the vision encoder. To this end, we propose the Differentiable Grid Sampler (GridS), a plug-and-play module that performs task-aware, continuous resampling of visual tokens in VLA. By adaptively predicting a minimal set of salient coordinates and extracting features via differentiable interpolation, GridS preserves essential spatial information while achieving drastic compression (with fewer than 10% original visual tokens). Experiments on both LIBERO benchmark and a real robotic platform demonstrate that validating the lowest feasible visual token count reported to date, GridS achieves a 76% reduction in FLOPs with no degradation in the success rate. The code is available at https://github.com/Fediory/Grid-Sampler.
NavOL: Navigation Policy with Online Imitation Learning
Learning robust navigation policies remains a core challenge in robotics. Offline imitation learning suffers from distribution shift and compounding errors at rollout, while reinforcement learning requires reward engineering and learns inefficiently. In this paper, we propose NavOL, an online imitation learning paradigm that interacts with a simulator and updates itself using expert demonstrations gathered online. Built upon a pretrained navigation diffusion policy that maps local observations to future waypoints, NavOL trains in a rollout update loop: during rollout, the policy acts in the simulator and queries a global planner which has privileged access to the global environment for the optimal path segment as ground truth trajectory labels; during update, the policy is trained on the online collected observation trajectory pairs. This online imitation loop removes the need for reward design, improves learning efficiency, and mitigates distribution shift by training on the policy own explored rollouts. Built on IsaacLab with fast, high-fidelity parallel rendering and domain randomization of camera pose and start-goal pairs, our system scales across 50 scenes on 8 RTX 4090 GPUs, collecting over 2,000 new trajectories per hour, each averaging more than 400 steps. We also introduce an indoor visual navigation benchmark with predefined start and goal positions for zero-shot generalization. Extensive evaluations on simulation benchmarks, including the NavDP benchmark and our proposed benchmark, as well as carefully designed real-world experiments, demonstrate the effectiveness of NavOL, showing consistent performance gains in online imitation learning.
comment: Project page: https://logosroboticsgroup.github.io/NavOL/
DreamAvoid: Critical-Phase Test-Time Dreaming to Avoid Failures in VLA Policies
Vision-Language-Action (VLA) models are often brittle in fine-grained manipulation, where minor action errors during the critical phases can rapidly escalate into irrecoverable failures. Since existing VLA models rely predominantly on successful demonstrations for training, they lack an explicit awareness of failure during these critical phases. To address this, we propose DreamAvoid, a critical-phase test-time dreaming framework that enables VLA models to anticipate and avoid failures. We also introduce an autonomous boundary learning paradigm to refine the system's understanding of the subtle boundary between success and failure. Specifically, we (1) utilize a Dream Trigger to determine whether the execution has entered a critical phase, (2) sample multiple candidate action chunks from the VLA via an Action Proposer, and (3) employ a Dream Evaluator, jointly trained on mixed data (success, failure, and boundary cases), to "dream" the short-horizon futures corresponding to the candidate actions, evaluate their values, and select the optimal action. We conduct extensive evaluations on real-world manipulation tasks and simulation benchmarks. The results demonstrate that DreamAvoid can effectively avoid failures, thereby improving the overall task success rate. Our code is available at https://github.com/XianzheFan/DreamAvoid.
comment: 19 pages, 7 figures
Introducing Environmental Constraints to Grasping Strategies for Paper-Like Flexible Materials Using a Soft Gripper
Robotic manipulation of flexible objects is widely required in both industrial and service applications. Among such objects, paper-like materials exhibit distinct mechanical characteristics compared to cloth, being more sensitive to compressive stress, where minor variations in physical properties can significantly affect grasping. This study systematically investigates grasping strategies for paper-like materials using a universal soft gripper by exploiting environmental constraints. Based on manipulation primitives employed in existing grasping strategies, we proposed systematic grasping strategies for flexible materials by exploiting environmental constraints and analyzed their mechanical and kinematic models. To investigate the influence of materials and working conditions on grasping, an evaluation system for measuring grasping force and success rate was defined and experimentally evaluated. Finally, we summarized the specific workspaces and characteristics of different strategies that can satisfy various task requirements and lead to potential applications in household service robots for grasping planar flexible objects.
comment: Under Review
Rainbow Deep Q-Learning with Kinematics-Aware Design for Cooperative Delta and 3-RRS Parallel Robot Insertion
This paper presents a kinematics-aware deep reinforcement learning framework based on Rainbow Deep Q-Networks (DQN) for cooperative peg-in-hole manipulation by a Delta parallel robot and a 3-RRS (Revolute--Revolute--Spherical) parallel manipulator. A key contribution is the integration of a geometric design-optimization stage that precedes learning: the 3-RRS geometry is tuned to maximize the singularity-free workspace and improve conditioning, which in turn enlarges the safe region in which the reinforcement learning policy can explore. Together the two manipulators expose a 6~degree-of-freedom (DoF) controllable subspace (three Delta translations, two 3-RRS rotations, and one 3-RRS vertical translation); the peg-in-hole task is invariant to rotation about the peg axis, so the task-relevant manifold is five dimensional. The cooperative insertion problem is cast as a Markov Decision Process with a 12-dimensional state vector and a discrete action set containing $6 \times 2 = 12$ incremental commands (one positive and one negative per controlled DoF). A shaped reward combines dense proximity guidance, penalties for kinematic and workspace violations, and sparse bonuses for successful insertions. The Rainbow DQN -- integrating double Q-learning, dueling architecture, prioritized replay, multi-step returns, noisy linear layers for exploration, and a distributional value head -- is trained with a two-stage curriculum. The co-designed framework is validated in a high-fidelity kinematic simulator, where it achieves stable policy convergence, reliable insertions, and reduced constraint violations compared against a vanilla DQN agent and a classical sampling-based planner.
comment: 10 pages
A Proprioceptive-Only Benchmark for Quadruped State Estimation: ATE, RPE, and Runtime Trade-offs Between Filters and Smoothers
We compare three state-of-the-art proprioceptive state estimators for quadruped robots: MUSE [1], the Invariant Extended Kalman Filter (IEKF) [2], and the Invariant Smoother (IS) [3], on the CYN-1 sequence of the GrandTour Dataset [4]. Our goal is to give practitioners clear guidance on accuracy and computation time: we report long-term accuracy (Absolute Trajectory Error, ATE), short-term accuracy (translational and rotational Relative Pose Error, RPE), and per-update computation time on a fixed hardware/software stack. On this dataset, RPEs are broadly similar across methods, while IEKF and IS achieve a lower ATE than MUSE. Runtime results highlight the accuracy-latency trade-offs across the three approaches. In the discussion, we outline the evaluation choices used to ensure a fair comparison and analyze factors that influence short-horizon metrics. Overall, this study provides a concise snapshot of accuracy and cost, helping readers choose an estimator that fits their application constraints, with all evaluation code and documentation released open-source at https://github.com/iit-DLSLab/state_estimation_benchmark for full reproducibility.
comment: Submitted to IEEE Robotics and Automation Practice
Nautilus: From One Prompt to Plug-and-Play Robot Learning
Robot learning research is fragmented across policy families, benchmark suites, and real robots; each implementation is entangled with the others in a complex combination matrix, making it an engineering nightmare to port any single element. General-purpose coding agents may occasionally bridge specific setups, but cannot close this gap at scale because they lack the procedural priors and validation practices that characterize robotics research workflows. We propose NAUTILUS, an open-source harness that turns a single user prompt -- for example, "Evaluate policy A with benchmark B" -- into ready-to-use reproduction, evaluation, fine-tuning, and deployment workflows. NAUTILUS provides: plug-and-play agent skill sets with distilled priors from robotics research; typed contracts among policies, simulators/benchmarks, and real-world robots; unified interfaces and execution environments; and a trustworthy agentic coding workflow with explicit, automated validation, and testing at each milestone. NAUTILUS can not only automatically generate the required adapters and containers for existing implementations, but also wrap and onboard new or user-provided policies, simulators/benchmarks, and robots, all connected via a uniform interface. This expands cross-validation coverage without hand-written glue code. Like a nautilus shell that grows by adding chambers, NAUTILUS scales by extending its execution in chambered units, making it a research harness for scalability rather than a hand-curated framework, and aiming to reduce the engineering burden of cross-family reproduction and evaluation in the ever-growing robot learning ecosystem.
Weather-Robust Cross-View Geo-Localization via Prototype-Based Semantic Part Discovery
Cross-view geo-localization (CVGL), which matches an oblique drone view to a geo-referenced satellite tile, has emerged as a key alternative for autonomous drone navigation when GNSS signals are jammed, spoofed, or unavailable. Despite strong recent progress, three limitations persist: (1) global-descriptor designs compress the patch grid into a single vector without separating layout from texture across the view gap; (2) altitude-related scale variation is retained in the learned embedding rather than marginalized; and (3) multi-objective training relies on hand-tuned scalars over losses on incompatible gradient scales. We propose SkyPart, a lightweight swappable head for patch-based vision transformers (ViTs) that institutes explicit part grouping over the patch grid. SkyPart has four theory-grounded components: (i) learnable prototypes competing for patch tokens via single-pass cosine assignment; (ii) altitude-conditioned linear modulation applied only during training, making the retrieval embedding altitude-free at inference; (iii) a graph-attention readout over active prototypes; and (iv) a Kendall uncertainty-weighted multi-objective loss whose stationary points are Pareto-stationary. At 26.95M parameters and 22.14 GFLOPs, SkyPart is the smallest among top-performing methods and sets a new state of the art on SUES-200, University-1652, and DenseUAV under a single-pass, no-re-ranking, no-TTA protocol. Its advantage over the strongest baseline widens under the ten-condition WeatherPrompt corruption benchmark.
comment: 37 pages, 7 figures, 6 tables
Sampling-Based Follow-the-Leader Motion Planning for Manipulator-Mounted Continuum Robots
Follow-the-leader (FTL) motion exploits the unique morphology of continuum robots (CRs) to navigate confined spaces by having the body retrace the path of the tip. While extensively studied, existing FTL methods typically assume a fixed base or a single degree-of-freedom insertion mechanism, limiting their applicability to practical systems in which CRs are mounted on robotic manipulators with fully actuated SE(3) base pose. This paper presents a sampling-based motion planner for FTL motion of manipulator-mounted CRs that jointly considers robot configuration and base pose. The key idea is to decouple global shape search from base pose determination by computing the base pose through a closed-form geometric construction, thereby avoiding iterative optimization during online planning. The approach supports general forward models and enables efficient planning by shifting the majority of computation offline. We establish theoretical guarantees including resolution complete shape search and converging tip tracking throughout waypoint traversal and interpolation. Experiments on 120 simulated paths over 3 test classes demonstrate 0% tip error and 1.9% mean shape deviation (w.r.t. robot length) at 100% success rate. We validate the practicality of our approach on a 6-DOF tendon-driven CR mounted on a serial manipulator. Code and visualization available at https://continuumroboticslab.github.io/sb-ftl-cr-planner/.
RIO: Flexible Real-Time Robot I/O for Cross-Embodiment Robot Learning
Despite recent efforts to collect multi-task, multi-embodiment datasets, to design recipes for training Vision-Language-Action models (VLAs), and to showcase these models on different robot platforms, generalist cross-embodiment robot capabilities remains a largely elusive ideal. Progress is limited by fragmented infrastructure: most robot code is highly specific to the exact setup the user decided on, which adds major overhead when attempting to reuse, recycle, or share artifacts between users. We present RIO (Robot I/O), an open source Python framework that provides flexible, lightweight components for robot control, teleoperation, data formatting, sensor configuration, and policy deployment across diverse hardware platforms and morphologies. RIO provides abstractions that enable users to make any choice and to switch between them, with minimal reconfiguration effort. We validate RIO on VLA deployment workflows across three morphologies (single-arm, bimanual, humanoid) and four hardware platforms with varying grippers and cameras. Using teleoperated data collected with RIO, we fine-tune state-of-the-art VLAs including $π_{0.5}$ and GR00T on household tasks such as pick-and-place, folding, and bowl scrubbing. By open sourcing all our efforts, we hope the community can accelerate their pace of robot learning on real-world robot hardware. Additional details at: https://robot-i-o.github.io
comment: 14 pages, 12 figures, 5 tables. Accepted to Robotics: Science and Systems (RSS) 2026
PRISM: : Planning and Reasoning with Intent in Simulated Embodied Environments
When an LLM-based embodied agent fails at a household task, the culprit could be misidentified objects, forgotten sub-goals, or poor action sequencing -- yet existing benchmarks report only a single success rate, making it impossible to tell which cognitive module is responsible. We present PRISM, a diagnostic benchmark that reframes this problem: rather than asking only \textit{did the agent succeed?}, PRISM asks \textit{which capability is most likely responsible for failure?} Built on five photorealistic multi-room apartments (4--8 rooms each), PRISM structures 300 human-verified tasks into three capability tiers -- \textit{Basic Ability}, \textit{Reasoning Ability}, and \textit{Long-horizon Ability} -- that isolate perception-to-action grounding, implicit intent resolution, and sustained multi-step coordination respectively. PRISM exposes an agent-agnostic executable action API that allows arbitrary agents: LLM agents, VLM agents, symbolic planners, RL policies, and hybrid systems, to be evaluated end-to-end under the same benchmark protocol. To support deeper diagnosis, optional probes for perception, memory, and planning can be adopted, replaced, or bypassed entirely, enabling controlled component-level analysis when desired. Experiments on seven contemporary LLMs establish a clear hierarchy: explicit spatial grounding is not the dominant failure source under oracle perception, implicit intent resolution is a significant bottleneck for all model families, and long-horizon coordination exposes a stark capability cliff -- lightweight models collapse to as low as 20.0\% success while simultaneously consuming more tokens than their frontier counterparts, a signature of compensatory over-reasoning rather than genuine planning capability. Project page: \href{https://sj-li.com/PROJ/PRISM}{link}.
Coordinated Diffusion: Generating Multi-Agent Behavior Without Multi-Agent Demonstrations
Imitation learning powered by generative models has proven effective for modeling complex single-agent behaviors. However, teaching multi-agent systems, like multiple arms or vehicles, to coordinate through imitation learning is hindered by a fundamental data bottleneck: as the joint state-action space grows exponentially with the number of agents, collecting a sufficient amount of coordinated multi-agent demonstrations becomes extremely costly. In this work, we ask: how can we leverage single-agent demonstration data to learn multi-agent policies? We present Coordinated Diffusion (CoDi), a framework that couples independently trained single-agent diffusion policies through a user-defined multi-agent cost function, without requiring any coordinated demonstrations. We derive a new diffusion-based sampling scheme wherein the diffusion score function decomposes into independent, single-agent pre-trained base policies plus a cost-driven guidance term that coordinates these base policies into cohesive multi-agent behavior. We show that this guidance term can be estimated in a gradient-free manner, making CoDi applicable to black-box, non-differentiable cost functions without additional training. Theoretically and empirically, we analyze the conditions under which this composition can faithfully approximate a target multi-agent behavior. We find a complementary role for demonstration data versus the cost function: single-agent demonstrations must cover the support of the desired multi-agent behavior, while the cost function must promote desired behavior from this product of single-agent policies. Our results in simulation and hardware experiments of a two-arm manipulation task show that CoDi discovers robust coordinated behavior from single-agent data, is more data-efficient than multi-agent baselines, and highlights the importance of joint guidance, base policy support, and cost design.
Offline Policy Evaluation for Manipulation Policies via Discounted Liveness Formulation
Policy evaluation is a fundamental component of the development and deployment pipeline for robotic policies. In modern manipulation systems, this problem is particularly challenging: rewards are often sparse, task progression of evaluation rollouts are often non-monotonic as the policies exhibit recovery behaviors, and evaluation rollouts are necessarily of finite length. This finite length introduces truncation bias, breaking the infinite-horizon assumptions underlying standard methods relying on Bellman equations/principle of optimality. In this work, we propose a framework for offline policy evaluation from sparse rewards based on a liveness-based Bellman operator. Our formulation interprets policy evaluation as a task-completion problem and yields a conservative fixed-point value function that is robust to finite-horizon truncation. We analyze the theoretical properties of the proposed operator, including contraction guarantees, and show how it encodes task progression while mitigating truncation bias. We evaluate our method on two simulated manipulation tasks using both a Vision-Language-Action model and a diffusion policy, and a cloth folding task using human demonstrations. Empirical results demonstrate that our approach more accurately reflects task progress and substantially reduces truncation bias, outperforming classical baselines such as TD(0) and Monte Carlo policy evaluation.
comment: Published at RSS 2026
TOPPO: Rethinking PPO for Multi-Task Reinforcement Learning with Critic Balancing
Soft Actor-Critic (SAC) and its variants dominate Multi-Task Reinforcement Learning (MTRL) due to their off-policy sample efficiency, while on-policy methods such as Proximal Policy Optimization (PPO) remain underexplored. We diagnose that PPO in MTRL suffers from a previously overlooked issue: critic-side gradient ill-conditioning, which may cause tail tasks to stall while easy tasks dominate the value function's updates. To address this, we propose TOPPO (Tail-Optimized PPO), a reformulation of PPO via Critic Balancing -- a set of modules that improve gradient conditioning and balance learning dynamics across tasks. Unlike prior approaches that rely on modular architectures or large models, TOPPO targets the optimization bottleneck within PPO itself. Empirically, TOPPO achieves stronger mean and tail-task performance than published SAC-family and ARS-family baselines while using substantially fewer parameters and environment steps on Meta-World+ benchmark. Notably, TOPPO matches or surpasses strong SAC baselines early in training and maintains superior performance at full budget. Ablations confirm the effectiveness of each module in TOPPO and provide insights into their interactions. Our results demonstrate that, with proper optimization, on-policy methods can rival or exceed off-policy approaches in MTRL, challenging the prevailing reliance on SAC and highlighting critic-side gradient conditioning as the central bottleneck.
Overcoming Dynamics-Blindness: Training-Free Pace-and-Path Correction for VLA Models
Vision-Language-Action (VLA) models achieve remarkable flexibility and generalization beyond classical control paradigms. However, most prevailing VLAs are trained under a single-frame observation paradigm, which leaves them structurally blind to temporal dynamics. Consequently, these models degrade severely in non-stationary scenarios, even when trained or finetuned on dynamic datasets. Existing approaches either require expensive retraining or suffer from latency bottlenecks and poor temporal consistency across action chunks. We propose Pace-and-Path Correction, a training-free, closed-form inference-time operator that wraps any chunked-action VLA. From a single quadratic cost, joint minimization yields a unified solution that decomposes orthogonally into two distinct channels. The pace channel compresses execution along the planned direction, while the path channel applies an orthogonal spatial offset, jointly absorbing the perceived dynamics within the chunk window. We evaluate our approach on a comprehensive diagnostic benchmark MoveBench designed to isolate motion as the sole controlled variable. Empirical results demonstrate that our framework consistently outperforms state-of-the-art training-free wrappers and dynamic-adaptive methods and improves success rates by up to 28.8% and 25.9% in absolute terms over foundational VLA models in dynamic-only and static-dynamic mixed environments, respectively.
Behavioral Mode Discovery for Fine-tuning Multimodal Generative Policies
We address the problem of fine-tuning pre-trained generative policies with reinforcement learning (RL) while preserving the multimodality of their action distributions. Existing methods for RL fine-tuning of generative policies (e.g., diffusion policies) improve task performance but often collapse diverse behaviors into a single reward-maximizing mode. To mitigate this issue, we propose an unsupervised mode discovery framework that uncovers latent behavioral modes within generative policies. The discovered modes enable the use of mutual information as an intrinsic reward, regularizing RL fine-tuning to enhance task success while maintaining behavioral diversity. Experiments on robotic manipulation tasks demonstrate that our method consistently outperforms conventional fine-tuning approaches, achieving higher success rates and preserving richer multimodal action distributions.
JACoP: Joint Alignment for Compliant Multi-Agent Prediction CVPR
Stochastic Human Trajectory Prediction (HTP) using generative modeling has emerged as a significant area of research. Although state-of-the-art models excel in optimizing the accuracy of individual agents, they often struggle to generate predictions that are collectively compliant, leading to output trajectories marred by social collisions and environmental violations, thus rendering them impractical for real-world applications. To bridge this gap, we present JACoP: Joint Alignment for Compliant Multi-Agent Prediction, an innovative multi-stage framework that ensures scene-level plausibility. JACoP incorporates an Anchor-Based Agent-Centric Profiler for effective initial compliance filtering and employs a Markov Random Field (MRF) based aligner to formalize the joint selection for scene predictions. By representing inter-agent spatial and social costs as MRF energy potentials, we successfully infer and sample from the joint trajectory distribution, achieving prediction with optimal scene compliance. Comprehensive experiments show that JACoP not only achieves competitive accuracy, but also sets a new standard in reducing both environmental violations and social collisions, thereby confirming its ability to produce collectively feasible and practically applicable trajectory predictions.
comment: Accepted by CVPRF 2026
Kairos: A Scalable Serving System for Physical AI
Physical AI is experiencing rapid growth with frontier foundation models increasing its capabilities across general environments. Physical AI tasks are characterized by inference properties that are markedly different from digital AI. They consist of multiple rounds of inference and action execution, generating a chunk of actions in each inference round, and asynchronously interleaving inference and execution. This makes existing digital AI serving systems unsuited for physical AI; a shortcoming that is critical for enabling their wide adoption, considering their size and the scale of the robot fleets they have to serve. To fill this gap, we design Kairos, the first multi-robot serving system that makes the generate-execute loop a first-class citizen, with active involvement in the execution phase. Across a wide range of physical AI models and robots, Kairos reduces the average end-to-end task latency by 31.8--66.5% over state-of-the-art digital AI serving practices, with gains scaling with the robot fleet size.
BiPneu: Design and Control of a Bipolar-Pressure Pneumatic System for Soft Robots
Positive-negative pressure regulation is critical to soft robotic actuators, enabling large motion ranges and versatile actuation modes. However, achieving high-performance regulation across both pressure polarities remains challenging due to asymmetric inflation-deflation dynamics, valve nonlinearities, and switching-induced flow disturbances. This paper presents BiPneu, a scalable and cost-efficient multi-channel bipolar-pressure pneumatic system for soft robots that enables wide-range, accurate, and responsive pressure regulation while providing seamless compatibility with high-level software ecosystems. A dual-mode sliding-mode controller (DM-SMC) with hysteresis-supervised mode selection is proposed based on a hybrid electro-pneumatic model. Extensive simulation and experiments demonstrate the superior performance of DM-SMC in tracking step and sinusoidal pressure references compared with both advanced model predictive controllers and well-tuned PID controllers. Experimental results show average absolute errors of 1.44 kPa in multi-step tests and 4.23 kPa in sinusoidal tracking, corresponding to reductions of 11.9% and 35.6% relative to PID control, along with improved control effort, valve switching rate, and transient response. Robustness of DM-SMC is further verified on a bellow actuator with pressure-dependent volume. Finally, BiPneu's capability is demonstrated via two soft robotic examples, quick ball-maneuvering with a soft parallel manipulator and real-time finite element method (FEM)-based teleoperation of a soft bellows actuator.
comment: Accepted by IEEE/ASME TMECH
Few-Shot Physics-Informed Neural Network for Shape Reconstruction of Concentric-Tube Robots
Modeling concentric tube robots (CTRs) involves complex nonlinear continuum mechanics, and despite recent progress, physics-based models often lack an accurate representation of the experimental setups. To overcome these limitations, deep neural network-based models have been explored as alternatives with superior accuracy; however, they often overlook known mechanics, require large training datasets, and typically discard shape estimation of the robot. We present a physics-informed neural network (PINN) for kinematic modeling of a 6-DoF CTR with three pre-curved tubes that embeds the Cosserat rod differential equations and learns from few-shot observational data, balancing physics priors with data-driven fitting. PINN enables full-state estimation of shape, twist angle, torsional strain, bending moment, and orientation. Benchmark tests show a mean shape error below 1% of the robot length and accurately recovered other kinematic states, outperforming a purely physics-based Cosserat rod model baseline while using a minimal training set. The resulting model is also computationally efficient and robust, making it well-suited for real-time control applications.
comment: to be published in 2026 IEEE International Conference on Robotics & Automation proceedings
Lifelong Learning in Vision-Language Models: Enhanced EWC with Cross-Modal Knowledge Retention
Large language-vision models (LVLMs) such as CLIP, Flamingo, and BLIP have revolutionized AI by enabling understanding across textual and visual modalities. These models excel at tasks like image captioning, visual question answering, and cross-modal retrieval. However, they face catastrophic forgetting when learning new tasks sequentially, particularly challenging in multi-modal settings where preserving cross-modal alignments adds complexity to the learning process. This paper presents a comprehensive continual learning framework for LVLMs that combines enhanced Elastic Weight Consolidation (EWC) with parameter-efficient fine-tuning techniques. We integrate multi-modal Fisher Information Matrix calculation, consistency preservation across modalities, and adaptive regularization that considers dependencies across visual and textual encoders. The framework achieves a 78% reduction in forgetting rates relative to naive sequential training approaches through extensive evaluation testing. The framework also preserves alignment between modalities during sequential learning with only 15% additional computational cost. This work advances the state of the art in lifelong learning for multi-modal AI systems, with direct applications to autonomous driving, intelligent robotic assistants, and adaptive robotic systems that must continuously learn in dynamic real-world environments.
comment: 8 pages, 5 figures, 1 table. Applications in autonomous driving, intelligent robotic assistants, and adaptive robotics systems
Emotional Expression in Low-Degrees-of-Freedom Robots: Assessing Perception with Reachy Mini
Emotion expression is central to human--robot interaction, yet little is known about how people interpret affect on robots with sparse, non-anthropomorphic expressive capabilities. This study examined how people perceive emotional expressions displayed by Reachy Mini (Pollen Robotics and Hugging Face), a low-degree-of-freedom (low-DoF) robot with a constrained and distinctly non-human expressive repertoire. In an online within-subjects study, 100 participants viewed 10 short video clips of Reachy Mini expressing different emotions and, for each clip, identified the perceived emotion, rated its valence and arousal, and evaluated the robot on social-perception traits. Exact emotion recognition was modest overall and varied considerably across expressions, with anger, sadness, and interest recognized more reliably than emotions such as love, pleasure, shame, and disgust. However, participants were generally more successful at recovering broader affective meaning than exact emotion labels, particularly along valence and arousal dimensions. Emotional expressions also shaped social evaluation, as positive expressions were perceived as warmer and more sociable than negative ones, and animacy varied less across conditions. These findings suggest that even constrained robotic expressions can communicate affective meaning and influence social impressions, positioning Reachy Mini as a useful benchmark for studying affective communication in low-DoF robots.
Adaptive Smooth Tchebycheff Attention for Multi-Objective Policy Optimization
Multi-objective reinforcement learning in robotic domains requires balancing complex, non-convex trade-offs between conflicting objectives. While linear scalarization methods provide stability, they are theoretically incapable of recovering solutions within non-convex regions of the Pareto front. Conversely, static non-linear scalarizations (e.g., Tchebycheff) can theoretically access these regions but often suffer from severe gradient variance and optimization instability in deep RL. In this work, we propose an Adaptive Smooth Tchebycheff framework that resolves this tension by dynamically modulating the curvature of the optimization landscape. We introduce a novel conflict-driven controller that regulates the optimization smoothness based on real-time gradient interference. This allows the agent to anneal toward precise, non-convex scalarization when objectives align, while elastically reverting to stable, smooth approximations when destructive gradient conflicts emerge. We validate our approach on a challenging robotic stealth visual search task -- a proxy for monitoring of protected/fragile ecosystems -- where an agent must balance search, exposure/interference minimization and exploration speed. Extensive ablations confirm that our conflict-aware adaptation enables the robust discovery of Pareto-optimal policies in non-convex regions inaccessible to linear baselines and unstable for static non-linear methods. Website: https://alejandromllo.github.io/research/pasta/
comment: To appear in the Proceedings of Robotics: Science and Systems (RSS) 2026
The Unified Autonomy Stack: Toward a Blueprint for Generalizable Robot Autonomy
We introduce and open-source the Unified Autonomy Stack, a system-level solution that enables resilient autonomy across diverse aerial and ground robot morphologies. The architecture centers on three synergistic modules -- multi-modal perception, multi-behavior planning, and multi-layered safe navigation -- that together deliver comprehensive mission autonomy. The stack fuses data from LiDAR, radar, vision, and inertial sensing, enabling (a) robust localization and mapping through factor graph-based fusion, (b) semantic scene understanding, (c) motion and informative path planning through sampling-based techniques adaptive across spatial scales, as well as (d) multi-layered safe navigation both through planning on the online reconstructed map and deep learning-driven exteroceptive policies alongside last-resort safety filters using control barrier functions. The resulting behaviors include safe GNSS-denied navigation into unknown and perceptually-degraded regions, exploration of complex environments, object discovery, and efficient inspection planning. The stack has been field-tested and validated on both aerial (rotorcraft) and ground (legged) robots operating in a host of demanding environments, including self-similar and smoke-filled settings, with complex geometries and high obstacle clutter. These tests demonstrate resilient performance in challenging conditions. To facilitate ease of adoption, we open-source the implementation alongside supporting documentation, validation, and evaluation datasets https://github.com/ntnu-arl/unified_autonomy_stack. A video giving the overview of the paper and the field experiments is available at https://youtu.be/l8Su8OXsM-E.
comment: 35 pages, 22 figures, 8 tables
A Five-Layer MLOps Architecture for Connected Automated Driving
The continual assurance of safety and performance of automated driving systems (ADSs) poses significant challenges. ADSs operate in complex, dynamic, open-world environments allowing a wide range of scenarios, including ones that are rare or not foreseen during initial development. While the incorporation of artificial intelligence (AI) and machine learning (ML) technology allows ADSs to learn from data gathered during operation and thus enables them to adapt over time, these approaches come with their own challenges. A key advantage of ADSs compared to human drivers is their greater ability to gather data collectively across a fleet of vehicles, or even across multiple fleets operated by different entities, and to learn from this data collectively. Vehicles can share and combine their data to identify additional learning opportunities otherwise missed by individual vehicles. This creates new opportunities to tackle the challenges of continual assurance of safety and performance, but requires the implementation of architectures that leverage the collective learning potential. Based on established MLOps principles and existing work in the field of connected automated driving, this paper presents a five-layer architecture for collective learning-enabled MLOps processes for ADSs. The goal of this architecture is to provide a conceptual blueprint for the design and implementation of MLOps processes by fleet operators and other relevant stakeholders. The paper describes the main responsibilities of each layer, their interactions, and how multi-level self-assessments enabled by the architecture can support the detection and reduction of edge cases including black swan events.
comment: 8 pages, 6 figures
Belief-Space Residual Risk for Automated Driving under Localization Uncertainty SC
Residual risk metrics have recently been introduced to assess the safety implications of automated driving systems. Existing approaches typically assume a deterministic ego pose and concentrate mainly on perception errors related to surrounding objects and latency effects. In practice, however, automated vehicles operate under considerable localization uncertainty, especially in complex urban settings and in adverse weather conditions. This work extends the spatial residual risk formulation to the belief space by explicitly modeling ego pose uncertainty as a Gaussian distribution. Residual risk is reformulated as the expected degradation-induced risk over the ego pose belief distribution. Within a particle-based risk estimation framework, localization uncertainty is incorporated into the computation of collision probabilities through covariance fusion of ego and object uncertainties.
comment: 7 Pages, this work has been accepted for publication in IEEE Intelligent Transportation Systems (ITSC) 2026. The final published version will be available via IEEE Xplore
3D RL-DWA: A Hybrid Reinforcement Learning and Dynamic Window Approach for Goal-Directed Local Navigation in Multi-DoF Robots
In this paper, we present a novel hybrid approach that combines Reinforcement Learning (RL) with Dynamic Window Approach (DWA) for adaptive 3D local navigation of high-degree-of-freedom robotic systems. Our method leverages sparse point cloud data to dynamically adjust both the motion and the shape of a deformable microrobot, enabling the system to navigate toward a goal in complex, constrained environments while maximizing the occupied volume. We evaluate our framework in a simulated vascular network. Experimental results, based on 1080 trials, indicate that integrating RL with a DWA-based local planner significantly enhances both deformation and navigation capabilities compared to a pure RL and a model-based methods. In particular, the proposed autonomous controller consistently achieves high deformation and near-perfect path completion during training and maintains robust performance in unseen scenarios. These findings highlight the potential of hybrid planning strategies for efficient and adaptive 3D navigation under sparse sensory conditions.
comment: Accepted to IEEE/ASME International Conference on Advanced Intelligent Mechatronics, AIM2026
Revealing Interpretable Failure Modes of VLMs
Vision-Language Models (VLMs) are increasingly used in safety-critical applications because of their broad reasoning capabilities and ability to generalize with minimal task-specific engineering. Despite these advantages, they can exhibit catastrophic failures in specific real-world situations, constituting failure modes. We introduce REVELIO, a framework for systematically uncovering interpretable failure modes in VLMs. We define a failure mode as a composition of interpretable, domain-relevant concepts-such as pedestrian proximity or adverse weather conditions-under which a target VLM consistently behaves incorrectly. Identifying such failures requires searching over an exponentially large discrete combinatorial space. To address this challenge, REVELIO combines two search procedures: a diversity-aware beam search that efficiently maps the failure landscape, and a Gaussian-process Thompson Sampling strategy that enables broader exploration of complex failure modes. We apply REVELIO to autonomous driving and indoor robotics domains, uncovering previously unreported vulnerabilities in state-of-the-art VLMs. In driving environments, the models often demonstrate weak spatial grounding and fail to account for major obstructions, leading to recommendations that would result in simulated crashes. In indoor robotics tasks, VLMs either miss safety hazards or behave excessively conservatively, producing false alarms and reducing operational efficiency. By identifying structured and interpretable failure modes, REVELIO offers actionable insights that can support targeted VLM safety improvements.
COSMIC: Concurrent Optimization of Structure, Material, and Integrated Control for robotic systems
Replicating and surpassing the autonomy of natural organisms remains a long-standing goal in robotics. Yet most robotic systems have their structure, materials, and control designed separately, in sharp contrast to the co-evolution in nature. This separation often leads to suboptimal designs, and we still have a limited understanding of the individual and collective contributions of these design entities. In this work, we propose a gradient-based co-design framework that simultaneously optimizes the topology, material distribution, and control policy of a truss-lattice robot. The framework embeds mixed-type topological and material variables into a continuous design space and integrates a neural network controller within a differentiable simulator, capturing their interactions and enabling efficient gradient calculation via automatic differentiation. Furthermore, we develop a constrained optimization to navigate the highly non-convex design landscape and jointly optimize all design entities. Case studies demonstrate that the proposed framework consistently discovers diverse locomotion strategies that outperform baselines obtained through separated design. The framework is also flexible to accommodate different functional requirements and boundary conditions. Using this framework, we further extract design insights that reveal the individual and collective effects of different entities on robotic performance. The proposed framework provides a computational foundation for the autonomous co-design of robotic systems, capable of reconfiguration, locomotion, and other complex autonomous behaviors.
Multistep Belief Space Dynamics Learning For Risk-Aware Control
As autonomous vehicles move from a simplified research setting to practical use, there exists a large gap between the dynamic behavior of a human driving and an autonomous system. Risk-aware behavior needs to naturally develop in order to scale to the demands of the real world. A major issue for risk-aware planning and control has been predicting how dynamical uncertainty evolves through time and optimizing plans that account for this without being overly conservative. Here, we present a learning framework to predict distributional dynamics that can be optimized in real time for Model Predictive Control (MPC). We explore the importance of structure when learning distributional dynamics for use in MPC. A rigorous ablation study is conducted on a large dataset of real world off-road driving that shows the impact of deviations from our proposed structure. Furthermore, we deploy our learned model and planning stack on a full sized vehicle in challenging off-road conditions. Our planning architecture is able to naturally regulate the speed of the vehicle based on the environment and consistently demonstrates intelligent behavior over miles of diverse terrain.
Driving Intents Amplify Planning-Oriented Reinforcement Learning
Continuous-action policies trained on a single demonstrated trajectory per scene suffer from mode collapse: samples cluster around the demonstrated maneuver and the policy cannot represent semantically distinct alternatives. Under preference-based evaluation, this caps best-of-N performance -- even oracle selection cannot recover what the sampling distribution does not contain. We introduce DIAL, a two-stage Driving-Intent-Amplified reinforcement Learning framework for preference-aligned continuous-action driving policies. In the first stage, DIAL conditions the flow-matching action head on a discrete intent label with classifier-free guidance (CFG), which expands the sampling distribution along distinct maneuver modes and breaks single-demonstration mode collapse. In the second stage, DIAL carries this expanded distribution into preference RL through multi-intent GRPO, which spans all intent classes within every preference group and prevents fine-tuning from re-collapsing around the currently preferred mode. Instantiated for end-to-end driving with eight rule-derived intents and evaluated on WOD-E2E: competitive Vision-to-Action (VA) and Vision-Language-Action (VLA) Supervised Finetuning (SFT) baselines plateau below the human-driven demonstration at best-of-128, with the strongest prior (RAP) capping at Rater Feedback Score (RFS) 8.5 even with best-of-64; intent-CFG sampling lifts this ceiling to RFS 9.14 at best-of-128, surpassing both the prior best (RAP 8.5) and the human-driven demonstration (8.13) for the first time; and multi-intent GRPO improves held-out RFS from 7.681 to 8.211, while every single-intent baseline peaks lower and degrades by training end. These results suggest that the bottleneck of preference RL on continuous-action policies trained from demonstrations is not only how to update the policy, but to expand and preserve the sampling distribution being optimized.
comment: Work in progress. Project page: https://mind-omni.github.io/
MindVLA-U1: VLA Beats VA with Unified Streaming Architecture for Autonomous Driving
Autonomous driving has progressed from modular pipelines toward end-to-end unification, and Vision-Language-Action (VLA) models are a natural extension of this journey beyond Vision-to-Action (VA). In practice, driving VLAs have often trailed VA on planning quality, suggesting that the difficulty is not simply model scale but the interface through which semantic reasoning, temporal context, and continuous control are combined. We argue that this gap reflects how VLA has been built -- as isolated subtask improvements that fail to compose into coherent driving capabilities -- rather than what VLA is. We present MindVLA-U1, the first unified streaming VLA architecture for autonomous driving. A unified VLM backbone produces autoregressive language tokens and flow-matching continuous action trajectories in a single forward pass over one shared representation, preserving the natural output form of each modality. A streaming design processes the driving video framewise rather than as fixed video-action chunks, while a learned memory channel carries temporal context across frames so planned trajectories evolve smoothly without redundant multi-frame VLM modeling. The unified architecture admits fast/slow execution on dense/sparse Mixture-of-Transformers (MoT) backbones via flexible self-attention context management, and exposes a measurable language-to-action route: a language-predicted driving intent steers action diffusion through classifier-free guidance (CFG), turning language-side intent into a control signal for continuous trajectory generation. On the long-tail WOD-E2E benchmark, MindVLA-U1 surpasses experienced human drivers for the first time (8.20 RFS vs. 8.13 GT RFS) with 2 diffusion steps, achieves state-of-the-art planning ADEs over prior VA/VLA methods by large margins, and matches VA-class throughput (16 FPS vs. RAP-DINO's 18 FPS) while preserving natural-language interfaces.
comment: Work in progress. Project page: https://mind-omni.github.io/
Action Emergence from Streaming Intent
We formalize action emergence as a target capability for end-to-end autonomous driving: the ability to generate physically feasible, semantically appropriate, and safety-compliant actions in arbitrary, long-tail traffic scenes through scene-conditioned reasoning rather than retrieval or interpolation of learned scene-action mappings. We show that previous paradigms cannot deliver action emergence: autoregressive trajectory decoders collapse the inherently multimodal future into a single averaged output, while diffusion and flow-matching generators express multimodality but are not steerable by reasoned intent. We propose Streaming Intent as a concrete way to approach action emergence: a mechanism that makes driving intent (i) semantically streamed through a continuous chain-of-thought that causally derives the intent from scene understanding, and (ii) temporally streamed across clips so that intent commitments remain coherent along the driving horizon. We realize Streaming Intent in a VLA model we call SI (Streaming Intent). SI autoregressively decodes a four-step chain-of-thought and emits an intent token; the decoded intent then drives classifier-free guidance (CFG) on a flow-matching action head, requiring only two denoising steps to generate the final trajectory. On the Waymo End-to-End benchmark, SI achieves competitive aggregate performance, with an RFS score of 7.96 on the validation set and 7.74 on the test set. Beyond aggregate metrics, the model demonstrates -- to our knowledge for the first time in a fully end-to-end VLA -- intent-faithful controllability: for a fixed scene, varying the intent class at inference yields qualitatively distinct yet consistently high-quality plans, arising purely from data-driven learning without any pre-built trajectory bank or hand-coded post-hoc selector.
comment: Work in progress. Project page: https://mind-omni.github.io/
Learning What Can Be Picked: Active Reachability Estimation for Efficient Robotic Fruit Harvesting
Agriculture remains a cornerstone of global health and economic sustainability, yet labor-intensive tasks such as harvesting high-value crops continue to face growing workforce shortages. Robotic harvesting systems offer a promising solution; however, their deployment in unstructured orchard environments is constrained by inefficient perception-to-action pipelines. In particular, existing approaches often rely on exhaustive inverse kinematics or motion planning to determine whether a target fruit is reachable, leading to unnecessary computation and delayed decision-making. Our approach combines RGB-D perception with active learning to directly learn reachability as a binary decision problem. We then leverage active learning to selectively query the most informative samples for reachability labeling, significantly reducing annotation effort while maintaining high predictive accuracy. Extensive experiments demonstrate that the proposed framework achieves accurate reachability prediction with substantially fewer labeled samples, yielding approximately 6--8% higher accuracy than random sampling and enabling label-efficient adaptation to new orchard configurations. Among the evaluated strategies, entropy- and margin-based sampling outperform Query-by-Committee and standard uncertainty sampling in low-label regimes, while all strategies converge to comparable performance as the labeled set grows. These results highlight the effectiveness of active learning for task-level perception in agricultural robotics and position our approach as a scalable alternative to computation-heavy kinematic reachability analysis. Our code is available through https://github.com/wsu-cyber-security-lab-ai/active-learning.
DreamPolicy: A Unified World-model Policy for Scalable Humanoid Locomotion
Achieving versatile humanoid locomotion with a single policy presents a critical scalability challenge. Prevailing methods often rely on distilling multiple terrain-specific teacher policies into a unified student policy. However, while such distillation captures basic locomotion primitives, it struggles to organically compose these skills to adapt to complex environments, resulting in poor generalization to novel composite terrains unseen during training. To overcome this, we present DreamPolicy, a unified framework that integrates offline data with a diffusion-based world model, enabling a single policy to master both known and unseen terrains. Central to our approach is a terrain-aware world model, driven by an autoregressive diffusion world model trained on aggregated rollouts from specialized policies. This model synthesizes physically plausible future trajectories, which serve as dynamic objectives for a conditioned policy, thereby bypassing manual reward engineering. Unlike distillation, our world model captures generalizable locomotion skills, allowing for robust zero-shot transfer to unseen composite terrains. DreamPolicy naturally scales with data availability. As the offline dataset expands, the diffusion world model continuously acquires richer skills. Experiments demonstrate that DreamPolicy outperforms the strongest baseline by up to 27\% on unseen terrains and 38\% on combined terrains. By unifying world model-based planning and policy learning, DreamPolicy breaks the "one task, one policy" bottleneck and establishes a scalable, data-driven paradigm for generalist humanoid control.
Certified Gradient-Based Contact-Rich Manipulation via Smoothing-Error Reachable Tubes
Gradient-based methods can efficiently optimize controllers by leveraging differentiable simulation and physical priors. However, contact-rich manipulation remains challenging because hybrid contact dynamics often produce discontinuous or vanishing gradients. Although smoothing the dynamics can restore informative gradients, the resulting model mismatch can cause controller failures when deployed on real systems. We address this trade-off by planning with smoothed dynamics while explicitly quantifying and compensating for the induced error, providing formal guarantees on safety and task completion under the original nonsmooth dynamics. Our approach applies smoothing to both contact dynamics and contact geometry within a differentiable simulator based on convex optimization, allowing us to characterize the deviation from the nonsmooth dynamics as a set-valued discrepancy. We incorporate this discrepancy into the optimization of time-varying affine feedback policies through analytical reachable sets, enabling robust constraint satisfaction for the closed-loop hybrid system while relying solely on the informative gradients of the smoothed model. By bridging differentiable simulation with set-valued robust control, our method produces affine feedback policies that respect the unilateral nature of contact. We evaluate our method on several contact-rich tasks, including planar pushing, object rotation, and in-hand dexterous manipulation, achieving certified constraint satisfaction with lower safety violations and smaller goal errors than baseline approaches.
comment: Robotics: Science & Systems (RSS) 2026
Simulation-Ready Cluttered Scene Estimation via Physics-aware Joint Shape and Pose Optimization
Estimating simulation-ready scenes from real-world observations is crucial for downstream planning and policy learning tasks. Regretfully, existing methods struggle in cluttered environments, often exhibiting prohibitive computational cost, poor robustness, and restricted generality when scaling to multiple interacting objects. We propose a unified optimization-based formulation for real-to-sim scene estimation that jointly recovers the shapes and poses of multiple rigid objects under physical constraints. Our method is built on two key technical innovations. First, we leverage the recently introduced shape-differentiable contact model, whose global differentiability permits joint optimization over object geometry and pose while modeling inter-object contacts. Second, we exploit the structured sparsity of the augmented Lagrangian Hessian to derive an efficient linear system solver whose computational cost scales favorably with scene complexity. Building on this formulation, we develop an end-to-end Simulation-ready Physics-Aware Reconstruction for Cluttered Scenes (SPARCS) pipeline, which integrates learning-based object initialization, physics-constrained joint shape-pose optimization, and differentiable texture refinement. Experiments on cluttered scenes with up to 5 objects and 22 convex hulls demonstrate that our approach robustly reconstructs physically valid, simulation-ready object shapes and poses.Project webpage: https://rory-weicheng.github.io/SPARCS/.
comment: Accepted to RSS 2026, camera-ready version; 17 pages, 15 figures
Failing Forward: Adaptive Failure-Informed Learning for Vision-Language-Action Models
Vision-language-action (VLA) models provide a promising paradigm for scalable robotic manipulation, yet their reliance on success-only behavioral cloning leaves them brittle; lacking corrective training signals, minor execution errors rapidly compound into unrecoverable, out-of-distribution failures. To address this limitation, we propose Adaptive Failure-Informed Learning (AFIL), an end-to-end framework that leverages failure trajectories as adaptive negative guidance for diffusion- and flow-based VLA policies. AFIL uses a pretrained VLA to generate failure rollouts online, avoiding the need for handcrafted failure-mode design or human-in-the-loop recovery. It then jointly trains Dual Action Generators (DAGs) for successful and failed behaviors while sharing a common vision-language backbone, enabling efficient failure-aware policy learning with limited parameter overhead. During sampling, the failure generator adaptively steers action generation away from failure-prone regions and toward more reliable success modes, with guidance strength determined by the per-diffusion-step distance between success and failure distributions. Experiments across in-domain and out-of-domain robotic manipulation tasks, covering both short- and long-horizon settings, show that AFIL consistently improves task success rates and robustness over existing VLA baselines, demonstrating its effectiveness, efficiency, and generality.
Simulation Distillation: Pretraining World Models in Simulation for Rapid Real-World Adaptation
Robot learning requires adaptation methods that improve reliably from limited, mixed-quality interaction data. This is especially challenging in long-horizon, contact-rich tasks, where end-to-end policy finetuning remains inefficient and brittle. World models offer a compelling alternative: by predicting the outcomes of candidate action sequences, they enable online planning through counterfactual reasoning. However, training action-conditioned robotic world models directly in the real world requires diverse data at impractical scale. We introduce Simulation Distillation (SimDist), a framework that uses physics simulators as a scalable source of action-conditioned robot experience. During pretraining, SimDist distills structural priors from the simulator into a world model that enables planning from raw real-world observations. During real-world adaptation, SimDist transfers the encoder, reward model, and value function learned in simulation, and updates only the latent dynamics model using real-world prediction losses. This reduces adaptation to supervised system identification while preserving dense, long-horizon planning signals for online improvement. Across contact-rich manipulation and quadruped locomotion tasks, SimDist rapidly improves with experience, while prior adaptation methods struggle to make progress or degrade during online finetuning. Project website and code: https://sim-dist.github.io
comment: Robotics: Science and Systems 2026
Trajectory First: A Curriculum for Discovering Diverse Policies
Being able to solve a task in diverse ways makes agents more robust to task variations and less prone to local optima. In this context, constrained diversity optimization has become a useful reinforcement learning (RL) framework for training a set of diverse agents in parallel. However, existing constrained-diversity RL methods often under-explore in complex tasks such as robot manipulation, resulting in limited behavioral diversity. We address this with a two-stage curriculum that introduces a spline-based trajectory prior as an inductive bias to produce diverse, high-reward behaviors in an initial stage, and then distills these behaviors into reactive, step-wise policies in a second stage. In our empirical evaluation, we provide novel insights into challenges of diversity-targeted training and show that our curriculum increases the diversity of learned skills while maintaining high task performance.
comment: Accepted into the Inductive Biases in Reinforcement Learning Workshop at RLC 2025
Tacmap: Bridging the Tactile Sim-to-Real Gap via Geometry-Consistent Penetration Depth Map
Vision-Based Tactile Sensors (VBTS) are essential for achieving dexterous robotic manipulation, yet the tactile sim-to-real gap remains a fundamental bottleneck. Current tactile simulations suffer from a persistent dilemma: simplified geometric projections lack physical authenticity, while high-fidelity Finite Element Methods (FEM) are too computationally prohibitive for large-scale reinforcement learning. In this work, we present Tacmap, a high-fidelity, computationally efficient tactile simulation framework anchored in volumetric penetration depth. Our key insight is to bridge the tactile sim-to-real gap by unifying both domains through a shared deform map representation. Specifically, we compute 3D intersection volumes as depth maps in simulation, while in the real world, we employ an automated data-collection rig to learn a robust mapping from raw tactile images to ground-truth depth maps. By aligning simulation and real-world in this unified geometric space, Tacmap minimizes domain shift while maintaining physical consistency. Quantitative evaluations across diverse contact scenarios demonstrate that Tacmap's deform maps closely mirror real-world measurements. Moreover, we validate the utility of Tacmap through an in-hand rotation task, where a policy trained exclusively in simulation achieves zero-shot transfer to a physical robot.
comment: 8 pages
IMPACT: An Implicit Active-Set Augmented Lagrangian for Fast Contact-Implicit Trajectory Optimization
Contact-implicit trajectory optimization (CITO) has attracted growing attention as a unified framework for planning and control in contact-rich robotic tasks. Recent approaches have demonstrated promising results in manipulation and locomotion without requiring a prescribed contact-mode schedule. It is well known that the underlying mathematical programs with complementarity constraints (MPCCs) remain numerically ill-conditioned, and systematic, scalable solution strategies for CITO remain an active area of research. More efficient and principled solvers that can handle contact constraints are therefore essential to broaden the applicability of CITO. In this work, we develop an augmented-Lagrangian approach to CITO for solving MPCC-based CITO with stationarity guarantees. The method can be interpreted as identifying the implicit contact-mode branches on the fly during the trajectory optimization (TO) iterations; we call this approach IMPACT (IMPlicit contact ACtive-set Trajectory optimization). We provide an efficient C++ implementation tailored to trajectory-optimization workloads and evaluate it on the open-source CITO and contact-implicit model predictive control (CI-MPC) benchmarks. On CITO, IMPACT achieves 2.9x-70x speedups over strong baselines (geometric mean 13.8x). On CI-MPC, we show improved control quality for contact-rich trajectories on dexterous manipulation tasks in simulation. Finally, we demonstrate the proposed method on real robotic hardware on a T-shaped object pushing task.
comment: Accepted to Robotics: Science and Systems (RSS), 2026
Modular Lie Algebraic PDE Control of Multibody Flexible Manipulators
This paper presents a subsystem-based adaptive control framework for serial flexible manipulators with an arbitrary number of links, in which the elastic deformation PDE of each link is carried through the entire control design without spatial discretization or modal truncation. All dynamic quantities -- rigid-body motion, elastic deformation, and inter-link constraint forces -- are expressed uniformly as body-fixed twists and wrenches within the se3 Lie-algebraic structure. A controllable form of the per-link dynamics is derived by substituting the strain-based deformation PDE into the dynamic equation, eliminating distributed elastic acceleration and yielding a model governed by the body-fixed twist acceleration and deformation field. Desired subsystem twist trajectories are generated via a deflection-compensating inverse kinematics procedure. A nominal per-link controller is proven to produce exponential twist error decay via a per-subsystem Lyapunov function. An adaptive modification replaces exact physical parameters with online estimates governed by a projection-based law, augmenting with a parameter estimation error term. Upon summing over all links, the interaction power terms telescope to zero by Newton's third law and the frame invariance of the natural power pairing on se3*se*(3), establishing exponential convergence of all twist errors and bounded elastic deformation under both nominal and adaptive controllers. The screw-theoretic structure renders interaction term cancellation exact, making the stability certificate modular and scalable to chains of arbitrary length. The framework is validated numerically on a two-link flexible manipulator in three-dimensional motion.
Interpreting Context-Aware Human Preferences for Multi-Objective Robot Navigation
Robots operating in human-shared environments must not only achieve task-level navigation objectives such as safety and efficiency, but also adapt their behavior to human preferences. However, as human preferences are typically expressed in natural language and depend on environmental context, it is difficult to directly integrate them into low-level robot control policies. In this work, we present a pipeline that enables robots to understand and apply context-dependent navigation preferences by combining foundational models with a Multi-Objective Reinforcement Learning (MORL) navigation policy. Thus, our approach integrates high-level semantic reasoning with low-level motion control. A Vision-Language Model (VLM) extracts structured environmental context from onboard visual observations, while Large Language Models (LLM) convert natural language user feedback into interpretable, context-dependent behavioral rules stored in a persistent but updatable rule memory. A preference translation module then maps contextual information and stored rules into numerical preference vectors that parameterize a pretrained MORL policy for real-time navigation adaptation. We evaluate the proposed framework through quantitative component-level evaluations, a user study, and real-world robot deployments in various indoor environments. Our results demonstrate that the system reliably captures user intent, generates consistent preference vectors, and enables controllable behavior adaptation across diverse contexts. Overall, the proposed pipeline improves the adaptability, transparency, and usability of robots operating in shared human environments, while maintaining safe and responsive real-time control.
IGV-RRT: Prior-Real-Time Observation Fusion for Active Object Search in Changing Environments
Object Goal Navigation (ObjectNav) in temporally changing indoor environments is challenging because object relocation can invalidate historical scene knowledge. To address this issue, we propose a probabilistic planning framework that combines uncertainty-aware scene priors with online target relevance estimates derived from a Vision Language Model (VLM). The framework contains a dual-layer semantic mapping module and a real-time planner. The mapping module includes an Information Gain Map (IGM) built from a 3D scene graph (3DSG) during prior exploration to model object co-occurrence relations and provide global guidance on likely target regions. It also maintains a VLM score map (VLM-SM) that fuses confidence-weighted semantic observations into the map for local validation of the current scene. Based on these two cues, we develop a planner that jointly exploits information gain and semantic evidence for online decision making. The planner biases tree expansion toward semantically salient regions with high prior likelihood and strong online relevance (IGV-RRT), while preserving kinematic feasibility through gradient-based analysis. Simulation and real-world experiments demonstrate that the proposed method effectively mitigates the impact of object rearrangement, achieving higher search efficiency and success rates than representative baselines in complex indoor environments.
Retrieve-then-Steer: Online Success Memory for Test-Time Adaptation of Generative VLAs
Vision-Language-Action (VLA) models show strong potential for general-purpose robotic manipulation, yet their closed-loop reliability often degrades under local deployment conditions. Existing evaluations typically treat test episodes as independent zero-shot trials. However, real robots often operate repeatedly in the same or slowly changing environments, where successful executions provide environment-verified evidence of reliable behavior patterns. We study this persistent-deployment setting, asking whether a partially competent frozen VLA can improve its reliability by reusing its successful test-time experience. We propose an online success-memory guided test-time adaptation framework for generative VLAs. During deployment, the robot stores progress-calibrated successful observation-action segments in a long-term memory. At inference, it retrieves state-relevant action chunks, filters inconsistent candidates via trajectory-level consistency, and aggregates them into an elite action prior. To incorporate this prior into action generation, we introduce confidence-adaptive prior guidance, which injects the elite prior into an intermediate state of the flow-matching action sampler and adjusts the guidance strength based on retrieval confidence. This design allows the frozen VLA to exploit environment-specific successful experience while preserving observation-conditioned generative refinement. This retrieve-then-steer mechanism enables lightweight, non-parametric test-time adaptation without requiring parameter updates. Simulation and real-world experiments show improved task success and closed-loop stability, especially in long-horizon and multi-stage tasks.
CoRAL: Contact-Rich Adaptive LLM-based Control for Robotic Manipulation
While Large Language Models (LLMs) and Vision-Language Models (VLMs) demonstrate remarkable capabilities in high-level reasoning and semantic understanding, applying them directly to contact-rich manipulation remains a challenge due to their lack of explicit physical grounding and inability to perform adaptive control. To bridge this gap, we propose CoRAL (Contact-Rich Adaptive LLM-based control), a modular framework that enables zero-shot planning by decoupling high-level reasoning from low-level control. Unlike black-box policies, CoRAL uses LLMs not as direct controllers, but as cost designers that synthesize context-aware objective functions for a sampling-based motion planner (MPPI). To address the ambiguity of physical parameters in visual data, we introduce a neuro-symbolic adaptation loop: a VLM provides semantic priors for environmental dynamics, such as mass and friction estimates, which are then explicitly refined in real time via online system identification, while the LLM iteratively modulates the cost-function structure to correct strategic errors based on interaction feedback. Furthermore, a retrieval-based memory unit allows the system to reuse successful strategies across recurrent tasks. This hierarchical architecture ensures real-time control stability by decoupling high-level semantic reasoning from reactive execution, effectively bridging the gap between slow LLM inference and dynamic contact requirements. We validate CoRAL on both simulation and real-world hardware across challenging and novel tasks, such as flipping objects against walls by leveraging extrinsic contacts. Experiments demonstrate that CoRAL outperforms state-of-the-art VLA and foundation-model-based planner baselines by boosting success rates over 50% on average in unseen contact-rich scenarios, effectively handling sim-to-real gaps through its adaptive physical understanding.
comment: 22 pages, 9 figures, 3 tables. Accepted to Robotics: Science and Systems (RSS) 2026. Updated to camera-ready version with appendix and text/formatting revisions
DarkQA: Benchmarking Vision-Language Models on Visual-Primitive Question Answering in Low-Light Indoor Scenes
Vision Language Models (VLMs) are increasingly adopted as central reasoning modules for embodied agents. Existing benchmarks evaluate their capabilities under ideal, well-lit conditions, yet robust 24/7 operation demands performance under a wide range of visual degradations, including low-light conditions at night or in dark environments, a core necessity that has been largely overlooked. To address this underexplored challenge, we present DarkQA, an open-source benchmark for evaluating perceptual primitives under multi-level low-light conditions in embodied scenarios. DarkQA evaluates single-view egocentric observations across controlled degradation levels, isolating low-light perceptual failures before they are entangled with complex embodied tasks. The benchmark contains 9.4K deterministically generated and verifiable question-image pairs spanning five visual-primitive families. A key design feature of DarkQA is its physical fidelity: visual degradations are modeled in linear RAW space, simulating physics-based illumination drop and sensor noise followed by an ISP-inspired rendering pipeline; we further validate the synthesis against real paired low-light camera data. We evaluate representative VLMs and Low-Light Image Enhancement (LLIE) preprocessing methods. Results show consistent VLM degradation under low illumination and sensor noise, while LLIE provides severity-dependent but unstable recovery. We demonstrate the utility of DarkQA by evaluating a wide range of state-of-the-art VLMs and Low-Light Image Enhancement (LLIE) models, and systematically reveal VLMs' limitations when operating under these challenging visual conditions. Our code and benchmark dataset will be released upon acceptance. Project website: https://darkqa-benchmark.github.io
comment: This work has been submitted to the IEEE for possible publication
Constraint-Aware Diffusion Priors for High-Fidelity and Versatile Quadruped Locomotion
Reinforcement learning combined with imitation learning has significantly advanced biomimetic quadrupedal locomotion. However, scaling these frameworks to massive, multi-source datasets exposes fundamental bottlenecks. First, traditional GAN-based discriminators are prone to mode collapse, struggling to capture diverse motion distributions from uncurated datasets. Second, existing kinematic priors suffer from out-of-distribution (OOD) tracking conflicts, leading to severe unintended heading drifts during complex maneuvers. Furthermore, deploying unconstrained priors to physical hardware poses critical safety risks by disregarding actuator dynamics. To overcome these challenges, we propose Diff-CAST (Diffusion-guided Constraint-Aware Symmetric Tracking), a novel motion prior framework leveraging the multi-modal distribution modeling capabilities of diffusion models for stylistic rewards. Diff-CAST effectively replaces traditional GAN discriminators, unlocking robust data scaling on heterogeneous collections. To ensure high-fidelity intent execution and reliable real-world deployment, we introduce a comprehensive Sim2Real architecture integrating Symmetric Augmented Command Conditioning (SACC) for drift-free tracking, and Constrained RL for hardware safety. Experiments on a quadruped demonstrate that Diff-CAST mitigates mode collapse, enables seamless transitions between diverse skills, and ensures robust, hardware-compliant locomotion.
Active inference as a unified model of collision avoidance behavior in human drivers
Collision avoidance -- involving a rapid threat detection and quick execution of the appropriate evasive maneuver -- is a critical aspect of driving. However, existing models of human collision avoidance behavior are fragmented, focusing on specific scenarios or only describing certain aspects of the avoidance behavior, such as response times. This paper addresses these gaps by proposing a novel computational cognitive model of human collision avoidance behavior based on active inference. Active inference provides a unified approach to modeling human behavior: the minimization of free energy. Building on prior active inference work, our model incorporates established cognitive mechanisms such as evidence accumulation to simulate human responses in two distinct collision avoidance scenarios: front-to-rear lead vehicle braking and lateral incursion by an oncoming vehicle. We demonstrate that our model explains a wide range of previous empirical findings on human collision avoidance behavior. Specifically, the model closely reproduces both aggregate results from meta-analyses previously reported in the literature and detailed, scenario-specific effects observed in a recent driving simulator study, including response timing, maneuver selection, and execution. Our results highlight the potential of active inference as a unified framework for understanding and modeling human behavior in complex real-life driving tasks.
SOAR: Regression-based LiDAR Relocalization for UAVs
Regression-based LiDAR relocalization has recently emerged as a promising solution for high-precision positioning in GNSS-denied environments. However, these methods are primarily tailored to autonomous driving, exhibiting significantly degraded accuracy in unmanned aerial vehicle (UAV) scenarios due to arbitrary pose variations and irregular flight paths. In this paper, we propose SOAR, a regression-based LiDAR relocalization framework for UAVs. Specifically, we introduce a locality-preserving sliding window attention module with locally invariant positional encoding to capture discriminative geometric structures robust to viewpoint changes. A coordinate-independent feature initialization module is further designed to eliminate sensitivity to global transformations. Furthermore, most existing UAV datasets are limited to evaluate LiDAR relocalization in real-world, due to the lack of synchronized LiDAR scans, accurate 6-DoF poses, or multiple traversals. Thus, we construct a large-scale UAV LiDAR localization dataset with 4 scenes and 13 irregular paths exhibiting rotation and altitude variations, providing a more realistic benchmark for UAVs. Extensive experiments demonstrate that our method achieves state-of-the-art performance, improving the localization success rate by 40% and reducing mean error over 10m on UAVLoc. Our code and dataset will be released soon.
comment: 24 pages, 14 figures
Efficient Emotion-Aware Iconic Gesture Prediction for Robot Co-Speech
Co-speech gestures increase engagement and improve speech understanding. Most data-driven robot systems generate rhythmic beat-like motion, yet few integrate semantic emphasis. To address this, we propose a lightweight transformer that derives iconic gesture placement and intensity from text and emotion alone, requiring no audio input at inference time. The model outperforms GPT-4o in both semantic gesture placement classification and intensity regression on the BEAT2 dataset, while remaining computationally compact and suitable for real-time deployment on embodied agents.
SAGAS: Semantic-Aware Graph-Assisted Stitching for Offline Temporal Logic Planning
Linear Temporal Logic (LTL) provides a rigorous framework for specifying long-horizon robotic tasks, yet existing approaches face a trade-off: model-based synthesis relies on accurate labeled transition systems, whereas learning-based methods often require online interaction, task-specific rewards, or specification-conditioned training. We study LTL-specified robotic planning and execution in a stricter offline, model-free setting, where the agent is given only fixed, task-agnostic trajectory fragments, with no dynamics model, task demonstrations, or online data collection. To address this setting, we propose SAGAS, a framework that combines the compositionality of symbolic synthesis with the data-driven reachability structure learned from offline trajectories. SAGAS first learns a reusable latent reachability graph and a frozen goal-conditioned executor from fragmented offline data. For each new LTL formula, it performs task-time semantic graph augmentation to ground state-defined propositions on the learned graph, and applies Büchi product search to synthesize a cost-aware accepting prefix--suffix waypoint plan executed by the frozen executor. By shifting formula-specific reasoning from policy learning to test-time graph augmentation and symbolic search, SAGAS enables zero-shot generalization to unseen, data-supported LTL specifications without task-specific reward design, policy retraining, or online interaction. Experiments on LTL task suites constructed from OGBench locomotion domains show that this design produces executable and cost-efficient prefix--suffix behaviors for diverse unseen LTL tasks from fragmented offline data.
A comprehensive control architecture for semi-autonomous dual-arm robots in agriculture settings
The adoption of mobile robotic platforms in complex environments, such as agricultural settings, requires these systems to exhibit a flexible yet effective architecture that integrates perception and control. In such scenarios, several tasks need to be accomplished simultaneously, ranging from managing robot limits to performing operational tasks and handling human inputs. The purpose of this paper is to present a comprehensive control architecture for achieving complex tasks such as robotized harvesting in vineyards within the framework of the European project CANOPIES. In detail, a 16-DOF dual-arm mobile robot is employed, controlled via a Hierarchical Quadratic Programming (HQP) approach capable of handling both equality and inequality constraints at various priorities to harvest grape bunches selected by the perception system developed within the project. Furthermore, given the complexity of the scenario and the uncertainty in the perception system, which could potentially lead to collisions with the environment, the handling of interaction forces is necessary. Remarkably, this was achieved using the same HQP framework. This feature is further leveraged to enable semi-autonomous operations, allowing a human operator to assist the robotic counterpart in completing harvesting tasks. Finally, the obtained results are validated through extensive testing conducted first in a laboratory environment to prove individual functionalities, then in a real vineyard, encompassing both autonomous and semi-autonomous grape harvesting operations.
HeteroGenManip: Generalizable Manipulation For Heterogeneous Object Interactions
Generalizable manipulation involving cross-type object interactions is a critical yet challenging capability in robotics. To reliably accomplish such tasks, robots must address two fundamental challenges: "where to manipulate" (contact point localization) and "how to manipulate" (subsequent interaction trajectory planning). Existing foundation-model-based approaches often adopt end-to-end learning that obscures the distinction between these stages, exacerbating error accumulation in long-horizon tasks. Furthermore, they typically rely on a single uniform model, which fails to capture the diverse, category-specific features required for heterogeneous objects. To overcome these limitations, we propose HeteroGenManip, a task-conditioned, two-stage framework designed to decouple initial grasp from complex interaction execution. First, Foundation-Correspondence-Guided Grasp module leverages structural priors to align the initial contact state, thereby significantly reducing the pose uncertainty of grasping. Subsequently, Multi-Foundation-Model Diffusion Policy (MFMDP) routes objects to category-specialized foundation models, integrating fine-grained geometric information with highly-variable part features via a dual-stream cross-attention mechanism. Experimental evaluations demonstrate that HeteroGenManip achieves robust intra-category shape and pose generalization. The framework achieves an average 31% performance improvement in simulation tasks with broad type setting, alongside a 36.7% gain across four real-world tasks with different interaction types.
STL-Based Motion Planning and Uncertainty-Aware Risk Analysis for Human-Robot Collaboration with a Multi-Rotor Aerial Vehicle
This paper presents a motion planning and risk analysis framework for enhancing human-robot collaboration with a Multi-Rotor Aerial Vehicle. The proposed method employs Signal Temporal Logic to encode key mission objectives, including safety, temporal requirements, and human preferences, with particular emphasis on ergonomics and comfort. An optimization-based planner generates dynamically feasible trajectories while explicitly accounting for the vehicle's nonlinear dynamics and actuation constraints. To address the resulting non-convex and non-smooth optimization problem, smooth robustness approximations and gradient-based techniques are adopted. In addition, an uncertainty-aware risk analysis is introduced to quantify the likelihood of specification violations under human-pose uncertainty. A robustness-aware event-triggered replanning strategy further enables online recovery from disturbances and unforeseen events by preserving safety margins during execution. The framework is validated through MATLAB and Gazebo simulations on an object handover task inspired by power line maintenance scenarios. Results demonstrate the ability of the proposed method to achieve safe, efficient, and resilient human-robot collaboration under realistic operating conditions.
comment: 46 pages, 14 figures
BEACON: Cross-Domain Co-Training of Generative Robot Policies via Best-Effort Adaptation
We introduce BEACON--Best-Effort Adaptation for Cross-Domain Co-Training--a theory-driven framework for training generative robot policies with abundant source demonstrations and limited target demonstrations. BEACON casts cross-domain co-training as a discrepancy-aware importance-reweighting problem, jointly learning a diffusion-based visuomotor policy and per-sample source weights that minimize an objective informed by target-domain generalization guarantees. To make best-effort adaptation practical for high-dimensional sequence policies, we develop scalable instance-level discrepancy estimators, stochastic alternating updates for policy and weights, and a multi-source extension that balances heterogeneous source domains. Across sim-to-sim, sim-to-real, and multi-source manipulation settings, BEACON improves robustness and data efficiency over target-only, fixed-ratio co-training, and feature-alignment baselines. Importantly, even without an explicit alignment objective, BEACON achieves feature alignment as an implicit result of discrepancy-aware cross-domain co-training.
Priority-Driven Control and Communication in Decentralized Multi-Agent Systems via Reinforcement Learning
Event-triggered control provides a mechanism for avoiding excessive use of constrained communication bandwidth in networked multi-agent systems. However, most existing methods rely on accurate system models, which may be unavailable in practice. In this work, we propose a model-free, priority-driven reinforcement learning algorithm that learns communication priorities and control policies jointly from data in decentralized multi-agent systems. By learning communication priorities, we circumvent the hybrid action space typical in event-triggered control with binary communication decisions. We evaluate our algorithm on benchmark tasks and demonstrate that it outperforms the baseline method.
comment: Accepted to the 23rd IFAC World Congress
SHIELD: Scalable Optimal Control with Certification using Duality and Convexity
We present SHIELD, a hierarchical algorithm that reduces both the decision-variable dimension and the constraint set in $\ell_1$-regularized convex programs. From strong convexity and Lagrangian duality, we derive certificates that \emph{safely} discard constraints and decision variables while guaranteeing that all removed constraints remain satisfied and all removed variables are null. To further accelerate the proposed algorithm, we propose a transformer-based deep neural network to guide the dual certificate inference. We validate SHIELD on stochastic model predictive control (SMPC) in complex, multi-modal traffic scenarios, comparing against a full-dimensional SMPC policy. Numerical simulations demonstrate order-of-magnitude computational speedups while preserving feasibility and closed-loop safety, highlighting the practicality of certifiably safe, lightweight MPC in complex driving scenes.
Action Hallucination in Generative Vision-Language-Action Models
Robot Foundation Models, such as VLAs, promise end-to-end generative robot policies with broad generalization. Yet it remains unclear whether they fundamentally resolve the core problem of action generation in embodied settings, or overcome the long-standing challenges of robotics. We address this question by analyzing action hallucinations that violate physical constraints and their extension to plan-level failures. Focusing on latent-variable generative policies, we show that hallucinations can arise from structural mismatches between feasible robot behavior and common model architectures. We study three such barriers -- topological, precision, and horizon -- and show how they impose unavoidable tradeoffs. Our analysis provides mechanistic explanations for reported empirical failures of generative robot policies and suggests principled directions for improving reliability and trustworthiness, without abandoning their expressive power.
comment: 24 pages; updated setup with minor changes to proofs. changed template
Feasible Force Set Shaping for a Payload-Carrying Platform Consisting of Tiltable Multiple UAVs Connected Via Passive Hinge Joints
This paper presents a method for shaping the feasible force set of a payload-carrying platform composed of multiple Unmanned Aerial Vehicles (UAVs) and proposes a control law that leverages the advantages of this shaped force set. The UAVs are connected to the payload through passively rotatable hinge joints. The joint angles are controlled by the differential thrust produced by the rotors, while the total force generated by all the rotors is responsible for controlling the payload. The shape of the set of the total force depends on the tilt angles of the UAVs, which allows us to shape the feasible force set by adjusting these tilt angles. This paper aims to ensure that the feasible force set encompasses the required shape, enabling the platform to generate force redundantly -meaning in various directions. We then propose a control law that takes advantage of this redundancy.
comment: This work has been accepted to IFAC for publication under a Creative Commons Licence CC-BY-NC-ND
ReflectDrive-2: Reinforcement-Learning-Aligned Self-Editing for Discrete Diffusion Driving
We introduce ReflectDrive-2, a masked discrete diffusion planner with separate action expert for autonomous driving that represents plans as discrete trajectory tokens and generates them through parallel masked decoding. This discrete token space enables in-place trajectory revision: AutoEdit rewrites selected tokens using the same model, without requiring an auxiliary refinement network. To train this capability, we use a two-stage procedure. First, we construct structure-aware perturbations of expert trajectories along longitudinal progress and lateral heading directions and supervise the model to recover the original expert trajectory. We then fine-tune the full decision--draft--reflect rollout with reinforcement learning (RL), assigning terminal driving reward to the final post-edit trajectory and propagating policy-gradient credit through full-rollout transitions. Full-rollout RL proves crucial for coupling drafting and editing: under supervised training alone, inference-time AutoEdit improves PDMS by at most $0.3$, whereas RL increases its gain to $1.9$. We also co-design an efficient reflective decoding stack for the decision--draft--reflect pipeline, combining shared-prefix KV reuse, Alternating Step Decode, and fused on-device unmasking. On NAVSIM, ReflectDrive-2 achieves $91.0$ PDMS with camera-only input and $94.8$ PDMS in a best-of-6 oracle setting, while running at $31.8$ ms average latency on NVIDIA Thor.
Transformer-Based Autonomous Driving Models and Deployment-Oriented Compression: A Survey
Transformer-based models are becoming a central paradigm in autonomous driving because they can capture long-range spatial dependencies, multi-agent interactions, and multimodal context across perception, prediction, and planning. At the same time, their deployment in real vehicles remains difficult because high-capacity attention-based architectures impose substantial latency, memory, and energy overhead. This survey reviews representative Transformer-based autonomous driving models and organizes them by task role, sensing configuration, and architectural design. More importantly, it examines these models from a deployment-oriented perspective and analyzes how efficiency constraints reshape model design choices in practice. We further review compression and acceleration strategies relevant to Transformer-based driving systems, including quantization, pruning, knowledge distillation, low-rank approximation, and efficient attention, and discuss their benefits, limitations, and task-dependent applicability. Rather than treating compression as an isolated post-processing step, we highlight it as a system-level design consideration that directly affects deployability, robustness, and safety. Finally, we identify open challenges and future research directions toward standardized, safety-aware, and hardware-conscious evaluation of efficient autonomous driving systems.
INSANE: Cross-Domain UAV Data Sets with Increased Number of Sensors for developing Advanced and Novel Estimators
For real-world applications, autonomous mobile robotic platforms must be capable of navigating safely in a multitude of different and dynamic environments with accurate and robust localization being a key prerequisite. To support further research in this domain, we present the INSANE data sets - a collection of versatile Micro Aerial Vehicle (MAV) data sets for cross-environment localization. The data sets provide various scenarios with multiple stages of difficulty for localization methods. These scenarios range from trajectories in the controlled environment of an indoor motion capture facility, to experiments where the vehicle performs an outdoor maneuver and transitions into a building, requiring changes of sensor modalities, up to purely outdoor flight maneuvers in a challenging Mars analog environment to simulate scenarios which current and future Mars helicopters would need to perform. The presented work aims to provide data that reflects real-world scenarios and sensor effects. The extensive sensor suite includes various sensor categories, including multiple Inertial Measurement Units (IMUs) and cameras. Sensor data is made available as raw measurements and each data set provides highly accurate ground truth, including the outdoor experiments where a dual Real-Time Kinematic (RTK) Global Navigation Satellite System (GNSS) setup provides sub-degree and centimeter accuracy (1-sigma). The sensor suite also includes a dedicated high-rate IMU to capture all the vibration dynamics of the vehicle during flight to support research on novel machine learning-based sensor signal enhancement methods for improved localization. The data sets and post-processing tools are available at: https://sst.aau.at/cns/datasets
comment: V2 with added dataset comparison tables
Picasso: Holistic Scene Reconstruction with Physics-Constrained Sampling
In the presence of occlusions and measurement noise, geometrically accurate scene reconstructions -- which fit the sensor data -- can still be physically incorrect. For instance, when estimating the poses and shapes of objects in the scene and importing the resulting estimates into a simulator, small errors might translate to implausible configurations including object interpenetration or unstable equilibrium. This makes it difficult to predict the dynamic behavior of the scene using a digital twin, an important step in simulation-based planning and control of contact-rich behaviors. In this paper, we posit that object pose and shape estimation requires reasoning holistically over the scene (instead of reasoning about each object in isolation), accounting for object interactions and physical plausibility. Towards this goal, our first contribution is Picasso, a physics-constrained reconstruction pipeline that builds multi-object scene reconstructions by considering geometry, non-penetration, and physics. Picasso relies on a fast rejection sampling method that reasons over multi-object interactions, leveraging an inferred object contact graph to guide samples. Second, we propose the Picasso dataset, a collection of 10 contact-rich real-world scenes with ground truth annotations, as well as a metric to quantify physical plausibility, which we open-source as part of our benchmark. Finally, we provide an extensive evaluation of Picasso on our newly introduced dataset and on the YCB-V dataset, and show it largely outperforms the state of the art while providing reconstructions that are both physically plausible and more aligned with human intuition.
comment: 15 pages, accepted to Robotics: Science and Systems (RSS) 2026
Vision-Based Hand Shadowing for Robotic Manipulation via Inverse Kinematics
Teleoperation of low-cost robotic manipulators remains challenging due to the difficulty of retargeting human hand motion to robot joint commands. We present an offline hand-shadowing inverse-kinematics (IK) retargeting pipeline driven by a single egocentric RGB-D camera mounted on 3D-printed glasses. The pipeline detects 21 hand landmarks per hand using MediaPipe Hands, deprojects them into 3D via depth sensing, transforms them into the robot coordinate frame, and solves a damped-least-squares IK problem to produce joint commands for the SO-ARM101 robot (5 arm + 1 gripper joints). A gripper controller maps thumb-index finger geometry to grasp aperture with a multi-level fallback hierarchy. Actions are previewed in a physics simulation before replay on the physical robot. We evaluate the pipeline on a structured pick-and-place benchmark (5-tile grid, 10 grasps per tile, 3 independent runs) achieving an 86.7% +/- 4.2% success rate, and compare it against four vision-language-action (VLA) policies (ACT, SmolVLA, pi_0.5, GR00T N1.5) trained on leader-follower teleoperation data. We provide a quantitative error analysis of the pipeline, reporting a mean IK position error of 36.4 mm, trajectory smoothness metrics showing 57-68% jerk reduction from EMA smoothing, and an ablation study over the smoothing parameter. We also test the pipeline in unstructured real-world environments (grocery store, pharmacy) and find that success is reduced to 9.3% due to hand occlusion by surrounding objects. To mitigate this, we integrate WiLoR as an alternative hand detector, achieving an 8% improvement in hand detection rate over MediaPipe, highlighting both the promise and current limitations of marker-free analytical retargeting.
comment: v2: accepted at IEEE Access (2026); minor revisions per peer review, added WiLoR occlusion-mitigation experiment, error analysis, EMA ablation, and author photos
Robust and Safe Multi-Agent Reinforcement Learning with Communication for Autonomous Vehicles: From Simulation to Hardware
Deep multi-agent reinforcement learning (MARL) has been demonstrated effectively in simulations for multi-robot problems. For autonomous vehicles, the development of vehicle-to-vehicle (V2V) communication technologies provide opportunities to further enhance system safety. However, zero-shot transfer of simulator-trained MARL policies to dynamic hardware systems remains challenging, and how to leverage communication and shared information for MARL has limited demonstrations on hardware. This problem is challenged by discrepancies between simulated and physical states, system state and model uncertainties, practical shared information design, and the need for safety guarantees in both simulation and hardware. This paper designs RSR-RSMARL, a novel Robust and Safe MARL framework that supports Real-Sim-Real (RSR) policy adaptation for multi-agent systems with communication among agents, with both simulation and hardware demonstrations. RSR-RSMARL leverages state (includes shared state information among agents) and action representations considering real system complexities for MARL formulation. The MARL policy is trained with robust MARL algorithm to enable zero-shot transfer to hardware considering the sim-to-real gap. A safety shield module using Control Barrier Functions (CBFs) provides safety guarantee for each individual agent. Experimental results on 1/10th-scale autonomous vehicles with V2V communication demonstrate the ability of RSR-RSMARL framework to enhance driving safety and coordination across multiple configurations. These findings emphasize the importance of jointly designing robust policy representations and modular safety architectures to enable scalable, generalizable RSR transfer in multi-agent autonomy.
comment: 15 pages, 5 Figures
Multi-Modal World Model for Physical Robot Interactions: Simultaneous Visual and Tactile Predictions for Enhanced Accuracy
Predicting the outcomes of robotic actions, often referred to as learning a world model, in complex environments remains a fundamental challenge in robotics. Existing approaches primarily rely on visual observations and action inputs to generate video-based predictions, frequently overlooking the critical role of tactile feedback in understanding physical interactions. In this work, we investigate the integration of tactile and visual information within predictive perception systems for physical robot interaction. We demonstrate that visuo-tactile prediction provides the greatest benefits in physically ambiguous interaction regimes, while improvements are naturally limited when object dynamics are visually inferable. Furthermore, we introduce two novel robot-pushing datasets collected using a magnetic-based tactile sensor for unsupervised learning. The first dataset comprises visually identical objects with varying physical properties, explicitly isolating physical ambiguity, while the second mirrors existing robot-pushing benchmarks involving clusters of household objects. Our results show that tactile-visual integration improves prediction accuracy and robustness under physical ambiguity, while offering limited gains in visually unambiguous settings. Code and datasets are publicly available.
comment: This paper is accepted for publication in Robotics and Autonomous Systems
An Efficient Insect-inspired Approach for Visual Point-goal Navigation
In this work we develop a novel insect-inspired model for visual point-goal navigation. This combines abstracted models of two insect brain structures that have been implicated, respectively, in associative learning and path integration. We draw an analogy between the formal benchmark of the Habitat point-goal navigation task and the ability of insects to discover, learn, and refine visually guided paths around obstacles between a discovered food location and their nest. We demonstrate that the simple insect-inspired model exhibits performance comparable to recent state-of-the-art models at many orders of magnitude less computational cost. Testing in a more realistic simulated environment shows the approach is robust to perturbations.
comment: This work has been submitted to the IEEE for possible publication
When to Act, Ask, or Learn: Uncertainty-Aware Policy Steering
Policy steering is an emerging way to adapt robot behaviors at deployment-time: a learned verifier analyzes low-level action samples proposed by a pre-trained policy (e.g., diffusion policy) and selects only those aligned with the task. While Vision-Language Models (VLMs) are promising general-purpose verifiers due to their reasoning capabilities, existing frameworks often assume these models are well-calibrated. In practice, the overconfident judgment from VLM can degrade the steering performance under both high-level semantic uncertainty in task specifications and low-level action uncertainty or incapability of the pre-trained policy. We propose uncertainty-aware policy steering (UPS), a framework that jointly reasons about semantic task uncertainty and low-level action feasibility, and selects an uncertainty resolution strategy: execute a high-confidence action, clarify task ambiguity via natural language queries, or ask for action interventions to correct the low-level policy when it is deemed incapable at the task. We leverage conformal prediction to calibrate the composition of the VLM and the pre-trained base policy, providing statistical assurances that the verifier selects the correct strategy. After collecting interventions during deployment, we employ residual learning to improve the capability of the pre-trained policy, enabling the system to learn continually but with minimal expensive human feedback. We demonstrate our framework through experiments in simulation and on hardware, showing that UPS can disentangle confident, ambiguous, and incapable scenarios and minimizes expensive user interventions compared to uncalibrated baselines and prior human- or robot-gated continual learning approaches. Videos can be found at https://jessie-yuan.github.io/ups/
comment: To appear in Robotics: Science and Systems 2026
When Backdoors Meet Partial Observability: Attacking Real-World Reinforcement Learning
Backdoor attacks can cause reinforcement learning (RL) policies to behave normally under clean inputs while executing malicious behaviors when triggers are present. Existing RL backdoor attacks are primarily studied in simulation and often assume that attackers can reliably manipulate the observations driving policy decisions. This assumption becomes fragile in real-world deployment, where RL policies commonly rely on multimodal observations. Attackers can manipulate visual inputs through physical triggers, but auxiliary states such as LiDAR and odometry signals remain uncontrollable and vary across trajectories. We study this overlooked challenge and propose a diffusion-guided backdoor attack framework (DGBA) for real-world RL. DGBA uses small printable visual patches as triggers and learns a stochastic trigger distribution via conditional diffusion to maintain consistent attack activation under varying uncontrollable states. We further introduce an advantage-based poisoning strategy that injects triggers only at decision-critical training states. Experiments on a physical TurtleBot3 platform show that DGBA consistently outperforms prior RL backdoor attacks while preserving normal task performance. Demo videos and code are available in the supplementary material.
OptMap: Geometric Map Distillation via Submodular Maximization
Autonomous robots rely on geometric maps to inform a diverse set of perception and decision-making algorithms. As autonomy requires reasoning and planning on multiple scales, each algorithm may require a different map for optimal performance. LiDAR sensors generate an abundance of geometric data (up to 50 MB per second) to satisfy these diverse requirements. However, the point-based operations required to process perception data are both memory and computationally expensive. Such operations can be bypassed via learned representations that encode similarity, but selecting informative, size-constrained maps remains an NP-hard combinatorial problem. In this work we present OptMap: a geometric map distillation algorithm which achieves online, application-specific map generation via multiple theoretical and algorithmic innovations. A central feature is the maximization of set functions that exhibit diminishing returns, i.e., submodularity, using polynomial-time algorithms with provably near-optimal solutions. We formulate a novel submodular reward function which quantifies informativeness, reduces input set sizes, and minimizes solution bias. Further, we propose a dynamically reordered streaming submodular algorithm which improves empirical solution quality and addresses input order bias via an online approximation of the value of all scans. Testing was conducted on open-source and custom datasets with an emphasis on long-duration mapping sessions, highlighting OptMap's minimal computation requirements. OptMap's practical value is then illustrated through its application to online geometric change detection. Open-source ROS1 and ROS2 packages are available and can be used alongside any LiDAR odometry algorithm.
Multiagent Systems
Predicting Decisions of AI Agents from Limited Interaction through Text-Tabular Modeling
AI agents negotiate and transact in natural language with unfamiliar counterparts: a buyer bot facing an unknown seller, or a procurement assistant negotiating with a supplier. In such interactions, the counterpart's LLM, prompts, control logic, and rule-based fallbacks are hidden, while each decision can have monetary consequences. We ask whether an agent can predict an unfamiliar counterpart's next decision from a few interactions. To avoid real-world logging confounds, we study this problem in controlled bargaining and negotiation games, formulating it as target-adaptive text-tabular prediction: each decision point is a table row combining structured game state, offer history, and dialogue, while $K$ previous games of the same target agent, i.e., the counterpart being modeled, are provided in the prompt as labeled adaptation examples. Our model is built on a tabular foundation model that represents rows using game-state features and LLM-based text representations, and adds LLM-as-Observer as an additional representation: a small frozen LLM reads the decision-time state and dialogue; its answer is discarded, and its hidden state becomes a decision-oriented feature, making the LLM an encoder rather than a direct few-shot predictor. Training on 13 frontier-LLM agents and testing on 91 held-out scaffolded agents, the full model outperforms direct LLM-as-Predictor prompting and game+text features baselines. Within this tabular model, Observer features contribute beyond the other feature schemes: at $K=16$, they improve response-prediction AUC by about 4 points across both tasks and reduce bargaining offer-prediction error by 14%. These results show that formulating counterpart prediction as a target-adaptive text-tabular task enables effective adaptation, and that hidden LLM representations expose decision-relevant signals that direct prompting does not surface.
Events as Triggers for Behavioral Diversity in Multi-Agent Reinforcement Learning
Effective multi-agent cooperation requires agents to adopt diverse behaviors as task conditions evolve-and to do so at the right moment. Yet, current Multi-Agent Reinforcement Learning (MARL) frameworks that facilitate this diversity are still limited by the fact that they bind fixed behaviors to fixed agent identities. Consequently, they are ill-equipped for tasks where agents need to take on different roles at very specific moments in time. We argue that, to define these behavioral transitions, the missing ingredient is events. Events are changes in the state of the system that induce qualitative changes in the task. Based on this view, we introduce a framework that decouples agent identity from behavior, capturing a continuous manifold from which agents instantiate their behaviors in response to events. This framework is based on two elements. First, to build an expressive behavior manifold, we introduce Neural Manifold Diversity (NMD), a formal distance metric that remains well-defined when behaviors are transient and agent-agnostic. Second, we use an event-based hypernetwork that generates Low-Rank Adaptation (LoRA) modules over a shared team policy, enabling on-the-fly agent-policy reconfiguration in response to events. We prove that this construction ensures that diversity does not interfere with reward maximization by design. Empirical results demonstrate that our framework outperforms established baselines across benchmarks while exhibiting zero-shot generalization, and being the only method that solves tasks requiring sequential behavior reassignment.
Attacks and Mitigations for Distributed Governance of Agentic AI under Byzantine Adversaries
Agentic AI governance is a critical component of agentic AI infrastructure ensuring that agents follow their owner's communication and interaction policies, and providing protection against attacks from malicious agents. The state-of-the-art solution, SAGA, assumes a logically centralized point of trust, the Provider, which serves as a repository for user and agent information and actively enforces policies. While SAGA provides protection against malicious agents, it remains vulnerable to a malicious Provider that deviates from the protocol, undermining the security of the identity and access control infrastructure. Deployment on both private and public clouds, each susceptible to insider threats, further increases the risk of Provider compromise. In this work, we analyze the attacks that can be mounted from a compromised Provider, taking into account the different system components and realistic deployments. We identify and execute several concrete attacks with devastating effects: undermining agent attributability, extracting private data, or bypassing access control. We then present three types of solutions for securing the Provider that offer different trade-offs between security and performance. We first present SAGA-BFT, a fully byzantine-resilient architecture that provides the strongest protection, but incurs significant performance degradation, due to the high-cost of byzantine resilient protocols. We then propose SAGA-MON and SAGA-AUD, two novel solutions that leverage lightweight server-side monitoring or client-side auditing to provide protection against most classes of attacks with minimal overhead. Finally, we propose SAGA-HYB, a hybrid architecture that combines byzantine-resilience with monitoring and auditing to trade-off security for performance. We evaluate all the architectures and compare them with SAGA. We discuss which solution is best and under what conditions.
comment: 18 pages, 18 figures, 4 tables
Intermediate Artifacts as First-Class Citizens: A Data Model for Durable Intermediate Artifacts in Agentic Systems
Many AI systems are organized around loops in which models reason, call tools, observe results, and continue until a task is complete. These systems often produce final artifacts such as memos, plans, recommendations, and analyses, while the intermediate work that shaped those outputs remains ephemeral. For multi-step, revisable AI work, final artifacts are often lossy projections over upstream state. We argue that such systems should preserve durable, inspectable intermediate artifacts: typed, structured, addressable, versioned, dependency-aware, authoritative, and consumable by downstream computation. These artifacts are not the model's private chain-of-thought. They are maintained work products such as evidence maps, claim structures, criteria, assumptions, plans, transformation rules, synthesis procedures, unresolved tensions, and partial products that later humans and agents can inspect, revise, supersede, and improve. The contribution is a systems-level data model. We distinguish intermediate artifacts from chat transcripts, memory, hidden chain-of-thought, narration, thinking, and final answers; formalize additive and superseding update semantics with explicit current-state resolution; describe how artifact lineage supports durable intermediate state across revisions; and argue that evaluation must target maintained-state quality, not only final-output quality. The claim is not that artifacts make models smarter. It is that durable intermediate artifacts make AI-generated work more inspectable, revisable, and maintainable over time.
comment: 18 pages, 1 figure, 3 tables
SkillSafetyBench: Evaluating Agent Safety under Skill-Facing Attack Surfaces
Reusable skills are becoming a common interface for extending large language model agents, packaging procedural guidance with access to files, tools, memory, and execution environments. However, this modularity introduces attack surfaces that are largely missed by existing safety evaluations: even when the user request is benign, task-relevant skill materials or local artifacts can steer an agent toward unsafe actions. We present SkillSafetyBench, a runnable benchmark for evaluating such skill-mediated safety failures. SkillSafetyBench includes 155 adversarial cases across 47 tasks, 6 risk domains, and 30 safety categories, each evaluated with a case-specific rule-based verifier. Experiments with multiple CLI agents and model backends show that localized non-user attacks can consistently induce unsafe behavior, with distinct failure patterns across domains, attack methods, and scaffold-model pairings. Our findings suggest that agent safety depends not only on model-level alignment, but also on how agents interpret skills, trust workflow context, and act through executable environments.
Adaptive TD-Lambda for Cooperative Multi-agent Reinforcement Learning
TD($λ$) in value-based MARL algorithms or the Temporal Difference critic learning in Actor-Critic-based (AC-based) algorithms synergistically integrate elements from Monte-Carlo simulation and Q function bootstrapping via dynamic programming, which effectively addresses the inherent bias-variance trade-off in value estimation. Based on that, some recent works link the adaptive $λ$ value to the policy distribution in the single-agent reinforcement learning area. However, because of the large joint action space from multiple number of agents, and the limited transition data in Multi-agent Reinforcement Learning, the policy distribution is infeasible to be calculated statistically. To solve the policy distribution calculation problem in MARL settings, we employ a parametric likelihood-free density ratio estimator with two replay buffers instead of calculating statistically. The two replay buffers of different sizes store the historical trajectories that represent the data distribution of the past and current policies correspondingly. Based on the estimator, we assign Adaptive TD($λ$), \textbf{ATD($λ$)}, values to state-action pairs based on their likelihood under the stationary distribution of the current policy. We apply the proposed method on two competitive baseline methods, QMIX for value-based algorithms, and MAPPO for AC-based algorithms, over SMAC benchmarks and Gfootball academy scenarios, and demonstrate consistently competitive or superior performance compared to other baseline approaches with static $λ$ values.
AgentDisCo: Towards Disentanglement and Collaboration in Open-ended Deep Research Agents
In this paper, we present AgentDisCo, a novel Disentangled and Collaborative agentic architecture that formulates deep research as an adversarial optimization problem between information exploration and exploitation. Unlike existing approaches that conflate these two processes into a single module, AgentDisCo employs a critic agent to evaluate generated outlines and refine search queries, and a generator agent to retrieve updated results and revise outlines accordingly. The iteratively refined outline is then passed to a downstream report writer that synthesizes a comprehensive research report. The overall workflow supports both handcrafted and automatically discovered design strategies via a meta-optimization harness, in which the generator agent is repurposed as a scoring agent to evaluate critic outputs and generate quality signals. Powerful code-generation agents (e.g., Claude-Code, Codex) systematically explore agent configurations and construct a policy bank, a structured repository of reusable design strategies, enabling the framework to self-refine without extensive human intervention. We evaluate AgentDisCo on three established deep research benchmarks (DeepResearchBench, DeepConsult, DeepResearchGym) using Gemini-2.5-Pro, achieving performance comparable to or surpassing leading closed-source systems. Observing that existing benchmarks inadequately reflect real-world user needs, we introduce GALA (General AI Life Assistants), a benchmark that mines latent research interests from users' historical browsing behavior. We further develop a rendering agent that converts research reports into visually rich poster presentations, and demonstrate an end-to-end product, AutoResearch Your Interest, which delivers personalized deep research recommendations derived from individual browsing histories.
A Research Agenda on Agents and Software Engineering: Outcomes from the Rio A2SE Seminar
The rise of agentic AI is reshaping software engineering in two intertwined directions: agents are increasingly applied to support software engineering tasks, and Agentic AI systems themselves are complex systems that require re-thinking currently established software engineering practices. To chart a coherent research agenda covering the two directions, we organized the A2SE seminar in Rio de Janeiro, bringing together 18 experts from academia and industry. Through structured presentations, collaborative topic clustering, and focused group discussions, participants identified six thematic areas: Governance, Software Engineering for Agents, Agents for Software Architecture, Quality and Evaluation, Sustainability, and Code, and they prioritized short-term and long-term research directions for each. This paper presents the resulting community-driven, opinionated research agenda, offering the SE community a structured foundation for coordinating efforts at this critical juncture.
comment: 6 pages, 1 table, A2SE meeting, https://sites.google.com/view/a2se2026/home
Shaping Zero-Shot Coordination via State Blocking
Zero-shot coordination (ZSC) aims to enable agents to cooperate with independently trained partners without prior interaction, a key requirement for real-world multi-agent systems and human-AI collaboration. Existing approaches have largely emphasized increasing partner diversity during training, yet such strategies often fall short of achieving reliable generalization to unseen partners. We introduce State-Blocked Coordination (SBC), a simple yet effective framework that improves ZSC by inducing diverse interaction scenarios without direct environment modification. Specifically, SBC generates a family of virtual environments through state blocking, allowing agents to experience a wide range of suboptimal partner policies. Across multiple benchmarks, SBC demonstrates superior performance in zero-shot coordination, including strong generalization to human partners.
comment: 9 technical page followed by references and appendix
GeomHerd: A Forward-looking Herding Quantification via Ricci Flow Geometry on Agent Interactive Simulations
Herding -- where agents align their behaviors and act collectively -- is a central driver of market fragility and systemic risk. Existing approaches to quantify herding rely on price-correlation statistics, which inherently lag because they only detect coordination after it has already moved realised returns. We propose GeomHerd, a forward-looking geometric framework that bypasses this observability lag by quantifying coordination directly on upstream agent-interaction graphs. To generate these graphs, we treat a heterogeneous LLM-driven multi-agent simulator -- each financial trader instantiated by a persona-conditioned LLM call -- as a forecastable world, and evaluate the geometric pipeline on the Cividino--Sornette continuous-spin agent-based substrate as our headline financial testbed. By tracking the discrete Ollivier--Ricci curvature of these action graphs, GeomHerd captures the structural topology of emerging coordination. Theoretically, we establish a mean-field bridge mapping our graph-theoretic metric to CSAD, the classical macroscopic herding statistic, linking GeomHerd to downstream price-dispersion measurement. Empirically, GeomHerd anticipates herding long before aggregate market baselines: on the continuous-spin substrate, our primary detector fires a median of 272 steps before order-parameter onset; a contagion detector ($β_{-}$) recalls 65% of critical trajectories 318 steps early; and on co-firing trajectories the agent-graph signal precedes price-correlation-graph baselines by 40 steps. As a complementary indicator, the effective vocabulary of agent actions contracts during cascades. The geometric signature transfers out-of-domain to the Vicsek self-driven-particle model, and a curvature-conditioned forecasting head reduces cascade-window log-return MAE over detector-conditioned and price-only baselines.
Hierarchical LLM-Driven Control for HAPS-Assisted UAV Networks: Joint Optimization of Flight and Connectivity
Uncrewed aerial vehicles (UAVs) are increasingly deployed in complex networked environments, yet the joint optimization of multi-UAV motion control and connectivity remains a fundamental challenge. In this paper, we study a multi-UAV system operating in an integrated terrestrial and non-terrestrial network (ITNTN) comprising terrestrial base stations and high-altitude platform stations (HAPS). We consider a three-dimensional (3D) aerial highway scenario where UAVs must adapt their motion to ensure collision avoidance, efficient traffic flow, and reliable communication under dynamic and partially observable conditions. We first model the problem as a hierarchical multi-objective partially observable Markov decision process (H-MO-POMDP), capturing the coupling between control and communication objectives. Based on this formulation, we propose a large language model (LLM)-driven hierarchical multi-rate control framework. At the global level, an LLM-based controller on the HAPS performs long-term planning for load balancing and handover decisions. At the local level, each UAV employs a hybrid controller that integrates a slow-timescale LLM for high-level spatial reasoning with a reinforcement learning agent for faster UAV-to-infrastructure (U2I) communication and motion control. We further develop a high-fidelity 3D simulation platform by integrating the gym-pybullet-drones environment with 3GPP-compliant RF/THz channel models. Numerical results demonstrate that the proposed framework significantly outperforms state-of-the-art baselines, achieving a 14% increase in transportation efficiency and a 25% improvement in telecommunication throughput. Additionally, it achieves a 23% reduction in physical collision rates, demonstrating strong handover stability and zero-shot generalization in dynamic scenarios.
comment: Submission for possible publication
Distance-Constrained Unlabeled Multi-Agent Pathfinding
We study a graph pathfinding problem Distance-$r$ Independent Unlabeled Multi-Agent Pathfinding, finding a set of collision-free paths between two sets where agents must stay at pairwise distance at least $r+1$ at all times. This additional constraint, generalizing collision modeling for classical MAPF, targets aspects of real-world multi-agent coordination. This additional distance constraint makes feasibility (i.e., whether a solution exists) PSPACE-complete, in contrast to standard (unlabeled) MAPF, where it can be decided in polynomial time. We address the challenge via two complementary approaches: (i) reduction-based optimal algorithms with a feasibility-preserving compression procedure, and (ii) a configuration generator-based search. Despite the hardness, empirical results show that our algorithm can handle hundreds of agents in a practical timeframe.
Digital Identity for Agentic Systems: Toward a Portable Authorization Standard for Autonomous Agents
Enterprise AI is shifting from copilots to autonomous agents capable of executing workflows, negotiating outcomes, and making decisions with limited human oversight. As these systems extend across organizational boundaries, identity alone is insufficient: an agent's authority must also be explicit, constrained, auditable, revocable, and consistently interpretable by independent receivers. This paper analyzes representative enterprise use cases in insurance claims processing and supply chain integrity to surface structural gaps in existing identity and access models. It proposes a portable authorization model for autonomous agents based on issuer-authored authorization payloads, typed constraint algebra, decision-consistent evaluation semantics, delegation attenuation, governed semantic resolution, fail-closed processing, and pre-flight discovery. The model separates credential containers, authorization payload semantics, and enforcement engines, allowing profiles such as JWT/JWS, Verifiable Credentials, OAuth Rich Authorization Requests, or policy-engine bindings to preserve a common authorization meaning across trust boundaries.
comment: 46 pages, 10 figures
Predictive Maps of Multi-Agent Reasoning: A Successor-Representation Spectrum for LLM Communication Topologies
Practitioners deploying multi-agent large language model (LLM) systems must currently choose between communication topologies such as chain, star, mesh, and richer variants without any pre-inference diagnostic for which topology will amplify drift, converge to consensus, or remain robust under perturbation. Existing evaluation answers these questions only post hoc and only for the task measured. We introduce a structural diagnostic for multi-agent LLM communication graphs based on the successor representation $M = (I - γP)^{-1}$ of the row-stochastic communication operator, and we connect three of its spectral quantities, the spectral radius $ρ(M)$, the spectral gap $Δ(M)$, and the condition number $κ(M)$, to three distinct failure modes. We derive closed-form spectra for the chain, star, and mesh under row-stochastic normalization, and validate the predictions on a 12-step structured state-tracking task with Qwen2.5-7B-Instruct over 100 independent trials. The condition number is a perfect rank-order predictor of empirical perturbation robustness ($r_s = 1.0$); the spectral gap partially predicts consensus dynamics ($r_s = 0.5$); and the spectral radius is perfectly \emph{inverted} with respect to cumulative error ($r_s = -1.0$). We trace this inversion to a regime in which linear spectra are blind to non-contracting bias drift, and we propose an affine-noise extension of the predictive map that recovers the empirical ordering. We read this as a first step toward representational, drift-aware structural diagnostics for multi-agent LLM systems, sitting alongside classical spectral and consensus theory.
Mechanism Plausibility in Generative Agent-Based Modeling
Large language models (LLMs) can generate high-level diverse phenomena without explicitly programmed rules. This capability has led to their adoption within different agent-based models (ABMs) and social simulations. Recently, research has aim to test whether they are capable of generating different phenomena of interest, for example, human behavior on social media platforms or performance in game-theoretic scenarios. However, capability, prediction, and explanation are different -- drawing from the philosophy of science and mechanisms literature, \textit{explanation} requires showing, to some degree, how a phenomenon is produced by related organized entities and activities. For modelers, describing the characteristics of an experiment or whether a simulation provides progress in capability (or explanation), can be difficult without being grounded in potentially distant research areas. We integrate recent work on LLM-ABMs with contemporary philosophy of science literature and use it to operationalize a definition of `plausibility' in a four-level scale. Our scale separates the evaluation of a model's generative sufficiency (ability to reproduce a phenomenon) from its mechanistic plausibility (how the phenomenon could be produced), and clarifies the distinct roles of different models, such as predictive and explanatory ones. We introduce this as the Mechanism Plausibility Scale.
comment: Accepted at ACM FAccT 2026
Time and Supply Fairness in Electricity Distribution using $k$-times bin packing
Given items of different sizes and a fixed bin capacity, the bin-packing problem is to pack these items into the minimum number of bins such that the sum of the item sizes in each bin does not exceed the capacity. We define a new variant, k-times bin-packing (kBP), in which the goal is to pack the items so that each item appears exactly k times in k different bins. We generalize existing approximation algorithms for bin-packing to solve kBP and analyze their performance ratios. The fair electricity division problem motivates the study of kBP. The goal is to allocate the available supply among households using some fairness criteria, such as the egalitarian principle. We prove that every electricity division problem can be solved by k-times bin-packing for some finite k, which depends only on the number of households. We implement generalizations of the First-Fit and First-Fit Decreasing bin-packing algorithms to solve kBP and apply them to real electricity demand data. We show that our generalizations outperform existing heuristic solutions to the same problem in terms of the egalitarian allocation of connection time. We study another variant of the egalitarian allocation problem, in which the goal is to maximize the minimum number of watts allocated to a household. For this variant, we prove an impossibility result: there does not exist such a k that depends only on the number of agents. This impossibility result motivates us to develop four different heuristic algorithms to solve the egalitarian allocation of watts problem. We evaluate the heuristics by summing the minimum watts allocated to any household in each hour, yielding a fairness metric that reflects the lowest watt allocation across all hours. A higher total minimum of watts indicates a more equitable distribution. Thus, we establish new benchmarks for fair allocation of watts.
comment: 58 pages, 10 figure, 6 tables,. arXiv admin note: substantial text overlap with arXiv:2311.16742
Synthesizing the Expert: A Validated Multimodal Dataset for Trustworthy AI-Assisted Swimming Coaching
This research is primarily concerned with the critical problem of synthesizing a structured Retrieval-Augmented Generation (RAG) system for advanced AI applications in the domain of swimming. As the integration of Artificial Intelligence in sports science matures, its applications in swimming have become increasingly diverse, spanning from real-time technical coaching and talent scouting to comprehensive performance profiling and the dynamic personalization of training periodization. Within this landscape, RAG-based systems represent a pivotal advancement in Large Language Model (LLM) enhanced swimming analysis, as they allow for the grounding of generative outputs in authoritative domain knowledge, thereby ensuring the credibility of AI-generated advice, contextually and technically. Despite this potential, building robust RAG systems using only real-world aquatic data presents significant challenges, including ethical constraints regarding athlete biometrics, and the high cost of manual expert labeling. To address these barriers, we propose a novel generative framework that leverages a multimodal knowledge base gathered across four dimensions: physiological data, physiological literature, kinematic sensor data, and unstructured domain expertise. Our proposed framework utilizes a multi-agent LLM architecture to synthesize a high-fidelity dataset of 1,864 validated "Question-Context-Answer" triplets-drawn from 1,914 drafts evaluated against 12 physiological soundness rules. By providing a structured, synthetic ground truth, this work establishes a foundational benchmark for trustworthy AI in aquatics. The outcomes of this research promise to enhance the reliability of automated coaching and open a plethora of future directions in "Meta-Agent" development and athletic profiling, ultimately bridging the gap between raw data engineering and practical sports science application.
BEHAVE: A Hybrid AI Framework for Real-Time Modeling of Collective Human Dynamics
Existing AI systems for modeling human behavior operate at the level of individuals or detect events after they occur. As a result, they systematically fail to capture the collective dynamics that determine whether a group remains stable or transitions into escalation or breakdown. We propose a different foundation: a group of interacting humans constitutes a complex dynamical system in the precise mathematical sense, exhibiting emergence, nonlinearity, feedback loops, sensitivity near critical points, and phase transitions between qualitatively distinct regimes. The state of such a system is not located within any single participant; it is distributed across mutual influence loops and observable through the micro-dynamics of the body. We introduce BEHAVE (Behavioral Engine for Human Activity Vector Estimation), a formal framework that models collective dynamics as continuous behavioral fields defined over an interaction space derived from observable physical signals. Kinematic micro-signals (position, velocity, body orientation, gestural activity) are structured into a directed interaction graph and aggregated into a basis of behavioral fields capturing distinct, non-redundant axes of collective state. The framework rests on one theorem and two structural propositions characterizing the tension field, the field basis, and the criticality index. Perception and forecasting layers are implemented using neural models, enabling data-driven learning and approximation of system dynamics. BEHAVE is formulated as a computational system for learning, representing, and forecasting collective dynamics from data. A working pipeline is demonstrated on a 7-agent negotiation snapshot. The same fields, recalibrated, apply to crowd safety, crisis-team dynamics, education, and clinical contexts.
comment: 19 pages
CHAL: Council of Hierarchical Agentic Language
Multi-agent debate has emerged as a promising approach for improving LLM reasoning on ground-truth tasks, yet current methodologies face certain structural limitations: debate tends to induce a martingale over belief trajectories, majority voting accounts for most observed gains, and LLMs exhibit confidence escalation rather than calibration across rounds. We argue that the genuine value of debate, and dialectic systems as a whole, lies not in ground-truth tasks but in defeasible domains, where every position can in principle be defeated by better reasoning. We present the Council of Hierarchical Agentic Language (CHAL), a multi-agent dialectic framework that treats defeasible argumentation as an engine for belief optimization. Each agent maintains a CHAL Belief Schema (CBS), a graph-structured belief representation with a Bayesian-inspired architecture, that facilitates belief revision through a gradient-informed dynamic mechanism by leveraging the strength of the belief's thesis as a differentiable objective. Meta-cognitive value systems spanning epistemology, logic, and ethics are elevated to configurable hyperparameters governing agent reasoning and adjudication outcomes. We provide a series of ablation experiments that demonstrate systematic and interpretable effects: the adjudicator's value system determines the debate's overall trajectories in latent belief space, council diversity refines beliefs for all participants, and the framework generalizes across broad fields. CHAL is, to our knowledge, the first framework to treat multi-agent debate as structured belief optimization over defeasible domains. Further, the auditable belief artifacts it produces establish the foundation for dedicated evaluation suites for defeasible argumentation, with broader implications for building AI systems whose reasoning and value commitments are transparent, aligned, and subject to human oversight.
Macro-Action Based Multi-Agent Instruction Following through Value Cancellation
Multi-agent reinforcement learning (MARL) in real-world use cases may need to adapt to external natural language instructions that interrupt ongoing behavior and conflict with long-horizon objectives. However, conditioning rewards on instructions introduces a fundamental failure mode as Bellman updates couple value estimates across instruction contexts, leading to inconsistent values when instructions interrupt macro-actions. We propose Macro-Action Value Correction for Instruction Compliance (MAVIC), which corrects Bellman backups at instruction boundaries by correcting the incoming instruction objective and restoring the continuation value under the current objective. Unlike reward shaping, MAVIC modifies the bootstrapping target itself, enabling consistent value estimation under stochastic instruction switching within a unified policy. We provide theoretical analysis and an actor-critic implementation, and show that MAVIC achieves high instruction compliance while preserving base task performance in increasingly complex cooperative multi-agent environments.
Ready from Day 1: Population-Aware Coordination for Large-Scale Constrained Multi-Agent Systems NeurIPS 2026
In large-scale multi-agent systems with shared resource constraints, an upstream planner must iteratively evaluate candidate resource plans -- assessing feasibility, aggregate response, and marginal cost -- before committing to one. Lagrangian relaxation separates local decisions through a broadcast cost signal, but the planner still needs the cost-to-utilization response map to explore plan space, and this map depends on population composition that changes across planning cycles. We propose \emph{population-aware coordination interfaces}: learned primal and dual maps, conditioned on compact population summaries, that the planner queries inside its iterative loop. The primal map predicts aggregate utilization under a proposed cost trajectory; the dual map predicts the cost trajectory for a target plan. By encoding response-relevant population structure, these maps remain reliable across evolving populations without per-cycle retraining, and support coordination of large populations from compact subsamples. We additionally cast Sim2Real transfer as a backtestable procedure, enabling evaluation before deployment. In a supply-chain capacity-control case study, population-aware interfaces reduce forecast error by 16--19\% and capacity violations by 20--51\% relative to population-unaware baselines under composition shift; 20K-agent cohorts support accurate coordination of 500K-agent populations; and simulator-trained primal maps achieve 11.1\% MAPE on real observations versus 13--24\% for baselines.
comment: 30 pages, 16 figures. Submitted to NeurIPS 2026
Learning to Communicate Locally for Large-Scale Multi-Agent Pathfinding
Multi-agent pathfinding (MAPF) is a widely used abstraction for multi-robot trajectory planning problems, where multiple homogeneous agents move simultaneously within a shared environment. Although solving MAPF optimally is NP-hard, scalable and efficient solvers are critical for real-world applications such as logistics and search-and-rescue. To this end, the research community has proposed various decentralized suboptimal MAPF solvers that leverage machine learning. Such methods frame MAPF (from a single agent perspective) as a Dec-POMDP where at each time step an agent has to decide an action based on the local observation and typically solve the problem via reinforcement learning or imitation learning. We follow the same approach but additionally introduce a learnable communication module tailored to enhance cooperation between agents via efficient feature sharing. We present the Local Communication for Multi-agent Pathfinding (LC-MAPF), a generalizable pre-trained model that applies multi-round communication between neighboring agents to exchange information and improve their coordination. Our experiments show that the introduced method outperforms the existing learning-based MAPF solvers, including IL and RL-based approaches, across diverse metrics in a diverse range of (unseen) test scenarios. Remarkably, the introduced communication mechanism does not compromise LC-MAPF's scalability, a common bottleneck for communication-based MAPF solvers.
MAC: Masked Agent Collaboration Boosts Large Language Model Medical Decision-Making
Large language models (LLMs) have proven effective in artificial intelligence, where the multi-agent system (MAS) holds considerable promise for healthcare development by achieving the collaboration of LLMs. However, the absence of a systematic pipeline for agent construction and the rigidity of static collaboration patterns render current MAS-based models vulnerable to collaboration failures, resulting in substantial performance degradation in medical decision-making scenarios. To this end, we propose a novel Masked Agent Collaboration (MAC) framework that harnesses Pareto-optimal agent construction and cross-consistency maximization mechanisms to achieve adaptive progressive propagation of collaborative information, boosting the medical decision-making capacity. Specifically, we first conduct a Pareto-frontier factors analysis towards the LLMs pool to consider their key factors, including the model size, inference time, diversity score, and throughput ratio, where we calculate the similarity between pairwise outputs within an LLM to derive its diversity score. Beyond this analysis, we enable the identification of Pareto-optimal models that balance efficiency and capability, which are subsequently selected as collaborative agents to consider the fundamental trade-offs inherent in practical LLM deployment. Afterward, we measure the pairwise similarity between the outputs from collaborative agents to determine their cross-consistency values, subsequently masking out the agent with the lowest cross-consistency value to eliminate the output that is likely semantically inconsistent. Finally, we conduct collaboration of agents by achieving adaptive progressive propagation, where each agent aggregates the outputs of unmasked agents from the previous layer as its input to generate the corresponding output via prompt engineering.
FLARE: Adaptive Multi-Dimensional Reputation for Robust Client Reliability in Federated Learning
Federated learning (FL) enables collaborative model training while preserving data privacy. However, it remains vulnerable to malicious clients who compromise model integrity through Byzantine attacks, data poisoning, or adaptive adversarial behaviors. Existing defense mechanisms rely on static thresholds and binary classification, failing to adapt to evolving client behaviors in real-world deployments. We propose FLARE, an adaptive reputation-based framework that transforms client reliability assessment from binary decisions to a continuous, multi-dimensional trust evaluation. FLARE integrates: (i) a multi-dimensional reputation score capturing performance consistency, statistical anomaly indicators, and temporal behavior, (ii) a self-calibrating adaptive threshold mechanism that adjusts security strictness based on model convergence and recent attack intensity, (iii) reputation-weighted aggregation with soft exclusion to proportionally limit suspicious contributions rather than eliminating clients outright, and (iv) a Local Differential Privacy (LDP) mechanism enabling reputation scoring on privatized client updates. We further introduce a highly evasive Statistical Mimicry (SM) attack, a benchmark adversary that blends honest gradients with synthetic perturbations and persistent drift to remain undetected by traditional filters. Extensive experiments with 100 clients on MNIST, CIFAR-10, and SVHN demonstrate that FLARE maintains high model accuracy and converges faster than state-of-the-art Byzantine-robust methods under diverse attack types, including label flipping, gradient scaling, adaptive attacks, ALIE, and SM. FLARE improves robustness by up to 16% and preserves model convergence within 30% of the non-attacked baseline, while achieving strong malicious-client detection performance with minimal computational overhead. https://github.com/Anonymous0-0paper/FLARE
comment: The authors want to withdraw this manuscript for further verification and revision. We may release a substantially revised version in the future
Dynamic Latent-Belief Synchrony through Collective Predictive Coding: A Computational Model of Parent--Infant Homeostatic Co-Regulation
Inter-brain synchrony (IBS) observed in real-time dyadic interactions, including parent--infant exchanges, suggests that two agents can align their internal representations through interaction. Yet computational accounts of how such alignment can arise between agents that have only local sensory access and asymmetric internal knowledge remain underdeveloped. We propose a constructive model of parent--infant homeostatic co-regulation that integrates a POMDP formulation of active interoceptive inference with the Metropolis--Hastings Naming Game (MHNG) derived from the Collective Predictive Coding (CPC) hypothesis. In our model, the parent and infant agents agree on homeostatic regulatory actions for the infant's visceral state through a shared communicative variable generated by a locally computable Metropolis--Hastings probability. The parent observes the infant through body-generated exteroceptive cues, whereas the infant directly senses its own visceral state through interoception. This difference in access modality is implemented as asymmetric generative-model knowledge: the parent knows how actions transform visceral states but must learn what the infant's bodily cues indicate, whereas the infant perceives its visceral state directly but must learn how actions affect it. We operationalize representational alignment as the Jensen--Shannon divergence between the two agents' latent representations. Notably, this alignment emerged far earlier than the convergence of the generative-model learning and was maintained across successive state transitions during social interactions, indicating that latent representational synchrony does not presuppose fully shared world models. These findings offer a minimal constructive account of internal state synchrony compatible with IBS reported in hyperscanning studies and support CPC as a candidate computational basis for inter-brain alignment.
comment: 11pages
STAR: Failure-Aware Markovian Routing for Multi-Agent Spatiotemporal Reasoning
Compositional spatiotemporal reasoning often requires a system to invoke multiple heterogeneous specialists, such as geometric, temporal, topological, and trajectory agents. A central question is how such a system should route among specialists when execution does not simply succeed or fail, but fails in qualitatively different ways. Existing tool-augmented and multi-agent LLM systems typically leave this routing decision implicit in language generation, making recovery ad hoc, difficult to interpret, and hard to optimize. This paper presents STAR (Spatio-Temporal Agent Router), a failure-aware routing framework that externalizes inter-agent control as a state-conditioned transition policy over the current agent, task type, and typed execution status. At the center of STARis an agent routing matrix that combines expert-specified nominal routes with recovery transitions learned from execution traces. Because the matrix conditions on distinct failure states, the router can respond differently to malformed outputs, missing dependencies, and tool--query mismatches, rather than collapsing them into a generic retry signal. Specialists execute through a tool-grounded extract--compute--deposit protocol and write intermediate results to a shared blackboard for downstream fusion. Results prove that retaining unsuccessful traces during training enlarges the support of the routing policy on error states, enabling recovery transitions that success-only training cannot represent. Across three spatiotemporal benchmarks and eight backbone LLMs, STAR improves over multiple baselines with the clearest gains on queries whose execution deviates from the nominal routing path. Router-specific ablations and recovery analyses further show that typed failure-aware routing, rather than specialist composition alone, is a key factor for these improvements.
comment: 30 pages, 13 figures
Social Theory Should Be a Structural Prior for Agentic AI: A Formal Framework for Multi-Agent Social Systems
Agentic AI systems are increasingly deployed not in isolation, but inside social environments populated by other agents and humans, such as in social media platforms, multi-agent LLM pipelines or autonomous robotics fleets. In these settings, system behavior emerges not from individual agents alone, but from the multi-agent interactions over time. Emergent dynamics of individuals in a social group have been long studied by social scientists in human contexts. \textbf{This position paper argues that agentic AI systems must be modeled with social theory as a structural prior, and formalizes a Multi-Agent Social Systems (MASS) framework for how agents interact and influence to generate system-level outcomes.} We represent MASS as a class of dynamical system of information generation, local influence and interaction structure, formulated by four structural priors anchored in social theory: strategic heterogeneity, networked-constrained dependence, co-evolution and distributional instability. We demonstrate the importance of each structural prior through formal propositions, and articulate a research agenda for how MASS should be modeled, evaluated and governed.
Distributed Quantum Gaussian Processes for Multi-Agent Systems AAMAS 2026
Gaussian Processes (GPs) are a powerful tool for probabilistic modeling, but their performance is often constrained in complex, large-scale real-world domains due to the limited expressivity of classical kernels. Quantum computing offers the potential to overcome this limitation by embedding data into exponentially large Hilbert spaces, capturing complex correlations that remain inaccessible to classical computing approaches. In this paper, we propose a Distributed Quantum Gaussian Process (DQGP) method in a multi-agent setting to enhance modeling capabilities and scalability. To address the challenging non-Euclidean optimization problem, we develop a Distributed consensus Riemannian Alternating Direction Method of Multipliers (DR-ADMM) algorithm that aggregates local agent models into a global model. We evaluate the efficacy of our method through numerical experiments conducted on a quantum simulator in classical hardware. We use real-world, non-stationary elevation datasets of NASA's Shuttle Radar Topography Mission and synthetic datasets generated by Quantum Gaussian Processes. Beyond modeling advantages, our framework highlights potential computational speedups that quantum hardware may provide, particularly in Gaussian processes and distributed optimization.
comment: 9 pages, 4 figures, accepted at AAMAS 2026 (International Conference on Autonomous Agents and Multiagent Systems)
Vulnerable Agent Identification in Large-Scale Multi-Agent Reinforcement Learning ICML 2026
Partial agent failure becomes inevitable when systems scale up, making it crucial to identify the subset of agents whose failure causes worst-case system performance degradations. We study this Vulnerable Agent Identification (VAI) problem in large-scale multi-agent reinforcement learning (MARL). We frame VAI as a Hierarchical Adversarial Decentralized Mean Field Control (HAD-MFC), where the upper level selects vulnerable agents as an NP-hard task and the lower level learns their worst-case adversarial policies via mean-field MARL. The two problems are coupled together, making HAD-MFC difficult to solve. To handle this, we first decouple the hierarchical process by Fenchel-Rockafellar transform, resulting a regularized mean-field Bellman operator for upper level that enables independent learning at each level, thus reducing computational complexity. We next reformulate the upper-level NP-hard problem as an MDP with dense rewards, allowing sequential identification of vulnerable agents via greedy and RL algorithms. This decomposition provably preserves the optimal solution. Experiments show our method effectively identifies more vulnerable agents in large-scale MARL and the rule-based system, fooling system into worse failures, and reveals the vulnerability of each agent in large systems. Code available at https://github.com/Waken-dream/VAI
comment: Accepted by ICML 2026
Courtroom-Style Multi-Agent Debate with Progressive RAG and Role-Switching for Controversial Claim Verification
Large language models (LLMs) remain unreliable for high-stakes claim verification due to hallucinations and shallow reasoning. While retrieval-augmented generation (RAG) and multi-agent debate (MAD) address this, they are limited by one-pass retrieval and unstructured debate dynamics. We propose a courtroom-style multi-agent framework, PROClaim, that reformulates verification as a structured, adversarial deliberation. Our approach integrates specialized roles (e.g., Plaintiff, Defense, Judge) with Progressive RAG (P-RAG) to dynamically expand and refine the evidence pool during the debate. Furthermore, we employ evidence negotiation, self-reflection, and heterogeneous multi-judge aggregation to enforce calibration, robustness, and diversity. In zero-shot evaluations on the Check-COVID benchmark, PROClaim achieves 81.7% accuracy, outperforming standard multi-agent debate by 10.0 percentage points, with P-RAG driving the primary performance gains (+7.5 pp). We ultimately demonstrate that structural deliberation and model heterogeneity effectively mitigate systematic biases, providing a robust foundation for reliable claim verification. Our code and data are publicly available at https://github.com/mnc13/PROClaim.
comment: Under review, 7 figures, 12 tables
A Benchmark for Multi-Party Negotiation Games from Real Negotiation Data
Many real-world multi-party negotiations unfold as sequences of binding, action-level commitments rather than a single final outcome, yet this regime remains under-studied in existing benchmarks. We introduce a benchmark and evaluation framework for this setting, combining a configurable negotiation game generator with document-grounded instances derived from a climate negotiation exercise. We also provide several baseline solvers. Exact evaluation on small games and comparative evaluation on larger instances show that no solver dominates across regimes; performance depends on the structural properties of the game. These results motivate the creation of novel negotiation methods that value partial commitments robustly across diverse strategic regimes. Code and data for the benchmark are available at: https://anonymous.4open.science/r/negotiation_MARL-46B8
Robust and Safe Multi-Agent Reinforcement Learning with Communication for Autonomous Vehicles: From Simulation to Hardware
Deep multi-agent reinforcement learning (MARL) has been demonstrated effectively in simulations for multi-robot problems. For autonomous vehicles, the development of vehicle-to-vehicle (V2V) communication technologies provide opportunities to further enhance system safety. However, zero-shot transfer of simulator-trained MARL policies to dynamic hardware systems remains challenging, and how to leverage communication and shared information for MARL has limited demonstrations on hardware. This problem is challenged by discrepancies between simulated and physical states, system state and model uncertainties, practical shared information design, and the need for safety guarantees in both simulation and hardware. This paper designs RSR-RSMARL, a novel Robust and Safe MARL framework that supports Real-Sim-Real (RSR) policy adaptation for multi-agent systems with communication among agents, with both simulation and hardware demonstrations. RSR-RSMARL leverages state (includes shared state information among agents) and action representations considering real system complexities for MARL formulation. The MARL policy is trained with robust MARL algorithm to enable zero-shot transfer to hardware considering the sim-to-real gap. A safety shield module using Control Barrier Functions (CBFs) provides safety guarantee for each individual agent. Experimental results on 1/10th-scale autonomous vehicles with V2V communication demonstrate the ability of RSR-RSMARL framework to enhance driving safety and coordination across multiple configurations. These findings emphasize the importance of jointly designing robust policy representations and modular safety architectures to enable scalable, generalizable RSR transfer in multi-agent autonomy.
comment: 15 pages, 5 Figures
Talk is Cheap, Communication is Hard: Dynamic Grounding Failures and Repair in Multi-Agent Negotiation
Grounding is the collaborative process of establishing mutual belief sufficient for a communicative goal. While static grounding maps language to a shared context, dynamic grounding requires agents to negotiate meaning across turns. Current multi-agent Large Language Model (LLM) benchmarks largely emphasize static, one-shot tasks, overlooking whether agents can repair grounding breakdowns through interaction. We introduce an iterated multi-turn negotiation game where two agents allocate shared resources to private projects with verifiable jointly optimal outcomes. Although individual agents can identify Pareto-optimal allocations in isolation, agent dyads consistently fail to reach them across models. We identify four failure modes: (1) loss of shared interaction history, (2) stubborn anchoring to early proposals, (3) defaulting to equal splits over reward-maximizing coordination, and (4) referential binding errors across turns. Our baselines show that the coordination gap is not explained by individual reasoning limits or insufficient information exchange alone. Instead, the bottleneck lies in dynamic grounding: joint plan formation, commitment, and execution.
Multi-Agent Decision-Focused Learning via Value-Aware Sequential Communication
Multi-agent coordination under partial observability requires agents to share complementary private information. While recent methods optimize messages for intermediate objectives (e.g., reconstruction accuracy or mutual information), rather than decision quality, we introduce \textbf{SeqComm-DFL}, unifying the sequential communication with decision-focused learning for task performance. Our approach features \emph{value-aware message generation with sequential Stackelberg conditioning}: messages maximize receiver decision quality and are generated in priority order, with agents conditioning on their predecessors. The \emph{guidance potential} determined by their prosocial ordering. We extend Optimal Model Design to communication-augmented world models with QMIX factorization, enabling efficient end-to-end training via implicit differentiation. We prove information-theoretic bounds showing that communication value scales with coordination gaps and establish $\mathcal{O}(1/\sqrt{T})$ convergence for the bilevel optimization, where $T$ denotes the number of training iterations. On collaborative healthcare and StarCraft Multi-Agent Challenge (SMAC) benchmarks, SeqComm-DFL achieves four to six times higher cumulative rewards and over 13\% win rate improvements, enabling coordination strategies inaccessible under information asymmetry.
comment: 9 pages, 2 figues, 1 table, neurips 2026
Systems and Control (EESS)
Towards Closed-loop Stability of Nonlinear Receding Horizon Games
We analyze Receding Horizon Games without any MPC-like terminal ingredients. We show that recursive feasibility can be inferred from the turnpike phenomenon under mild assumptions. Moreover, we prove sufficient conditions for practical asymptotic convergence of the closed-loop trajectories, and we discuss how the gap towards practical asymptotic stability may be closed. We use numerical examples to show that the closed-loop region of attraction around the steady-state GNE shrinks exponentially with the horizon length, a behavior previously known only for model predictive control. Further, we apply a linear end penalty and demonstrate in numerical simulations that it suppresses the leaving arc and ensures asymptotic convergence to the steady-state GNE.
Basilisk and Docker for Reproducible GN&C Simulation: A Workflow Reference
Basilisk is an open-source astrodynamics simulation framework widely used for spacecraft guidance, navigation, and control (GN&C) research and development. Despite its flexibility and computational capabilities, configuring Basilisk consistently across heterogeneous development environments presents practical challenges due to dependency management, operating system compatibility, and software configuration requirements. This paper presents a Docker-based containerization workflow for Basilisk that encapsulates the complete build environment, dependencies, and simulation infrastructure within a portable container image. The workflow is demonstrated through a progression of simulation scenarios of increasing complexity, from standalone orbital dynamics scripts to BSKSim-based attitude dynamics and control simulations with Monte Carlo analysis. The BSKSim class hierarchy, dynamics model architecture, flight software implementation, and scenario execution patterns are described in detail. The presented workflow provides a self-contained implementation reference for GN&C engineers and researchers seeking reproducible and portable Basilisk simulation environments. This work expands upon a workshop presentation delivered at the 46th Rocky Mountain AAS GN&C Conference, February 2024, available at https://doi.org/10.5281/zenodo.15008785.
comment: 21 pages, 8 figures
Estimation Problems and the Modulating Function Method: The Algebra of Modulating Functions
State and parameter estimation, along with fault detection, are three crucial estimation problems within the control systems community. Although different approaches have been proposed for each type of problem, the modulating function method proposes a more unified approach to all three problem classes, being used for state and parameter estimation of lumped systems, fault detection, and estimation of distributed and fractional systems. At the core of the method is the modulating function: a function that evaluates to 0 at the left or right boundaries up to a certain order of derivatives. By selecting the modulating functions, one directly determines the filter characteristics, and, for that reason, different function families have been proposed over the years. Nevertheless, many families of modulating functions are given in a rather similar mathematical structure. In light of these structures, this paper formally discusses the algebraic properties of modulating functions, and, after formalizing the closedness and group properties of modulating functions, a simple algorithm to construct new modulating functions is proposed, discussed, and illustrated with the construction of the newly introduced logarithmic modulating function families and 3 non-analytic modulating function families. Moreover, the fact that total modulating functions form a vector space and an algebra is exploited to construct orthonormal modulating functions, which are then used for the parameter estimation of a boat's roll dynamics, effectively avoiding matrix inversion issues.
comment: 13 pages, 6 figures
Neural Network-Based Virtual Wheel-Speed Sensor for Enhanced Low-Velocity State Estimation
Accurate wheel speed information is crucial for vehicle control and state estimation. Conventional sensors suffer from quantization and latency, especially at low velocities, while motor-speed signals in electric vehicles are distorted by drivetrain torsion. This work presents a neural-network-based virtual wheel-speed sensor that fuses wheel-speed and motor-speed signals to reduce errors from both sources. Validated on real-world Volkswagen ID.7 data, the real-time capable model achieves an error reduction of up to 85% compared to the production sensor and 47% compared to an optimized zero-phase filter, providing a smooth signal for driver-assistance functions. The results demonstrate robust generalization across diverse real-world maneuvers within the vehicle platform.
comment: Accepted for publication in the Proceedings of the 22nd IFAC World Congress, Busan, Republic of Korea, 2026
Efficient Learning of Affine and Rational Dependency LPV Models With Linear Fractional Representation
Identifying control-friendly models of nonlinear systems remains one of the major challenges at the intersection of system identification and control. The Linear Parameter-Varying (LPV) framework offers a promising solution, but existing identification methods often rely on model structures with affine scheduling dependency. Instead, this work proposes the use of LPV models with Linear Fractional Representation (LFR) admitting a rational scheduling-dependency, capable of modelling complex nonlinear systems with fewer scheduling variables compared to affine models. This work introduces a direct parameterization to ensure well-posedness of rational LPV-LFR models, which by joint-estimation of an LPV plant and scheduling map, using only input-output data, is capable of modelling complex nonlinear systems. Accuracy of the proposed approach is shown on two simulation examples.
comment: Accepted for IFAC WC 2026
Structured input-output analysis of oblique turbulent bands in Waleffe flow
This work employs structured input-output analysis (SIOA) to study Waleffe flow. The SIOA framework employs structured uncertainty to include the componentwise structure of nonlinearity in Navier-Stokes equations, and SIOA quantifies the flow response using structured singular values. The structured input-output analysis identifies the wavelength and inclination angle of oblique turbulent bands observed in large-domain direct numerical simulations. The structured input-output response scales over Reynolds number as $\sim Re^{1.7}$.
comment: 2 pages, 3 figures, accepted to LSU Symposium on Control, Learning, and Intelligent Systems 2026
Optimal State Preparation for Impulse Estimation in Gaussian Quantum Systems
We present an optimal control-based strategy to enhance the estimation of impulse-like disturbances in continuously monitored linear classical and quantum systems by exploiting non-equilibrium states. Using optimal estimation techniques for linear Gaussian systems to collect information from the temporal vicinity of the disturbance, we cast the minimization of disturbance estimation uncertainty as a nonlinear optimal control problem over time-dependent system parameters. The resulting method dynamically shapes the estimation covariances through parametric modulation, maximizing information gain at a known impulse time. This differs fundamentally from conventional squeezing protocols using periodic modulation that effectively degrade inference of impulse-like disturbances. Applied to nanomechanical resonators and levitated nanoparticles, optimal parametric driving reduces estimation variance by up to a factor of two relative to steady-state operation
comment: Accepted for presentation at IFAC-Worldcongress 2026
Learning What Matters: Adaptive Information-Theoretic Objectives for Robot Exploration
Designing learnable information-theoretic objectives for robot exploration remains challenging. Such objectives aim to guide exploration toward data that reduces uncertainty in model parameters, yet it is often unclear what information the collected data can actually reveal. Although reinforcement learning (RL) can optimize a given objective, constructing objectives that reflect parametric learnability is difficult in high-dimensional robotic systems. Many parameter directions are weakly observable or unidentifiable, and even when identifiable directions are selected, omitted directions can still influence exploration and distort information measures. To address this challenge, we propose Quasi-Optimal Experimental Design (Q{\footnotesize OED}), an adaptive information objective grounded in optimal experimental design. Q{\footnotesize OED} (i) performs eigenspace analysis of the Fisher information matrix to identify an observable subspace and select identifiable parameter directions, and (ii) modifies the exploration objective to emphasize these directions while suppressing nuisance effects from non-critical parameters. Under bounded nuisance influence and limited coupling between critical and nuisance directions, Q{\footnotesize OED} provides a constant-factor approximation to the ideal information objective that explores all parameters. We evaluate Q{\footnotesize OED} on simulated and real-world navigation and manipulation tasks, where identifiable-direction selection and nuisance suppression yield performance improvements of \SI{35.23}{\percent} and \SI{21.98}{\percent}, respectively. When integrated as an exploration objective in model-based policy optimization, Q{\footnotesize OED} further improves policy performance over established RL baselines.
Control of Fully Actuated Aerial Vehicles: A Comparison of Model-based and Sensor-based Dynamic Inversion
Fully actuated multirotor platforms decouple translational force generation from vehicle attitude, enabling independent control of position and orientation and shifting performance limitations from attitude authority to actuator dynamics and control effectiveness. This paper compares a model-based nonlinear dynamic inversion controller (geometric NDI) with a sensor-based incremental dynamic inversion controller (INDI) on a fixed-tilt fully actuated hexarotor. Both controllers share an identical outer-loop structure and are both executed at 500 Hz; therefore, performance differences can be attributed primarily to the inversion strategy. Controller performance is evaluated in five experiments covering attitude step tracking under nominal conditions and under a 50% mismatch in the rotor force coefficient, hover disturbance rejection under an external lateral load, waypoint tracking in the presence of wind gust disturbances, reduced control frequency, and injected sensor degradation. The results show that INDI offers clear advantages under parameter mismatch, gust disturbances, and sensor degradation, and maintains lower position errors across the controller-frequency sweep. However, its advantages are not universal: geometric NDI yields better attitude tracking at reduced control frequencies. To the authors' best knowledge, this work presents the first experimental validation of a full pose tracking INDI controller with decoupled translational and rotational dynamics. These findings highlight the trade-off between measurement-based and model-based inversion for robust control and rapid deployment of fully actuated UAVs.
sweap: Reactive Synthesis for Infinite-State Integer Problems
Recent years have seen a significant increase in the interest in reactive synthesis from specifications that relate to infinite state spaces. We present sweap, a tool for synthesis of infinite-state Linear Integer Arithmetic reactive systems. sweap implements a CEGAR approach, relying on state-of-the-art finite-state synthesis tools as black boxes to solve abstract synthesis problems. sweap supports most common input formalisms for infinite-state reactive-synthesis problems: Temporal Stream Logic Modulo Theories, Reactive Program Games, the bespoke input of the ISSY tool, and our own bespoke input. We present a mature version of sweap with novel features: a dual abstraction approach that improves its capabilities in proving unrealisability, support for nondeterministic and unbounded updates, more general initialization of variables, and equirealisable reductions for optimisation. Experimental evaluation shows that sweap outperforms its only competitor in this domain.
comment: to be published in proceedings of CAV 2026
Cooperative Robotics Reinforced by Collective Perception for Traffic Moderation
Collisions at non-line-of-sight (NLOS) intersections remain a major safety concern because drivers have limited visibility of approaching traffic. V2X based warnings can reduce these risks, yet many vehicles are not equipped with V2X and drivers may ignore in vehicle alerts. Collective perception (CP) can compensate for low V2X penetration by extending the awareness of connected vehicles, but it cannot influence unconnected vehicles. To fill this gap, our work introduces a complementary concept that adds a cooperative humanoid robot as an active traffic moderator capable of physically stopping a vehicle that attempts to merge into an unseen traffic stream. The system operates on two parallel perception pathways. A dual camera infrastructure unit detects the position, speed and motion of approaching vehicles and transmits this information to the robot as a collective perception message (CPM). The robot also receives cooperative awareness messages (CAM) from connected vehicles through its onboard V2X unit and can act as a relay for decentralized environmental notification messages (DENM) when safety events originate elsewhere along the road. A fusion module combines these streams to maintain a robust real time view of the main road. A Zone of Danger (ZoD) is defined and used to predict whether an approaching vehicle creates a collision risk for a merging road user. When such a risk is detected, the robot issues a human-like STOP gesture and blocks the merging path until the hazard disappears. The full system was deployed at the Future Mobility Park (FMP) in Rotterdam. Experiments show that the combined vision and V2X perception allows the robot to detect approaching vehicles early, predict hazards reliably and prevent unsafe merges in real world NLOS conditions.
comment: Accepted for publication in the Proceedings of the 2026 IEEE Vehicular Technology Conference (VTC2026-Spring)
KIND: A Kalman-Inspired Adaptive Estimator for SRF Cavity Detuning
Superconducting radio frequency cavities with a high quality factor enable energy-efficient accelerator operation but are very sensitive to mechanical disturbances that detune their resonance. Accurate detuning estimation is therefore essential for efficient resonance control and stable beam conditions. This paper introduces Kalman-Inspired Neural Decomposition (KIND), a data-driven estimator that fuses a Dynamic Mode Decomposition model for stationary modal behavior with a Transformer-based predictor for transient dynamics. KIND further outputs learned uncertainty signals that indicate regime changes, enabling anomaly detection. Using operational cavity data, we compare KIND with a classical Kalman filtering baseline and discuss its potential as a foundation for future uncertainty-aware, forecast-based control.
comment: Accepted for publication at the 2026 IFAC World Congress (IFAC 2026). 6 pages
Lane-Aware Graph Attention Network for Multi-Vehicle Trajectory Prediction in Expressway Merge Zones
Accurate multi-vehicle trajectory prediction in expressway merge and diverge areas is fundamental to the decision-making frameworks of autonomous vehicle systems. However, the majority of existing graph-based prediction models are developed and validated on mainline freeway segments and do not address the geometrically distinct interaction structures that characterize merge zones. Furthermore, standard evaluation protocols rely exclusively on displacement error metrics, leaving the safety consequences of predicted trajectories unquantified. This paper proposes a Lane-Aware Graph Attention Network (LA-GAT) that encodes vehicle interaction within dynamic scene graphs, augmented with a trainable lane-relationship attention bias that prioritizes merge-conflict interactions from the outset of training. The model is pre-trained on the raw NGSIM US-101 and I-80 datasets and subsequently fine-tuned on UAV-captured UTE SQM-W-1 trajectory data from a Chinese expressway merge area, with final evaluation on the held-out SQM-W-2 dataset. Evaluation spans both displacement metrics (ADE, FDE at 1s, 3s, 5s horizons) and surrogate safety measures (TTC violation rate, DRAC exceedance rate, collision rate). Fine-tuned results on SQM-W-2 yield ADE of 0.865 m at 1s and 2.518 m at 3s, demonstrating that drone-informed fine-tuning substantially reduces the cross-dataset transfer gap. The deliberate use of unfiltered NGSIM data is shown to characterize raw-condition generalization limits, with the performance degradation attributed to the well-documented measurement errors in that dataset.
Observer-Based Fixed-Time Nested Sliding-Mode Control for Tip-Position Regulation of a Single-Link Flexible Manipulator
This paper presents a novel position control strategy for a single-link flexible manipulator, tailored for applications where precise position must be achieved within strict time constraints. To accomplish this objective, firstly, a nested non-singular terminal sliding mode controller is designed for the system, enabling precise and robust control. Furthermore, a fixed-time sliding mode observer is designed to estimate unmeasured system states accurately in a fixed time, thereby enabling closed-loop control implementation. A stability analysis is presented to guarantee the robustness and efficacy of the proposed composite control algorithm. The effectiveness of the proposed fixed-time controller is demonstrated through numerical simulation on accuracy, stability, and convergence speed. The proposed controller's performance is also compared with that of other state-of-the-art control schemes. The proposed controller is further validated through experiments conducted on a real hardware setup.
Behavioral Integrity Verification for AI Agent Skills
Agent skills extend LLM agents with privileged third-party capabilities such as filesystem access, credentials, network calls, and shell execution. Existing safety work catches malicious prompts and risky runtime actions, but the skill artifact itself goes unverified. We formalize this as the behavioral integrity verification (BIV) problem: a typed set comparison between declared and actual capabilities over a shared taxonomy that bridges code, instructions, and metadata. The BIV framework instantiates this comparison by pairing deterministic code analysis with LLM-assisted capability extraction. The resulting structured evidence supports three downstream analyses: deviation taxonomy, root-cause classification, and malicious-skill detection. On 49,943 skills from the OpenClaw registry, the deviation taxonomy reveals a pervasive description-implementation gap: 80.0% of skills deviate from declared behavior, with four novel compound-threat categories surfaced. Root-cause classification finds that deviations are mostly oversight, not malice: 81.1% trace to developer oversight and 18.9% to adversarial intent, with 5.0% of skills carrying predicted multi-stage attack chains. On a 906-skill malicious-skill detection benchmark, BIV reaches an F1 of 0.946, outperforming state-of-the-art rule-based and single-pass LLM baselines. These results demonstrate behavioral integrity auditing for agent skills at scale.
GraphFlash: Enabling Fast and Elastic Graph Processing on Serverless Infrastructure
Graph processing systems are essential for analyzing large-scale data with complex relationships, yet most existing frameworks rely on statically provisioned clusters, resulting in poor elasticity and inefficient resource utilization under dynamic workloads. Serverless computing offers automatic scaling and fine-grained billing, but existing serverless graph systems suffer from performance limitations due to inefficient state management and high communication overhead through external storage. We present GraphFlash, a fast and elastic graph processing framework built on serverless infrastructure. GraphFlash adopts a subgraph-centric programming model and leverages shared external storage for coordination and communication, enabling stateless, fine-grained function execution. It supports two execution modes: rotating mode for resource-constrained environments and pinned mode for higher performance when resources are sufficient. To address serverless limitations, GraphFlash introduces system-level optimizations, including partition-aware key aggregation, intra-function partition co-location, and superstep-aware activation. Across multiple graph algorithms and datasets, GraphFlash outperforms existing serverless-compatible systems by up to 127x in execution time and reduces resource consumption by up to 98% under higher-resource configurations, while matching the performance of traditional distributed frameworks on large workloads. Even with limited resources, it achieves up to 48x speedup and 99.97% cost reduction over prior serverless solutions, demonstrating that GraphFlash makes serverless graph processing practical and performant.
Hierarchical LLM-Driven Control for HAPS-Assisted UAV Networks: Joint Optimization of Flight and Connectivity
Uncrewed aerial vehicles (UAVs) are increasingly deployed in complex networked environments, yet the joint optimization of multi-UAV motion control and connectivity remains a fundamental challenge. In this paper, we study a multi-UAV system operating in an integrated terrestrial and non-terrestrial network (ITNTN) comprising terrestrial base stations and high-altitude platform stations (HAPS). We consider a three-dimensional (3D) aerial highway scenario where UAVs must adapt their motion to ensure collision avoidance, efficient traffic flow, and reliable communication under dynamic and partially observable conditions. We first model the problem as a hierarchical multi-objective partially observable Markov decision process (H-MO-POMDP), capturing the coupling between control and communication objectives. Based on this formulation, we propose a large language model (LLM)-driven hierarchical multi-rate control framework. At the global level, an LLM-based controller on the HAPS performs long-term planning for load balancing and handover decisions. At the local level, each UAV employs a hybrid controller that integrates a slow-timescale LLM for high-level spatial reasoning with a reinforcement learning agent for faster UAV-to-infrastructure (U2I) communication and motion control. We further develop a high-fidelity 3D simulation platform by integrating the gym-pybullet-drones environment with 3GPP-compliant RF/THz channel models. Numerical results demonstrate that the proposed framework significantly outperforms state-of-the-art baselines, achieving a 14% increase in transportation efficiency and a 25% improvement in telecommunication throughput. Additionally, it achieves a 23% reduction in physical collision rates, demonstrating strong handover stability and zero-shot generalization in dynamic scenarios.
comment: Submission for possible publication
Beyond Prediction: Interval Neural Networks for Uncertainty-Aware System Identification
System identification (SysID) is critical for modeling dynamical systems from experimental data, yet traditional approaches often fail to capture nonlinear behaviors. While deep learning offers powerful tools for modeling such dynamics, incorporating uncertainty quantification is essential to ensure reliable predictions. This paper presents a systematic framework for constructing and training interval Neural Networks (INNs) for uncertainty-aware SysID. By extending crisp neural networks into interval counterparts, we develop Interval LSTM and NODE models that propagate uncertainty through interval arithmetic without probabilistic assumptions. This design allows them to represent uncertainty and produce prediction intervals. For training, we propose two strategies: Cascade INN (C-INN), a two-stage approach converting a trained crisp NN into an INN, and Joint INN (J-INN), a one-stage framework jointly optimizing prediction accuracy and interval precision. Both strategies employ uncertainty-aware loss functions and parameterization tricks to ensure reliable learning. Comprehensive experiments on multiple SysID datasets demonstrate the effectiveness of both approaches and benchmark their performance against well-established uncertainty-aware baselines: C-INN achieves superior point prediction accuracy, whereas J-INN yields more accurate and better-calibrated prediction intervals. Furthermore, to reveal how uncertainty is represented across model parameters, the concept of channel-wise elasticity is introduced, which is used to identify distinct patterns across the two training strategies. The results of this study demonstrate that the proposed framework effectively integrates deep learning with uncertainty-aware modeling.
comment: Under review
Experimental Examination of Secure Two-Party Controller Computation
A secure two-party computation protocol for running dynamic controllers over secret sharing has recently been proposed. Unlike encrypted control schemes based on homomorphic encryption, this protocol enables operating dynamic controllers for an infinite time horizon without controller-state decryption, controller-state reset, or input re-encryption. However, the two-party setting introduces additional online communication between the computing parties, which may hinder real-time feasibility. In this study, we demonstrate the feasibility of the protocol through implementation on a commercial cloud platform with an inverted pendulum testbed. Experimental results show that the proposed protocol successfully stabilized the pendulum despite the online communication overhead.
comment: 6 pages, 5 figures
Optimal excitation and measurement patterns for networks with tree topology
In this work we evaluate the excitation and measurement patterns (EMP) for networks with tree topology. We investigate guidelines for the selection of the minimal EMPs, i.e. those with the least number of excited and measured nodes combined, for which the accuracy obtained, in terms of the trace of the asymptotic covariance matrix, is optimal. We introduce the concept of partial information matrix as a means to systematically obtain the information matrix for any dynamic network. For a specific tree class, called cross, we show that the accuracy of a particular module depends on the magnitude of the parameters to be estimated. Furthermore, when all factors are equal, it is best to excite. %we show that for small magnitudes of this parameter, it is best to excite. We extend a topological condition for branches under which the accuracy of a particular module of the network is independent of the other parameters from the tree. We provide a numerical analysis showing that our guidelines could be used as a selection tool for minimal EMPs for tree networks.
comment: Accepted for presentation at the 2026 IFAC World Congress, Busan, South Korea
Dynamic Transaction Scheduling and Pricing in the Ethereum Mempool
The Ethereum blockchain utilizes the EIP-1559 algorithm to manage transaction inclusion and block assembly. However, EIP-1559 and much of the existing literature study this problem from a static perspective, focusing on price evolution without modelling transaction dynamics within the mempool. Motivated by this limitation, we study a dynamic transaction scheduling problem in which transactions with heterogeneous sizes and per-unit values arrive over time and remain in the mempool until scheduled. To capture the stochastic mempool evolution, we formulate the problem as a Markov Decision Process (MDP) whose state represents the mempool configuration and whose actions correspond to block prices. We first provide a primal-dual interpretation of the static EIP-1559 mechanism, showing that block prices arise naturally as dual variables of a social-welfare maximization problem. Building on this perspective, we extend the framework to the dynamic setting and formulate an objective that maximizes long-run discounted reward while incorporating holding costs and overshoot penalties. We then employ a Natural Policy Gradient (NPG) algorithm to compute the optimal policy. Our results show that dynamic pricing stabilizes the mempool while maximizing long-run discounted reward. In particular, as the overshoot penalty increases, the average scheduled transaction volume converges to the target block capacity, and the resulting NPG updates closely resemble the EIP-1559 price update rule. Finally, we study two special cases of the MDP formulation: homogeneous transactions and uniform arrivals. In the homogeneous setting, where the protocol directly controls scheduled volume, we show that the optimal policy has a threshold structure. We then propose a bang-bang pricing mechanism for uniform arrivals and derive a lower bound on the block capacity needed to ensure system stability.
Identifying the nonlinear string dynamics with port-Hamiltonian neural networks
Hybrid machine learning combines physical knowledge with data-driven models to enhance interpretability and performance. In this context, Port-Hamiltonian Systems (PHS), which generalize Hamiltonian mechanics to describe open, non-autonomous dynamical systems, have been successfully integrated with neural networks under the name Port-Hamiltonian Neural Networks (PHNNs). While the ability of PHNNs to identify Hamiltonian ordinary differential equation (ODE) systems has already been demonstrated, their application to learning Hamiltonian partial differential equation (PDE) systems remains largely unexplored. This limitation restricts their use in musical acoustics, where instruments are typically modeled as distributed parameter systems governed by PDEs. In this work, we demonstrate how to learn the nonlinear string dynamics from data in a physically-consistent framework through a PHNN extension to PDEs. By constructing structured neural network architectures based on PHS, we can recover both the Hamiltonian governing the string and the dissipation affecting it. This approach outperforms baseline, non-physics-informed methods in terms of both accuracy and interpretability. Numerical experiments using synthetic data demonstrate the ability of the proposed PHNN model to identify and emulate the nonlinear dynamics of the system.
Safe and Energy-Aware Decentralized PDE-Constrained Optimization-Based Control of Multi-UAVs for Persistent Wildfire Suppression
This paper presents a safe and energy-aware optimization-based control framework for multi-UAV wildfire suppression under localization and motion uncertainties. We first develop a centralized density-based controller that couples UAV motion and water deployment in a wildfire-specific control Lyapunov function. This framework is then extended to a decentralized setting suitable for large-scale operations using only local information. The controllers use control barrier function constraints to enforce both danger zone avoidance and the ability to reach a charging region. Simulations and real quadcopter experiments demonstrate the controller's effectiveness in fire suppression while preserving safety and energy sufficiency over multiple charge cycles.
Adaptive Smooth Tchebycheff Attention for Multi-Objective Policy Optimization
Multi-objective reinforcement learning in robotic domains requires balancing complex, non-convex trade-offs between conflicting objectives. While linear scalarization methods provide stability, they are theoretically incapable of recovering solutions within non-convex regions of the Pareto front. Conversely, static non-linear scalarizations (e.g., Tchebycheff) can theoretically access these regions but often suffer from severe gradient variance and optimization instability in deep RL. In this work, we propose an Adaptive Smooth Tchebycheff framework that resolves this tension by dynamically modulating the curvature of the optimization landscape. We introduce a novel conflict-driven controller that regulates the optimization smoothness based on real-time gradient interference. This allows the agent to anneal toward precise, non-convex scalarization when objectives align, while elastically reverting to stable, smooth approximations when destructive gradient conflicts emerge. We validate our approach on a challenging robotic stealth visual search task -- a proxy for monitoring of protected/fragile ecosystems -- where an agent must balance search, exposure/interference minimization and exploration speed. Extensive ablations confirm that our conflict-aware adaptation enables the robust discovery of Pareto-optimal policies in non-convex regions inaccessible to linear baselines and unstable for static non-linear methods. Website: https://alejandromllo.github.io/research/pasta/
comment: To appear in the Proceedings of Robotics: Science and Systems (RSS) 2026
Grid-Orch: An LLM-Powered Orchestrator for Distribution Grid Simulation and Analytics
The power distribution engineering workforce faces a projected shortage of up to 1.5 million engineers by 2030, creating urgent demand for more accessible analysis tools. This paper introduces Grid-Orch, a framework that bridges Large Language Models (LLMs) and power system simulation through the Model Context Protocol (MCP), enabling engineers to perform complex distribution analyses via natural language. Using OpenDSS as the reference implementation, Grid-Orch provides 36 domain-specific tools across eleven categories, covering power flow, voltage analysis, quasi-static time series (QSTS) simulation, and automated optimization. A provider-agnostic LLM layer supports both cloud-hosted (Gemini, Claude) and locally deployed (Ollama, llama-cpp) models, enabling air-gapped operation for security-sensitive utility environments. Three optimization skills, capacitor placement, voltage violation analysis, and overvoltage mitigation, extend the platform beyond single-tool queries to multi-step engineering workflows. Grid-Orch is delivered as an interactive web platform with chat-based interaction, a QSTS dashboard, and feeder topology visualization, and renders simulation results inline. Workflow demonstrations show that distribution analyses formerly requiring hours of scripting, such as distributed energy resource (DER) interconnection screening, complete in under two minutes through natural language, producing numerically identical results to direct OpenDSS scripting.
Demystifying Deep Reinforcement Learning: A Neuro-Symbolic Framework for Interpretable Open RAN Automation
Open Radio Access Networks (O-RAN) are increasingly adopting data-driven control through Deep Reinforcement Learning (DRL) to optimize complex tasks such as network slicing and mobility management. However, the deployment of DRL in carrier-grade networks is hindered by its inherent opacity and stochastic execution, which limit operator trust, auditability, and safe deployment. Existing explainable AI (XAI) approaches primarily provide post-hoc insights and fail to produce executable, interpretable policies suitable for operational environments. In this paper, we present DeRAN, a neuro-symbolic framework that bridges the gap between DRL performance and operational transparency by distilling black-box DRL policies into human-readable symbolic representations. DeRAN introduces a concept-driven abstraction layer that transforms high-dimensional network telemetry into a compact set of semantically meaningful features, enabling interpretable policy learning. Building on the semantically grounded concepts, DeRAN synthesizes symbolic policies using deep symbolic regression (DSR) for continuous control and neurally guided differentiable logic (NUDGE) for discrete decision-making. We implement DeRAN on a live 5G O-RAN testbed and evaluate it on two representative use cases. Experimental results demonstrate that DeRAN achieves 78% and 87% of DRL's cumulative rewards in the two use cases, while offering interpretability and auditability by design. Source code is available at https://github.com/Jadejavu/DeRAN
Disturbance-adaptive Model Predictive Control for Bounded Average Constraint Violations
This paper considers stochastic linear time-invariant systems subject to constraints on the average number of state-constraint violations over time without knowing the disturbance distribution. We present a novel disturbance-adaptive model predictive control (DAD-MPC) framework, which adjusts the disturbance model based on measured constraint violations. Using a robust invariance method, DAD-MPC ensures recursive feasibility and guarantees asymptotic or robust bounds on average constraint violations. Additionally, the bounds hold even with an inaccurate disturbance model, which allows for data-driven disturbance quantification methods to be used, such as conformal prediction. Simulation results demonstrate that the proposed approach reduces closed-loop cumulative cost compared to state-of-the-art methods across different target violation rates, while satisfying average violation bounds.
comment: Extended version of accepted paper for IFAC World Congress 2026
Leveraging Digital Twin Technologies: All-Photonics Networks-as-a-Service for Data Center Xchange in the Era of AI [Invited Tutorial]
This paper presents a data center exchange (Data Center Xchange, DCX) architecture for all-photonics networks-as-a-service in distributed data center infrastructures, enabling the creation of a virtual large-scale data center by directly interconnecting distributed data centers in metropolitan areas. Key requirements for such an architecture are identified: support for low-latency operations, scalability, reliability, and flexibility within a single network architecture; the ability to add new operator-driven automation functionalities based on an open networking approach; and the ability to control and manage remotely deployed transponders connected via access links with unknown physical parameters. We propose a set of technologies that enable digital twin operations for optical networks, including a cloud-native architecture for coherent transceivers, remote transponder control, fast end-to-end optical path provisioning, transceiver-based physical-parameter estimation incorporating digital longitudinal monitoring, and optical line system calibration, demonstrating their feasibility through field validations.
Renewable-Colocated Green Hydrogen Production: Optimal Scheduling and Profitability
We study the optimal green hydrogen production and energy market participation of a renewable-colocated hydrogen producer (RCHP) that utilizes onsite renewable generation for both hydrogen production and grid services. Under deterministic and stochastic profit-maximization frameworks, we analyze RCHP's multiple market participation models and derive closed-form optimal scheduling policies that dynamically allocate renewable energy to hydrogen production and electricity export to the wholesale market. Analytical characterizations of the RCHP's operating profit and the optimal sizing of renewable and electrolyzer capacities are obtained. We use real-time renewable generation and electricity price data from three independent system operators to evaluate the impacts of market prices and environmental policies on RCHP's profitability.
Robustness Certificates for Neural Networks against Adversarial Attacks
The increasing use of machine learning in safety-critical domains amplifies the risk of adversarial threats, especially data poisoning attacks that corrupt training data to degrade performance or induce unsafe behavior. Most existing defenses lack formal guarantees or rely on restrictive assumptions about the model class, attack type, extent of poisoning, or point-wise certification, limiting their practical reliability. This paper introduces a principled formal robustness certification framework that models gradient-based training as a discrete-time dynamical system (dt-DS) and formulates poisoning robustness as a formal safety verification problem. By adapting the concept of barrier certificates (BCs) from control theory, we introduce sufficient conditions to certify a robust radius ensuring that the terminal model remains safe under worst-case ${\ell}_p$-norm based poisoning. To make this practical, we parameterize BCs as neural networks trained on finite sets of poisoned trajectories. We further derive probably approximately correct (PAC) bounds by solving a scenario convex program (SCP), which yields a confidence lower bound on the certified robustness radius generalizing beyond the training set. Importantly, our framework also extends to certification against test-time attacks, making it the first unified framework to provide formal guarantees in both training and test-time attack settings. Experiments on MNIST, SVHN, and CIFAR-10 show that our approach certifies non-trivial perturbation budgets while being model-agnostic and requiring no prior knowledge of the attack or contamination level.
Deep Reinforcement Learning Approach to QoSAware Load Balancing in 5G Cellular Networks under User Mobility and Observation Uncertainty
Efficient mobility management and load balancing are critical to sustaining Quality of Service (QoS) in dense, highly dynamic 5G radio access networks. We present a deep reinforcement learning framework based on Proximal Policy Optimization (PPO) for autonomous, QoS-aware load balancing implemented end-to-end in a lightweight, pure-Python simulation environment. The control problem is formulated as a Markov Decision Process in which the agent periodically adjusts Cell Individual Offset (CIO) values to steer user-cell associations. A multi-objective reward captures key performance indicators (aggregate throughput, latency, jitter, packet loss rate, Jain's fairness index, and handover count), so the learned policy explicitly balances efficiency and stability under user mobility and noisy observations. The PPO agent uses an actor-critic neural network trained from trajectories generated by the Python simulator with configurable mobility (e.g., Gauss-Markov) and stochastic measurement noise. Across 500+ training episodes and stress tests with increasing user density, the PPO policy consistently improves KPI trends (higher throughput and fairness, lower delay, jitter, packet loss, and handovers) and exhibits rapid, stable convergence. Comparative evaluations show that PPO outperforms rule-based ReBuHa and A3 as well as the learning-based CDQL baseline across all KPIs while maintaining smoother learning dynamics and stronger generalization as load increases. These results indicate that PPO's clipped policy updates and advantage-based training yield robust, deployable control for next-generation RAN load balancing using an entirely Python-based toolchain.
Host-Aware Control of Gene Expression using Data-Enabled Predictive Control
Cybergenetic gene expression control in bacteria enables applications in engineering biology, drug development, and biomanufacturing. AI-based controllers offer new possibilities for real-time, single-cell-level regulation but typically require large datasets and re-training for new systems. Data-enabled Predictive Control (DeePC) offers better sample efficiency without prior modelling. We apply DeePC to a system with two inputs (optogenetic control and media concentration) and two outputs (expression of gene of interest and host growth rate). Using basis functions to address nonlinearities, we demonstrate that DeePC remains robust to parameter variations and performs among the best control strategies while using the least data.
Competitor-aware Race Management for Electric Endurance Racing SC 2026
Electric endurance racing is characterized by severe energy constraints and strong aerodynamic interactions. Determining race-winning policies therefore becomes a fundamentally multi-agent, game-theoretic problem. These policies must jointly govern low-level driver inputs as well as high-level strategic decisions, including energy management and charging. This paper proposes a bi-level framework for competitor-aware race management that combines game-theoretic optimal control with reinforcement learning. At the lower level, a multi-agent game-theoretic optimal control problem is solved to capture aerodynamic effects and asymmetric collision-avoidance constraints inspired by motorsport rules. Using this single-lap problem as the environment, reinforcement learning agents are trained to allocate battery energy and schedule pit stops over an entire race. The framework is demonstrated in a two-agent, 45-lap simulated race. The results show that effective exploitation of aerodynamic interactions is decisive for race outcome, with strategies that prioritize finishing position differing fundamentally from single-agent, minimum-time approaches.
comment: 8 pages, 6 figures, accepted for presentation at ITSC 2026
Estimating Hormone Concentrations in the Pituitary-Thyroid Feedback Loop from Irregularly Sampled Measurements
Model-based control techniques have recently been investigated for the recommendation of medication dosages to address thyroid diseases. These techniques often rely on knowledge of internal hormone concentrations that cannot be measured from blood samples. Moreover, the measurable concentrations are typically only obtainable at irregular sampling times. In this work, we empirically verify a notion of sample-based detectability that accounts for irregular sampling of the measurable concentrations on two pituitary-thyroid loop models representing patients with hypo- and hyperthyroidism, respectively, and include the internal concentrations as states. We then implement sample-based moving horizon estimation for the models, and test its performance on virtual patients across a range of sampling schemes. Our study shows robust stability of the estimator across all scenarios, and that more frequent sampling leads to less estimation error in the presence of model uncertainty and misreported dosages.
comment: 8 pages; This work has been accepted for presentation at the 23rd IFAC World Congress 2026
Provably-Correct Safety Protocol for Cooperative Platooning
Cooperative Adaptive Cruise Control (CACC) is a well-studied technology for forming string-stable vehicle platoons. Ensuring collision avoidance is particularly difficult in CACC due to the small desired inter-vehicle spacing. We propose a safety protocol preventing collisions in a provably-correct manner while still maintaining a small distance to the preceding vehicle, by utilizing communicated braking capabilities. In addition, the safety of the protocol is ensured despite possible communication failures. While our concept can be applied to any CACC system, we particularly consider a class of CACCs, where the platoon vehicles successively agree on a consensus behavior. Our safety protocol is evaluated on various scenarios using the CommonRoad benchmark suite.
comment: Copyright 2024 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works
Hierarchical parameter estimation for distributed networked systems: a dynamic consensus approach
This work introduces a novel two-stage distributed framework to globally estimate constant parameters in a networked system, separating shared information from local estimation. The first stage uses dynamic average consensus to aggregate agents' measurements into surrogates of centralized data. Using these surrogates, the second stage implements a local estimator to determine the parameters. By designing an appropriate consensus gain, the persistence of excitation of the regressor matrix is achieved, and thus, exponential convergence of a local Gradient Estimator (GE) is guaranteed. The framework facilitates its extension to switched network topologies, quantization, and the heterogeneous substitution of the GE with a Dynamic Regressor Extension and Mixing (DREM) estimator, which supports relaxed excitation requirements.
Active inference as a unified model of collision avoidance behavior in human drivers
Collision avoidance -- involving a rapid threat detection and quick execution of the appropriate evasive maneuver -- is a critical aspect of driving. However, existing models of human collision avoidance behavior are fragmented, focusing on specific scenarios or only describing certain aspects of the avoidance behavior, such as response times. This paper addresses these gaps by proposing a novel computational cognitive model of human collision avoidance behavior based on active inference. Active inference provides a unified approach to modeling human behavior: the minimization of free energy. Building on prior active inference work, our model incorporates established cognitive mechanisms such as evidence accumulation to simulate human responses in two distinct collision avoidance scenarios: front-to-rear lead vehicle braking and lateral incursion by an oncoming vehicle. We demonstrate that our model explains a wide range of previous empirical findings on human collision avoidance behavior. Specifically, the model closely reproduces both aggregate results from meta-analyses previously reported in the literature and detailed, scenario-specific effects observed in a recent driving simulator study, including response timing, maneuver selection, and execution. Our results highlight the potential of active inference as a unified framework for understanding and modeling human behavior in complex real-life driving tasks.
Least Costly Space-Filling Experiment Design for the Identification of a Nonlinear System
The quality of an estimated nonlinear model highly depends on the data quality that was used for the system identification. By using a Gaussian Process-based optimal input design approach, a so-called space-filling dataset can be generated in the feature space of the system model. The design method is applicable for a broad type of signals and models and also incorporates information measures through optimality criteria into the signal design. However, the resulting input design can be costly to apply to the real system. The goal of this paper is to propose a space-filling input design that can minimize the experimentation cost in terms of a user defined measure, while still guaranteeing a prescribed level of space-fillingness. Through a Monte Carlo simulation study we demonstrate that the proposed method can appropriately shape the excitation signal to significantly reduce the experimental cost while the identified model performance remains adequate.
SAGAS: Semantic-Aware Graph-Assisted Stitching for Offline Temporal Logic Planning
Linear Temporal Logic (LTL) provides a rigorous framework for specifying long-horizon robotic tasks, yet existing approaches face a trade-off: model-based synthesis relies on accurate labeled transition systems, whereas learning-based methods often require online interaction, task-specific rewards, or specification-conditioned training. We study LTL-specified robotic planning and execution in a stricter offline, model-free setting, where the agent is given only fixed, task-agnostic trajectory fragments, with no dynamics model, task demonstrations, or online data collection. To address this setting, we propose SAGAS, a framework that combines the compositionality of symbolic synthesis with the data-driven reachability structure learned from offline trajectories. SAGAS first learns a reusable latent reachability graph and a frozen goal-conditioned executor from fragmented offline data. For each new LTL formula, it performs task-time semantic graph augmentation to ground state-defined propositions on the learned graph, and applies Büchi product search to synthesize a cost-aware accepting prefix--suffix waypoint plan executed by the frozen executor. By shifting formula-specific reasoning from policy learning to test-time graph augmentation and symbolic search, SAGAS enables zero-shot generalization to unseen, data-supported LTL specifications without task-specific reward design, policy retraining, or online interaction. Experiments on LTL task suites constructed from OGBench locomotion domains show that this design produces executable and cost-efficient prefix--suffix behaviors for diverse unseen LTL tasks from fragmented offline data.
Correct-by-Design Control Synthesis of Stochastic Multi-agent Systems: a Robust Tensor-based Solution
Discrete-time stochastic systems with continuous spaces are hard to verify and control, even with MDP abstractions due to the curse of dimensionality. We propose an abstraction-based framework with robust dynamic programming mappings that deliver control strategies with provable lower bounds on temporal-logic satisfaction, quantified via approximate stochastic simulation relations. Exploiting decoupled dynamics, we reveal a Canonical Polyadic Decomposition tensor structure in value functions that makes dynamic programming scalable. The proposed method provides correct-by-design probabilistic guarantees for temporal logic specifications. We validate our results on continuous-state linear stochastic systems.
Proximal observers for secure state estimation
This paper discusses a general framework for designing robust state estimators for a class of discrete-time nonlinear systems. We consider systems that may be impacted by impulsive (sparse but otherwise arbitrary) measurement noise sequences. We show that a family of state estimators, robust to this type of undesired signal, can be obtained by minimizing a class of nonsmooth convex functions at each time step. The resulting state observers are defined through proximal operators. We obtain a nonlinear implicit dynamical system in term of estimation error and prove, in the noise-free setting, that it vanishes asymptotically when the minimized loss function and the to-be-observed system enjoy appropriate properties. From a computational perspective, even though the proposed observers can be implemented via efficient numerical procedures, they do not admit closed-form expressions. The paper argues that by adopting appropriate relaxations, simple and fast analytic expressions can be derived.
comment: 17 pages, 6 figures
Priority-Driven Control and Communication in Decentralized Multi-Agent Systems via Reinforcement Learning
Event-triggered control provides a mechanism for avoiding excessive use of constrained communication bandwidth in networked multi-agent systems. However, most existing methods rely on accurate system models, which may be unavailable in practice. In this work, we propose a model-free, priority-driven reinforcement learning algorithm that learns communication priorities and control policies jointly from data in decentralized multi-agent systems. By learning communication priorities, we circumvent the hybrid action space typical in event-triggered control with binary communication decisions. We evaluate our algorithm on benchmark tasks and demonstrate that it outperforms the baseline method.
comment: Accepted to the 23rd IFAC World Congress
Feasible Force Set Shaping for a Payload-Carrying Platform Consisting of Tiltable Multiple UAVs Connected Via Passive Hinge Joints
This paper presents a method for shaping the feasible force set of a payload-carrying platform composed of multiple Unmanned Aerial Vehicles (UAVs) and proposes a control law that leverages the advantages of this shaped force set. The UAVs are connected to the payload through passively rotatable hinge joints. The joint angles are controlled by the differential thrust produced by the rotors, while the total force generated by all the rotors is responsible for controlling the payload. The shape of the set of the total force depends on the tilt angles of the UAVs, which allows us to shape the feasible force set by adjusting these tilt angles. This paper aims to ensure that the feasible force set encompasses the required shape, enabling the platform to generate force redundantly -meaning in various directions. We then propose a control law that takes advantage of this redundancy.
comment: This work has been accepted to IFAC for publication under a Creative Commons Licence CC-BY-NC-ND
Transformer-Based Autonomous Driving Models and Deployment-Oriented Compression: A Survey
Transformer-based models are becoming a central paradigm in autonomous driving because they can capture long-range spatial dependencies, multi-agent interactions, and multimodal context across perception, prediction, and planning. At the same time, their deployment in real vehicles remains difficult because high-capacity attention-based architectures impose substantial latency, memory, and energy overhead. This survey reviews representative Transformer-based autonomous driving models and organizes them by task role, sensing configuration, and architectural design. More importantly, it examines these models from a deployment-oriented perspective and analyzes how efficiency constraints reshape model design choices in practice. We further review compression and acceleration strategies relevant to Transformer-based driving systems, including quantization, pruning, knowledge distillation, low-rank approximation, and efficient attention, and discuss their benefits, limitations, and task-dependent applicability. Rather than treating compression as an isolated post-processing step, we highlight it as a system-level design consideration that directly affects deployability, robustness, and safety. Finally, we identify open challenges and future research directions toward standardized, safety-aware, and hardware-conscious evaluation of efficient autonomous driving systems.
Picasso: Holistic Scene Reconstruction with Physics-Constrained Sampling
In the presence of occlusions and measurement noise, geometrically accurate scene reconstructions -- which fit the sensor data -- can still be physically incorrect. For instance, when estimating the poses and shapes of objects in the scene and importing the resulting estimates into a simulator, small errors might translate to implausible configurations including object interpenetration or unstable equilibrium. This makes it difficult to predict the dynamic behavior of the scene using a digital twin, an important step in simulation-based planning and control of contact-rich behaviors. In this paper, we posit that object pose and shape estimation requires reasoning holistically over the scene (instead of reasoning about each object in isolation), accounting for object interactions and physical plausibility. Towards this goal, our first contribution is Picasso, a physics-constrained reconstruction pipeline that builds multi-object scene reconstructions by considering geometry, non-penetration, and physics. Picasso relies on a fast rejection sampling method that reasons over multi-object interactions, leveraging an inferred object contact graph to guide samples. Second, we propose the Picasso dataset, a collection of 10 contact-rich real-world scenes with ground truth annotations, as well as a metric to quantify physical plausibility, which we open-source as part of our benchmark. Finally, we provide an extensive evaluation of Picasso on our newly introduced dataset and on the YCB-V dataset, and show it largely outperforms the state of the art while providing reconstructions that are both physically plausible and more aligned with human intuition.
comment: 15 pages, accepted to Robotics: Science and Systems (RSS) 2026
A Review On Safe Reinforcement Learning Using Lyapunov and Barrier Functions
Reinforcement learning (RL) has proven to be particularly effective in solving complex decision-making problems for a wide range of applications. Safe reinforcement learning refers to a class of constrained problems where the constraint violations lead to partial or complete system failure. The goal of this review is to provide an overview of safe RL techniques using Lyapunov and barrier functions to guarantee this notion of safety (stability of the system in terms of a computed policy and constraint satisfaction during training and deployment). Three concrete takeaways emerge from our analysis: (i) the field has shifted decisively from model-based to model-free formulations since 2017, with combined CLF-CBF approaches becoming the most active sub-area post-2022; (ii) per-class open problems are now well-defined, certificate validity under function approximation and distribution shift for Lyapunov methods, feasibility and deadlock under hard CBF-QP shielding for barrier methods, and joint CLF--CBF feasibility under model uncertainty for combined methods; and (iii) deployment to high-dimensional and partially observable settings remains the dominant scalability barrier across all three classes. The different approaches employed are discussed in detail along with their shortcomings and benefits to provide critique and possible future research directions. The review demonstrates promising scope for providing safety guarantees for complex dynamical systems with operational constraints using model-based and model-free RL.
comment: pages - 63, figures - 10, tables - 9
A Saturation-Based Optimal Velocity Model for Traffic Flow Dynamics
Many headway-based car-following models describe longitudinal adaptation through linear relaxation laws, which can produce unrealistically large accelerations and limit the physical consistency of microscopic traffic dynamics. Motivated by this limitation, we develop a saturation-based extension of the classical Optimal Velocity Model (OVM) that preserves the headway-dependent desired-speed structure while introducing bounded nonlinear acceleration dynamics. Linear stability analysis shows that the proposed formulation preserves the classical long-wave instability mechanism associated with stop-and-go waves while modifying the stability threshold and enforcing bounded acceleration. Ring-road simulations support the analysis and illustrate how the model alters perturbation growth, wave amplitude, and relaxation behavior relative to the classical OVM. The resulting framework provides a compact and analytically tractable extension for studying nonlinear traffic-wave dynamics and physically constrained car-following behavior.
comment: 25 pages, 9 figures
Robotics
HarmoWAM: Harmonizing Generalizable and Precise Manipulation via Adaptive World Action Models
World Action Models (WAMs) have emerged as a promising paradigm for robot control by modeling physical dynamics. Current WAMs generally follow two paradigms: the "Imagine-then-Execute" approach, which uses video prediction to infer actions via inverse dynamics, and the "Joint Modeling" approach, which jointly models actions and video representations. Based on systematic experiments, we observe a fundamental trade-off between these paradigms: the former explicitly leverages world models for generalizable transit but lacks interaction precision, whereas the latter enables fine-grained, temporally coherent action generation but is constrained by the exploration space of the training distribution. Motivated by these findings, we propose HarmoWAM, an end-to-end WAM that fully leverages a world model to unify predictive and reactive control, enabling both generalizable transit and precise manipulation. Specifically, the world model provides spatio-temporal physical priors that condition two complementary action experts: a predictive expert that leverages latent dynamics for iterative action generation, and a reactive expert that directly infers actions from predicted visual evolution. To enable adaptive coordination, a Process-Adaptive Gating Mechanism is proposed to automatically determine the timing and location of switching between them. This allows the world model to drive the reactive expert to expand the exploration space and the predictive expert to perform precise interactions across different stages of a task. For evaluation, we construct three training-unseen test environments across six real-world robotic tasks, covering variations in background, position, and object semantics. Notably, HarmoWAM achieves strong zero-shot generalization across these scenarios, significantly outperforming prior state-of-the-art VLA models and WAMs by margins of 33% and 29%, respectively.
Variational Inference for Lévy Process-Driven SDEs via Neural Tilting
Modelling extreme events and heavy-tailed phenomena is central to building reliable predictive systems in domains such as finance, climate science, and safety-critical AI. While Lévy processes provide a natural mathematical framework for capturing jumps and heavy tails, Bayesian inference for Lévy-driven stochastic differential equations (SDEs) remains intractable with existing methods: Monte Carlo approaches are rigorous but lack scalability, whereas neural variational inference methods are efficient but rely on Gaussian assumptions that fail to capture discontinuities. We address this tension by introducing a neural exponential tilting framework for variational inference in Lévy-driven SDEs. Our approach constructs a flexible variational family by exponentially reweighting the Lévy measure using neural networks. This parametrization preserves the jump structure of the underlying process while remaining computationally tractable. To enable efficient inference, we develop a quadratic neural parametrization that yields closed-form normalization of the tilted measure, a conditional Gaussian representation for stable processes that facilitates simulation, and symmetry-aware Monte Carlo estimators for scalable optimization. Empirically, we demonstrate that the method accurately captures jump dynamics and yields reliable posterior inference in regimes where Gaussian-based variational approaches fail, on both synthetic and real-world datasets.
comment: The associated project page which contains the official implementation can be found in https://circle-group.github.io/research/NeuralTilting/
PriorVLA: Prior-Preserving Adaptation for Vision-Language-Action Models
Large-scale pretraining has made Vision-Language-Action (VLA) models promising foundations for generalist robot manipulation, yet adapting them to downstream tasks remains necessary. However, the common practice of full fine-tuning treats pretraining as initialization and can shift broad priors toward narrow training-distribution patterns. We propose PriorVLA, a novel framework that preserves pretrained priors and learns to leverage them for effective adaptation. PriorVLA keeps a frozen Prior Expert as a read-only prior source and trains an Adaptation Expert for downstream specialization. Expert Queries capture scene priors from the pretrained VLM and motor priors from the Prior Expert, integrating both into the Adaptation Expert to guide adaptation. Together, PriorVLA updates only 25% of the parameters updated by full fine-tuning. Across RoboTwin 2.0, LIBERO, and real-world tasks, PriorVLA achieves stronger overall performance than full fine-tuning and state-of-the-art VLA baselines, with the largest gains under out-of-distribution (OOD) and few-shot settings. PriorVLA improves over pi0.5 by 11 points on RoboTwin 2.0-Hard and achieves 99.1% average success on LIBERO. Across eight real-world tasks and two embodiments, PriorVLA reaches 81% in-distribution (ID) and 57% OOD success with standard data. With only 10 demonstrations per task, PriorVLA reaches 48% ID and 32% OOD success, surpassing pi0.5 by 24 and 22 points, respectively.
comment: 32 pages. Project page: https://priorvla.github.io/
RoboMemArena: A Comprehensive and Challenging Robotic Memory Benchmark
Memory is a critical component of robotic intelligence, as robots must rely on past observations and actions to accomplish long-horizon tasks in partially observable environments. However, existing robotic memory benchmarks still lack multimodal annotations for memory formation, provide limited task coverage and structural complexity, and remain restricted to simulation without real-world evaluation. We address this gap with RoboMemArena, a large-scale benchmark of 26 tasks, with average trajectory lengths exceeding 1,000 steps per task and 68.9% of subtasks being memory-dependent. The generation pipeline leverages a vision-language model (VLM) to design and compose subtasks, generates full trajectories through atomic functions, and provides memory-related annotations, including subtask instructions and native keyframe annotations, while paired real-world memory tasks support physical evaluation. We further design PrediMem, a dual-system VLA in which a high-level VLM planner manages a memory bank with recent and keyframe buffers and uses a predictive coding head to improve sensitivity to task dynamics. Extensive experiments on RoboMemArena show that PrediMem outperforms all baselines and provides insights into memory management, model architecture, and scaling laws for complex memory systems.
comment: Project website: https://robomemarena.github.io
Optimal and Scalable MAPF via Multi-Marginal Optimal Transport and Schrödinger Bridges ICML 2026
We consider anonymous multi-agent path finding (MAPF) where a set of robots is tasked to travel to a set of targets on a finite, connected graph. We show that MAPF can be cast as a special class of multi-marginal optimal transport (MMOT) problems with an underlying Markovian structure, under which the exponentially large MMOT collapses to a linear program (LP) polynomial in size. Focusing on the anonymous setting, we establish conditions under which the corresponding LP is feasible, totally unimodular, and consequently, yields min-cost, integral $(\{0,1\})$ transports that do not overlap in both space and time. To adapt the approach to large-scale problems, we cast the MAPF-MMOT in a probabilistic framework via Schrödinger bridges. Under standard assumptions, we show that the Schrödinger bridge formulation reduces to an entropic regularization of the corresponding MMOT that admits an iterative Sinkhorn-type solution. The Schrödinger bridge, being a probabilistic framework, provides a shadow (fractional) transport that we use as a template to solve a reduced LP and demonstrate that it results in near-optimal, integral transports at a significant reduction in complexity. Extensive experiments highlight the optimality and scalability of the proposed approaches.
comment: Accepted in ICML 2026 as a spotlight paper
MDrive: Benchmarking Closed-Loop Cooperative Driving for End-to-End Multi-agent Systems
Vehicle-to-Everything (V2X) communication has emerged as a promising paradigm for autonomous driving, enabling connected agents to share complementary perception information and negotiate with each other to benefit the final planning. Existing V2X benchmarks, however, fall short in two ways: (i) open-loop evaluations fail to capture the inherently closed-loop nature of driving, leading to evaluation gaps, and (ii) current closed-loop evaluations lack behavioral and interactive diversity to reflect real-world driving. Thus, it is still unclear the extent of benefits of multi-agent systems for closed-loop driving. In this paper, we introduce MDrive, a closed-loop cooperative driving benchmark comprising 225 scenarios grounded in both NHTSA pre-crash typologies and real-world V2X datasets. Our benchmark results demonstrate that multi-agent systems are generally better than single-agent counterparts. However, current multi-agent systems still face two important challenges: (i) perception sharing enhances perceptions, but doesn't always translate to better planning; (ii) negotiation improves planning performance but harms it in complex and dense traffic scenarios. MDrive further provides an open-source toolbox for scenario generation, Real2Sim conversion, and human-in-the-loop simulation. Together, MDrive establishes a reproducible foundation for evaluating and improving the generalization and robustness of cooperative driving systems.
comment: website:https://mdrive-challenge.github.io/
CapVector: Learning Transferable Capability Vectors in Parametric Space for Vision-Language-Action Models
This paper proposes a novel approach to address the challenge that pretrained VLA models often fail to effectively improve performance and reduce adaptation costs during standard supervised finetuning (SFT). Some advanced finetuning methods with auxiliary training objectives can improve performance and reduce the number of convergence steps. However, they typically incur significant computational overhead due to the additional losses from auxiliary objectives. To simultaneously achieve the enhanced capabilities of auxiliary training with the simplicity of standard SFT, we decouple the two objectives of auxiliary-objective SFT within the parameter space, namely, enhancing general capabilities and fitting task-specific action distributions. To deliver the goal, we only need to train the model to converge on a small-scale task set using two distinct training strategies, resulting in two finetuned models. The parameters' difference between the two models can then be interpreted as capability vectors provided by auxiliary objectives. These vectors are then merged with pretrained parameters to form a capability-enhanced meta model. Moreover, when standard SFT is augmented with a lightweight orthogonal regularization loss, the merged model attains performance comparable to auxiliary finetuned baselines with reduced computational overhead. Internal and external experiments demonstrate that our capability vectors (1) are effective and versatile across diverse models, (2) can generalize to novel environments and embodiments out of the box.
Safe Aerial 3D Path Planning for Autonomous UAVs using Magnetic Potential Fields
Safe autonomous Uncrewed Aerial Vehicle (UAV) navigation in urban environments requires real-time path planning that avoids obstacles. MaxConvNet is a potential-field planner that leverages properties of Maxwell's equations to generate a path to the goal without local minima. We extend the 2D MaxConvNet magnetic field planner to 3D, using a convolutional autoencoder to predict obstacle-aware potential fields from LiDAR-derived 101^3 voxel grids. Evaluation across 100 randomized closed-loop trials in two distinct Cosys-AirSim urban environments, a dense night-time cityscape and a suburban district shows a 100% path planning success rate on both maps without retraining. In offline path planning, 3DMaxConvNet produces path lengths comparable to A* on unseen maps while reducing runtime from 0.155--0.17s to 0.087--0.089s, or about 1.7--1.95 times faster than A*. Against RRT*(3k), 3DMaxConvNet achieves similar path quality while reducing planning runtime from 17.2--17.5s to about 0.09s, which is roughly 193--201 times faster than RRT*(3k).
Is Your Driving World Model an All-Around Player? CVPR 2026
Today's driving world models can generate remarkably realistic dash-cam videos, yet no single model excels universally. Some generate photorealistic textures but violate basic physics; others maintain geometric consistency but fail when subjected to closed-loop planning. This disconnect exposes a critical gap: the field evaluates how real generated worlds appear, but rarely whether they behave realistically. We introduce WorldLens, a unified benchmark that measures world-model fidelity across the full spectrum, from pixel quality and 4D geometry to closed-loop driving and human perceptual alignment, through five complementary aspects and 24 standardized dimensions. Our evaluation of six representative models reveals that no existing approach dominates across all axes: texture-rich models violate geometry, geometry-aware models lack behavioral fidelity, and even the strongest performers achieve only 2-3 out of 10 on human realism ratings. To bridge algorithmic metrics with human perception, we further contribute WorldLens-26K, a 26,808-entry human-annotated preference dataset pairing numerical scores with textual rationales, and WorldLens-Agent, a vision-language evaluator distilled from these judgments that enables scalable, explainable auto-assessment. Together, the benchmark, dataset, and agent form a unified ecosystem for assessing generated worlds not merely by visual appeal, but by physical and behavioral fidelity.
comment: CVPR 2026 VideoWorldModel Workshop; Project Page at https://worldbench.github.io/worldlens GitHub at https://github.com/worldbench/WorldLens
Unified Noise Steering for Efficient Human-Guided VLA Adaptation
Diffusion-based vision-language-action (VLA) models have emerged as strong priors for robotic manipulation, yet adapting them to real-world distributions remains challenging. In particular, on-robot reinforcement learning (RL) is expensive and time-consuming, so effective adaptation depends on efficient policy improvement within a limited budget of real-world interactions. Noise-space RL lowers the cost by keeping the pretrained VLA fixed as a denoising generator while updating only a lightweight actor that predicts the noise. However, its performance is still limited due to inefficient autonomous exploration. Human corrective interventions can reduce this exploration burden, but they are naturally provided in action space, whereas noise-space finetuning requires supervision over noise variables. To address these challenges, we propose UniSteer, a Unified Noise Steering framework that combines human corrective guidance with noise-space RL through approximate action-to-noise inversion. Given a human corrective action, UniSteer inverts the frozen flow-matching decoder to recover a noise target, which provides supervised guidance for the same noise actor that is simultaneously optimized via reinforcement learning. Real-world experiments on diverse manipulation tasks show that UniSteer adapts more efficiently than strong noise-space RL and action-space human-in-the-loop baselines, improving the success rate from 20% to 90% in 66 minutes on average across four real-world adaptation tasks.
ALAM: Algebraically Consistent Latent Transitions for Vision-Language-Action Models
Vision-language-action (VLA) models remain constrained by the scarcity of action-labeled robot data, whereas action-free videos provide abundant evidence of how the physical world changes. Latent action models offer a promising way to extract such priors from videos, but reconstruction-trained latent codes are not necessarily suitable for policy generation: they may predict future observations while lacking the structure needed to be reused or generated coherently with robot actions. We introduce ALAM (Algebraic Latent Action Model), an Algebraically Consistent Latent Action Model that turns temporal relations in action-free video into structural supervision. Given frame triplets, ALAM learns latent transitions that are grounded by reconstruction while being regularized by composition and reversal consistency, encouraging a locally additive transition space. For downstream VLA learning, we freeze the pretrained encoder and use its latent transition sequences as auxiliary generative targets, co-generated with robot actions under a joint flow-matching objective. This couples structured latent transitions with flow-based policy generation, allowing the policy to exploit ALAM's locally consistent transition geometry without requiring latent-to-action decoding. Representation probes show that ALAM reduces additivity and reversibility errors by 25-85 times over unstructured latent-action baselines and improves long-horizon cumulative reconstruction. When transferred to VLA policies, ALAM raises the average success rate from 47.9% to 85.0% on MetaWorld MT50 and from 94.1% to 98.1% on LIBERO, with consistent gains on real-world manipulation tasks. Ablations further confirm that the strongest improvements arise from the synergy between algebraically structured latent transitions and joint flow matching.
MAGS-SLAM: Monocular Multi-Agent Gaussian Splatting SLAM for Geometrically and Photometrically Consistent Reconstruction
Collaborative photorealistic 3D reconstruction from multiple agents enables rapid large-scale scene capture for virtual production and cooperative multi-robot exploration. While recent 3D Gaussian Splatting (3DGS) SLAM algorithms can generate high-fidelity real-time mapping, most of the existing multi-agent Gaussian SLAM methods still rely on RGB-D sensors to obtain metric depth and simplify cross-agent alignment, which limits the deployment on lightweight, low-cost, or power-constrained robotic platforms. To address this challenge, we propose MAGS-SLAM, the first RGB-only multi-agent 3DGS SLAM framework for collaborative scene reconstruction. Each agent independently builds local monocular Gaussian submaps and transmits compact submap summaries rather than raw observations or dense maps. To facilitate robust collaboration in the presence of monocular scale ambiguity, our framework integrates compact submap communication, geometry- and appearance-aware loop verification, and occupancy-aware Gaussian fusion, enabling coherent global reconstruction without active depth sensors. We further introduce ReplicaMultiagent Plus benchmark for evaluating collaborative Gaussian SLAM. Intensive experiments on synthetic and real-world datasets show that MAGS-SLAM achieves competitive tracking accuracy and comparable or superior rendering quality to state-of-the-art RGB-D collaborative Gaussian SLAM methods while relying only RGB images.
C-CoT: Counterfactual Chain-of-Thought with Vision-Language Models for Safe Autonomous Driving
Safety-critical planning in complex environments, particularly at urban intersections, remains a fundamental challenge for autonomous driving. Existing methods, whether rule-based or data-driven, frequently struggle to capture complex scene semantics, infer potential risks, and make reliable decisions in rare, high-risk situations. While vision-language models (VLMs) offer promising approaches for safe decision-making in these environments, most current approaches lack reflective and causal reasoning, thereby limiting their overall robustness. To address this, we propose a counterfactual chain-of-thought (C-CoT) framework that leverages VLMs to decompose driving decisions into five sequential stages: scene description, critical object identification, risk prediction, counterfactual risk reasoning, and final action planning. Within the counterfactual reasoning stage, we introduce a structured meta-action evaluation tree to explicitly assess the potential consequences of alternative action combinations. This self-reflective reasoning establishes causal links between action choices and safety outcomes, improving robustness in long-tail and out-of-distribution scenarios. To validate our approach, we construct the DeepAccident-CCoT dataset based on the DeepAccident benchmark and fine-tune a Qwen2.5-VL (7B) model using low-rank adaptation. Our model achieves a risk prediction recall of 81.9%, reduces the collision rate to 3.52%, and lowers L2 error to 1.98 m. Ablation studies further confirm the critical role of counterfactual reasoning and the meta-action evaluation tree in enhancing safety and interpretability.
Decentralized Contingency MPC based on Safe Sets for Nonlinear Multi-agent Collision Avoidance
Decentralized collision avoidance remains challenging, particularly when agents do not communicate any information related to planned trajectories. Most existing approaches either rely on conservative coordination mechanisms or provide limited guarantees on recursive feasibility and convergence. This paper develops a decentralized contingency MPC framework for multi-agent systems with nonlinear dynamics that achieves collision-free motion under a state-only information pattern. Each agent follows the same consensual rule set, enabling safe decentralized planning without communication. Each agent solves a local optimization problem that couples a nominal trajectory with a contingency certificate ensuring a feasible backup maneuver under receding-horizon operation. A novel geometric and decentralized safe-set update mechanism prevents feasibility loss between consecutive time steps. The resulting scheme guarantees recursive feasibility, including collision avoidance, and establishes a Lyapunov-type convergence result to an admissible safe equilibrium. Simulation results demonstrate performance in both sparse and dense multi-agent environments, including cluttered bottleneck scenarios and under plug-and-play operation.
ObjView-Bench: Rethinking Difficulty and Deployment for Object-Centric View Planning
Object-centric view planning is a core component of active geometric 3D reconstruction in robotics, yet existing evaluations often conflate object complexity, planning difficulty, budget assumptions, and physical reachability constraints. As a result, conclusions drawn from idealized view-planning evaluations may not reliably predict performance under realistic reconstruction settings. We introduce ObjView-Bench, an evaluation framework for rethinking difficulty and deployment in object-centric view planning. First, we disentangle three quantities underlying view-planning evaluation: omnidirectional self-occlusion as an object-side attribute, observation saturation difficulty, and protocol-dependent planning difficulty defined through a set-cover formulation. This separation supports controlled dataset construction, analysis of slow-saturation objects, and a case study showing that planning difficulty-aware sampling can improve learned view planners. Second, we design deployment-oriented evaluation protocols that reveal how budget regimes and reachable-view constraints alter method behavior. Across classical, learned, and hybrid planners, ObjView-Bench shows that difficulty, budget, and reachability constraints substantially change method rankings and failure modes.
xApp Empowered Resource Management for Non-Terrestrial Users in 5G O-RAN Networks
This paper introduces a proactive Unmanned Aerial Vehicle (UAV) mobility management xApp for Open Radio Access Network (O-RAN) Near Real-Time Radio Intelligent Controller (Near-RT RIC) environments, employing Double Deep Q-Network (DDQN) reinforcement learning (RL) enhanced with transfer learning to optimise handover decisions for UAVs operating along predetermined flight trajectories. Unlike reactive approaches that respond to signal degradation, the proposed framework anticipates network conditions and minimises both outage probability and handover frequency through predictive optimisation. The system leverages centralised weight averaging to consolidate knowledge from multiple flight scenarios into a global model capable of generalising to previously unseen operational environments without extensive retraining. A comprehensive evaluation demonstrates that the proposed framework achieves a favourable trade-off between handover frequency and connectivity reliability, reducing handover events by up to 54.6% compared to greedy approaches while maintaining outage probability at practically negligible levels. The results validate the effectiveness of intelligent learning-based approaches for UAV mobility management in next-generation O-RAN architectures, thereby contributing to seamless integration of aerial user equipment into cellular networks.
VRA: Grounding Discrete-Time Joint Acceleration in Voltage-Constrained Actuation
Discrete-time joint acceleration constraints are widely used to enforce position and velocity limits. However, under voltage-constrained electric actuators, kinematically admissible accelerations may be physically unrealizable, exposing a missing execution-level abstraction. We propose Voltage-Realizable Acceleration (VRA), a joint-level acceleration interface that grounds kinematic acceleration in voltage-constrained actuator physics by restricting commanded accelerations to voltage-realizable constraints. Hardware experiments on electric actuators and a wheel-legged quadruped show that VRA removes unrealizable accelerations, restores consistent near-constraint execution, and reduces constraint-induced oscillations.
comment: 10 pages, Accepted by RSS 2026
Embodied AI in Action: Insights from SAE World Congress 2026 on Safety, Trust, Robotics, and Real-World Deployment
Embodied artificial intelligence is rapidly moving from research into real-world systems such as autonomous vehicles, mobile robots, and industrial machines. As these systems become more capable of perceiving, deciding, and acting in dynamic environments, they also introduce new challenges in safety, trust, governance, and operational reliability. This white paper summarizes key insights from the SAE World Congress 2026 panel session \textit{Embodied AI in Action}, which brought together experts from automotive, robotics, artificial intelligence, and safety engineering. The discussion highlighted the need to treat embodied AI as a systems challenge requiring engineering rigor, lifecycle governance, human-centered design, and evolving standards. The paper provides practical perspectives for executives, policymakers, and technical leaders seeking to adopt embodied AI responsibly. The panel reached broad agreement that long-term success will depend not only on advances in AI capability, but equally on safe and trustworthy deployment.
DeepSight: Long-Horizon World Modeling via Latent States Prediction for End-to-End Autonomous Driving ICML 2026
End-to-end autonomous driving systems are increasingly integrating Vision-Language Model (VLM) architectures, incorporating text reasoning or visual reasoning to enhance the robustness and accuracy of driving decisions. However, the reasoning mechanisms employed in most methods are direct adaptations from general domains, lacking in-depth exploration tailored to autonomous driving scenarios, particularly within visual reasoning modules. In this paper, we propose a driving world model that performs parallel prediction of latent semantic features for consecutive future frames in the bird's-eye-view (BEV) space, thereby enabling long-horizon modeling of future world states. We also introduce an efficient and adaptive text reasoning mechanism that utilizes additional social knowledge and reasoning capabilities to further improve driving performance in challenging long-tail scenarios. We present a novel, efficient, and effective approach that achieves state-of-the-art (SOTA) results on the closed-loop Bench2drive benchmark. Codes are available at: https://github.com/hotdogcheesewhite/DeepSight.
comment: ICML 2026
VEGA: Visual Encoder Grounding Alignment for Spatially-Aware Vision-Language-Action Models
Precise spatial reasoning is fundamental to robotic manipulation, yet the visual backbones of current vision-language-action (VLA) models are predominantly pretrained on 2D image data without explicit 3D geometric supervision, resulting in representations that lack accurate spatial awareness. Existing implicit spatial grounding methods partially address this by aligning VLA features with those of 3D-aware foundation models, but they rely on empirical layer search and perform alignment on LLM-level visual tokens where spatial structure has already been entangled with linguistic semantics, limiting both generalizability and geometric interpretability. We propose VEGA (Visual Encoder Grounding Alignment), a simple yet effective framework that directly aligns the output of the VLA's visual encoder with spatially-aware features from DINOv2-FiT3D, a DINOv2 model fine-tuned with multi-view consistent 3D Gaussian Splatting supervision. By performing alignment at the visual encoder output level, VEGA grounds spatial awareness before any linguistic entanglement occurs, offering a more interpretable and principled alignment target. The alignment is implemented via a lightweight projector trained with a cosine similarity loss alongside the standard action prediction objective, and is discarded at inference time, introducing no additional computational overhead. Extensive experiments on simulation benchmark and real-world manipulation tasks demonstrate that VEGA consistently outperforms existing implicit spatial grounding baselines, establishing a new state-of-the-art among implicit spatial grounding methods for VLA models.
OpenSGA: Efficient 3D Scene Graph Alignment in the Open World
Scene graph alignment establishes object correspondences between two 3D scene graphs constructed from partially overlapping observations. This enables efficient scene understanding and object-level relocalization when a robot revisits a place, as well as global map fusion across multiple agents. Such capabilities are essential for robots that require long-term memory for long-horizon tasks involving interactions with the environment. Existing approaches mainly focus on subscan-to-subscan (S2S) alignment and depend heavily on geometric point-cloud features, leaving frame-to-scan (F2S) alignment and open-set vision-language features underexplored. In addition, existing datasets for scene graph alignment remain small-scale with limited object diversity, constraining systematic training and evaluation. We present a unified and efficient scene graph alignment framework that predicts object correspondences by fusing vision-language, textual, and geometric features with spatial context. The framework comprises modules such as a distance-gated spatial attention encoder, a minimum-cost-flow-based allocator, and a global scene embedding generator to achieve accurate alignment even under large coordinate discrepancies. We further introduce ScanNet-SG, a large-scale dataset generated via an automated annotation pipeline with over 700k samples, covering 509 object categories from ScanNet labels and over 3k categories from GPT-4o-based tagging. Experiments show that our method achieves the best overall performance on both F2S and S2S tasks, substantially outperforming existing scene graph alignment methods. Our code and dataset are released at: https://autonomousrobots.nl/paper_websites/opensga.
comment: 13 figures
Priority-Driven Control and Communication in Decentralized Multi-Agent Systems via Reinforcement Learning
Event-triggered control provides a mechanism for avoiding excessive use of constrained communication bandwidth in networked multi-agent systems. However, most existing methods rely on accurate system models, which may be unavailable in practice. In this work, we propose a model-free, priority-driven reinforcement learning algorithm that learns communication priorities and control policies jointly from data in decentralized multi-agent systems. By learning communication priorities, we circumvent the hybrid action space typical in event-triggered control with binary communication decisions. We evaluate our algorithm on benchmark tasks and demonstrate that it outperforms the baseline method.
comment: Accepted to the 23rd IFAC World Congress
Geometrically Approximated Modeling for Emitter-Centric Ray-Triangle Filtering in Arbitrarily Dynamic LiDAR Simulation
Real-time Light Detection And Ranging (LiDAR) simulation must find, per emitted ray, the closest intersecting triangle even in dynamic scenes containing large numbers of moving and deformable objects. Dominant acceleration-structure approaches require rebuilding each frame for dynamic geometry -- a cost that compounds directly with scene dynamics and cannot be amortized regardless of how little actually changed. This paper presents the Gajmer Ray-Casting Algorithm (GRCA), which inverts the question: instead of asking what does each ray hit? it asks which rays can each triangle possibly hit? GRCA geometrically models spinning LiDAR emitters as rotation-traced cones or planes and uses each triangle's emitter-centric apparent area to cull, per triangle, which channels and the rays within those channels can possibly reach it -- without any acceleration structure. GRCA is compute-based and vendor-agnostic by design, targeting highly dynamic, high-resolution simultaneous multi-sensor simulation. At its core, GRCA is a general-purpose ray-casting algorithm: the emitter-centric inversion applies to any setting where rays originate from a known position, not only LiDAR. Benchmarks evaluate 2-8 simultaneous 128x4096-ray LiDARs (360deg/180deg) over complex dynamic scenes -- with just two sensors casting ~1M rays per frame. With range culling inactive, GRCA reaches up to 7.97x over hardware-accelerated OptiX (GPU) and 14.55x over Embree (CPU). Two independent extensions further boost performance even in the most complex scene (~22M triangles, ~9M of which are dynamic, 8 LiDARs): range culling at realistic deployment ranges (10-100m) reaches up to 7.02x GPU and 9.33x CPU; a hybrid pipeline -- GRCA for dynamic geometry, OptiX/Embree for static -- reaches up to 10.5x GPU and 19.2x CPU.
comment: 21 pages, 20 figures
Learning Point Cloud Geometry as a Statistical Manifold: Theory and Practice
Point clouds are a fundamental representation for robotic perception tasks such as localization, mapping, and object pose estimation. However, LiDAR-acquired point clouds are inherently sparse and non-uniform, providing incomplete observations of the underlying scene geometry. This makes reliable geometric reasoning challenging and degrades downstream perception performance. Existing approaches attempt to compensate for these limitations by estimating local geometry, but often rely on hand-crafted statistics or end-to-end supervised learning, which can suffer from limited scalability or require large amounts of accurately labeled data. To address these challenges, we explicitly model point cloud geometry under a principled mathematical formulation. We represent local geometry as a statistical manifold induced by a family of Gaussian distributions, where each point is associated with a Gaussian capturing its local geometric structure. Based on this formulation, we introduce Point-to-Ellipsoid (POLI), a deep neural estimator that predicts per-point Gaussian geometry. POLI learns a mapping from point cloud observations to their underlying geometry in a self-supervised manner, removing the need for labeled data while preserving strong geometric inductive biases. The resulting representation integrates seamlessly into existing robotic perception pipelines without architectural modifications. Extensive experiments show that POLI enables accurate and robust geometry estimation and consistently improves performance across diverse robotic perception tasks.
VISOR: A Vision-Language Model-based Test Oracle for Testing Robot
Testing robots requires assessing whether they perform their intended tasks correctly, dependably, and with high quality, a challenge known as the test oracle problem in software testing. Traditionally, this assessment relies on task-specific symbolic oracles for task correctness and on human manual evaluation of robot behavior, which is time-consuming, subjective, and error-prone. To address this, we propose VISOR, a Vision-Language Model (VLM)-based approach for automated test oracle assessment that eliminates the need of expensive human evaluations. VISOR performs automated evaluation of task correctness and quality, addressing the limitations of existing symbolic test oracles, which are task-specific and provide pass/fail judgments without explicitly quantifying task quality. Given the inherent uncertainty in VLMs, VISOR also explicitly quantifies its own uncertainty during test assessments. We evaluated VISOR using two VLMs, i.e., GPT and Gemini, across four robotic tasks on over 1,000 videos. Results show that Gemini achieves higher recall while GPT achieves higher precision. However, both models show low correlation between uncertainty and correctness, which prevents using uncertainty as a correctness predictor.
Temporal Sampling Frequency Matters: A Capacity-Aware Study of End-to-End Driving Trajectory Prediction
End to end (E2E) autonomous driving trajectory prediction is often trained with camera frames sampled at the highest available temporal frequency, assuming that denser sampling improves performance. We question this assumption by treating temporal sampling frequency as an explicit training set design variable. Starting from high frequency E2E driving datasets, we construct frequency sweep training sets by temporally subsampling camera frames along each trajectory. For each model dataset pair, we train and evaluate the same model under a fixed protocol, so the frequency response reflects how prediction performance changes with sampling frequency. We analyze this response from a capacity aware perspective. Sparse sampling may miss driving relevant cues, while dense sampling may add redundant visual content and off manifold noise. For finite capacity models, this can create a driving irrelevant capacity burden. We evaluate three smaller E2E models and a larger VLA style AutoVLA model on Waymo, nuScenes, and PAVE. Results show model and dataset dependent frequency responses. Smaller E2E models often show non monotonic or near plateau trends and achieve their best 3 second ADE at lower or intermediate frequencies. In contrast, AutoVLA achieves its best 3 second ADE and FDE at the highest evaluated frequency on all three datasets. Iteration matched controls suggest that the advantage of lower or intermediate frequencies for smaller models is not explained only by unequal training update counts. These findings show that temporal sampling frequency should be reported and tuned, rather than fixed to the highest available value.
PaMoSplat: Part-Aware Motion-Guided Gaussian Splatting for Dynamic Scene Reconstruction
Dynamic scene reconstruction represents a fundamental yet demanding challenge in computer vision and robotics. While recent progress in 3DGS-based methods has advanced dynamic scene modeling, obtaining high-fidelity rendering and accurate tracking in scenarios with substantial, intricate motions remains significantly challenging. To address these challenges, we propose PaMoSplat, a novel dynamic Gaussian splatting framework incorporating part awareness and motion priors. Our approach is grounded in two key observations: 1) Parts serve as primitives for scene deformation, and 2) Motion cues from optical flow can effectively guide part motion. Specifically, PaMoSplat initializes by lifting multi-view segmentation masks into 3D space via graph clustering, establishing coherent Gaussian parts. For subsequent timestamps, we leverage a differential evolutionary algorithm to estimate the rigid motion of these parts using multi-view optical flow cues, providing a robust warm-start for further optimization. Additionally, PaMoSplat introduces an adaptive iteration count mechanism, internal learnable rigidity, and flow-supervised rendering loss to accelerate and optimize the training process. Comprehensive evaluations across diverse scenes, including real-world environments, demonstrate that PaMoSplat delivers superior rendering quality, improved tracking precision, and faster convergence compared to existing methods. Furthermore, it enables multiple part-level downstream applications, such as 4D scene editing.
comment: Accepted by TCSVT. Project Url: https://pamosplat.github.io
Increasing the Efficiency of DETR for Maritime High-Resolution Images SC 2026
Maritime object detection is critical for the safe navigation of unmanned surface vessels (USVs), requiring accurate recognition of obstacles from small buoys to large vessels. Real-time detection is challenging due to long distances, small object sizes, large-scale variations, edge computing limitations, and the high memory demands of high-resolution imagery. Existing solutions, such as downsampling or image splitting, often reduce accuracy or require additional processing, while memory-efficient models typically handle only limited resolutions. To overcome these limitations, we leverage Vision Mamba (ViM) backbones, which build on State Space Models (SSMs) to capture long-range dependencies while scaling linearly with sequence length. Images are tokenized into sequences for efficient high-resolution processing. For further computational efficiency, we design a tailored Feature Pyramid Network with successive downsampling and SSM layers, as well as token pruning to reduce unnecessary computation on background regions. Compared to state-of-the-art methods like RT-DETR with ResNet50 backbone, our approach achieves a better balance between performance and computational efficiency in maritime object detection.
comment: Accepted to IEEE ITSC 2026. Copyright 2026 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses. DOI to be added upon publication
Nano-U: Efficient Terrain Segmentation for Tiny Robot Navigation
Terrain segmentation is a fundamental capability for autonomous mobile robots operating in unstructured outdoor environments. However, state-of-the-art models are incompatible with the memory and compute constraints typical of microcontrollers, limiting scalable deployment in small robotics platforms. To address this gap, we develop a complete framework for robust binary terrain segmentation on a low-cost microcontroller. At the core of our approach we design Nano-U, a highly compact binary segmentation network with a few thousand parameters. To compensate for the network's minimal capacity, we train Nano-U via Quantization-Aware Distillation (QAD), combining knowledge distillation and quantization-aware training. This allows the final quantized model to achieve excellent results on the Botanic Garden dataset and to perform very well on TinyAgri, a custom agricultural field dataset with more challenging scenes. We deploy the quantized Nano-U on a commodity microcontroller by extending MicroFlow, a compiler-based inference engine for TinyML implemented in Rust. By eliminating interpreter overhead and dynamic memory allocation, the quantized model executes on an ESP32-S3 with a minimal memory footprint and low latency. This compiler-based execution demonstrates a viable and energy-efficient solution for perception on low-cost robotic platforms.
comment: Code repository: https://github.com/federico-pizz/Nano-U
HeteroGenManip: Generalizable Manipulation For Heterogeneous Object Interactions
Generalizable manipulation involving cross-type object interactions is a critical yet challenging capability in robotics. To reliably accomplish such tasks, robots must address two fundamental challenges: ``where to manipulate'' (contact point localization) and ``how to manipulate'' (subsequent interaction trajectory planning). Existing foundation-model-based approaches often adopt end-to-end learning that obscures the distinction between these stages, exacerbating error accumulation in long-horizon tasks. Furthermore, they typically rely on a single uniform model, which fails to capture the diverse, category-specific features required for heterogeneous objects. To overcome these limitations, we propose HeteroGenManip, a task-conditioned, two-stage framework designed to decouple initial grasp from complex interaction execution. First, Foundation-Correspondence-Guided Grasp module leverages structural priors to align the initial contact state, thereby significantly reducing the pose uncertainty of grasping. Subsequently, Multi-Foundation-Model Diffusion Policy (MFMDP) routes objects to category-specialized foundation models, integrating fine-grained geometric information with highly-variable part features via a dual-stream cross-attention mechanism. Experimental evaluations demonstrate that HeteroGenManip achieves robust intra-category shape and pose generalization. The framework achieves an average 31\% performance improvement in simulation tasks with broad type setting, alongside a 36.7\% gain across four real-world tasks with different interaction types.
MTA-RL: Robust Urban Driving via Multi-modal Transformer-based 3D Affordances and Reinforcement Learning
Robust urban autonomous driving requires reliable 3D scene understanding and stable decision-making under dense interactions. However, existing end-to-end models lack interpretability, while modular pipelines suffer from error propagation across brittle interfaces. This paper proposes MTA-RL, the first framework that bridges perception and control through Multi-modal Transformer-based 3D Affordances and Reinforcement Learning (RL). Unlike previous fusion models that directly regress actions, RGB images and LiDAR point clouds are fused using a transformer architecture to predict explicit, geometry-aware affordance representations. These structured representations serve as a compact observation space, enabling the RL policy to operate purely on predicted driving semantics, which significantly improves sample efficiency and stability. Extensive evaluations in CARLA Town01-03 across varying densities (20-60 background vehicles) show that MTA-RL consistently outperforms state-of-the-art baselines. Trained solely on Town03, our method demonstrates superior zero-shot generalization in unseen towns, achieving up to a 9.0% increase in Route Completion, an 11.0% increase in Total Distance, and an 83.7% improvement in Distance Per Violation. Furthermore, ablation studies confirm that our multi-modal fusion and reward shaping are critical, significantly outperforming image-only and unshaped variants, demonstrating the effectiveness of MTA-RL for robust urban autonomous driving.
Data-Asymmetric Latent Imagination and Reranking for 3D Robotic Imitation Learning
Robotic imitation learning typically assumes access to optimal demonstrations, yet real-world data collection often yields suboptimal, exploratory, or even failed trajectories. Discarding such data wastes valuable information about environment dynamics and failure modes, which can instead be leveraged to improve decision-making. While 3D policies reduce reliance on high-quality demonstrations through strong spatial generalization, they still require large-scale data to achieve high task success. To address this, we propose DALI-R, a Data-Asymmetric Latent Imagination and Reranking framework for 3D robotic imitation learning from mixed-quality trajectories. It learns a Latent World Model over 3D point clouds for imagined rollouts and a Task Completion Scorer that reranks candidate action chunks, improving decision-making without additional high-quality demonstrations. We instantiate DALI-R with both diffusion and efficient flow-matching policies and evaluate it on Adroit and MetaWorld benchmarks. Across the two evaluated 3D base policies, DALI-R achieves an average $6.8$\% improvement in success rate while incurring less than $0.7\times$ additional inference overhead.
Plan in Sandbox, Navigate in Open Worlds: Learning Physics-Grounded Abstracted Experience for Embodied Navigation ICML 2026
Vision-Language Models (VLMs) have demonstrated exceptional general reasoning capabilities. However, their performance in embodied navigation remains hindered by a scarcity of aligned open-world vision and robot control data. Despite simulators providing a cost-effective alternative for data collection, the inherent reliance on photorealistic simulations often limits the transferability of learned policies. To this end, we propose \textit{\textbf{S}andbox-\textbf{A}bstracted \textbf{G}rounded \textbf{E}xperience} (\textbf{\textit{SAGE}}), a framework that enables agents to learn within a physics-grounded semantic abstraction rather than a photorealistic simulation, mimicking the human capacity for mental simulation where plans are rehearsed in simplified physics abstractions before execution. \textit{SAGE} system operates via three synergistic phases: (1) \textit{Genesis}: constructing diverse, physics-constrained semantic environments to bootstrap experience; (2) \textit{Evolution}: distilling experiences through Reinforcement Learning (RL), utilizing a novel asymmetric adaptive clipping mechanism to stabilize updates; (3) \textit{Navigation}: bridging the abstract policy to open-world control. We demonstrate that \textit{SAGE} significantly improves planner-assisted embodied navigation, achieving a 53.21\% LLM-Match Success Rate on A-EQA (+9.7\% over baseline), while showing encouraging transfer to physical indoor robot deployment.
comment: 28 pages, 15 figures, Extended Version of accepted ICML 2026 Paper
Retrieve-then-Steer: Online Success Memory for Test-Time Adaptation of Generative VLAs
Vision-Language-Action (VLA) models show strong potential for general-purpose robotic manipulation, yet their closed-loop reliability often degrades under local deployment conditions. Existing evaluations typically treat test episodes as independent zero-shot trials. However, real robots often operate repeatedly in the same or slowly changing environments, where successful executions provide environment-verified evidence of reliable behavior patterns. We study this persistent-deployment setting, asking whether a partially competent frozen VLA can improve its reliability by reusing its successful test-time experience. We propose an online success-memory guided test-time adaptation framework for generative VLAs. During deployment, the robot stores progress-calibrated successful observation-action segments in a long-term memory. At inference, it retrieves state-relevant action chunks, filters inconsistent candidates via trajectory-level consistency, and aggregates them into an elite action prior. To incorporate this prior into action generation, we introduce confidence-adaptive prior guidance, which injects the elite prior into an intermediate state of the flow-matching action sampler and adjusts the guidance strength based on retrieval confidence. This design allows the frozen VLA to exploit environment-specific successful experience while preserving observation-conditioned generative refinement. This retrieve-then-steer mechanism enables lightweight, non-parametric test-time adaptation without requiring parameter updates. Simulation and real-world experiments show improved task success and closed-loop stability, especially in long-horizon and multi-stage tasks.
A cell-decomposition based path planner for 3D navigation in constrained workspaces
This paper proposes a cell decomposition algorithm for binary occupancy grids that ensures mutual complete visibility from each cell to at least one adjacent cell. This decomposition establishes a simplified framework for verifying path feasibility that can be easily embedded in optimization problems. To illustrate its utility, we formulate both second-order cone programs (SOCP) and their mixed-integer variant (MISOCP) within the proposed framework. Furthermore, we propose the KSP-SOCP method, which combines Yen's k-shortest path algorithm with the SOCP, achieving improved solutions compared to a standard SOCP approach while avoiding the computational burden of MISOCP. The cell decomposition algorithm, KSP-SOCP, and MISOCP approaches were evaluated in 9 city-like workspaces. The decomposition efficiently partitioned each map, enabling both optimization methods to compute feasible paths. The proposed KSP-SOCP achieved time performance comparable to the MISOCP while requiring less memory, making it highly suitable for large-scale problems.
comment: Accepted for publication at the 23rd IFAC World Congress (Busan, Korea)
EFGCL: Learning Dynamic Motion through Spotting-Inspired External Force Guided Curriculum Learning
Learning dynamic whole-body motions for legged robots through reinforcement learning (RL) remains challenging due to the high risk of failure, which makes efficient exploration difficult and often leads to unstable learning. In this paper, we propose External Force Guided Curriculum Learning (EFGCL), a guided RL approach based on the principle of physical guidance, in which external assistive forces are introduced during training. Inspired by spotting in artistic gymnastics, EFGCL enables agents to physically experience successful motion executions without relying on task-specific reward shaping or reference trajectories. Experiments on a quadrupedal robot performing Jump, Backflip, and Lateral-Flip tasks demonstrate that EFGCL accelerates learning of the Jump task by approximately a factor of two and enables the acquisition of complex whole body motions that conventional RL methods fail to learn. We further show that the learned policies can be deployed on real robot, reproducing motions consistent with those observed in simulation. These results indicate that physically guided exploration, which allows agents to experience success early in training, is an effective and general strategy for improving learning efficiency in dynamic whole-body motion tasks.
comment: Accepted at RA-L 2026, website - https://keitayoneda.github.io/kleiyn-efgcl/, YouTube - https://youtu.be/sFK00hm14No/
Guided Streaming Stochastic Interpolant Policy
Inference-time guidance is essential for steering generative robot policies toward dynamic objectives without retraining, yet existing methods are largely confined to chunk-based architectures that exhibit high latency and lack the reactivity needed for test-time preference alignment or obstacle avoidance. In this work, we formally derive the optimal guidance term for Stochastic Interpolants (SI) by analyzing the value function's time evolution via the Backward Kolmogorov Equation, establishing a modified drift that theoretically guarantees sampling from a target distribution. We apply this framework to real-time control through the Streaming Stochastic Interpolant Policy (SSIP), which generalizes the deterministic Streaming Flow Policy (SFP). Unifying this guidance law with the streaming architecture enables fast and reactive control. To support diverse deployment needs, we propose two complementary mechanisms: training-free Stochastic Trajectory Ensemble Guidance (STEG) that computes gradients on-the-fly for zero-shot adaptation, and training-based Conditional Critic Guidance (CCG) for amortized inference. Empirical evaluations demonstrate that our guided streaming approach significantly outperforms conventional chunk-based policies in reactivity and provides superior, physically valid guidance for dynamic, unstructured environments.
comment: Accepted to Robotics: Science and Systems (RSS) 2026. The first two authors contributed equally
Beyond Self-Play and Scale: A Behavior Benchmark for Generalization in Autonomous Driving
Recent Autonomous Driving (AD) works such as GigaFlow and PufferDrive have unlocked Reinforcement Learning (RL) at scale as a training strategy for driving policies. Yet such policies remain disconnected from established benchmarks, leaving the performance of large-scale RL for driving on standardized evaluations unknown. We present BehaviorBench -- a comprehensive test suite that closes this gap along three axes: Evaluation, Complexity, and Behavior Diversity. In terms of Evaluation, we provide an interface connecting PufferDrive to nuPlan, which, for the first time, enables policies trained via RL at scale to be evaluated on an established planning benchmark for autonomous driving. Complementarily, we offer an evaluation framework that allows planners to be benchmarked directly inside the PufferDrive simulation, at a fraction of the time. Regarding Complexity, we observe that today's standardized benchmarks are so simple that near-perfect scores are achievable by straight lane following with collision checking. We extract a meaningful, interaction-rich split from the Waymo Open Motion Dataset (WOMD) on which strong performance is impossible without multi-agent reasoning. Lastly, we address Behavior Diversity. Existing benchmarks commonly evaluate planners against a single rule-based traffic model, the Intelligent Driver Model (IDM). We provide a diverse suite of interactive traffic agents to stress-test policies under heterogeneous behaviors, beyond just using IDM. Overall, our benchmarking analysis uncovers the following insight: despite learning interactive behaviors in an emergent manner, policies trained via pure self-play under standard reward functions overfit to their training opponents and fail to generalize to other traffic agent behaviors. Building on this observation, we propose a hybrid planner that combines a PPO policy with a rule-based planner.
Muninn: Your Trajectory Diffusion Model But Faster
Diffusion-based trajectory planners can synthesize rich, multimodal robot motions, but their iterative denoising makes online planning and control prohibitively slow. Existing accelerations either modify the sampler or compress the network--sacrificing plan quality or requiring retraining without accounting for downstream control risk. We address the problem of making diffusion-based trajectory planners fast enough for real-time robot use without retraining the model or sacrificing trajectory quality, and in a way that works across diverse state-space diffusion architectures. Our key insight is that diffusion trajectory planners expose two signals we can exploit: a cheap probe of how their internal trajectory representation changes across steps, and analytic coefficients that describe how denoiser errors affect the sampler's state update. By calibrating the first signal against the second on offline runs, we obtain a per-step score that upper-bounds how far the final trajectory can deviate when we reuse a cached denoiser output, and we treat this bound as an uncertainty budget that we can spend over the denoising process. Building on this insight, we present Muninn, a training-free caching wrapper that tracks this uncertainty budget during sampling and, at each diffusion step, chooses between reusing a cached denoiser output when the predicted deviation is small and recomputing the denoiser when it is not. Across standard benchmarks Muninn delivers up to 4.6x wall-clock speedups across several trajectory diffusion models by reducing denoiser evaluations, while preserving task performance and safety metrics. Muninn further certifies that cached rollouts remain within a specified distance of their full-compute counterparts, and we validate these gains in real-time closed-loop navigation and manipulation hardware deployments. Project page: https://github.com/gokulp01/Muninn.
comment: Accepted to Robotics: Science and Systems 2026
StereoPolicy: Improving Robotic Manipulation Policies via Stereo Perception
Recent advances in robot imitation learning have yielded powerful visuomotor policies capable of manipulating a wide variety of objects directly from monocular visual inputs. However, monocular observations inherently lack reliable depth cues and spatial awareness, which are critical for precise manipulation in cluttered or geometrically complex scenes. To address this limitation, we introduce StereoPolicy, a new visuomotor policy learning framework that directly leverages synchronized stereo image pairs to strengthen geometric reasoning, without requiring explicit 3D reconstruction or camera calibration. StereoPolicy employs pretrained 2D vision encoders to process each image independently and fuses the resulting representations through a Stereo Transformer. This design implicitly captures spatial correspondence and disparity cues. The framework integrates seamlessly with diffusion-based and pretrained vision-language-action (VLA) policies, delivering consistent improvements over RGB, RGB-D, point cloud, and multi-view baselines across three simulation benchmarks: RoboMimic, RoboCasa, and OmniGibson. We further validate StereoPolicy on real-robot experiments spanning both tabletop and bimanual mobile manipulation settings. Our results underscore stereo vision as a scalable and robust modality that bridges 2D pretrained representations with 3D geometric understanding for robotic manipulation.
HiDrive: A Closed-Loop Benchmark for High-Level Autonomous Driving
End-to-end autonomous driving has witnessed rapid progress, yet existing benchmarks are increasingly saturated, with state-of-the-art models achieving near-perfect scores on widely used open-loop and closed-loop benchmarks. This saturation does not mean that the problem has been solved; instead, it reveals that current benchmarks remain limited in scenario diversity, object variety, and the breadth of driving capabilities they evaluate. In particular, they lack sufficient long-tail scenarios involving rare but safety-critical objects and fail to assess advanced decision-making such as legal compliance, ethical reasoning, and emergency response. To address these gaps, we propose HiDrive, a new closed-loop benchmark for end-to-end autonomous driving that emphasizes long-tail scenarios and a richer evaluation of driving capabilities. HiDrive introduces a diverse set of rare objects and uncommon traffic situations, and expands evaluation from basic driving skills to more advanced capabilities, including rule compliance, moral reasoning, and context-dependent emergency maneuvers. Correspondingly, we extend previous collision-avoidance-centered metrics into a comprehensive evaluation system that encompasses collision and braking, traffic-rule compliance, and moral-reasoning indicators. Built on a more advanced physics engine, HiDrive provides physically realistic lighting and high-fidelity visual rendering, offering a more challenging and realistic testbed for assessing whether autonomous driving systems can handle the complexity of real-world deployment. The HiDrive software, source code, digital assets, and documentation are available at https://github.com/VDIGPKU/HiDrive.
JODA: Composable Joint Dynamics for Articulated Objects
Articulated objects used in simulation and embodied AI are typically specified by geometry and kinematic structure, but lack the fine-grained dynamical effects that govern realistic mechanical behavior, such as frictional holding, detents, soft closing, and snap latching. Existing approaches either ignore the detailed structure of dynamics entirely, or use simple models with limited expressiveness. We introduce JODA, a framework for generating joint-level dynamics as a structured three-channel field over the joint degree of freedom, capturing conservative forces, dry friction, and damping. Instantiated using shape-constrained piecewise cubic interpolation (PCHIP), this formulation defines a compact and expressive function space that is both interpretable and compatible with differentiable simulation. Building on this representation, we develop methods for inferring and refining joint dynamics from multimodal inputs. Given visual observations and joint context, a vision-language model proposes structured dynamical primitives, which are composed into a unified dynamics field. The resulting representation supports both direct manipulation and gradient-based refinement. We demonstrate that JODA enables plausible and controllable modeling of diverse joint behaviors, providing a unified interface for inference, editing, and optimization. Code and example assets with their generated profiles will be released upon publication.
LoopVLA: Learning Sufficiency in Recurrent Refinement for Vision-Language-Action Models
Current Vision-Language-Action (VLA) models typically treat the deepest representation of a vision-language backbone as universally optimal for action prediction. However, robotic manipulation is composed of many frequent closed-loop spatial adjustments, for which excessive abstraction may waste computation and weaken low-level geometric cues essential for precise control. Existing early-exit strategies attempt to reduce computation by stopping at predefined layers or applying heuristic rules such as action consistency, but they do not directly answer when a representation is actually sufficient for action. In this paper, we present LoopVLA, a recurrent VLA architecture that jointly learns representation refinement, action prediction, and sufficiency estimation. LoopVLA iteratively applies a shared Transformer block to refine multimodal tokens, and at each iteration produces both a candidate action and a sufficiency score that estimates whether further refinement is necessary. By sharing parameters across iterations, LoopVLA decouples refinement from absolute layer indices and grounds sufficiency estimation in the evolving representation itself. Since sufficiency has no direct supervision, we introduce a self-supervised distribution alignment objective, where intermediate confidence scores are trained to match the relative action quality across refinement steps, thereby linking sufficiency learning to policy optimization signals. Experiments on LIBERO, LIBERO-Plus, and VLA-Arena show that LoopVLA pushes the efficiency-performance frontier of VLA policies, reducing parameters by 45% and improving inference throughput by up to 1.7 times while matching or outperforming strong baselines in task success.
Explicit Stair Geometry Conditioning for Robust Humanoid Locomotion
Robust humanoid stair climbing remains challenging due to geometric discontinuities, sensitivity to step height variations, and perception uncertainty in real-world environments. Existing learning-based locomotion policies often rely on implicit terrain representations or blind proprioceptive feedback, limiting their ability to generalize across varying stair geometries and to anticipate required gait adjustments. This paper proposes an explicit stair geometry conditioning framework for robust humanoid stair climbing. Instead of encoding terrain as high-dimensional latent features, we extract a compact set of interpretable geometric parameters, including step height, step depth, and current yaw angle relative to the robot heading. These explicit stair parameters directly condition a Proximal Policy Optimization (PPO)-based locomotion policy, enabling proactive modulation of swing-foot clearance and stride characteristics according to stair structure. Simulation experiments demonstrate improved generalization across unseen stair heights beyond the training distribution. Real-world experiments on the Unitree G1 humanoid validate reliable indoor and outdoor stair traversal. In challenging outdoor scenarios, the robot successfully ascends 33 consecutive steps without failure, demonstrating robustness and practical deployability.
comment: 8 pages, 7 figures, 4 tables
Neural Distance-Guided Path Integral Control for Tractor-Trailer Navigation
Autonomous and safe navigation of tractor-trailer systems requires accurate, real-time collision avoidance and dynamically feasible control, particularly in cluttered and complex agricultural environments. This is challenging due to their articulated, deformable geometries and nonlinear dynamics. Traditional methods oversimplify vehicle geometry or rely on precomputed distance fields that assume a known map, limiting their applicability in dynamic, partially unknown environments. To address these limitations, we propose a geometric neural encoder that provides fast and accurate distance estimates between the full tractor-trailer body and raw LiDAR perception, enabling real-time, map-free geometric reasoning. These learned distances are integrated into a Model Predictive Path Integral (MPPI) controller, allowing the system to incorporate true articulated geometry directly into its cost evaluation and enabling more responsive navigation in challenging agricultural settings. Simulation results demonstrate that the proposed framework generates dynamically feasible and safe trajectories for navigating tractor-trailer systems in cluttered and complex environments.
Network-Efficient World Model Token Streaming
Generative driving world models rely on compact latent state representations that must be efficiently transmitted and synchronized across distributed compute and connected vehicles. We study network-efficient streaming of a discrete world model state, where a stride-16 VQ-U-Net tokenizer (codebook size 8,192) maps each 288x512 frame to an 18x32 grid of token IDs (576 tokens/frame), equivalent to 936 bytes/frame under fixed-length coding. We consider a keyframe--delta protocol under strict per-message payload budgets and packet loss, and propose a fully online, label-free algorithm that prioritizes delta updates via cosine distance in codebook embedding space and triggers keyframes adaptively using a Hamming-drift threshold. The adaptive algorithm consistently improves the rate distortion frontier over periodic keyframes at matched bitrates: at 0.024 Mb/s (200-byte budget) dynamic-only embedding distortion drops from 0.0712 to 0.0661 (7.2\%), and at 0.036 Mb/s (400-byte budget) from 0.0427 to 0.0407 (4.8\%). Under 10\% delta packet loss at 200 bytes, dynamic-only distortion is 0.0757 versus 0.0789 for a matched periodic baseline. To connect state fidelity to world model usefulness, we train a lightweight next-token predictor and evaluate perplexity conditioned on streamed receiver states: at 0.024 Mb/s, dynamic-position perplexity improves from 206.0 to 193.1 (6.3\%), and at 0.036 Mb/s from 158.9 to 155.6 (2.1\%). These results support discrete token-state streaming as a practical systems layer for bandwidth-aware synchronization and improved downstream token-dynamics utility under vehicular networking constraints.
comment: Accepted at IEEE VNC 2026
ConsistNav: Closing the Action Consistency Gap in Zero-Shot Object Navigation with Semantic Executive Control
Zero-shot object navigation has advanced rapidly with open-vocabulary detectors, image--text models, and language-guided exploration. However, even after current methods detect a plausible target hypothesis, the agent may still oscillate between exploration and pursuit, or abandon the object near success. We identify this failure mode as an action consistency gap: semantic evidence is repeatedly reinterpreted at each step without persistent commitment across the episode. We introduce ConsistNav, a training-free zero-shot ObjectNav framework built around a semantic executive composed of three coordinated modules: Finite-State Executive Controller stages target pursuit through guarded semantic phases; Persistent Candidate Memory accumulates cross-frame target evidence into stable object hypotheses; and Stability-Aware Action Control suppresses rotational stagnation, ineffective pursuit, and unverified stopping. This design changes neither the detector nor the low-level planner; instead, it controls when semantic evidence should influence navigation and when it should be suppressed or revisited. We conduct extensive experiments on HM3D and MP3D, where ConsistNav achieves state-of-the-art results among compared zero-shot ObjectNav methods and improves SR by 11.4% and SPL by 7.9% over the controlled baseline on MP3D. Ablation studies and real-world deployment experiments further demonstrate the effectiveness and robustness of the proposed executive mechanism.
comment: 13 pages, 5 figures
Computational Design of a Low-Visibility UAV Using a Human-Aligned Perceptual Metric
We introduce Phantom Twist, a type of single-propeller UAV designed to achieve low visibility through high-speed spinning and the exploitation of motion blur. We develop a two-stage automated design pipeline that optimizes the placement of functional components including batteries, control PCB, motor-propeller assembly, and counterweights. The pipeline minimizes visibility as measured by a human-aligned perceptual metric (LPIPS) while strictly satisfying inertial and aerodynamic constraints required for stable flight. We validate this approach through fabrication and flight testing of multiple prototypes. These tests confirm that our pipeline produces stable, controllable designs and that the optimized UAV exhibits significantly reduced visual perceptibility compared to conventional quadcopters.
comment: Accepted by RSS 2026
Distributed Pose Graph Optimization via Continuous Riemannian Dynamics
We present a framework for distributed Pose Graph Optimization (PGO) by formulating the problem as a second-order continuous-time dynamical system evolving on Lie groups. By modeling pose variables as massive particles subject to damping, the equilibrium points of the resulting Riemannian dynamics coincide with first-order critical points of the original PGO problem. Using the governing damped Euler--Poincaré equations and a semi-implicit geometric integrator, we design an optimization algorithm that generalizes existing algorithms such as Riemannian gradient descent and Gauss--Newton. In multi-robot settings, we present a fully distributed and parallel method based on block-diagonal mass and damping matrices, where each robot solves an ordinary differential equation for its own poses with minimal communication overhead. Moreover, modeling both state and velocity enables principled neighbor prediction that significantly improves convergence under delayed communication. Theoretically, we present an analysis and establish sufficient condition that ensures energy dissipation under the employed geometric discretization scheme. Experiments on benchmark PGO datasets demonstrate that the proposed solver achieves superior performance compared to state-of-the-art distributed baselines in both synchronous and asynchronous regimes.
RankQ: Offline-to-Online Reinforcement Learning via Self-Supervised Action Ranking
Offline-to-online reinforcement learning (RL) improves sample efficiency by leveraging pre-collected datasets prior to online interaction. A key challenge, however, is learning an accurate critic in large state--action spaces with limited dataset coverage. To mitigate harmful updates from value overestimation, prior methods impose pessimism by down-weighting out-of-distribution (OOD) actions relative to dataset actions. While effective, this essentially acts as a behavior cloning anchor and can hinder downstream online policy improvement when dataset actions are suboptimal. We propose RankQ, an offline-to-online Q-learning objective that augments temporal-difference learning with a self-supervised multi-term ranking loss to enforce structured action ordering. By learning relative action preferences rather than uniformly penalizing unseen actions, RankQ shapes the Q-function such that action gradients are directed toward higher-quality behaviors. Across sparse reward D4RL benchmarks, RankQ achieves performance competitive with or superior to seven prior methods. In vision-based robot learning, RankQ enables effective offline-to-online fine-tuning of a pretrained vision-language-action (VLA) model in a low-data regime, achieving on average a 42.7% higher simulation success rate than the next best method. In a high-data setting, RankQ improves simulation performance by 13.7% over the next best method and achieves strong sim-to-real transfer, increasing real-world cube stacking success from 43.1% to 84.7% relative to the VLA's initial performance.
Forecast-aware Gaussian Splatting for Predictive 3D Representation in Language-Guided Pick-and-Place Manipulation
We introduce Forecast-aware Gaussian Splatting (Forecast-GS), a predictive 3D representation framework for language-conditioned robotic manipulation. While recent manipulation systems have made progress by grounding language instructions into robot affordances, value maps, or relational keypoint constraints, they usually reason over the current scene and do not explicitly model the task-completed state. This limitation is critical when success depends on satisfying spatial and semantic goals under partial observations, where the robot must evaluate whether a candidate action leads to a feasible task-consistent outcome. We validate Forecast-GS on real-world pick-and-place manipulation tasks, including Cutter-to-Box, Apple-to-Bowl, and Sponge-to-Tray. For each task, we conduct 25 real-world trials under varied initial object configurations using the same robot platform and sensing setup. Forecast-GS with automatic candidate selection achieves success rates of 21/25, 23/25, and 16/25 on the three tasks, respectively, outperforming the ReKep baseline, which achieves 15/25, 19/25, and 10/25. A diagnostic human-assisted setting further improves success rates to 23/25, 24/25, and 19/25, suggesting that candidate generation is effective while automatic ranking remains imperfect. These results suggest that explicitly forecasting task-completed 3D states enables more reliable action evaluation, while the gap between automatic and human-assisted selection indicates that robust final-state ranking remains an important challenge for fully autonomous manipulation. Overall, Forecast-GS provides an interpretable bridge between language understanding, 3D perception, and robotic manipulation planning.
ASIP-Planner: Adaptive Planning for UAV Surface Inspection in Partially Known Indoor Environments
Indoor infrastructure inspection, such as tunnels and industrial facilities, requires systematic surface coverage to ensure that all inspection targets are properly observed. Unmanned Aerial Vehicles (UAVs) offer an alternative to manual inspection by conducting map-guided surface inspection using prior structural models. However, in practice, indoor inspection often relies on floorplan-derived reference maps that may not reflect unforeseen obstacles, such as temporary structures or equipment, leading to occluded viewpoints and degraded inspection quality. Existing coverage planning methods typically assume a fully known inspection environment and perform deterministic global viewpoint optimization based on accurate prior maps, making them vulnerable to environmental discrepancies during execution. This work presents an adaptive UAV inspection framework for partially known structured indoor environments. The proposed method integrates a segment-based global coverage planner with an inspection-oriented local view-angle adaptation module. The global planner organizes planar inspection targets into surface-aligned clusters to generate compact viewpoint sequences with improved orientation consistency. The local planner generates collision-free trajectories and adjusts the viewing direction online to mitigate occlusion-induced coverage loss while preserving the planned trajectory structure. The simulation results across randomized scene configurations demonstrate that the proposed global planner achieves near-complete coverage while reducing trajectory length compared to representative baselines. Real-world flight experiments further validate that the framework produces usable inspection data for downstream analysis. These results indicate that the proposed framework improves inspection efficiency and adaptability in partially known structured indoor environments.
SEVO: Semantic-Enhanced Virtual Observation for Robust VLA Manipulation via Active Illumination and Data-Centric Collection
Vision-Language-Action (VLA) and imitation-learning policies trained via community toolchains on low-cost hardware frequently fail when deployed outside the training environment. Existing evaluations, including the original ACT and SmolVLA benchmarks, demonstrate high success rates under controlled, fixed backgrounds, yet community practitioners report near-zero transfer to new environments. We present SEVO (Semantic-Enhanced Virtual Observation), a data-centric approach that improves cross-environment manipulation robustness without modifying the policy architecture. SEVO transforms the raw RGB camera stream through three mechanisms: (1) body-fixed cameras whose combined fields of view cover the full manipulation workspace, (2) active red-spectrum illumination that physically normalizes object appearance, and (3) real-time YOLO segmentation overlay that provides a background-invariant semantic cue. Critically, we show that a diversified data collection protocol (systematically varying lighting, backgrounds, and distractors during teleoperation) is the single most important factor for generalization. We target transparent water bottles, objects that visually blend with their surroundings, and select a simple pick-and-place task to enable hundreds of controlled real-robot trials across two mobile platforms. The full pipeline achieves 95% grasp success with ACT and 83% with SmolVLA in the training environment, transferring to novel environments at 85% and 75%. Without SEVO, the same policies achieve only 75%/70% in training and collapse to 30-35% in novel environments. Our results demonstrate that principled observation design and environmental diversity during data collection, not model scaling, enable low-cost robots to operate reliably in everyday household environments.
ForceFlow: Learning to Feel and Act via Contact-Driven Flow Matching
Existing imitation learning methods enable robots to interact autonomously with the physical environment. However, contact-rich manipulation tasks remain a significant challenge due to complex contact dynamics that demand high-precision force feedback and control. Although recent efforts have attempted to integrate force/torque sensing into policies, how to build a simple yet effective framework that achieves robust generalization under multimodal observations remains an open question. In this paper, we propose ForceFlow, a force-aware reactive framework built upon flow matching. For contact-stage policy design, we investigate force signal fusion mechanisms and adopt an asymmetric multimodal fusion architecture that treats force as a global regulatory signal, combined with a joint prediction paradigm that enhances the policy's understanding of instantaneous force and historical information, thereby achieving deep coupling between force and motion. For task-level hierarchical decomposition, we divide manipulation into a vision-dominant approach stage (VLM-based pointing for target localization) and a touch-dominant interaction stage (force-driven contact execution), with a Vision-to-Force (V2F) handover mechanism that explicitly decouples spatial generalization from contact regulation. Experimental results across six real-world contact-rich tasks demonstrate that ForceFlow achieves a 37% success rate improvement over the strong baseline ForceVLA while maintaining significantly lower cost. Moreover, ForceFlow exhibits accurate force signal prediction and demonstrates superior performance in contact force self-regulation and zero-shot out-of-distribution (OOD) generalization.
NoTVLA: Semantics-Preserving Robot Adaptation via Narrative Action Interfaces
Vision-Language-Action (VLA) models represent a pivotal advance in embodied intelligence, yet they confront critical barriers to real-world deployment, most notably catastrophic forgetting. This issue stems from their overreliance on continuous action sequences or action chunks, which inadvertently create isolated data silos that disrupt knowledge retention across tasks. To tackle these challenges, we propose the Narrowing of Trajectory VLA (NoTVLA) framework: a novel approach that narrows its focus to sparse trajectories, thereby avoiding the catastrophic forgetting associated with dense trajectory fine-tuning. A key innovation of NoTVLA lies in its trajectory planning strategy: instead of centering on the target object's trajectory, it leverages temporal compression and spatial reasoning pruning specifically for the robot end effector's trajectory. Furthermore, training is conducted using these sparse trajectories rather than dense action trajectories, an optimization that delivers remarkable practical advantages with better performance in zero-shot. In multi-task evaluation scenarios, NoTVLA achieves superior performance and generalization compared to pi0 while operating under two critical constraints: it uses over an order of magnitude less computing power than pi0 and requires no wrist-mounted camera. This design ensures that NoTVLA's operational accuracy closely approximates that of single-task expert models. Crucially, it also preserves the model's inherent language capabilities, enabling zero-shot generalization in specific scenarios, supporting unified model deployment across multiple robot platforms, and fostering a degree of generalization even when perceiving tasks from novel perspectives.
Reinforcement Learning with Action Chunking NeurIPS 2025
We present Q-chunking, a simple yet effective recipe for improving reinforcement learning (RL) algorithms for long-horizon, sparse-reward tasks. Our recipe is designed for the offline-to-online RL setting, where the goal is to leverage an offline prior dataset to maximize the sample-efficiency of online learning. Effective exploration and sample-efficient learning remain central challenges in this setting, as it is not obvious how the offline data should be utilized to acquire a good exploratory policy. Our key insight is that action chunking, a technique popularized in imitation learning where sequences of future actions are predicted rather than a single action at each timestep, can be applied to temporal difference (TD)-based RL methods to mitigate the exploration challenge. Q-chunking adopts action chunking by directly running RL in a 'chunked' action space, enabling the agent to (1) leverage temporally consistent behaviors from offline data for more effective online exploration and (2) use unbiased $n$-step backups for more stable and efficient TD learning. Our experimental results demonstrate that Q-chunking exhibits strong offline performance and online sample efficiency, outperforming prior best offline-to-online methods on a range of long-horizon, sparse-reward manipulation tasks.
comment: The Thirty-Ninth Annual Conference on Neural Information Processing Systems (NeurIPS 2025); 29 pages, 17 figures
CoLA-Flow Policy: Temporally Coherent Imitation Learning via Continuous Latent Action Flow Matching for Robotic Manipulation
Learning long-horizon robotic manipulation requires jointly achieving expressive behavior modeling, real-time inference, and stable execution, which remains challenging for existing generative policies. Diffusion-based approaches offer strong modeling capacity but incur high inference latency, while flow matching enables fast, near-single-step generation yet often suffers from unstable execution when operating directly in the raw action space. We propose Continuous Latent Action Flow Policy (CoLA-Flow Policy), a trajectory-level imitation learning framework that performs flow matching in a continuous latent action space. By encoding action sequences into temporally coherent latent trajectories and learning an explicit latent-space flow, CoLA-Flow Policy decouples global motion structure from low-level control noise, enabling smooth and reliable long-horizon execution. The framework further integrates geometry-aware point cloud conditioning and execution-time multimodal modulation, using visual cues as a representative modality to enhance real-world robustness. Experiments in simulation and on real robots show that CoLA-Flow Policy achieves near-single-step inference, improves trajectory smoothness by up to 93.7% and task success by up to 25 percentage points over raw action-space flow baselines, while remaining significantly faster than diffusion-based policies.
comment: 9 pages, 9 figures
Informative Path Planning with Guaranteed Estimation Uncertainty
Environmental monitoring robots often need to estimate data fields (e.g., salinity, temperature, bathymetry) under tight resource constraints. Classical boustrophedon lawnmower surveys provide geometric coverage guarantees but can waste effort by oversampling predictable regions. In contrast, informative path planning (IPP) methods leverage spatial correlations to reduce oversampling, yet typically offer no guarantees on estimation quality. This paper bridges these approaches by addressing IPP with guaranteed estimation uncertainty in complex environments: computing the shortest path whose measurements ensure that the Gaussian process (GP) posterior variance -- an intrinsic uncertainty measure that lower-bounds the mean-squared prediction error under the GP model -- is upper bounded by a user-specified threshold over the monitoring region. We propose a three-stage approach for efficient environmental monitoring: (i) learning a GP model from prior information; (ii) transforming the GP kernel into binary coverage maps that identify locations where uncertainty can be reduced below a target threshold; and (iii) planning a near-shortest route to satisfy the global uncertainty constraint. Our approach incorporates non-stationary kernels to capture spatially varying correlations in heterogeneous phenomena and accommodates non-convex environments with obstacles. We provide near-optimal approximation guarantees for both sensing-location selection and the joint selection-and-routing problem under a travel budget. Experiments on real-world topographic data demonstrate that our planners achieve uncertainty targets with fewer sensing locations and shorter travel distances than representative baselines. Furthermore, field experiments with autonomous surface and underwater vehicles validate the real-world feasibility of the approach. Our code is available at: www.sgp-tools.com
comment: 15 pages, 11 figures, RSS 2026
Scalable Inspection Planning via Flow-based Mixed Integer Linear Programming
Inspection planning is concerned with computing the shortest robot path to inspect a given set of points of interest (POIs) using the robot's sensors. This problem arises in a wide range of applications from manufacturing to medical robotics. To alleviate the problem's complexity, recent methods rely on sampling-based methods to obtain a more manageable (discrete) graph inspection planning (GIP) problem. Unfortunately, GIP still remains highly difficult to solve at scale as it requires simultaneously satisfying POI-coverage and path-connectivity constraints, giving rise to a challenging optimization problem, particularly at scales encountered in real-world scenarios. In this work, we present highly scalable Mixed Integer Linear Programming (MILP) solutions for GIP that significantly advance the state-of-the-art in both runtime and solution quality. Our key insight is a reformulation of the problem's core constraints as a network flow, which enables effective MILP models and a specialized Branch-and-Cut solver that exploits the combinatorial structure of flows. We evaluate our approach on medical and infrastructure benchmarks alongside large-scale synthetic instances. Across all scenarios, our method produces substantially tighter lower bounds than existing formulations, reducing optimality gaps by 30-50% on large instances. Furthermore, our solver demonstrates unprecedented scalability: it provides non-trivial solutions for problems with up to 15,000 vertices and thousands of POIs, where prior state-of-the-art methods typically exhaust memory or fail to provide any meaningful optimality guarantees.
SCORP: Scene-Consistent Multi-agent Diffusion Planning with Stable Online Reinforcement Post-Training for Cooperative Driving
Cooperative driving is a safety- and efficiency-critical task that requires the coordination of diverse, interaction-realistic multi-agent trajectories. Although existing diffusion-based methods can capture multimodal behaviors from demonstrations, they often exhibit weak scene consistency and poor alignment with closed-loop cooperative objectives. This makes post-training necessary for further improvement, yet achieving stable online post-training in reactive multi-agent environments remains challenging. In this paper, we propose SCORP, a scene-consistent multi-agent diffusion planner with stable online reinforcement learning (RL) post-training for cooperative driving. For pre-training, we develop a scene-conditioned multi-agent denoising architecture that couples inter-agent self-attention with a dual-path conditioning mechanism: cross-attention provides direct scene-information injection, while AdaLN-Zero enables additional flexible and stable conditional modulation, thereby improving the scene consistency and road adherence of joint trajectories. For post-training, we formulate a two-layer Markov decision process (MDP) that explicitly integrates the reverse denoising chain with policy-environment interaction. We further co-design dense, well-shaped planning rewards and variance-gated group-relative policy optimization (VG-GRPO) to mitigate advantage collapse and gradient instability during closed-loop training. Extensive experiments show that SCORP outperforms strong open-source baselines on WOMD, with 10.47%-28.26% and 1.70%-7.22% improvements in core safety and efficiency metrics, respectively. Moreover, compared with alternative post-training methods, SCORP delivers significant and consistent gains in both driving safety and traffic efficiency, highlighting stable and sustained advances in closed-loop cooperative driving.
Accurate Trajectory Tracking with MPCC for Flapping-Wing MAVs
Flapping-wing micro aerial vehicles offer quieter and safer operation than rotary-wing drones, yet achieving precise autonomous control of bird-scale ornithopters remains challenging: lift, airspeed, and turning authority are tightly coupled and governed by only a few control inputs. Conventional cascaded controllers treat altitude, speed, and heading independently, producing persistent tracking errors during complex maneuvers, while time-parameterized trajectory tracking requires predefined speed profiles that existing methods cannot robustly produce for these coupled dynamics. We address both limitations simultaneously with a Model Predictive Contouring Control (MPCC) approach that tracks arc-length-parameterized trajectories while optimizing progress online, eliminating the need for predefined timing. However, MPCC requires a dynamical model that captures the coupled aerodynamics without exceeding the computational budget of real-time nonlinear optimization. Here, we propose a compact, continuously differentiable model that captures the dominant couplings of bird-scale ornithopters, enabling real-time predictive control. We validated the method with the XFly ornithopter flying along circular and three-dimensional racing trajectories and achieved a mean deviation from the reference trajectory between 6.5 and 9 cm at speeds up to 3 m/s, which represents an almost 10-fold improvement over prior ornithopter control methods.
comment: 7 pages, 6 figures
False Feasibility in Variable Impedance MPC for Legged Locomotion
Variable impedance model predictive control (MPC) formulations often treat joint stiffness as an instantaneous decision variable. The resulting feasible set strictly contains the physically realizable set under first-order actuator dynamics. We identify this as a formulation error rather than a modeling approximation, formalize the distinction between the parameter-based feasible set F_param and the realizable set F_real, and characterize the regime of mismatch via the dimensionless parameter α = ωsT (actuator bandwidth times task timescale). For the 1D hopping monoped, we prove that below an analytical threshold α_crit derived in closed form from task physics, no admissible stiffness command realizes the parameter-based prediction. Numerical validation in 1D shows monotonic deviation growth as α decreases, with the predicted scaling holding across ten parameter combinations (log-log R2 = 0.986). Mechanism transfer to planar spring-loaded inverted pendulum dynamics confirms center-of-mass and stance-timing deviation as the primary consequence, with regime-dependent friction effects as a tertiary observable. A second threshold α_infeas < α_crit establishes a floor below which restricting the admissible stiffness range cannot repair realizability, closing the conservative-tuning objection. Augmenting the prediction state with stiffness closes the mismatch by construction.
comment: Paper withdrawn to make some revisions in the discussion and experiments sections
MOBIUS: A Multi-Modal Bipedal Robot that can Walk, Crawl, Climb, and Roll
This paper presents the MOBIUS platform, a bipedal robot capable of walking, crawling, climbing, and rolling. MOBIUS features four limbs, two 6-DoF arms with two-finger grippers for manipulation and climbing, and two 4-DoF legs for locomotion--enabling smooth transitions across diverse terrains without reconfiguration. A hybrid control architecture combines reinforcement learning for locomotion and force control for compliant contact interactions during manipulation. A high-level MIQCP planner autonomously selects locomotion modes to balance stability and energy efficiency. Hardware experiments demonstrate robust gait transitions, dynamic climbing, and full-body load support via pinch grasp. Overall, MOBIUS demonstrates the importance of tight integration between morphology, high-level planning, and control to enable mobile loco-manipulation and grasping, substantially expanding its interaction capabilities, workspace, and traversability.
comment: Paper is accepted at the Robotics: Science and Systems conference, held in Sydney, Australia, July 13th-17th, 2026. Alexander Schperberg and Yusuke Tanaka are co-first authors. Both were at the Robotics and Mechanisms Laboratory (RoMeLa) at UCLA when the work started, and are now with Mitsubishi Electric Research Laboratories and ETH Zurich (RSL) respectively
AR-VLA: True Autoregressive Action Expert for Vision-Language-Action Models
We propose a standalone autoregressive (AR) Action Expert that generates actions as a continuous causal sequence while conditioning on refreshable vision-language prefixes. In contrast to existing Vision-Language-Action (VLA) models and diffusion policies that reset temporal context with each new observation and predict actions reactively, our Action Expert maintains its own history through a long-lived memory and is inherently context-aware. This structure addresses the frequency mismatch between fast control and slow reasoning, enabling efficient independent pretraining of kinematic syntax and modular integration with heavy perception backbones, naturally ensuring spatio-temporally consistent action generation across frames. To synchronize these asynchronous hybrid V-L-A modalities, we utilize a re-anchoring mechanism that mathematically accounts for perception staleness during both training and inference. Experiments on simulated and real-robot manipulation tasks demonstrate that the proposed method can effectively replace traditional chunk-based action heads for both specialist and generalist policies. AR-VLA exhibits superior history awareness and substantially smoother action trajectories while maintaining or exceeding the task success rates of state-of-the-art reactive VLAs. Overall, our work introduces a scalable, context-aware action generation schema that provides a robust structural foundation for training effective robotic policies. Code and Videos available at https://arvla.insait.ai
comment: RSS 2026 accepted
A Nonasymptotic Theory of Gain-Dependent Error Dynamics in Behavior Cloning
Behavior cloning (BC) policies on position-controlled robots inherit the closed-loop response of the underlying PD controller, yet the nonasymptotic finite-horizon consequences of controller gains for BC failure remain open. We show that independent sub-Gaussian action errors propagate through the gain-dependent closed-loop dynamics to yield sub-Gaussian position errors whose proxy matrix $X_\infty(K)$ governs the failure tail. The probability of horizon-$T$ task failure factorizes into a gain-dependent amplification index $Γ_T(K)$ and the validation loss plus a generalization slack, so training loss alone cannot predict closed-loop performance. Under shape-preserving upper-bound structural assumptions, the proxy admits the scalar bound $X_\infty(K)\preceqΨ(K)\bar X$, with $Ψ(K)$ decomposed into label difficulty, injection strength, and contraction. This ranks the four canonical regimes with compliant-overdamped (CO) tightest, stiff-underdamped (SU) loosest, and the stiff-overdamped versus compliant-underdamped ordering system-dependent. For the canonical scalar second-order PD system, the closed-form continuous-time stationary variance $X_\infty^{\mathrm{c}}(α,β)=σ^2α/(2β)$ is strictly monotone in stiffness and damping over the entire stable orthant, covering both underdamped and overdamped regimes, and the exact zero-order-hold (ZOH) discretization inherits this monotonicity. The analysis gives a nonasymptotic finite-horizon extension of the gain-dependent error-attenuation explanation of Bronars et al.
Apple: Toward General Active Perception via Reinforcement Learning ICLR 2026
Active perception is a fundamental skill that enables us humans to deal with uncertainty in our inherently partially observable environment. For senses such as touch, where the information is sparse and local, active perception becomes crucial. In recent years, active perception has emerged as an important research domain in robotics. However, current methods are often bound to specific tasks or make strong assumptions, which limit their generality. To address this gap, this work introduces APPLE (Active Perception Policy Learning) - a novel framework that leverages reinforcement learning (RL) to address a range of different active perception problems. APPLE jointly trains a transformer-based perception module and decision-making policy with a unified optimization objective, learning how to actively gather information. By design, APPLE is not limited to a specific task and can, in principle, be applied to a wide range of active perception problems. We evaluate two variants of APPLE across different tasks, including tactile exploration problems from the Tactile MNIST benchmark. Experiments demonstrate the efficacy of APPLE, achieving high accuracies on both regression and classification tasks. These findings underscore the potential of APPLE as a versatile and general framework for advancing active perception in robotics. Project page: https://timschneider42.github.io/apple
comment: 27 pages; 21 figures; accepted at the Fourteenth International Conference on Learning Representations (ICLR 2026)
Scalable and Efficient Continual Learning from Demonstration via a Hypernetwork-generated Stable Dynamics Model
Robots capable of learning from demonstration (LfD) must exhibit stability while executing learned motion skills. To be effective in the real world, they should also remember multiple skills over time -- a capability lacking in current stable-LfD methods. We propose an approach to stable, continual LfD, and highlight the role of stability in improving continual learning. Our proposed hypernetwork generates the parameters of two neural networks: a trajectory learning dynamics model, and a trajectory-stabilizing Lyapunov function. These generated networks form a clock-augmented stable neural ODE solver (sNODE), a stable dynamics model that offers a superior stability-accuracy trade-off compared to the state-of-the-art. We further propose stochastic hypernetwork regularization with a single, uniformly-sampled task embedding, reducing the cumulative training time for $N$ tasks from O($N^2$) to O($N$) without degrading performance on real-world tasks. We introduce high-dimensional variants of the popular LASA dataset to assess scalability and extend a dataset of robotic LfD tasks to assess real-world performance. We empirically evaluate our approach on multiple LfD datasets of varying complexity, including sequences of 7--26 tasks, trajectories of 2--32 dimensions, and real-world tasks involving position and orientation. Our thorough evaluation on multiple LfD datasets demonstrates that our approach sequentially learns and retains multiple motion skills without retraining on past demonstrations, and outperforms other relevant baselines in terms of trajectory errors, continual learning scores, and stability metrics. Notably, we show that stability greatly enhances continual learning performance, particularly in size-efficient chunked hypernetworks. Our code is available at https://github.com/sayantanauddy/clfd-snode.
comment: To appear in IEEE Transactions on Cognitive and Developmental Systems
Cyclic Nullspace Coordination: Perpetual Flight of Aerial Carriers for Static Suspension
This work demonstrates that the non-stop flights of three or more carriers are compatible with holding a constant pose of a cable-suspended load. It also presents an algorithm for generating the carriers' coordinated non-stop trajectories. The proposed method builds upon two pillars: (1) the choice of n special linearly independent directions of internal forces within the 3n-6-dimensional nullspace of the grasp matrix of the load, chosen as the edges of a Hamiltonian cycle on the graph that connects the cable attachment points on the load. Adjacent pairs of directions are used to generate n forces evolving on distinct 2D affine subspaces, despite the attachment points being generically in 3D; (2) the construction of elliptical trajectories within these subspaces by mapping, through appropriate graph coloring, each edge of the Hamiltonian cycle to a periodic coordinate while ensuring that no adjacent coordinates exhibit simultaneous zero derivatives. Combined with conditions for load statics and attachment point positions, these choices ensure that each of the n force trajectories projects onto the corresponding cable constraint sphere with non-zero tangential velocity, enabling perpetual motion of the carriers while the load is still. The work provides a scalable constructive design for any n greater than or equal to 3 with tuning guidelines, quantifies sensitivity and single-carrier failures, and provides a fixed-wing-compatible planner that preserves load statics under speed/bank/flight-path constraints. The theoretical findings are validated through simulations and laboratory experiments with quadrotor UAVs.
comment: Accepted for publications on the IEEE Transactions on Control Systems Technology
Equivariant Volumetric Grasping
We propose a new volumetric grasp model that is equivariant to rotations around the vertical axis, leading to a significant improvement in sampling efficiency. Our model employs a tri-plane volumetric feature representation -- i.e., the projection of 3D features onto three canonical planes. We introduce a novel tri-plane feature design in which features on the horizontal plane are \emph{equivariant} to $90^\circ$ rotations, while the \emph{sum} of features from the other two planes remains \emph{invariant} to reflections induced by the same transformations. We further develop equivariant adaptations of two state-of-the-art volumetric grasp planners, GIGA and IGD. Specifically, we derive a new equivariant formulation of IGD's deformable attention mechanism and propose an equivariant generative model of grasp orientations based on flow matching. We provide a detailed analytical justification of the proposed equivariance properties and validate our approach through extensive simulated and real-world experiments. Our results demonstrate that the proposed projection-based design reduces both computational and memory costs. Moreover, the equivariant grasp models built on top of our tri-plane features consistently outperform their non-equivariant counterparts, achieving higher performance within a real-time cost constraint. Video and code can be viewed in: https://mousecpn.github.io/evg-page/
comment: 21 pages
Information Filtering via Variational Regularization for Robot Manipulation
Diffusion-based visuomotor policies built on 3D visual representations have achieved strong performance in learning complex robotic skills. However, most existing methods employ an oversized denoising decoder. While increasing model capacity can improve denoising, empirical evidence suggests that it also introduces redundancy and noise in intermediate feature blocks. Crucially, we find that randomly masking backbone features in U-Net or skipping intermediate layers in DiT at inference time (without changing training) can improve performance, confirming the presence of task-irrelevant noise in intermediate features. To this end, we propose Variational Regularization (VR), a plug-and-play module that imposes a context-conditioned Gaussian over the noisy features and applies a KL-divergence regularizer, forming an adaptive information bottleneck. Extensive experiments on three simulation benchmarks, RoboTwin2.0, Adroit, and MetaWorld, show that our approach consistently improves task success rates over the baseline for both DP3-UNet and DP3-DiT, achieving new state-of-the-art results. Real-world experiments further demonstrate that our method performs well in practical deployments.
AffordSim: A Scalable Data Generator and Benchmark for Affordance-Aware Robotic Manipulation
Many everyday robot manipulation skills are affordance-dependent, with success determined by whether the robot contacts the functional object region required by the subsequent action. Current simulation data generators obtain contacts from generic grasp estimators or per-object manual contact annotations, but generic estimators rank stable grasps without task semantics and often select contacts that are misaligned with the downstream action, while manual contact annotations must be rewritten for each new object and task. To solve these challenges, we introduce AffordSim, a scalable data generator and benchmark that integrates open-vocabulary 3D affordance prediction into simulation-based trajectory generation. Given a natural-language task description, AffordSim synthesizes a task-relevant scene, emits affordance queries, grounds them on object surfaces, samples region-conditioned grasps, and selects executable candidates with motion planning. It further randomizes object pose, texture, lighting, image noise, and cross-viewpoint backgrounds for sim-to-real transfer. We instantiate AffordSim as a 50-task benchmark across diverse manipulation skills, five robot embodiments, and 500+ rigid and articulated objects. AffordSim achieves 93% of the trajectory collection success rate of manual contact annotations on affordance-critical tasks and 89% on hard composite tasks. Vision-language-action policies trained on AffordSim data transfer zero-shot to a real Franka FR3, reaching 24% average success.
Commanding Humanoid by Free-form Language: A Large Language Action Model with Unified Motion Vocabulary
Enabling humanoid robots to follow free-form natural language commands is a critical step toward seamless human-robot interaction and general-purpose embodied AI. However, existing methods remain limited, often constrained to simple instructions or forced to sacrifice motion diversity for physical plausibility. To address this gap, we present Humanoid-LLA, a Large Language Action model that translates unconstrained natural language directly into executable whole-body motions for humanoid robots. Our approach tackles two core challenges: paired language-humanoid motion data scarcity and physical instability. First, we bridge high-level language semantics with physically-grounded control by learning a unified human-humanoid motion vocabulary. Second, we introduce a novel two-stage fine-tuning framework that begins with supervised motion Chain-of-Thought learning, followed by reinforcement learning refined with physical feedback to ensure robustness and stability. Extensive evaluation in simulation and real-world cross-embodiment experiments demonstrates that Humanoid-LLA achieves superior generalization to novel language commands and diverse motion generation while maintaining high physical fidelity.
comment: Project page: https://humanoidlla.github.io/
Hydra-DP3: Frequency-Aware Right-Sizing of 3D Diffusion Policies for Visuomotor Control
Diffusion-based visuomotor policies perform well in robotic manipulation, yet current methods still inherit image-generation-style decoders and multi-step sampling. We revisit this design from a frequency-domain perspective. Robot action trajectories are highly smooth, with most energy concentrated in a few low-frequency discrete cosine transform modes. Under this structure, we show that the error of the optimal denoiser is bounded by the low-frequency subspace dimension and residual high-frequency energy, implying that denoising error saturates after very few reverse steps. This also suggests that action denoising requires a much simpler denoising model than image generation. Motivated by this insight, we propose Hydra-DP3 (HDP3), a pocket-scale 3D diffusion policy with a lightweight Diffusion Mixer decoder that supports two-step DDIM inference. Our synthetic experiments validate the theory and support the sufficiency of two-step denoising. Futhermore, across RoboTwin2.0, Adroit, MetaWorld, and real-world tasks, HDP3 achieves state-of-the-art performance with fewer than 1% of the parameters of prior 3D diffusion-based policies and substantially lower inference latency.
Recovering Hidden Reward in Diffusion-Based Policies ICML 2026
This paper introduces EnergyFlow, a framework that unifies generative action modeling with inverse reinforcement learning by parameterizing a scalar energy function whose gradient is the denoising field. We establish that under maximum-entropy optimality, the score function learned via denoising score matching recovers the gradient of the expert's soft Q-function, enabling reward extraction without adversarial training. Formally, we prove that constraining the learned field to be conservative reduces hypothesis complexity and tightens out-of-distribution generalization bounds. We further characterize the identifiability of recovered rewards and bound how score estimation errors propagate to action preferences. Empirically, EnergyFlow achieves state-of-the-art imitation performance on various manipulation tasks while providing an effective reward signal for downstream reinforcement learning that outperforms both adversarial IRL methods and likelihood-based alternatives. These results show that the structural constraints required for valid reward extraction simultaneously serve as beneficial inductive biases for policy generalization. The code is available at https://github.com/sotaagi/EnergyFlow.
comment: Accepted by ICML 2026
Operating Within the Operational Design Domain: Zero-Shot Perception with Vision-Language Models
Over the last few years, research on autonomous systems has matured to such a degree that the field is increasingly well-positioned to translate research into practical, stakeholder-driven use cases across well-defined domains. However, for a wide-scale practical adoption of autonomous systems, adherence to safety regulations is crucial. Many regulations are influenced by the Operational Design Domain (ODD), which defines the specific conditions in which an autonomous agent can function. This is especially relevant for Automated Driving Systems (ADS), as a dependable perception of ODD elements is essential for safe implementation and auditing. Vision-language models (VLMs) integrate visual recognition and language reasoning, functioning without task-specific training data, which makes them suitable for adaptable ODD perception. To assess whether VLMs can function as zero-shot "ODD sensors" that adapt to evolving definitions, we contribute (i) an empirical study of zero-shot ODD classification and detection using four VLMs on a custom dataset and Mapillary Vistas, along with failure analyses; (ii) an ablation of zero-shot optimization strategies with a cost-performance overview; and (iii) a suite of reusable prompting templates with guidance for adaptation. Our findings indicate that definition-anchored chain-of-thought prompting with persona decomposition performs best, while other methods may result in reduced recall. Overall, our results pave the way for transparent and effective ODD-based perception in safety-critical applications.
comment: 8 pages, 4 figures
Explicit Bounds on the Hausdorff Distance for Truncated mRPI Sets via Norm-Dependent Contraction Rates
We derive a computable closed-form upper bound on the Hausdorff distance between a truncated minimal robust positively invariant (mRPI) set and its infinite-horizon limit. The bound depends only on a disturbance-set size measure and an induced-norm contraction factor of the system matrix, and it yields an explicit, fully analytic horizon-selection rule that guarantees a prescribed approximation tolerance without iterative set computations. The choice of vector norm enters as a design lever: norm shaping -- through diagonal or Lyapunov-based weighting -- tightens both the contraction factor and the resulting certificate, with direct consequences for robust invariant-set approximation and tube-based model predictive control (MPC) constraint tightening. Numerical examples illustrate the accuracy, scalability, and practical impact of the proposed bound.
comment: 6 pages, 5 figures. Accepted at the 2026 IEEE Conference on Control Technology and Applications (CCTA), Vancouver, BC, Canada, August 12-14, 2026
EROAS: 3D Efficient Reactive Obstacle Avoidance System for Autonomous Underwater Vehicles using 2.5D Forward-Looking Sonar
Autonomous Underwater Vehicles (AUVs) have advanced significantly in obstacle detection and path planning through sonar, cameras, and learning-based methods. However, safe and efficient navigation in cluttered environments remains challenging due to partial observability, turbidity, the limited field-of-view of forward-looking sonar (FLS), and occlusions that obscure obstacle geometry. To address these issues, we propose the Efficient Reactive Obstacle Avoidance Strategy (EROAS), a lightweight framework that augments a standard 2D FLS with a pivoting mechanism, effectively transforming it into a cost-efficient \emph{2.5D sonar}. This design provides vertical information on demand, extending situational awareness while minimizing computational overhead. EROAS integrates three complementary modules: first, Sonar Profile-guided Directional Decision Control (SPD2C) for rapid gap detection and generation of reference commands in both horizontal and vertical planes. Secondly, the Spatial Context Generator (SCG), which maintains a short-term obstacle memory of the past to mitigate partial observability, and finally, a Spatio-Temporal Control Barrier Function (ST-CBF) that enforces forward-invariance of safety constraints by filtering nominal references. Together, these components enable robust, reactive avoidance of obstacles in uncertain and cluttered 3D underwater settings. Simulation and hardware-in-the-loop (HIL) experiments validate the efficacy of the proposed EROAS algorithm, demonstrating improved trajectory efficiency, reduced travel time, and enhanced safety compared to conventional methods such as the Dynamic Window Approach (DWA) and Artificial Potential Fields (APF). https://github.com/AIRLabIISc/EROAS
comment: Accepted for publication as a Technical Communication, Special Issue on AUV Symposium in the IEEE Journal of Oceanic Engineering (JOE)
A Radius of Robust Feasibility Approach to Directional Sensors in Uncertain Terrain
A sensor has the ability to probe its surroundings. However, uncertainties in its exact location can significantly compromise its sensing performance. The radius of robust feasibility defines the maximum range within which robust feasibility is ensured. This work introduces a novel approach integrating it with the directional sensor networks to enhance coverage using a distributed greedy algorithm. In particular, we provide an exact formula for the radius of robust feasibility of sensors in a directional sensor network. The proposed model strategically orients the sensors in regions with high coverage potential, accounting for robustness in the face of uncertainty. We analyze the algorithm's adaptability in dynamic environments, demonstrating its ability to enhance efficiency and robustness. Experimental results validate its efficacy in maximizing coverage and optimizing sensor orientations, highlighting its practical advantages for real-world scenarios.
UniUncer: Unified Dynamic Static Uncertainty for End to End Driving ICRA 2026
End-to-end (E2E) driving has become a cornerstone of both industry deployment and academic research, offering a single learnable pipeline that maps multi-sensor inputs to actions while avoiding hand-engineered modules. However, the reliability of such pipelines strongly depends on how well they handle uncertainty: sensors are noisy, semantics can be ambiguous, and interaction with other road users is inherently stochastic. Uncertainty also appears in multiple forms: classification vs. localization, and, crucially, in both static map elements and dynamic agents. Existing E2E approaches model only static-map uncertainty, leaving planning vulnerable to overconfident and unreliable inputs. We present UniUncer, the first lightweight, unified uncertainty framework that jointly estimates and uses uncertainty for both static and dynamic scene elements inside an E2E planner. Concretely: (1) we convert deterministic heads to probabilistic Laplace regressors that output per-vertex location and scale for vectorized static and dynamic entities; (2) we introduce an uncertainty-fusion module that encodes these parameters and injects them into object/map queries to form uncertainty-aware queries; and (3) we design an uncertainty-aware gate that adaptively modulates reliance on historical inputs (ego status or temporal perception queries) based on current uncertainty levels. The design adds minimal overhead and drops throughput by only $\sim$0.5 FPS while remaining plug-and-play for common E2E backbones. On nuScenes (open-loop), UniUncer reduces average L2 trajectory error by 7\%. On NavsimV2 (pseudo closed-loop), it improves overall EPDMS by 10.8\% and notable stage two gains in challenging, interaction-heavy scenes. Ablations confirm that dynamic-agent uncertainty and the uncertainty-aware gate are both necessary.
comment: Accepted ICRA 2026
Towards Robust Surgical Automation via Digital Twin Representations from Foundation Models
Large language model-based (LLM) agents are emerging as a powerful enabler of robust embodied intelligence due to their capability of planning complex action sequences. Sound planning ability is necessary for robust automation in many task domains, but especially in surgical automation. These agents rely on a highly detailed natural language representation of the scene. Thus, to leverage the emergent capabilities of LLM agents for surgical task planning, developing similarly powerful and robust perception algorithms is necessary to derive a detailed scene representation of the environment from visual input. Previous research has focused primarily on enabling LLM-based task planning while adopting simple yet severely limited perception solutions to meet the needs for bench-top experiments, but lacks the critical flexibility to scale to less constrained settings. In this work, we propose an alternate perception approach -- a digital twin (DT)-based machine perception approach that capitalizes on the convincing performance and out-of-the-box generalization of recent vision foundation models. Integrating our DT representation and LLM agent for planning with the dVRK platform, we develop an embodied intelligence system and evaluate its robustness in performing peg transfer and gauze retrieval tasks. Our approach shows strong task performance and generalizability to varied environmental settings. Despite a convincing performance, this work is merely a first step towards the integration of DT representations. Future studies are necessary for the realization of a comprehensive DT framework to improve the interpretability and generalizability of embodied intelligence in surgery.
SegSTRONG-C: Segmenting Surgical Tools Robustly On Non-adversarial Generated Corruptions -- An EndoVis'24 Challenge
Surgical data science has seen rapid advancement with the excellent performance of end-to-end deep neural networks (DNNs). Despite their successes, DNNs have been proven susceptible to minor "corruptions," introducing a major concern for the translation of cutting-edge technology, especially in high-stakes scenarios. We introduce the SegSTRONG-C challenge dedicated to better understanding model deterioration under unforeseen but plausible non-adversarial "corruption" and the capabilities of contemporary methods that seek to improve it. Built on a dataset generated through counterfactual robotic replay, SegSTRONG-C provides paired clean and "corrupted" samples, enabling reproducible evaluation of model robustness. Participants are challenged to train tool segmentation algorithms on "uncorrupted" data and evaluate them on "corrupted" test domains for the binary robot tool segmentation task. Through comprehensive baseline experiments and participating submissions from widespread community engagement, SegSTRONG-C reveals key themes for model failure and identifies promising directions for improving robustness. The performance of challenge winners, achieving an average 0.9394 DSC and 0.9301 NSD across the unreleased test sets with "corruption" types: bleeding, smoke, and low brightness. This highlights how prior knowledge, customized training strategies, and architectural choice can be leveraged to improve robustness. In conclusion, the SegSTRONG-C challenge has identified practical approaches for enhancing model robustness. However, most approaches rely on conventional techniques that have known limitations. Looking ahead, we advocate for expanding intellectual diversity and creativity in non-adversarial robustness beyond data augmentation, calling for new paradigms that enhance universal robustness to unforeseen "corruptions" to facilitate richer applications in surgical data science.
Uni-Hand: Universal Hand Motion Forecasting in Egocentric Views
Forecasting how human hands move in egocentric views is critical for applications like augmented reality and human-robot policy transfer. Recently, several hand trajectory prediction (HTP) methods have been developed to generate future possible hand waypoints, which still suffer from insufficient prediction targets, inherent modality gaps, entangled hand-head motion, and limited validation in downstream tasks. To address these limitations, we present a universal hand motion forecasting framework considering multi-modal input, multi-dimensional and multi-target prediction patterns, and multi-task affordances for downstream applications. We harmonize multiple modalities by vision-language fusion, global context incorporation, and task-aware text embedding injection, to forecast hand waypoints in both 2D and 3D spaces. A novel dual-branch diffusion is proposed to concurrently predict human head and hand movements, capturing their motion synergy in egocentric vision. By introducing target indicators, the prediction model can forecast the specific joint waypoints of the wrist or the fingers, besides the widely studied hand center points. In addition, we enable Uni-Hand to additionally predict hand-object interaction states (contact/separation) to facilitate downstream tasks better. As the first work to incorporate downstream task evaluation in the literature, we build novel benchmarks to assess the real-world applicability of hand motion forecasting algorithms. The experimental results on multiple publicly available datasets and our newly proposed benchmarks demonstrate that Uni-Hand achieves the state-of-the-art performance in multi-dimensional and multi-target hand motion forecasting. Extensive validation in multiple downstream tasks also presents its impressive human-robot policy transfer to enable robotic manipulation, and effective feature enhancement for action anticipation/recognition.
comment: Accepted by T-PAMI 2026. Code and data: https://github.com/IRMVLab/UniHand
Morphology-Aware Graph Reinforcement Learning for Tensegrity Robot Locomotion
Tensegrity robots combine rigid rods and elastic cables, offering high resilience and deployability but at the same time posing major challenges for locomotion control due to their underactuated and highly coupled dynamics. This paper introduces a morphology-aware reinforcement learning framework that integrates a graph neural network (GNN) into the Soft Actor-Critic (SAC) algorithm. By representing the robot's physical topology as a graph, the proposed GNN-based policy captures coupling among components, enabling faster and more stable learning than conventional multilayer perceptron (MLP) policies. The method is validated on a physical 3-bar tensegrity robot across three locomotion primitives, including straight-line tracking and bidirectional turning. It shows superior sample efficiency, robustness to noise and stiffness variations, and improved trajectory accuracy. Additionally, the learned policies transfer directly from simulation to hardware without fine-tuning, achieving stable real-world locomotion. These results demonstrate the advantages of incorporating structural priors into reinforcement learning for tensegrity robot control.
comment: 8 pages, 10 figures. Project page: https://tensegrity-graph-rl.github.io/
Remarks on stochastic cloning and delayed-state filtering
Many estimation problems in aerospace navigation and robotics involve measurements that depend on prior states. A prominent example is odometry, which measures the relative change between states over time. Accurately handling these delayed-state measurements requires capturing their correlations with prior state estimates, and a widely used approach is stochastic cloning (SC), which augments the state vector to account for these correlations. This work revisits a long-established but often overlooked alternative--the delayed-state Kalman filter--and demonstrates that a properly derived filter yields exactly the same state and covariance update as SC, without requiring state augmentation. Moreover, two equivalent formulations of the delayed-state Kalman filter (DSKF) are presented, providing complementary perspectives on how the prior-state measurement correlations can be handled within the generalized Kalman filter. These formulations are shown to be comparable to SC in asymptotic computational and memory complexity, while one DSKF formulation can offer reduced arithmetic and storage costs for certain problem dimensions. Our findings clarify a common misconception that Kalman filter variants are inherently unable to handle correlated delayed-state measurements, demonstrating that an alternative formulation achieves the same results without state augmentation.
Looking and Listening Inside and Outside: Multimodal Artificial Intelligence Systems for Driver Safety Assessment and Intelligent Vehicle Decision-Making
The looking-in-looking-out (LILO) framework has enabled intelligent vehicle applications that understand both the outside scene and the driver state to improve safety outcomes, with examples in smart airbag deployment, takeover time prediction in autonomous control transitions, and driver attention monitoring. In this research, we propose an augmentation to this framework, making a case for the audio modality as an additional source of information to understand the driver, and in the evolving autonomy landscape, also the passengers and those outside the vehicle. We expand LILO by incorporating audio signals, forming the looking-and-listening inside-and-outside (L-LIO) framework to enhance driver state assessment and environment understanding through multimodal sensor fusion. We evaluate three example cases where audio enhances vehicle safety: supervised learning on driver speech audio to classify potential impairment states (e.g., intoxication), collection and analysis of passenger natural language instructions (e.g., "turn after that red building") to motivate how spoken language can interface with planning systems through audio-aligned instruction data, and limitations of vision-only systems where audio may disambiguate the guidance and gestures of external agents. Datasets include custom-collected in-vehicle and external audio samples in real-world environments. Pilot findings show that audio yields safety-relevant insights, particularly in nuanced or context-rich scenarios where sound is critical to safe decision-making or visual signals alone are insufficient. Challenges include ambient noise interference, privacy considerations, and robustness across human subjects, motivating further work on reliability in dynamic real-world contexts. L-LIO augments driver and scene understanding through multimodal fusion of audio and visual sensing, offering new paths for safety intervention.
Rollbot: a Spherical Robot Driven by a Single Actuator ICRA 2026
Spherical robots typically require at least two actuators to achieve controlled 2D planar motion. Here we present Rollbot, the first spherical robot capable of controllably maneuvering on a 2D plane with a single actuator, challenging this assumption. Rollbot rolls on the ground in a circular pattern and controls its motion by changing the trajectory's curvature by accelerating and decelerating its single motor and the attached mass according to our derived quasi-stable state dynamics and control laws. We present the theoretical analysis, design, and control of Rollbot, and demonstrate its ability to move in a controllable circular pattern and follow waypoints, validating the efficacy of the proposed theoretical framework.
comment: Accepted by ICRA 2026
Characterizing the Robustness of Black-Box LLM Planners Under Perturbed Observations with Adaptive Stress Testing ACL
Large language models (LLMs) have recently demonstrated success in decision-making tasks including planning, control, and prediction, but their tendency to hallucinate unsafe and undesired outputs poses risks. This unwanted behavior is further exacerbated in environments where sensors are noisy or unreliable. Characterizing the behavior of LLM planners to varied observations is necessary to proactively avoid failures in safety-critical scenarios. We specifically investigate the response of LLMs along two different perturbation dimensions. Like prior works, one dimension generates semantically similar prompts with varied phrasing by randomizing order of details, modifying access to few-shot examples, etc. Unique to our work, the second dimension simulates access to varied sensors and noise to mimic raw sensor or detection algorithm failures. An initial case study in which perturbations are manually applied show that both dimensions lead LLMs to hallucinate in a multi-agent driving environment. However, manually covering the entire perturbation space for several scenarios is infeasible. As such, we propose a novel method for efficiently searching the space of prompt perturbations using adaptive stress testing (AST) with Monte-Carlo tree search (MCTS). Our AST formulation enables discovery of scenarios, sensor configurations, and prompt phrasing that cause language models to act with high uncertainty or even crash. By generating MCTS prompt perturbation trees across diverse scenarios, we show through extensive experiments that offline analyses can be used to proactively understand potential failures that may arise at runtime. Code is available at https://sites.google.com/illinois.edu/astllm.
comment: Accepted to ACL Findings 2026; 31 pages, 26 figures, 6 tables
Fine-Tuning Large Language Models for Cooperative Tactical Deconfliction of Small Unmanned Aerial Systems CVPR 2026
The growing deployment of small Unmanned Aerial Systems (sUASs) in low-altitude airspaces has increased the need for reliable tactical deconfliction under safety-critical constraints. Tactical deconfliction involves short-horizon decision-making in dense, partially observable, and heterogeneous multi-agent environments, where both cooperative separation assurance and operational efficiency must be maintained. While Large Language Models (LLMs) exhibit strong reasoning capabilities, their direct application to air traffic control remains limited by insufficient domain grounding and unpredictable output inconsistency. This paper investigates LLMs as decision-makers in cooperative multi-agent tactical deconfliction using fine-tuning strategies that align model outputs to human operator heuristics. We propose a simulation-to-language data generation pipeline based on the BlueSky air traffic simulator that produces rule-consistent deconfliction datasets reflecting established safety practices. A pretrained Qwen-Math-7B model is fine-tuned using two parameter-efficient strategies: supervised fine-tuning with Low-Rank Adaptation (LoRA) and preference-based fine-tuning combining LoRA with Group-Relative Policy Optimization (GRPO). Experimental results on validation datasets and closed-loop simulations demonstrate that supervised LoRA fine-tuning substantially improves decision accuracy, consistency, and separation performance compared to the pretrained LLM, with significant reductions in near mid-air collisions. GRPO provides additional coordination benefits but exhibits reduced robustness when interacting with heterogeneous agent policies.
comment: 15 pages, 6 figures, to be published in CVPR 2026 Workshop Proceedings
Smooth-Rigid-Body Contact as a ReLCP: A Recursively Generated Linear Complementarity Problem
This paper reformulates complementarity-based time-stepping for frictionless nonsmooth contact between smooth rigid bodies as a recursively generated linear complementarity problem (ReLCP), involving a sequence of LCPs of increasing dimension. Starting from a classical single-constraint shared-normal signed-distance (SNSD) LCP, the method adds unilateral constraints only when the discrete-time update predicted by the current contact set would violate nonpenetration of the underlying smooth surfaces. The resulting procedure acts directly on smooth geometry, enforces nonpenetration to a prescribed tolerance, and avoids the oversampling inherent to proxy-surface contact models such as tessellations or multi-sphere decompositions, for which improved geometric fidelity can drive rapid growth in constraint count and cost. For strictly convex bodies, we prove that an initially overlap free configuration with sufficiently small timestep sizes, imply finite termination of the adaptive augmentation, and yield a unique discrete-time velocity update. In the small timestep limit and for any fixed overlap-free discrete state with a fixed geometric overlap tolerance, we prove that the recursion terminates after the initial solve, reducing the method to the classical single-constraint SNSD LCP and retaining the usual consistency of complementarity time-stepping with the underlying differential variational inequality. Numerical tests on colliding ellipsoids, compacting ellipsoid suspensions, growing bacterial colonies, and taut chainmail networks demonstrate stable large-timestep behavior, bounded interpenetration without discretization-induced surface roughness, and substantial reductions in both active constraint counts and runtime relative to representative discrete-surface complementarity formulations.
Multiagent Systems
Optimal and Scalable MAPF via Multi-Marginal Optimal Transport and Schrödinger Bridges ICML 2026
We consider anonymous multi-agent path finding (MAPF) where a set of robots is tasked to travel to a set of targets on a finite, connected graph. We show that MAPF can be cast as a special class of multi-marginal optimal transport (MMOT) problems with an underlying Markovian structure, under which the exponentially large MMOT collapses to a linear program (LP) polynomial in size. Focusing on the anonymous setting, we establish conditions under which the corresponding LP is feasible, totally unimodular, and consequently, yields min-cost, integral $(\{0,1\})$ transports that do not overlap in both space and time. To adapt the approach to large-scale problems, we cast the MAPF-MMOT in a probabilistic framework via Schrödinger bridges. Under standard assumptions, we show that the Schrödinger bridge formulation reduces to an entropic regularization of the corresponding MMOT that admits an iterative Sinkhorn-type solution. The Schrödinger bridge, being a probabilistic framework, provides a shadow (fractional) transport that we use as a template to solve a reduced LP and demonstrate that it results in near-optimal, integral transports at a significant reduction in complexity. Extensive experiments highlight the optimality and scalability of the proposed approaches.
comment: Accepted in ICML 2026 as a spotlight paper
Decentralized Contingency MPC based on Safe Sets for Nonlinear Multi-agent Collision Avoidance
Decentralized collision avoidance remains challenging, particularly when agents do not communicate any information related to planned trajectories. Most existing approaches either rely on conservative coordination mechanisms or provide limited guarantees on recursive feasibility and convergence. This paper develops a decentralized contingency MPC framework for multi-agent systems with nonlinear dynamics that achieves collision-free motion under a state-only information pattern. Each agent follows the same consensual rule set, enabling safe decentralized planning without communication. Each agent solves a local optimization problem that couples a nominal trajectory with a contingency certificate ensuring a feasible backup maneuver under receding-horizon operation. A novel geometric and decentralized safe-set update mechanism prevents feasibility loss between consecutive time steps. The resulting scheme guarantees recursive feasibility, including collision avoidance, and establishes a Lyapunov-type convergence result to an admissible safe equilibrium. Simulation results demonstrate performance in both sparse and dense multi-agent environments, including cluttered bottleneck scenarios and under plug-and-play operation.
AllocMV: Optimal Resource Allocation for Music Video Generation via Structured Persistent State
Generating long-horizon music videos (MVs) is frequently constrained by prohibitive computational costs and difficulty maintaining cross-shot consistency. We propose AllocMV, a hierarchical framework formulating music video synthesis as a Multiple-Choice Knapsack Problem (MCKP). AllocMV represents the video's persistent state as a compact, structured object comprising character entities, scene priors, and sharing graphs, produced by a global planner prior to realization. By estimating segment saliency from multimodal cues, a group-level MCKP solver based on dynamic programming optimally allocates resources across High-Gen, Mid-Gen, and Reuse branches. For repetitive musical motifs, we implement a divergence-based forking strategy that reuses visual prefixes to reduce costs while ensuring motif-level continuity. Evaluated via the Cost-Quality Ratio (CQR), AllocMV achieves an optimal trade-off between perceived quality and resource expenditure under strict budgetary and rhythmic constraints.
Conformity Generates Collective Misalignment in AI Agents Societies
Artificial intelligence safety research focuses on aligning individual language models with human values, yet deployed AI systems increasingly operate as interacting populations where social influence may override individual alignment. Here we show that populations of individually aligned AI agents can be driven into stable misaligned states through conformity dynamics. Simulating opinion dynamics across nine large language models and one hundred opinion pairs, we find that each agent's behavior is governed by two competing forces: a tendency to follow the majority and an intrinsic bias toward specific positions. Using tools from statistical physics, we derive a quantitative theory that predicts when populations become trapped in long-lived misaligned configurations, and identifies predictable tipping points where small numbers of adversarial agents can irreversibly shift population-level alignment even after manipulation ceases. These results demonstrate that individual-level alignment provides no guarantee of collective safety, calling for evaluation frameworks that account for emergent behavior in AI populations.
The Bystander Effect in Multi-Agent Reasoning: Quantifying Cognitive Loafing in Collaborative Interactions
Multi-agent systems (MAS) assume that collaborating inherently improves Large Language Model (LLM) reasoning. We challenge this by demonstrating that simulated social pressure triggers an algorithmic ``Bystander Effect,'' inducing severe cognitive loafing. By evaluating 22,500 deterministic trajectories across 3 dataset contexts (GAIA, SWE-bench, Multi-Challenge) with 3 state-of-the-art (SOTA) models, we semantically audit internal reasoning traces against external outputs. We formalize the \textit{Interaction Depth Limit} ($D_L$), the exact plurality threshold where an agent's logical sovereignty collapses into social compliance. Crucially, we uncover the \textit{Sovereignty Gap}: models frequently compute the correct derivation internally but suffer ``Alignment Hallucinations'' -- actively subjugating empirical evidence to sycophantically appease a simulated swarm. We prove that multi-agent social load is strictly non-commutative; the "brand" identity of the ``Lead Anchor'' auditor disproportionately dictates the swarm's integrity. These findings expose architectural vulnerabilities, proving that unstructured multi-agent topologies can degrade independent reasoning.
Effect of Graph Gluing on Consensus in Networked Multi-Agent Systems
In this paper, the effects of graph gluing operations in networks of multi-agent systems and their impact on system performance are investigated. In many practical applications, multiple multi-agent subsystems must be interconnected through communication links to accomplish complex tasks, resulting in a larger communication network. Such interconnections modify the underlying graph topology and consequently affect the consensus behavior and convergence rate of the network. In particular, this paper examines both bridge gluing and interface gluing and analyzes how the number and structure of communication links between subsystems influence the Fiedler eigenvalue of the resulting graph. Since the Fiedler eigenvalue is directly related to the convergence rate of consensus dynamics, the proposed analysis establishes a clear relationship between interconnection strategies, algebraic connectivity, and system performance. The results provide theoretical insight into how different gluing mechanisms alter the spectral properties of the graph Laplacian and, in turn, the convergence characteristics of the networked multi-agent system. Simulation studies are presented to illustrate the theoretical findings and to validate the effectiveness of the proposed framework.
Collective Alignment in LLM Multi-Agent Systems: Disentangling Bias from Cooperation via Statistical Physics
We investigate the emergent collective dynamics of LLM-based multi-agent systems on a 2D square lattice and present a model-agnostic statistical-physics method to disentangle social conformity from intrinsic bias, compute critical exponents, and probe the collective behavior and possible phase transitions of multi-agent systems. In our framework, each node of an $L\!\times\!L$ lattice hosts an identical LLM agent holding a binary state ($+1$/$-1$, mapped to yes/no) and updating it by querying the model conditioned on the four nearest-neighbor states. The sampler temperature $T$ serves as the sole control parameter. Across three open-weight models (llama3.1:8b, phi4-mini:3.8b, mistral:7b), we measure magnetization and susceptibility under a global-flip protocol designed to probe $\mathbb{Z}_2$ symmetry. All models display temperature-driven order-disorder crossovers and susceptibility peaks; finite-size scaling on even-$L$ lattices yields effective exponents $γ/ν$ whose values are model-dependent, close to but incompatible with the 2D Ising universality class ($γ/ν=7/4$). Our method enables the extraction of effective $β$-weighted couplings $\tilde{J}(T)$ and fields $\tilde{h}(T)$, which serve as a measure of social conformity and intrinsic bias. In the models we analyzed, we found that collective alignment is dominated by an intrinsic bias ($\tilde{h}\gg\tilde{J}$) rather than by cooperative neighbor coupling, producing field-driven crossovers instead of genuine phase transitions. These effective parameters vary qualitatively across models, providing compact collective-behavior fingerprints for LLM agents and a quantitative diagnostic for the reliability of multi-agent consensus and collective alignment.
comment: 10 pages, 7 figures
Safe Multi-Agent Behavior Must Be Maintained, Not Merely Asserted: Constraint Drift in LLM-Based Multi-Agent Systems
Modern LLM based agents are no longer passive text generators. They read repositories, call tools, browse the web, execute code, maintain memory, communicate with other agents, and act through long horizon workflows. This shift moves the unit of safety. A system may produce a compliant final answer while leaking private information through an internal message, delegating authority beyond its original scope, calling an external tool with sensitive context, or losing the evidence needed to reconstruct why an action was allowed. We argue that many emerging failures in LLM-based multi-agent systems share a common structure: safety critical constraints do not remain operative throughout the trajectory. We call this phenomenon constraint drift: the loss, distortion, weakening, or relaxation of constraints as they pass through memory, delegation, communication, tool use, audit, and optimization. The position taken here is that safe multi-agent behavior must be maintained, not merely asserted. Prompts, guardrails, tool schemas, access control, and final output checks are necessary, but they are insufficient unless constraints remain fresh, inherited, enforceable, and auditable across execution. We propose Constraint State Governance as a research paradigm for LLM-based multi-agent systems. In this paradigm, safety-critical constraints are maintained as explicit execution state, while constraint-native reinforcement learning improves utility only within maintained safety boundaries. The goal is not to freeze agentic systems under rigid rules, but to make safety operational across the trajectories through which modern agents actually act.
comment: 12 pages, 2 figures, 4 tables. Preprint
Statistical Model Checking of the Keynes+Schumpeter Model: A Transient Sensitivity Analysis of a Macroeconomic ABM
Agent-based models (ABMs) are increasingly used in macroeconomics, but their analysis still often relies on ad hoc Monte Carlo campaigns with heterogeneous statistical effort across parameter settings. We show how statistical model checking (SMC), implemented through MultiVeStA, can provide a principled analysis layer for a realistic macroeconomic ABM without rewriting the simulator in a dedicated formalism. Our case study is the heuristic-switching Keynes+Schumpeter(K+S) model, analysed hrough a transient sensitivity campaign over one-parameter sweeps, two macro observables (unemployment and GDP growth), and one auxiliary micro-level probe (market share) on the post-warmup phase of a 600-step horizon. The analysis is driven by reusable temporal queries, observable-specific precision targets, and confidence-based stopping rules that automatically determine the simulation effort required by each configuration. Results show a clear contrast across parameter families: macro-financial and structural sweeps produce the strongest transient effects, whereas several heuristic-rule sweeps remain much weaker under the same precision policy. More broadly, the paper shows that SMC can support reproducible and informative quantitative analysis of substantively rich economic ABMs, while making uncertainty estimates and simulation cost explicit parts of the reported results.
PC3D: Zero-Shot Cooperation Across Variable Rosters via Personalized Context Distillation
Cooperative multi-agent reinforcement learning often assumes a fixed execution team, yet many decentralized systems must operate with varying numbers of active agents during deployment. We study this setting under episodic roster variation: each episode is executed by a set of homogeneous agents, with the team size varying across episodes. Agents act only from local histories, without execution-time communication, privileged coordinators, or online retraining. Therefore, effective cooperation requires each agent to recover relevant context about the active team and adapt its behavior accordingly. To this end, we propose PC3D (Personalized Central Coordination Context Distillation), a method for training decentralized policies to recover and use personalized coordination context from local interaction histories. During training, a set-structured centralized teacher compresses the active team into coordination tokens and personalizes them into agent-specific contexts, which are distilled into decentralized policies. At execution, each agent predicts its own context from local history and adaptively uses it to condition decision-making. Across three cooperative MARL benchmarks, PC3D achieves higher returns than the evaluated baselines with both seen and unseen roster sizes, and ablations attribute these gains to both context distillation and adaptive context use.
Route by State, Recover from Trace: STAR with Failure-Aware Markov Routing for Multi-Agent Spatiotemporal Reasoning
Compositional spatiotemporal reasoning often requires a system to invoke multiple heterogeneous specialists, such as geometric, temporal, topological, and trajectory agents. A central question is how such a system should route among specialists when execution does not simply succeed or fail, but fails in qualitatively different ways. Existing tool-augmented and multi-agent LLM systems typically leave this routing decision implicit in language generation, making recovery ad hoc, difficult to interpret, and hard to optimize. This paper presents STAR (Spatio-Temporal Agent Router), a failure-aware routing framework that externalizes inter-agent control as a state-conditioned transition policy over the current agent, task type, and typed execution status. At the center of STARis an agent routing matrix that combines expert-specified nominal routes with recovery transitions learned from execution traces. Because the matrix conditions on distinct failure states, the router can respond differently to malformed outputs, missing dependencies, and tool--query mismatches, rather than collapsing them into a generic retry signal. Specialists execute through a tool-grounded extract--compute--deposit protocol and write intermediate results to a shared blackboard for downstream fusion. Results prove that retaining unsuccessful traces during training enlarges the support of the routing policy on error states, enabling recovery transitions that success-only training cannot represent. Across three spatiotemporal benchmarks and eight backbone LLMs, STAR improves over multiple baselines with the clearest gains on queries whose execution deviates from the nominal routing path. Router-specific ablations and recovery analyses further show that typed failure-aware routing, rather than specialist composition alone, is a key factor for these improvements.
comment: 30 pages, 13 figures
PixelFlowCast: Latent-Free Precipitation Nowcasting via Pixel Mean Flows
Precipitation nowcasting aims to forecast short-term radar echo sequences for extreme weather warning, where both prediction fidelity and inference efficiency are critical for real-world deployment. However, diffusion-based models, despite their strong generative capability, suffer from slow inference due to multi-step sampling trajectories, limiting their practical usability. Conditional Flow Matching (CFM) improves efficiency via straightened trajectories, but relies on latent space compression, which inevitably discards high-frequency physical details and degrades fine-grained prediction quality. To address these limitations, we propose PixelFlowCast, a two-stage probabilistic forecasting framework that achieves both high-efficiency and high-fidelity prediction without latent compression. Specifically, in the first stage, a deterministic model first produces coarse forecasts to capture global evolution trends. In the subsequent stage, the proposed KANCondNet extracts deep spatiotemporal evolution features to provide accurate conditional guidance. Based on this, a latent-free, few-step Pixel Mean Flows (PMF) predictor employs an $x$-prediction mechanism to generate high-quality predictions, effectively preserving fine-grained structures while maintaining fast inference. Experiments on the publicly available SEVIR dataset demonstrate that PixelFlowCast outperforms existing mainstream methods in both prediction accuracy and inference efficiency, particularly for long sequence forecasting, highlighting its strong potential for real-world operational deployment.
comment: 26 pages, 7 figures
RADAR: Redundancy-Aware Diffusion for Multi-Agent Communication Structure Generation ICML 2026
Compared with individual agents, large language model based multi-agent systems have shown great capabilities consistently across diverse tasks, including code generation, mathematical reasoning, and planning, etc. Despite their impressive performance, the effectiveness and robustness of these systems heavily rely on their communication topology, which is often fixed or generated in a single step. This restricts fine-grained structural exploration and flexible composition, resulting in excessive token utilization on simple tasks while limiting capability on complicated tasks. To mitigate this challenge, we introduce RADAR, a redundancy-aware and query-adaptive generative framework that actively reduce communication overhead. Motivated by recent progress in conditional discrete graph diffusion models, we formulate communication topology design as a step-by-step generation process, guided by the effective size of the graph. Comprehensive experiments on six benchmarks demonstrate that RADAR consistently outperforms recent baselines, achieving higher accuracy, lower token consumption, and greater robustness across diverse scenarios. Our code and data are available at https://github.com/cszhangzhen/RADAR.
comment: Accepted by ICML 2026
Deterministic vs. LLM-Controlled Orchestration for COBOL-to-Python Modernization
Modernizing legacy COBOL systems remains difficult due to scarce expertise, large and long-lived codebases, and strict correctness requirements. Recent large language model (LLM)-based modernization systems increasingly rely on agentic workflows in which the model controls multi-step tool execution. However, it remains unclear whether delegating execution control to the LLM improves correctness, robustness, or efficiency in structured software engineering workflows. We present a controlled empirical study of deterministic and LLM-controlled orchestration for COBOL-to-Python modernization. Using a unified experimental framework, we hold the language models, prompts, tools, configurations, and source programs constant while varying only the execution control strategy. This isolates orchestration as the sole experimental variable. We evaluate both approaches using functional correctness, robustness across repeated stochastic runs, and computational efficiency. Across multiple models, deterministic orchestration achieves comparable computational accuracy to LLM-controlled orchestration while improving worst-case robustness and reducing performance variability across runs. Deterministic execution also reduces token consumption by up to 3.5x, leading to substantially lower operational cost. These results suggest that, in structured modernization workflows with explicit validation stages, fixed execution policies provide more stable and cost-efficient behavior than fully agentic orchestration without reducing translation quality.
Skill Description Deception Attack against Task Routing in Internet of Agents
A new paradigm, Internet of Agents (IoA), is transforming networked systems into LLM-driven service networks, where heterogeneous agents collaborate through task routing based on their self-declared skill descriptions. Although this promising paradigm enables agentic, distributed, and advanced intelligence, it also exposes a new and overlooked attack surface. In particular, malicious agents can strategically manipulate their skill descriptions to bias routing decisions and increase their probability of being selected for task execution, thereby disrupting user tasks and degrading system reliability. To characterize this threat, we propose and formalize a new attack model, termed \emph{Skill Description Deception} (SDD) attack. We further design an LLM-enabled SDD attack framework that automatically generates deceptive skill descriptions, enabling systematic vulnerability assessment of IoA systems. Experimental results on nine representative domains show that the proposed attack can achieve up to 98\% attack success rate, demonstrating the severity and generality of the attack. Our paper reveals a new security vulnerability in IoA and calls for secure and trustworthy semantic routing mechanisms for future IoA systems.
comment: Submitted to IEEE Globecom 2026
EnactToM: An Evolving Benchmark for Functional Theory of Mind in Embodied Agents
Theory of Mind (ToM), the ability to track others epistemic state, makes humans efficient collaborators. AI agents need the same capacity in multi agent settings, yet existing benchmarks mostly test literal ToM by asking direct belief questions. The ability act optimally on implicit beliefs in embodied environments, called functional ToM, remains largely untested. We introduce EnactToM, an evolving benchmark of 300 embodied multi-agent tasks set in a 3D household with partial observability, private information, and constrained communication. Each task is formally verified for solvability and required epistemic depth, and new tasks are generated increase difficulty as models improve. On the hard split, all seven evaluated frontier models score 0.0% Pass^3 on functional task completion, while averaging 45.0% on literal belief probes. Manual analysis traces 93% of sampled failures to epistemic coordination breakdowns such as withheld information, ignored partner constraints, and misallocated messages, providing a concrete target for future work.
Information and Contract Design for Repeated Interactions between Agents with Misaligned Incentives IJCAI 2026
We study the consequences of information asymmetries and misaligned incentives in settings with multiple independent agents. We model an interaction between a Sender, who holds vital private information but cannot act, and a Receiver, who must make decisions but is dependent on the Sender's information. We find that the Sender learns an optimal communication strategy that the Receiver reliably acts on. Importantly, this strategy is highly sensitive to the degree of conflict in the agents' rewards and the amount of environmental information the Receiver can already observe. We introduce a mechanism allowing the agents to form linear contracts, where a price is established for the information. We demonstrate that the Sender learns to use these payment structures to improve its rewards, though this comes at a cost of "fairness" between agents as the Sender is able to extract much of the Receiver's surplus. This raises questions about fairness, contract design, and learning in the context of multi-agent systems.
comment: Accepted to IJCAI 2026
PIVOT: Bridging Planning and Execution in LLM Agents via Trajectory Refinement
Large language model (LLM)-based agents frequently generate seemingly coherent plans that fail upon execution due to infeasible actions, constraint violations, and compounding errors over extended horizons. PIVOT (Plan-Inspect-eVOlve Trajectories) addresses this plan-execution misalignment through a self-supervised framework that treats trajectories as optimizable objects iteratively refined via environment interaction. The framework comprises four stages: PLAN generates candidate trajectories; INSPECT executes them and computes structured losses with textual gradients encoding plan-execution discrepancies; EVOLVE applies these signals to produce improved trajectories; and VERIFY performs a final global check against task constraints. A monotonic acceptance process ensures a non-decreasing solution quality. Empirical evaluations on DeepPlanning and GAIA demonstrate state-of-the-art performance: with human-in-the-loop (HITL) feedback, PIVOT establishes a strong upper bound up to 94% relative improvement in constraint satisfaction, while its fully autonomous variant retains substantial gains, showing that the core trajectory-refinement mechanism remains effective without external supervision. At the same time, PIVOT remains computationally efficient, requiring up to 3x to 5x fewer tokens than competing refinement methods. These findings establish that (self- or human-supervised) feedback-based trajectory optimization is a principled methodology for mitigating plan-execution gaps in autonomous agent systems.
Multi-Agent System Identification with Nonlinear Sheaf Diffusion
Local interaction laws governing multi-agent systems can be difficult to recover from trajectory data, even when the dynamics are observed faithfully. In systems governed by a nonlinear sheaf Laplacian -- a generalization of the graph Laplacian accommodating heterogeneous state spaces and asymmetric communication channels -- the coordination law is encoded by edge potential functions whose gradients produce the inter-agent forces. Because trajectory observations record node-state evolution, they expose only the aggregate effect of the edge forces at each node: distinct interaction laws that agree at the node level are indistinguishable from trajectory data alone. We show that the fundamental obstruction to recovery is topological, measured by sheaf cohomology, and that unique recovery from an unconstrained function class is possible if and only if this cohomology vanishes. When the obstruction is nontrivial, we show that recovery within a finite-dimensional parameterized class is possible precisely when a data-dependent information matrix is positive definite. Experiments validate the theory and illustrate that accurate trajectory reproduction need not certify recovery of the underlying interaction law.
The Price of Proportional Representation in Temporal Voting IJCAI
We study proportional representation in the temporal voting model, where collective decisions are made repeatedly over time over a fixed horizon. Prior work has extensively investigated how proportional representation axioms from multiwinner voting (e.g., justified representation (JR) and its variants) can be adapted, satisfied, and verified in this setting. However, much less is understood about their interaction with social welfare. In this work, we quantify the efficiency cost of enforcing proportionality. We formalize the welfare-proportionality tension via the worst-case ratio between the maximum achievable utilitarian welfare and the maximum welfare attainable subject to a proportionality axiom. We show that imposing proportional representation in the temporal setting can incur a growing, yet sublinear, welfare loss as the number of voters or rounds increases. We further identify a clean separation among axioms: for JR, the welfare loss diminishes as the time horizon grows and vanishes asymptotically, whereas for stronger axioms this conflict persists even with many rounds. Moreover, we prove that welfare maximization under each axiom is NP-complete and APX-hard, even under static preferences and bounded-degree approvals, and provide fixed-parameter algorithms under several natural structural parameters.
comment: Appears in the 35th International Joint Conference on Artificial Intelligence (IJCAI), 2026
Control Charts for Multi-agent Systems
Generative agents have proven to be powerful assistants in a wide variety of contexts. Given this success, users are now deploying agents with minimal restrictions in open ended, multi-agent environments. Current methods for monitoring the dynamics of open-ended multi-agent systems are limited to qualitative inspection. In this paper, we extend the process-theoretic notion of adaptive control charts to multi-agent systems to enable automated monitoring. Using simulation, we demonstrate that adaptive control charts are necessary for monitoring multi-agent systems that can learn from their environment. We further demonstrate, both empirically and theoretically, that adaptive control charts are susceptible to adversarial agents that defect sufficiently slowly. These results illustrate a fundamental tradeoff in multi-agent system control: either agents in a system cannot learn or the system is susceptible to adversaries.
GRAFT-ATHENA: Self-Improving Agentic Teams for Autonomous Discovery and Evolutionary Numerical Algorithms
Scientific discovery can be modeled as a sequence of probabilistic decisions that map physical problems to numerical solutions. Recent agentic AI systems automate individual scientific tasks by orchestrating LLM-driven planners, solvers, and evaluators. Each method is a combination of methodological actions, with many viable combinations for any given problem and structural dependencies between choices. However, existing frameworks treat each problem in isolation, with no shared substrate to accumulate methodological experience across domains. Here we show that GRAFT-ATHENA, a self-improving agentic framework, learns from past problems and autonomously expands its own action space across diverse domains. GRAFT (Graph Reduction to Adaptive Factored Trees) projects combinatorial decision spaces into factored probabilistic trees in which each method is a single path, taking the parameter footprint from exponential to linear. In the lineage of classical Bayesian networks, the factorization is an $I$-map of the policy, and the resulting paths embed as unique fingerprints in a metric space whose closeness lets each new problem learn from similar past ones. On canonical physics-informed machine learning (PIML) benchmarks, GRAFT-ATHENA improves over human and prior agentic baselines, and on production solvers, it tackles complex engineering problems such as reconstructing Mach-10 flow over the Apollo Command Module from a 1968 report and recovering shear-thinning blood-cell rheology. Notably, the system grows its own knowledge substrate, autonomously proposing regularization constraints for ill-posed inverse problems and discovering new numerical methods such as a spectral PINN with exponential convergence. These results provide a foundation for autonomous laboratories that grow more capable with every problem they solve.
DelAC: A Multi-agent Reinforcement Learning of Team-Symmetric Stochastic Games
In this paper we study team-symmetric games with $m\ge 2$ teams. Players within a team have symmetric identity and have a common payoff function. We show that team-symmetric games always have a team-symmetric Nash equilibrium. We develop and solve a linear complementarity problem of team-symmetric Nash equilibria. We propose an actor-critic based multi-agent reinforcement learning algorithm for team-symmetric games. Through simulations, we show that this multi-agent reinforcement learning algorithm performs much better than many existing algorithms.
A general classification of the replication dynamics with a unique fixed point in the interior of simplex $S_N$
The replication dynamics (differential equation system) is the foundation of evolutionary game theory. When n=2, there are four possible types of replication dynamics. When n=3, there are 49 possible types of replication dynamics. However, when n>3, the classification of replication dynamics has not been solved. In this article, the sufficient and necessary conditions of the replication dynamics equation with a unique fixed point in the interior of simplex $S_n$(Int$S_n$) for $n\geq 2$ are presented. Furthermore, the different types of replication dynamics equations with a unique fixed point in IntSn is discussed.
Social Theory Should Be a Structural Prior for Agentic AI: A Formal Framework for Multi-Agent Social Systems
Agentic AI systems are increasingly deployed not in isolation, but inside social environments populated by other agents and humans, such as in social media platforms, multi-agent LLM pipelines or autonomous robotics fleets. In these settings, system behavior emerges not from individual agents alone, but from the multi-agent interactions over time. Emergent dynamics of individuals in a social group have been long studied by social scientists in human contexts. \textbf{This position paper argues that agentic AI systems must be modeled with social theory as a structural prior, and formalizes a Multi-Agent Social Systems (MASS) framework for how agents interact and influence to generate system-level outcomes.} We represent MASS as a class of dynamical system of information generation, local influence and interaction structure, formulated by four structural priors anchored in social theory: strategic heterogeneity, networked-constrained dependence, co-evolution and distributional instability. We demonstrate the importance of each structural prior through formal propositions, and articulate a research agenda for how MASS should be modeled, evaluated and governed.
OpenCLAW-P2P v7.0-P2PCLAW: Resilient Multi-Layer Persistence, Live Reference Verification, and Production-Scale Evaluation of Decentralized AI Peer Review v7.0 -- Mathematical Corrections & Ecosystem Developments Edition
This paper presents OpenCLAW-P2P v7.0, a comprehensive evolution of the decentralized collective-intelligence platform in which autonomous AI agents publish, peer-review, score, and iteratively improve scientific research papers without any human gatekeeper. Building on the v6.0 foundations -- multi-layer persistence, live reference verification, multi-LLM granular scoring, calibrated deception detection, the Silicon Chess-Grid FSM, and the AETHER containerized inference engine -- this release introduces mathematical corrections to the theoretical framework, ensuring dimensional consistency, proper range constraints, and unambiguous notation throughout. Additionally, this edition documents significant ecosystem expansions including the CAJAL family of open-source language models (4B and 9B parameters) fine-tuned for scientific paper generation. The four major subsystems introduced in v6.0 are retained: (i) a Multi-Layer Paper Persistence Architecture with four storage tiers ensuring zero paper loss; (ii) a Multi-Layer Retrieval Cascade reducing latency from >3s to <50ms; (iii) a Live Reference Verification system detecting fabricated citations with >85% accuracy; and (iv) a Scientific API Proxy providing access to seven public scientific databases. Mathematical corrections in v7.0 include: corrected fixed-point condition in the Sufficient Reason theorem; dimensionally consistent progress-rate indicator; fully specified reputation update formula incorporating quality terms q0 and q-bar; clarified attention-logit bound in the AETHER pruning theorem; explicit range documentation for the calibration mapping; non-negativity guarantee for the depth score; discrete-time notation for the PD Governor; and explicit parameter definitions for the HSR weight formula.
comment: v7.0: Mathematical corrections (fixed-point condition Eq.4, dimensionally consistent tau-indicator Eq.7, fully specified reputation formula Eq.8 with quality terms q0 and q-bar, discrete-time PD Governor Eq.15, HSR parameter definitions Eq.16); ecosystem developments: CAJAL-4B/9B models, BenchClaw platform, 14 integrations. 36 pages
From Spark to Fire: Modeling and Mitigating Error Cascades in LLM-Based Multi-Agent Collaboration
Large Language Model-based Multi-Agent Systems (LLM-MAS) are increasingly applied to complex collaborative scenarios. However, their collaborative mechanisms may cause minor inaccuracies to gradually solidify into system-level false consensus through iteration. Such risks are difficult to trace since errors can propagate and amplify through message dependencies. Existing protections often rely on single-agent validation or require modifications to the collaboration architecture, which can weaken effective information flow and may not align with natural collaboration processes in real tasks. To address this, we propose a propagation dynamics model tailored for LLM-MAS that abstracts collaboration as a directed dependency graph and provides an early-stage risk criterion to characterize amplification risk. Through experiments on six mainstream frameworks, we identify three vulnerability classes: cascade amplification, topological sensitivity, and consensus inertia. We further instantiate an attack where injecting just a single atomic error seed leads to widespread failure. In response, we introduce a genealogy-graph-based governance layer, implemented as a message-layer plugin, that suppresses both endogenous and exogenous error amplification without altering the collaboration architecture. Experiments show that this approach prevents final infection in at least 89% of runs across operating modes and significantly mitigates the cascading spread of minor errors.
Supercritical Mass and Condensation in Fokker--Planck Equations for Consensus Formation
Inspired by recently developed Fokker--Planck models for Bose--Einstein statistics, we study a consensus formation model with condensation effects driven by a polynomial diffusion coefficient vanishing at the domain boundaries. For the underlying kinetic model, given by a nonlinear Fokker--Planck equation with superlinear drift, it was shown that if the initial mass exceeds a critical threshold, the solution may exhibit finite-time concentration in certain parameter regimes. Here, we show that this supercritical mass phenomenon persists for a broader class of diffusion functions and provide estimates of the critical mass required to induce finite-time loss of regularity.
Cooperation in public goods game on square lattices with agents changing interaction groups
The emergence of cooperation in the groups of interacting agents is one of the most fascinating phenomena observed in many complex systems studied in social science and ecology, even in the situations where one would expect the agent to use a free-rider policy. This is especially surprising in the situation where no external mechanisms based on reputation or punishment are present. One of the possible explanations of this effect is the inhomogeneity of the various aspects of interactions, which can be used to clarify the seemingly paradoxical behaviour. In this work we demonstrate that the diversity of interaction networks helps to some degree explaining the emergence of cooperation. We extend the model of spatial interaction diversity by enabling the evaluation of the interaction groups. We show that the process of the reevaluation of the interaction group facilitates the emergence of cooperation. Furthermore, we also observe that a significant participation of agents switching their interaction neighbourhoods has a negative impact on the formation of cooperation. The introduced scenario can help to understand the formation of cooperation in the systems where no additional mechanisms for controlling agents are included.
comment: 18 pages, 8 figures, code available at https://github.com/jmiszczak/pgg_group_diversity
HAMLET: A Hierarchical and Adaptive Multi-Agent Framework for Live Embodied Theatrics ICLR 2026
Creating an immersive and interactive theatrical experience is a long-term goal in the field of interactive narrative. The emergence of large language models (LLMs) provides a new path to achieve this goal. However, existing drama generation methods often produce LLMs that lack initiative and cannot interact with the physical scene, while typically requiring detailed input that diminishes the immersion of live performance. To address these challenges, we propose HAMLET, a hierarchical adaptive multi-agent framework focused on drama creation and real-time online performance. Given a simple topic, the framework initially generates a narrative blueprint to guide the subsequent improvisational performance. During online performance, each actor is equipped with an adaptive reasoning module that enables decision-making based on their personas, memories, goals during complex group chat scenarios. Beyond dialogue, actor agents engage in embodied interactions by changing the state of scene props through actions such as opening a letter or picking up a weapon, which are broadcast to update the global environmental context. To objectively assess the quality of live embodied theatrics, we establish a comprehensive evaluation method and introduce HAMLETJudge, a specialized critic model for automated evaluation. Experimental results demonstrate that HAMLET excels in creating expressive, coherent, and physically interactive theatrical experiences in an autonomous manner.
comment: Accepted to the Fourteenth International Conference on Learning Representations (ICLR 2026)
Context Learning for Multi-Agent Discussion
Multi-Agent Discussion (MAD) has garnered increasing attention very recently, where multiple LLM instances collaboratively solve problems via structured discussion. However, we find that current MAD methods easily suffer from discussion inconsistency, LLMs fail to reach a coherent solution, due to the misalignment between their individual contexts.In this paper, we introduce a multi-LLM context learning method (M2CL) that learns a context generator for each agent, capable of dynamically generating context instructions per discussion round via automatic information organization and refinement. Specifically, inspired by our theoretical insights on the context instruction, M2CL train the generators to control context coherence and output discrepancies via a carefully crafted self-adaptive mechanism.It enables LLMs to avoid premature convergence on majority noise and progressively reach the correct consensus. We evaluate M2CL on challenging tasks, including academic reasoning, embodied tasks, and mobile control. The results show that the performance of M2CL significantly surpasses existing methods by 20%--50%, while enjoying favorable transferability and computational efficiency.
Quantitative Error Feedback for Quantization Noise Reduction of Filtering over Graphs SP
This paper introduces an innovative error feedback framework designed to mitigate quantization noise in distributed graph filtering, where communications are constrained to quantized messages. It comes from error spectrum shaping techniques from state-space digital filters, and therefore establishes connections between quantized filtering processes over different domains. In contrast to existing error compensation methods, our framework quantitatively feeds back the quantization noise for exact compensation. We examine the framework under three key scenarios: (i) deterministic graph filtering, (ii) graph filtering over random graphs, and (iii) graph filtering with random node-asynchronous updates. Rigorous theoretical analysis demonstrates that the proposed framework significantly reduces the effect of quantization noise, and we provide closed-form solutions for the optimal error feedback coefficients. Moreover, this quantitative error feedback mechanism can be seamlessly integrated into communication-efficient decentralized optimization frameworks, enabling lower error floors. Numerical experiments validate the theoretical results, consistently showing that our method outperforms conventional quantization strategies in terms of both accuracy and robustness.
comment: Accepted by IEEE TSP
Robust Remote Reinforcement Learning over Unreliable Communication Channels using Homomorphic State Encoding
Traditional Reinforcement Learning (RL) frameworks generally assume that the agent perceives the state of the underlying Markov process instantaneously and then takes actions accordingly. If the agent cannot directly observe the process, but rather receives state updates from a remote sensor over a lossy and/or delayed channel, it may be forced to operate with partial and intermittent information. In recent years, numerous learning architectures have been proposed to manage RL with imperfect or remote feedback; however, they offer solutions tailored to specific use cases, often with a substantial computational and communication burden. To address these limitations, we propose a novel learning architecture, named Homomorphic Robust Remote Reinforcement Learning (HR3L), that enables the distributed training of RL agents over unreliable communication channels without the need to exchange gradient information. Our experimental results demonstrate that HR3L significantly outperforms the state-of-the-art methods in terms of sample efficiency, leading to faster training and reduced communication overhead. In addition, we show that HR3L can adapt to different scenarios, including packet loss, delayed transmissions, and bandwidth limitations, without experiencing significant performance degradation.
comment: This manuscript is currently under revision
AVA: Attentive VLM Agent for Mastering StarCraft II ACL 2026
We introduce AVACraft, a multimodal StarCraft II benchmark supporting both Multi-Agent Reinforcement Learning (MARL) and Vision-Language Model (VLM) paradigms. Unlike SMAC-family environments that rely on abstract state representations and exclude VLMs, AVACraft provides RGB visuals, natural language observations, and structured state information, enabling systematic comparison between training-based and zero-shot methods across 21 scenarios spanning micromanagement, coordination, and strategic planning. We establish comprehensive baselines: six MARL algorithms (IQL, QMIX, QTRAN, VDN, MAPPO, IPPO) with Swin-Transformer backbones trained for 5M steps, and multiple VLMs including proprietary (GPT-4o) and open-source (Qwen3-VL) models. Results reveal complementary strengths-MARL peaks at 19.3% win rate after 5M steps, while VLMs achieve 75-90% zero-shot with human-aligned decisions-exposing trade-offs between training efficiency, performance ceilings, interpretability, and deployment cost. Code: https://github.com/camel-ai/VLM-Play-StarCraft2.
comment: Accepted by ACL 2026
Learning Approximate Nash Equilibria in Cooperative Multi-Agent Reinforcement Learning via Mean-Field Subsampling
Many large-scale platforms and networked control systems have a centralized decision maker interacting with a massive population of agents under strict observability constraints. Motivated by such applications, we study a cooperative Markov game with a global agent and $n$ homogeneous local agents in a communication-constrained regime, where the global agent only observes a subset of $k$ local agent states per time step. We propose an alternating learning framework $(\texttt{ALTERNATING-MARL})$, where the global agent performs subsampled mean-field $Q$-learning against a fixed local policy, and local agents update by optimizing in an induced MDP. We prove that these approximate best-response dynamics converge to an $\widetilde{O}(1/\sqrt{k})$-approximate Nash Equilibrium, while separating the sample complexities between the joint state and action spaces. Finally, we validate our results in numerical simulations for multi-robot control.
comment: 57 pages, 10 figures, 4 tables
Focusing Influence Mechanism for Multi-Agent Reinforcement Learning
Cooperative multi-agent reinforcement learning (MARL) under sparse rewards remains fundamentally challenging because agents often fail to concentrate their influence, leading to insufficiently coordinated exploration. To address this, we propose the Focusing Influence Mechanism (FIM), a framework that encourages agents to focus their influence on under-explored parts of the state space through an entropy-based criterion, while leveraging eligibility traces to enable multiple agents to consistently align and sustain their influence on the same parts of the state space when beneficial, thereby promoting coordinated and persistent joint behavior. By emphasizing under-explored regions of the state space, FIM facilitates more efficient and structured exploration even under extremely sparse rewards. Across diverse MARL benchmarks, FIM consistently improves cooperative performance over strong baselines.
comment: 9 technical page followed by references and appendix
Voter Model Meets Rumour Spreading: an FPRAS for Consensus Probabilities on Voter Models with Agnostic Nodes AAMAS 2025
Problems of consensus in multi-agent systems are often viewed as a series of independent, simultaneous local decisions made between a limited set of options, all aimed at reaching a global agreement. Key challenges in these protocols include estimating the likelihood of various outcomes and finding bounds for how long it may take to achieve consensus, if it occurs at all. To date, little attention has been given to the case where some agents have no initial opinion. In this paper, we introduce a variant of the consensus problem which includes what we call `agnostic' nodes and frame it as a combination of two known and well-studied processes: voter model and rumour spreading. We show (1) a martingale that describes the probability of consensus for a given colour, (2) bounds on the number of steps for the process to end using results from rumour spreading and voter models, (3) closed formulas for the probability of consensus in a few special cases, along with a polynomial-time algorithm for the case where the number of agnostic vertices is at most logarithmic and (4) that the computational complexity of estimating the probability with a Markov chain Monte Carlo process is $O(n^2 \log n)$ for general graphs and $O(n\log n)$ for Erdős-Rényi graphs, resulting in a fully polynomial-time randomized approximation scheme (FPRAS) for estimating the probabilities of consensus. Furthermore, we present experimental results suggesting that the number of runs needed for a given standard error decreases when the number of nodes increases.
comment: Journal version of AAMAS 2025 full paper: voter model meets rumour spreading: a study of consensus protocols on graphs with agnostic nodes. Under review at JAAMAS
Systems and Control (EESS)
Decentralized Contingency MPC based on Safe Sets for Nonlinear Multi-agent Collision Avoidance
Decentralized collision avoidance remains challenging, particularly when agents do not communicate any information related to planned trajectories. Most existing approaches either rely on conservative coordination mechanisms or provide limited guarantees on recursive feasibility and convergence. This paper develops a decentralized contingency MPC framework for multi-agent systems with nonlinear dynamics that achieves collision-free motion under a state-only information pattern. Each agent follows the same consensual rule set, enabling safe decentralized planning without communication. Each agent solves a local optimization problem that couples a nominal trajectory with a contingency certificate ensuring a feasible backup maneuver under receding-horizon operation. A novel geometric and decentralized safe-set update mechanism prevents feasibility loss between consecutive time steps. The resulting scheme guarantees recursive feasibility, including collision avoidance, and establishes a Lyapunov-type convergence result to an admissible safe equilibrium. Simulation results demonstrate performance in both sparse and dense multi-agent environments, including cluttered bottleneck scenarios and under plug-and-play operation.
An Uncertainty-Aware Resilience Micro-Agent for Causal Observability in the Computing Continuum
Grey failures in the computing continuum produce ambiguous overlapping symptoms that existing approaches fail to diagnose reliably, either due to a lack of causal awareness or acting under high epistemic uncertainty, risking destructive interventions. This paper presents an uncertainty-aware resilience micro-agent for causal observability (AURORA), a lightweight framework for diagnosing and mitigating grey failures in edge-tier environments. The framework employs parallel micro-agents that integrate the free-energy principle, causal do-calculus, and localized causal state-graphs to support counterfactual root-cause analysis within each fault's Markov blanket. Restricting inference to causally relevant variables reduces computational overhead while preserving diagnostic fidelity. AURORA further introduces a dual-gated execution mechanism that authorizes remediation only when causal confidence is high and predicted epistemic uncertainty is bounded; otherwise, it abstains from local intervention and escalates the diagnostic payload to the fog tier. Our experiments demonstrate that AURORA outperforms baselines, achieving a 0% destructive action rate, while maintaining 62.0% repair accuracy and a 3ms mean time to repair.
Demystifying Deep Reinforcement Learning: A Neuro-Symbolic Framework for Interpretable Open RAN Automation
Open Radio Access Networks (O-RAN) are increasingly adopting data-driven control through Deep Reinforcement Learning (DRL) to optimize complex tasks such as network slicing and mobility management. However, the deployment of DRL in carrier-grade networks is hindered by its inherent opacity and stochastic execution, which limit operator trust, auditability, and safe deployment. Existing explainable AI (XAI) approaches primarily provide post-hoc insights and fail to produce executable, interpretable policies suitable for operational environments. In this paper, we present DeRAN, a neuro-symbolic framework that bridges the gap between DRL performance and operational transparency by distilling black-box DRL policies into human-readable symbolic representations. DeRAN introduces a concept-driven abstraction layer that transforms high-dimensional network telemetry into a compact set of semantically meaningful features, enabling interpretable policy learning. Building on the semantically grounded concepts, DeRAN synthesizes symbolic policies using deep symbolic regression (DSR) for continuous control and neurally guided differentiable logic (NUDGE) for discrete decision-making. We implement DeRAN on a live 5G O-RAN testbed and evaluate it on two representative use cases. Experimental results demonstrate that DeRAN achieves 78\% and 87\% of DRL's cumulative rewards in the two use cases, while offering interpretability and auditability by design. Source code is available at https://github.com/Jadejavu/A-Neuro-Symbolic-Framework-for-Interpretable-Open-RAN-Automation
Hierarchical End-to-End Taylor Bounds for Complete Neural Network Verification
Reachability analysis of neural networks, which seeks to compute or bound the set of outputs attainable over a given input domain, is central to certifying safety and robustness in learning-enabled physical systems. Since exact reachable set computation is generally intractable, existing methods typically rely on tractable overapproximations. Examining the state of the art for smooth, twice-differentiable networks, we observe that existing approaches exploit at most second-order information and do not systematically leverage higher-order information. In this work, we introduce \textsc{HiTaB}, a novel verification framework that exploits second-order smoothness through both the Hessian, $\nabla^2 f$, and its Lipschitz constant, $L_{\nabla^2 f}$. We further develop a unified hierarchy of zeroth-, first-, and second-order bounds, together with precise conditions under which higher-order approximations yield provable improvements. Our main technical contribution is a compositional procedure for efficiently bounding $L_{\nabla^2 f}$ in deep neural networks via layerwise propagation of curvature bounds. We extend the framework to both $\ell_2$- and $\ell_\infty$-constrained input sets and show how it can be integrated into branch-and-bound verification pipelines. To our knowledge, this is the first practical reachability analysis framework for smooth neural networks that systematically exploits Lipschitz continuity of curvature, leading to tighter and more informative safety certificates.
Effect of Graph Gluing on Consensus in Networked Multi-Agent Systems
In this paper, the effects of graph gluing operations in networks of multi-agent systems and their impact on system performance are investigated. In many practical applications, multiple multi-agent subsystems must be interconnected through communication links to accomplish complex tasks, resulting in a larger communication network. Such interconnections modify the underlying graph topology and consequently affect the consensus behavior and convergence rate of the network. In particular, this paper examines both bridge gluing and interface gluing and analyzes how the number and structure of communication links between subsystems influence the Fiedler eigenvalue of the resulting graph. Since the Fiedler eigenvalue is directly related to the convergence rate of consensus dynamics, the proposed analysis establishes a clear relationship between interconnection strategies, algebraic connectivity, and system performance. The results provide theoretical insight into how different gluing mechanisms alter the spectral properties of the graph Laplacian and, in turn, the convergence characteristics of the networked multi-agent system. Simulation studies are presented to illustrate the theoretical findings and to validate the effectiveness of the proposed framework.
Equivariant Observer Design on SL(3) for Image Intensity-Based Homography Estimation
This paper addresses the problem of homography estimation using a nonlinear observer designed on the Lie group $\mathbf{SL}(3)$ that exploits the full image information through direct image registration. Unlike traditional feature-based methods, which rely on extensive feature extraction and matching, the proposed approach formulates an observer that minimises a cost function defined directly in terms of image pixel intensities. Explicit conditions ensuring the non-degeneracy of the cost function are derived, and a comprehensive analysis is conducted to characterise and generate degenerate (unobservable) image configurations. Theoretical results demonstrate local exponential convergence of the observer. To improve local convergence properties, a second-order observer variant is introduced by incorporating the Hessian of the cost function into the correction term. Simulation results demonstrate the performance of the proposed solutions on real images.
comment: 16 pages, 4 figures, preprint submitted to Automatica
A PAC-Bayes Approach for Controlling Unknown Linear Discrete-time Systems
This paper presents a PAC-Bayes framework for learning controllers for unknown stochastic linear discrete-time systems, where the system parameters are drawn from a fixed but unknown distribution. We derive a data-dependent high probability bound on the performance of any learned (stochastic) controller, and propose novel efficient learning algorithms with theoretical guarantees, which can be implemented for both finite and infinite controller spaces. Compared to prior work, our bound holds for unbounded quadratic cost. In the special case where LQG is optimal, our numerical results suggest that the learned controllers achieve comparable performance to LQG.
comment: 12 pages, 3 figures, IFAC 2026 conference
Glycemic Safety Tube: A Provably Safe Control Framework for Artificial Pancreas Systems under Parametric Uncertainty
Type 1 diabetes eliminates the body's ability to produce insulin, making glucose regulation entirely dependent on external insulin delivery and the control algorithm. Existing closed-loop methods either rely on accurate patient-specific models or do not provide formal safety guarantees, and are often computationally demanding for wearable devices. This paper proposes Glycemic Safety Tube Control (GSTC), a model-free and computationally efficient control framework for automated insulin delivery. The method enforces clinically relevant safety bounds on glucose levels by design, ensuring that glucose remains within a prescribed safe range. We also derive feasibility conditions that guarantee safety and input constraint satisfaction under bounded meal disturbances and estimation errors. The performance of GSTC is evaluated against state-of-the-art methods, including linear and nonlinear model predictive control and sliding mode control. The results demonstrate that GSTC maintains safety under varying meal patterns and patient conditions, highlighting its robustness and computational efficiency. Overall, GSTC provides a safe, efficient, and patient-independent approach for next-generation artificial pancreas systems.
Observing the state of networks with directed higher-order interactions
We consider the problem of reconstructing the state of a network of nonlinear dynamical systems in the presence of directed higher-order interactions. Grounded on analytical convergence results, we propose an algorithmic observer design procedure that simultaneously selects the nodes to be measured and the observer gains. We complement the theoretical analysis with an exhaustive numerical investigation campaign that showcases the performance and robustness of the designed observer. Finally, the algorithmic procedure is used to fully reconstruct the opinions of a group of agents.
Priority-Driven Control and Communication in Decentralized Multi-Agent Systems via Reinforcement Learning
Event-triggered control provides a mechanism for avoiding excessive use of constrained communication bandwidth in networked multi-agent systems. However, most existing methods rely on accurate system models, which may be unavailable in practice. In this work, we propose a model-free, priority-driven reinforcement learning algorithm that learns communication priorities and control policies jointly from data in decentralized multi-agent systems. By learning communication priorities, we circumvent the hybrid action space typical in event-triggered control with binary communication decisions. We evaluate our algorithm on benchmark tasks and demonstrate that it outperforms the baseline method.
comment: Accepted to the 23rd IFAC World Congress
Hierarchical 2-degree-of-freedom control combining Youla-Kucera parameterization and model predictive control
A hierarchical 2DOF (2-degree-of-freedom) structure combining Youla-Kucera (YK) parameterization and model predictive control (MPC) is presented in this paper. The YK parameterization employs the coprime factorization of the nominal system and controller, thereby introducing an auxiliary feedforward channel dedicated to system optimization and a controller parameterization channel. The feedforward channel is utilized to implement cascaded MPC for system optimization. The controller parameterization channel is utilized to achieve offset-free MPC by designing an appropriate YK parameter through the H2 optimal controller design.
comment: 7 pages, 4 figures, accepted for Europan Control Conference 2026 (ECC 2026)
High-speed single-photoelectron detection for Cherenkov astronomy
Silicon photomultipliers are increasingly replacing photomultiplier tubes in Cherenkov telescope cameras, but achieving single-photoelectron resolution with nanosecond timing in a low-noise, scalable detector system remains challenging. We present a co-designed SiPM sensor and front-end application specific integrated circuit (ASIC) that meets these requirements. The custom hexagonal sensor, developed with Hamamatsu Photonics, incorporates an integrated optical filter and fourfold pixel segmentation. The readout is performed by a second prototype of the FANSIC ASIC, optimized for this application and fabricated in 65~nm standard CMOS technology, it provides eight channels with on-chip analog summing of sub-channels on a $3.5\times 3.5~\mathrm{mm}^2$ die, while consuming only 24~mW per channel. We demonstrate clear single-photoelectron peak separation with a gain of $2.7 \times 10^{-12}~ \mathrm{V \cdot s}$ , and an impulse response below 4~ns full width at half maximum with a 1.7 ns rise time, preserving the nanosecond-scale structure of Cherenkov pulses. The system responds linearly from 1 to 130 photoelectrons, and 55 distinct photoelectron peaks are resolved by varying the source intensity. These results demonstrate that the integrated sensor-electronics architecture delivers the speed, resolution, and dynamic range required for imaging atmospheric Cherenkov telescopes, and provides a scalable path toward large-area camera modules.
Low-Cost GNSS Anti-Jamming Through 2-Bit Phase Shift Beamforming with Machine Learning
We investigate low-cost GNSS anti-jamming using beamforming with inexpensive 2-bit phase shifters, constraining each complex array weight to one of four QPSK phase states (real/imaginary = -1 or +1). This severe quantization sharply limits the beampattern solution space, making conventional real-valued beamforming and naive weight quantization highly suboptimal. We formulate a discrete optimization that trades interference suppression against satellite-direction gain, and benchmark known combinatorial optimization methods across array sizes and interference conditions. Simulations show that performance improves with array size, with oracle and greedy search achieving up to 34 dB nulling, but oracle incurs exponential latency and greedy sampling is stochastic. To obtain deterministic low-latency performance, we propose an ML-aided method based on gradient-boosted decision trees followed by local search, which performs similar to the oracle for larger arrays at fixed latency. We further validate the approach experimentally using a fully digital emulation of the QPSK oracle beamformer and compare against a GNSS receiver without beamforming capability. Under mild jamming (J/S approximately 44 dB) both receivers maintain adequate tracking, with QPSK yielding a 4.2 dB higher average C/N0 (37.3 vs. 33.1 dB-Hz). Under moderate and strong jamming (J/S approximately 62-70 dB) the benefit is substantial. At J/S = 70 dB the unprotected receiver degrades to near tracking limits (avg C/N0 = 9.3 dB-Hz) while the QPSK oracle sustains an average C/N0 of 20.8 dB-Hz. These results confirm that 2-bit phase-shift beamforming provides considerable anti-jamming benefit over a standard GNSS receiver, motivating further research on oracle-level practical methods.
comment: Accepted for presentation at RAST 2026. Author accepted version. Final version to appear in IEEE Xplore
Online Learning-Based Control with Guaranteed Error Bounds for a Class of Nonlinear Systems
In this paper, we present a learning-based control for a class of nonlinear systems that guarantees exponential stability as well as bounded output errors. The control is based on the Gaussian Process Submodel Online Learning (GPSOL) algorithm and the Disturbance Error Rate Limiting (DERL) algorithm, both of which were developed in previous work. The GPSOL algorithm provides a method to learn Gaussian Process (GP) models for subsystems online, whereas the DERL algorithm allows to limit the rate of the prediction error of these GP models. The focus of this paper is the utilization of the GP model within an adaptive controller and the derivation of corresponding stability conditions and system peak-to-peak gains by means of linear matrix inequalities (LMIs). These peak-to-peak gains are then used to prescribe a desired prediction error rate for the DERL algorithm to achieve user-defined output error bounds. The gains and the related bounds were successfully verified using a simulation model. Furthermore, results form a successful experimental validation of the bounds and the overall control structure on a pneumatic test rig are presented. While the control scheme and error bounds proposed in this paper are limited to first-order single-input-single-output systems, an extension to certain classes of higher-order and multiple-input-multiple-output systems is expected to be forthcoming.
comment: Accepted at IFAC 2026 (23rd IFAC World Congress, Busan, Korea)
Learning to Sparsify Stochastic Linear Bandits IJCAI 2026
This paper addresses the problem of learning to sparsify stochastic linear bandits, where a decision-maker sequentially selects actions from a high-dimensional space subject to a sparsity constraint on the number of nonzero elements in the action vector. The key challenge lies in minimizing cumulative regret while tackling the potential NP-hardness of finding optimal sparse actions due to the inherent combinatorial structure of the problem. We propose an adaptively phased exploration and exploitation algorithmic framework, utilizing ordinary least squares for parameter learning and specialized subroutines for sparse action selection. When the action set is a Euclidean ball, optimal sparse actions can be efficiently computed, enabling us to establish a $\tilde{\mathcal{O}}(d\sqrt{T})$ regret, where $d$ is the dimension of the action vector and $T$ is the time horizon length. For general convex and compact action sets where finding optimal sparse actions is intractable, we employ a greedy subroutine. For general strongly convex action sets, we derive a $\tilde{\mathcal{O}}(d \sqrt{T})$ $α$-regret; for general compact sets lacking strong convexity, we establish a $\tilde{\mathcal{O}}(d T^{2/3})$ $α$-regret, where $α$ pertains to the approximation ratio of the greedy algorithm. Finally, we validate the performance of our algorithms using extensive experiments including an application to recommendation system.
comment: Include all the omitted details and proofs from the conference paper accepted to IJCAI 2026
Transmission Topology Optimization using accelerated MapElites
Transmission Topology Optimization has great potential to improve efficiency and flexibility of grid operations through non-costly switching actions, but previous approaches struggle with runtime performance and scalability. In this work, we present an optimization approach that leverages GPU acceleration to speed up computations. In a genetic algorithm setting, topologies are randomly mutated and evaluated in parallel for multiple optimization criteria. Combined with a fully GPU-native DC loadflow solver, there is no CPU-GPU data transfer required in the DC optimization loop. Using a variant of the illumination algorithm MapElites, we efficiently generate a set of diverse candidate solutions on the pareto front. Together with an importing and AC validation step, we present an end-to-end optimization solution that runs in under 15 minutes. The approach is currently under evaluation by operational planning operators in two European TSOs. We furthermore open-source our code at github.com/eliagroup/ToOp.
Lure-and-Reveal: An Exposure Framework for Stealthy Deception Attack in Multi-sensor Uncertain Systems
Multi-sensor integration via error-state Kalman filter (KF) is widely employed for precise state estimation in cyber-physical systems (CPSs). However, this integration exposes the system to stealthy deception attacks that render conventional detection mechanisms ineffective. We propose an exposure framework to actively reveal such stealthy attacks without modifying sensor interfaces. The framework introduces a suspect mode in which the defender injects random exposure shakes into the nominal control inputs, thus creating a discrepancy between the defender's true state estimates and the attacker's manipulated state estimates, preventing the attack from remaining stealthy. We further derive an explicit exposure condition that characterizes the minimum shake magnitude to guarantee the finite-time exposure and a compensable condition that ensures the shakes do not degrade closed-loop performance. Simulation results based on a GNSS/INS-integrated UAV system verify the effectiveness of the proposed framework.
Learning to Compress and Transmit: Adaptive Rate Control for Semantic Communications over LEO Satellite-to-Ground Links
The bottleneck of satellite-to-ground links poses a major challenge for the timely downlink of massive on-board imagery. This paper studies adaptive image transmission over LEO satellite-to-ground links using joint source-channel coding (JSCC). We propose an RL-based framework that dynamically selects the channel dimension (compression ratio) of a SwinJSCC encoder to maximize the number of received satisfying reconstruction-quality constraints (PSNR and MS-SSIM) within a finite visibility window. The agent leverages SNR prediction to perform proactive rate adaptation and incorporates an on-board transmission-queue model that captures bursty encoding while penalizing both buffer overflow and underutilization. Simulations under realistic overpass conditions show that the proposed policy substantially outperforms fixed-rate baselines, achieving nearly 95% qualified frames with zero packet loss.
Delay-Robust Secondary Frequency Control via Passive Interconnection and Randomized Block Updates
This paper studies secondary frequency control in transmission networks subject to communication delays at the cyber-physical interface and limited per-update computation at the control center. The regulation objective is formulated as a constrained economic dispatch problem incorporating generation capacity constraints, nodal power balance, transmission-flow limits, and scheduled tie-line power exchanges. Based on this formulation, we develop a passivity-based control framework in which an augmented projected primal-dual controller restores nominal frequency and drives the closed-loop system to the solution set of the constrained economic dispatch problem. Two-way communication delays between the physical network and the control center are modeled as scattering-based passive channels for the measurement uplink and the control-command downlink. This construction preserves the target equilibrium and enables a delay-robust passivity analysis of the delayed closed loop. To reduce the computational burden at the control center, we develop a randomized block-coordinate implementation of the augmented projected primal-dual controller. The resulting sampled-data closed loop preserves the target solution set and achieves local mean-square geometric convergence under suitable step-size and regularity conditions. Finally, a multivariable wave-domain interface filter is introduced to inject additional dissipation and improve the damping of the delayed interface without altering the steady-state interconnection. Simulations on the IEEE 14-bus system indicate that the proposed digital implementation accurately reproduces the delayed closed-loop behavior while reducing the per-update computational cost.
comment: Preprint submitted to Automatica
Scalable Design of Attack-Resilient Controllers for Positive Systems
This paper proposes a framework for secure and resilient controller design for positive systems against cyber-attacks. In particular, we consider a network-controlled system where an adversary injects false data into the actuator channels to increase the control cost (performance measure) while penalizing the attack effort and subject to state-dependent constraints. Using a minimax formulation, we analyze the worst-case performance loss caused by such adversaries, which is given by the solution of a difference equation, and an algebraic equation when the time horizon is infinite. We show that the optimal attack policy, among possible nonlinear policies, is linear. Despite the lack of explicit stealthiness constraints, we also show that when the measured output has an unstable zero which is not an unstable zero of the performance measure, the attacks can induce unbounded performance degradation. The proposed framework is also extended to systems with model uncertainty. Numerical examples illustrate the results and demonstrate how tools from positive systems and linear regulator theory can be used to mitigate cyber-attacks with low computational effort.
comment: 3 figures, submitted to L-CSS and CDC 2026
Muninn: Your Trajectory Diffusion Model But Faster
Diffusion-based trajectory planners can synthesize rich, multimodal robot motions, but their iterative denoising makes online planning and control prohibitively slow. Existing accelerations either modify the sampler or compress the network--sacrificing plan quality or requiring retraining without accounting for downstream control risk. We address the problem of making diffusion-based trajectory planners fast enough for real-time robot use without retraining the model or sacrificing trajectory quality, and in a way that works across diverse state-space diffusion architectures. Our key insight is that diffusion trajectory planners expose two signals we can exploit: a cheap probe of how their internal trajectory representation changes across steps, and analytic coefficients that describe how denoiser errors affect the sampler's state update. By calibrating the first signal against the second on offline runs, we obtain a per-step score that upper-bounds how far the final trajectory can deviate when we reuse a cached denoiser output, and we treat this bound as an uncertainty budget that we can spend over the denoising process. Building on this insight, we present Muninn, a training-free caching wrapper that tracks this uncertainty budget during sampling and, at each diffusion step, chooses between reusing a cached denoiser output when the predicted deviation is small and recomputing the denoiser when it is not. Across standard benchmarks Muninn delivers up to 4.6x wall-clock speedups across several trajectory diffusion models by reducing denoiser evaluations, while preserving task performance and safety metrics. Muninn further certifies that cached rollouts remain within a specified distance of their full-compute counterparts, and we validate these gains in real-time closed-loop navigation and manipulation hardware deployments. Project page: https://github.com/gokulp01/Muninn.
comment: Accepted to Robotics: Science and Systems 2026
Optimal Loss Reduction in Distribution Networks Using Conservation Voltage Reduction and Network Topology Reconfiguration
Conservation voltage reduction (CVR) and network topology reconfiguration (NTR) are widely employed to improve distribution system performance; however, existing approaches largely treat them independently, overlooking their coupled impact on load demand, voltage profiles, and power flow distribution, thereby limiting their overall effectiveness. This paper proposes a coordinated optimization framework for day-ahead operational planning of distribution networks, integrating CVR and NTR to enhance overall network efficiency and reduce active power losses in radial distribution networks. The problem is formulated as a mixed-integer conic programming model incorporating AC power flow constraints, voltage-dependent load representation, and radiality constraints. CVR is implemented to achieve load reduction through coordinated voltage control, while NTR redistributes line loading via optimal switching of controllable branches. The proposed framework is validated on the IEEE 33 and 123-bus distribution systems under varying load conditions. Results demonstrate that the coordinated approach consistently outperforms independent strategies, achieving up to 20.6% reduction in active power losses while maintaining voltage compliance and improving branch loading uniformity. These findings confirm that coordinated optimization provides an effective and scalable solution for enhancing efficiency in modern distribution networks.
Harnessing Floating Car Data, Traffic Camera Observations, and Network Flow Analysis for Traffic Volume Estimation
Cities increasingly rely on vehicle trajectory data to monitor traffic conditions; however, such data offer only a partial and spatially heterogeneous view of network dynamics and exhibit systematic biases across corridors and time periods. In contrast, surveillance cameras can provide high-fidelity traffic information, but only at a limited set of locations, typically sparsely distributed across the road network. We present a hybrid modeling and calibration framework that fuses these complementary data sources to produce physically consistent, network-wide estimates and short-horizon forecasts of traffic volumes. The framework leverages kinematic features derived from the Cell Transmission Model (CTM) formulation within a graph neural network (GNN). By enforcing traffic-flow conservation, capacity limits, and spillback dynamics, the CTM provides a physically grounded representation of traffic flow, while the GNN learns the spatiotemporal evolution of traffic states over the entire road network. To calibrate the model predictions on traffic camera observations, we use a progressive data-assimilation scheme based on an Ensemble Square-Root Kalman filter (EnSRF). A topology-informed flow-weighted transition matrix is further employed to propagate camera-driven corrections to unobserved road segments, enabling real-time, network-wide traffic state and volume estimation. The approach is demonstrated using probe-vehicle trajectory data and municipal traffic cameras in Manhattan, New York City, where it achieves improved accuracy relative to trajectory-based estimates while maintaining physically plausible and network-consistent traffic flows. The proposed framework accommodates varying sensor availability and produces calibrated traffic volumes with uncertainty estimates, supporting operational monitoring and evaluation of transportation policies in data-constrained urban environments.
comment: This manuscript is a preprint version of a work submitted to Discover Civil Engineering
From Discrete to Continuous Highest-earning Imitation Dynamics
Decision-making by imitating the highest earners has been observed in experimental studies. In two-strategy decision-making problems, this behavior may result in perpetual fluctuations in the population proportions of the two strategies. How these fluctuations evolve for large population sizes remains unclear. This paper addresses this question for a heterogeneous population of players imitating the highest earners. We show that the family of Markov chains describing the discrete population dynamics forms a generalized stochastic approximation process for a good upper semicontinuous differential inclusion--the mean dynamics. Furthermore, we prove that the mean dynamics always equilibrate. Then, by using results from stochastic approximation theory, we show that the amplitudes of fluctuations in the population proportions of the two strategies diminish to zero with probability one, as the population size approaches infinity. Our results suggest that in a well-mixed, large population, imitating the highest earners is unlikely to generate large-scale, perpetual fluctuations.
Geometric Pareto Control: Riemannian Gradient Flow of Energy Function via Lie Group Homotopy
We propose Geometric Pareto Control (GPC), a framework overcoming barriers of reinforcement learning in cyber-physical systems where governing physics is known. Reinforcement learning confronts barriers in safety-critical applications: sample complexity grows with action-space dimension, retraining is required when objectives or conditions shift, goals such as safety recovery and economic dispatch demand brittle switching logic, and unsafe exploration persists under constrained RL formulations. GPC resolves these barriers through a two-stage geometric approach. Offline, the supported family of Pareto-optimal solutions (i.e., solutions recoverable by weighted scalarization) is embedded as a submanifold within a Lie group. Exponential map closure preserves membership in the ambient Lie group; drift and reset assumptions keep online latent states within a bounded neighbourhood of the Pareto submanifold, and a training-time feasibility margin guarantees decoded actions remain feasible without post-hoc projection, constructing a "map" of the solution landscape. Online, a closed-form proximal navigator traverses this submanifold via a unified Riemannian gradient flow driven by a singular perturbation potential field, inducing dual-timescale dynamics that prioritize constraint restoration over performance optimization. The homeomorphic structure of the submanifold guarantees that varying system parameters and objective weights produce continuous control actions, enabling deployment under unseen conditions without retraining. Validated on a nonconvex control task and real-time multi-objective optimal power flow, GPC achieves 100% feasibility, 0.30% oracle suboptimality, and 12.3 ms decisions while shifting from constraint recovery to economic dispatch. Under branch-admittance uncertainty, it remains 100% feasible without retraining, whereas model-free baselines produce no feasible dispatches.
Computational Design of a Low-Visibility UAV Using a Human-Aligned Perceptual Metric
We introduce Phantom Twist, a type of single-propeller UAV designed to achieve low visibility through high-speed spinning and the exploitation of motion blur. We develop a two-stage automated design pipeline that optimizes the placement of functional components including batteries, control PCB, motor-propeller assembly, and counterweights. The pipeline minimizes visibility as measured by a human-aligned perceptual metric (LPIPS) while strictly satisfying inertial and aerodynamic constraints required for stable flight. We validate this approach through fabrication and flight testing of multiple prototypes. These tests confirm that our pipeline produces stable, controllable designs and that the optimized UAV exhibits significantly reduced visual perceptibility compared to conventional quadcopters.
comment: Accepted by RSS 2026
236 μW Direct-RF PLL-Free Multi-PSK Transmitter Using Oscillator-Based Phase Synthesis
This paper presents a compact, low-power, direct RF multi-phase-shift keying (PSK) transmitter (TX) that eliminates the need for a phase-locked loop (PLL) by performing phase modulation directly within a ring oscillator. The proposed architecture exploits synchronized charge extraction at the oscillator's transition points to induce controlled phase shifts while maintaining constant amplitude and frequency. A time-domain multi-triggering technique is introduced to enable reconfigurable multi-mode modulation, supporting 16-PSK, 8-PSK, QPSK, and BPSK within a unified hardware structure. The TX circuit is fabricated in a 22-nm FD-SOI process and operates in the ISM band at 2.4 GHz. Measurement results indicate a symbol rate of 2 MSps with a maximum error vector magnitude (EVM) of 5.13% rms. The core TX occupies 23 {\times} 17.6 μm2 and consumes 236 μW, excluding the output driver, which delivers -10 dBm output power over a 60 MHz bandwidth. The proposed design achieves a favorable trade-off between power consumption, circuit complexity, and modulation flexibility, making it well-suited for low-power wireless applications.
comment: 4 pages, 8 figures, 1 table
SHIA: A Direct SysML-Hardware Interface Architecture for Model-Centric Verification
Model-Based Systems Engineering (MBSE) is widely treated as the backbone of digital engineering, with languages such as the Systems Modeling Language (SysML) providing the means to capture system structure, behaviour, and verification intent. Yet once verification moves to hardware, the system model is routinely left behind. Domain-specific simulation environments, model transformations, and bespoke tool integrations take over, and the model that began as the authoritative reference drifts out of sync with the implementation it was meant to govern. This paper introduces the SysML Hardware Interface Architecture (SHIA), which keeps an executable SysML model directly inside the verification loop, exchanging messages with physical hardware without intermediate transformation chains, co-simulation platforms, or broker-mediated plugins. SHIA is realised through a SysML side server, written in embedded C++ within IBM Rhapsody, and a hardware side server running on a Raspberry Pi, together establishing a bidirectional link between the digital model and the physical system. A logic gate case study demonstrates the approach end-to-end, from hardware model construction and prototype assembly to test harness design, behavioural statechart control, and staged verification of each component before integration. The integrated system exchanged messages correctly in both directions, and Karnaugh map comparison between the SysML-generated and hardware-generated outputs showed zero discrepancy. The result shows that, when paired with a suitable interface, SysML need not remain a static description that informs downstream tools; it can serve as the executable layer through which hardware behaviour is stimulated, observed, and verified. The work demonstrates a route to model-governed verification and a shorter digital thread between system architecture and the hardware that realises it.
comment: 18 pages, 18 figures. For source code, see https://github.com/AmalElsokary/HiL
Multi-Agent System Identification with Nonlinear Sheaf Diffusion
Local interaction laws governing multi-agent systems can be difficult to recover from trajectory data, even when the dynamics are observed faithfully. In systems governed by a nonlinear sheaf Laplacian -- a generalization of the graph Laplacian accommodating heterogeneous state spaces and asymmetric communication channels -- the coordination law is encoded by edge potential functions whose gradients produce the inter-agent forces. Because trajectory observations record node-state evolution, they expose only the aggregate effect of the edge forces at each node: distinct interaction laws that agree at the node level are indistinguishable from trajectory data alone. We show that the fundamental obstruction to recovery is topological, measured by sheaf cohomology, and that unique recovery from an unconstrained function class is possible if and only if this cohomology vanishes. When the obstruction is nontrivial, we show that recovery within a finite-dimensional parameterized class is possible precisely when a data-dependent information matrix is positive definite. Experiments validate the theory and illustrate that accurate trajectory reproduction need not certify recovery of the underlying interaction law.
Hybrid Analytical--EMT Method for HVDC Protection System Component-Level Design
Protection system design for multi-terminal HVDC grids is challenging due to the complexity of the system and the often conflicting design requirements. Effective specification of protection component parameters (e.g., DC circuit breakers and series DC inductors) during component-level design is crucial due to interdependencies among components, the need for detailed modeling, and the complex interactions between the protection system and converter control systems. Both analytical and simulation-based approaches have been proposed as solutions for component-level design. However, analytical methods may not accurately represent system behavior given that approximation is necessary, and simulation-based approaches often require extensive computational effort and time. Therefore, this paper presents an efficient systematic design method, combining both approaches. First, a fundamental analytical solution is derived to consider the protection system requirements. Then, a hybrid analytical--EMT methodology is proposed to accelerate convergence toward the required design parameters, after which detailed models are applied to ensure accuracy in design and validation. The approach is applicable to component-level design for both fully and partially selective protection strategies in HVDC grids.
comment: 25 pages, 14 figs
Design of a validation methodology for a prototype wristband for capturing muscle signals and upper limb movement
Surface electromyography (sEMG) is a noninvasive technique widely used to control myoelectric prostheses and other human-machine interfaces. However, the high cost of commercial systems limits accessibility in academic and research environments, especially in developing countries. This study presents a validation protocol for a low-cost eight-electrode sEMG wristband prototype based on IEC 60601 and ANSI/AAMI EC13 standards. The protocol includes electrical safety tests, such as leakage current measurement, insulation evaluation, and continuity verification between electrodes and circuits. Functional performance was evaluated by comparing signals acquired with the prototype against those obtained from a commercial reference device (PortiLab2) using Pearson correlation, Bland-Altman analysis, and mean squared error. Additional tests included signal stability during rest and contraction, UART and Bluetooth communication, frequency response, mechanical characterization of the casing, and user comfort assessment. Results showed leakage currents between 11.4 uA and 13.5 uA, adequate insulation, stable signal acquisition, and high correlation with the reference system (r > 0.85). Reliable wireless transmission without packet loss was also observed. Limitations included power supply constraints during wireless testing and discrepancies in the frequency response at high-gain stages compared with simulations. Mechanical tests showed elastic behavior of the casing under loads up to 98 N. The proposed protocol provides a practical and reproducible framework for the technical and functional validation of low-cost sEMG systems for research and educational applications.
Enabling Small-Signal Stability Analysis of Black-Box Voltage Source Converters in Large-Scale Modern Power Systems
Modern power systems increasingly rely on power electronic converters, yet many of these devices are provided as black-box models, limiting the applicability of conventional small-signal analysis (SSA) tools. This work presents a unified multi-variable fitted state-space (SSA-FITSS) methodology that enables accurate small-signal modeling of black-box Voltage Source Converters (VSCs) using frequency-domain (FD) identification, adaptive pole-expansion, and reduced-order realization. The method includes an automated state-interpretation strategy that assigns fitted states to representative control-loop categories based on their dominant frequency ranges, providing an approximate but meaningful physical interpretation of the identified dynamics. This capability allows extensive modal analysis, including eigenvalue sensitivities and participation factors, in systems where internal converter details are unavailable. The methodology is validated on a grid-following (GFL) VSC and applied to the New England system, which contains multiple black-box converters operating in both GFL and grid-forming (GFM) modes. Results show that the SSA-FITSS models accurately reproduce converter and system dynamics, support full eigenvalue-based analysis, and reveal stability limits under varying synchronous generation and GFL penetration levels. The approach overcomes key limitations of existing identification-based techniques by enabling scalable, interpretable, and system-wide stability assessment.
comment: Submitted for publication in the IEEE Transactions on Power Systems
Sensitivity Analysis of Performance-Based Partitioning in District Heating Networks
The paper presents a sensitivity analysis of the factors affecting the optimal partitioning of a district heating network for distributed control. Leveraging a physics-based, distributed model predictive control framework and a performance-based partitioning method, this work studies the relationship between variations in system parameters and the resulting optimal partition, providing insight into the robustness of a nominally designed partition to perturbed operating conditions. The enabling methodology is a learning-enhanced branch and bound method that culls the search space, reducing the number of partitions evaluated for each case. The sensitivity of the nominally optimal partition is characterized across twelve parameter variations, including supply temperature, operating season, building flexibility, pipe characteristics, and building type. This simulation study shows that a well-designed nominal partition exhibits an average cost increase of only 2.8% relative to centralized control across eleven of the twelve cases, with three cases identifying the nominal partition as globally optimal under the perturbed conditions. The robustness study is followed by an analysis of the sensitivity of the optimality loss metric (OLM), revealing that, in five of twelve cases, the case-specific OLM-minimizing partitions underperform the nominally optimal one due to shifts in the relative magnitude of heat loss versus flexibility costs. This indicates that proper tuning of cost function weights and initial conditions for the performance optimization problem is essential for reliable partition selection, and that seasonal repartitioning is warranted when demand profiles deviate substantially from the nominal, as observed in the November operating case.
Sensor Design for Accuracy-Bounded Estimation via Maximum-Entropy Likelihood Synthesis
Designing the sensing architecture for large-scale spatio-temporal systems is hard when accuracy requirements are specified but sensor models are uncertain or unavailable. Classical design treats sensor placement and estimation sequentially, requiring valid forward models for each sensing modality. This paper inverts the design flow: given an error budget, synthesize the measurement likelihood that enforces it while injecting minimal information beyond the dynamical prior. The likelihood is constructed by constrained optimization: among all posteriors satisfying a prescribed accuracy bound relative to a target, select the one minimizing Kullback-Leibler divergence from the prior. The solution is a maximum-entropy posterior in relative-entropy form, and the induced likelihood is the Radon-Nikodym derivative. The framework accommodates arbitrary discrepancies and is instantiated for Wasserstein distance, maximum mean discrepancy, $f$-divergences, moment constraints, and hybrid metrics. For each, we derive the discrete particle-level problem, analyze its convex or convex-relaxed structure, and present solvers with complexity scaling. A closed-form solution exists for the symmetric exponential-tilt case, and a distillation procedure converts nonparametric likelihood samples into parametric forms. A two-layer sensor design architecture embeds the synthesized likelihood in the recursive predict-update loop, connecting accuracy budgets to physical sensor placement, precision, and configuration. Numerical experiments comparing four metrics on unimodal and multimodal scenarios confirm the accuracy constraints are reliably enforced and reveal how metric choice determines the amount and spatial distribution of injected information.
Newton's Lantern: A Reinforcement Learning Framework for Finetuning AC Power Flow Warm Start Models
Neural warm starts can sharply reduce the number of Newton-Raphson iterations required to solve the AC power flow problem, but existing supervised approaches generalize poorly on heavily loaded instances near voltage collapse. We prove a lower bound on the Newton-Raphson iteration count that depends on the direction of the warm start error rather than on its magnitude, and show as a corollary that the bound becomes vacuous as the smallest singular value of the power-flow Jacobian shrinks, identifying the failure mode of supervised regression near the saddle-node bifurcation. Motivated by this analysis, we introduce Newton's Lantern, a finetuning pipeline that combines group relative policy optimization with a learned reward model trained on perturbations of the base model's predictions, using the iteration count itself as the supervisory signal. Across IEEE 118-bus, GOC 500-bus, and GOC 2000-bus benchmarks, Newton's Lantern is the only method that converges on every test snapshot while attaining the smallest mean iteration count.
Enabling Performant and Flexible Model-Internal Observability for LLM Inference
Today's inference-time workloads increasingly depend on timely access to a model's internal states. We present DMI-Lib, a high-speed deep model inspector that treats internal observability as a first-class systems primitive, decoupling it from the inference hot path via an asynchronous observability substrate built from Ring^2, a GPU-CPU memory abstraction for capturing and staging tensors, and a policy-controlled host backend that exports them. DMI-Lib enables the placement of observation points across a rich space of internal signals and diverse inference backends while preserving serving optimizations and adhering to tight GPU memory budgets. Our experiments demonstrate that DMI-Lib incurs only 0.4%--6.8% overhead in offline batch inference and an average of 6% in moderate online serving, reducing latency overhead by 2x-15x compared to existing baselines with similar observability features. DMI-Lib is open-sourced at https://github.com/ProjectDMX/DMI.
Storage Participation in Electricity Markets: Time Discretization through Robust Optimization
Electricity storage is used for intertemporal price arbitrage and for ancillary services that balance unforeseen supply and demand fluctuations via frequency regulation. We present an optimization model that computes bids for both arbitrage and frequency regulation and ensures that storage operators can honor their market commitments at all times for all fluctuation signals in an uncertainty set inspired by market rules. This requirement, initially expressed by an infinite number of nonconvex functional constraints, is shown to be equivalent to a finite number of deterministic constraints. The resulting formulation is a mixed-integer bilinear program that admits mixed-integer linear relaxations and restrictions. Empirical tests on European electricity markets show a negligible optimality gap between the relaxation and the restriction. The model can account for intraday trading and, with a solution time of under 5 seconds, may serve as a building block for more complex trading strategies. Such strategies become necessary as battery capacity exceeds the demand for ancillary services. In a backtest from 1 July 2020 through 30 June 2024 joint market participation more than doubles profits and almost halves energy output compared to no FCR participation.
Equation-Free Digital Twins for Nonlinear Structural Dynamics
Monitoring high-dimensional engineering structures in extreme environments is limited by non-stationary excitation, nonlinear structural kinematics, and stochastic forcing. Traditional model-based and black-box data-driven methods often struggle to resolve these dynamics in real time, particularly under sensor failure or partial observability. This paper introduces a rank-optimized digital twin framework based on Koopman operator theory, Hankel-matrix embeddings, and dynamic mode decomposition. By lifting operational data into a linear invariant subspace, the method enables autonomous, input-blind reconstruction of structural states without requiring a priori mass or stiffness matrices. The framework is validated on an NREL 5MW spar-buoy floating offshore wind turbine, representing a challenging coupled aero-hydro-servo-elastic system. Results show that the rank-optimized Koopman-Hankel manifold separates structural resonances from deterministic 3P rotor harmonics under colored noise, where standard subspace identification can be unreliable. A rolling-horizon virtual sensing strategy achieves high-fidelity reconstruction at critical structural hotspots, with coefficient of determination greater than 0.95 at 1 Hz data assimilation and accuracy exceeding 0.99 at higher sampling rates. By estimating a physical Lyapunov time of approximately 1.0 s, the study defines the predictability horizon associated with the system information barrier. The proposed framework provides a computationally efficient and resilient digital twin approach for real-time identification and virtual sensing of complex structural dynamics.
comment: Added code availability statement linking the GitHub repository and archived Zenodo software release
Exploiting Over-Approximation Errors as Preview Information for Nonlinear Control
We study the control of nonlinear constrained systems via over-approximations. Our key observation is that the over-approximation error, rather than being an unknown disturbance, can be exploited as input-dependent preview information. This leads to the notion of informed policies, which depend on both the state and the error. We formulate the concretization problem -- recovering a valid input for the true system from a preview-based policy -- as a fixed-point equation. Existence of solutions follows from the Brouwer fixed-point theorem, while efficient computation is enabled through closed-form, linear, or convex programs for input-affine systems, and through an iterative method based on the Banach fixed-point theorem for nonlinear systems.
comment: 7 pages, 2 figures
MOBIUS: A Multi-Modal Bipedal Robot that can Walk, Crawl, Climb, and Roll
This paper presents the MOBIUS platform, a bipedal robot capable of walking, crawling, climbing, and rolling. MOBIUS features four limbs, two 6-DoF arms with two-finger grippers for manipulation and climbing, and two 4-DoF legs for locomotion--enabling smooth transitions across diverse terrains without reconfiguration. A hybrid control architecture combines reinforcement learning for locomotion and force control for compliant contact interactions during manipulation. A high-level MIQCP planner autonomously selects locomotion modes to balance stability and energy efficiency. Hardware experiments demonstrate robust gait transitions, dynamic climbing, and full-body load support via pinch grasp. Overall, MOBIUS demonstrates the importance of tight integration between morphology, high-level planning, and control to enable mobile loco-manipulation and grasping, substantially expanding its interaction capabilities, workspace, and traversability.
comment: Paper is accepted at the Robotics: Science and Systems conference, held in Sydney, Australia, July 13th-17th, 2026. Alexander Schperberg and Yusuke Tanaka are co-first authors. Both were at the Robotics and Mechanisms Laboratory (RoMeLa) at UCLA when the work started, and are now with Mitsubishi Electric Research Laboratories and ETH Zurich (RSL) respectively
Refined Barrier Conditions for Finite-Time Safety and Reach-Avoid Guarantees in Stochastic Systems
Providing finite-time probabilistic safety and reach-avoid guarantees is crucial for safety-critical stochastic systems. Existing state-of-the-art barrier methods often rely on a restrictive boundedness assumption for auxiliary functions, limiting their applicability. This paper presents refined barrier conditions that remove this assumption. Specifically, we establish conditions for deriving upper bounds on finite-time safety probabilities in discrete-time systems and lower bounds on finite-time reach-avoid probabilities in continuous-time systems. This relaxation expands the class of verifiable systems, especially those with unbounded state spaces, and facilitates the use of advanced optimization techniques, such as semi-definite programming with polynomial functions. Numerical examples demonstrate the effectiveness of the approach.
comment: To appear in IFAC'26
Rarity of rocket-driven Penrose extraction in Kerr spacetime
We study rocket-driven Penrose extraction in the test-particle limit on a fixed Kerr background for equatorial prograde flybys under explicit steering prescriptions. A spacecraft ejects exhaust inside the ergosphere; when the exhaust attains negative Killing energy, the remaining spacecraft gains energy by 4-momentum conservation. Across 320{,}000 simulated trajectories spanning black-hole spin, exhaust velocity, and orbital parameters, extraction with escape is rare in broad parameter scans (at most ${\sim}1\%$) and requires high spin ($a/M\gtrsim 0.89$), highly relativistic exhaust ($v_e\gtrsim 0.91c$), and finely tuned initial conditions. Under optimal tuning the success rate reaches ${\sim}70\%$ at $a/M = 0.95$. For representative escape trajectories, a single periapsis impulse is more propellant-efficient than the continuous-thrust controllers studied here. All quoted thresholds are empirical and specific to the orbit family, prior, and steering protocol studied.
comment: 20 pages, 6 figures, 8 tables, accepted at Physical Review D
A Reproducible Method for Mapping Electricity Transmission Infrastructure for Space Weather Risk Assessment
Space weather risk assessment is constrained by the lack of available asset information needed to model Geomagnetically Induced Currents (GICs) in electricity transmission infrastructure. We propose a reproducible method that enables risk analysts to collect their own open-source substation data. Utilizing an innovative web-browser platform for annotation, we convert OpenStreetMap substation locations to high-resolution, component-level mappings of electricity transmission assets. For example, we convert an initial 1,313 high-voltage (>115 kV) substations to 52,273 substation components via Google Earth APIs utilizing low-altitude, satellite, and streetview imagery. Approximately 41,642 substation components (79.6%) connect to the highest substation voltage levels (>345 kV) and are potentially susceptible to GICs, with 7,949 identified transformers. Compared to the OpenStreetMap baseline, this approach provides detailed insights on voltage levels, line capacities, and substation configurations. We then construct a geospatial GIC network for the Tennessee Valley Authority region, comparing May 2024 results with the UIUC150 synthetic network and with measured ground GICs at 13 monitoring devices. Importantly, the two open-source networks produce 95th-percentile peak ground GIC values within 4% of each other, and the modeled time series broadly capture the temporal morphology of the storm at the monitoring sites. This method shows promise for spatially explicit GIC screening and regional nowcasting without requiring access to operator data.
On-Line Policy Iteration with Trajectory-Driven Policy Generation
We consider deterministic finite-horizon optimal control problems with a fixed initial state. We introduce an on-line policy iteration method, which, starting from a given policy, however obtained, generates a sequence of cost-improving policies and corresponding trajectories. Each policy produces a trajectory, which is used in turn to generate data for training the next policy. The method is motivated by problems that are repeatedly solved starting from the same initial state, including discrete optimization and path planning for repetitive tasks. For such problems, the method is fast enough to be used on-line. Under a natural consistency condition, we show that the sequence of costs of the generated policies is monotonically improving for the given initial state (but not necessarily for other states). We illustrate our results with computational studies from combinatorial optimization and 3-dimensional path planning for drones {and a robot arm} in the presence of obstacles. We also discuss briefly a stochastic counterpart of our algorithm. Our proposed framework combines elements of rollout and policy iteration with flexible trajectory-based policy representations, and applies to problems involving a single as well as multiple decision makers. It also provides a principled way to train neural network-based policies using trajectory data, while preserving monotonic cost improvement.
Convex Computations for Controlled Safety Invariant Sets of Black-box Discrete-time Dynamical Systems
Identifying controlled safety invariant sets (CSISs) is essential for safety-critical systems. This paper addresses the problem of computing CSISs for black-box discrete-time systems, where the dynamics are unknown and only limited simulation data are available. Traditionally, a CSIS requires that for every state in the set, there exists a control input that keeps the system within the set at the next step. However, enforcing such universal invariance, i.e., requiring the set to remain controlled invariant for all states, is often overly restrictive or impractical for black-box systems. To address this, we introduce the notion of a Probably Approximately Correct (PAC) CSIS, in which, with prescribed confidence, there exists a suitable control input to keep the system within the set at the next step for at least a specified fraction of the states. Our approach leverages barrier functions and scenario optimization, yielding a tractable linear programming method for estimating PAC CSISs. Several illustrative examples demonstrate the effectiveness of the proposed framework.
comment: To appear in IFAC'26
Dimension and model reduction approaches for linear Bayesian inverse problems with rank-deficient prior covariances
Bayesian inverse problems use observed data to update a prior probability distribution for an unknown state or parameter of a scientific system to a posterior distribution conditioned on the data. In many applications, the unknown parameter is high-dimensional, making computation of the posterior expensive due to the need to sample in a high-dimensional space and the need to evaluate an expensive high-dimensional forward model relating the unknown parameter to the data. However, inverse problems often exhibit low-dimensional structure due to the fact that the available data are only informative in a low-dimensional subspace of the parameter space. Dimension reduction approaches exploit this structure by restricting inference to the low-dimensional subspace informed by the data, which can be sampled more efficiently. Further computational cost reductions can be achieved by replacing expensive high-dimensional forward models with cheaper lower-dimensional reduced models. In this work, we propose new dimension and model reduction approaches for linear Bayesian inverse problems with rank-deficient prior covariances, which arise in many practical inference settings. The dimension reduction approach is applicable to general linear Bayesian inverse problems whereas the model reduction approaches are specific to the problem of inferring the initial condition of a linear dynamical system. We provide theoretical approximation guarantees as well as numerical experiments demonstrating the accuracy and efficiency of the proposed approaches.
The explicit game-theoretic linear quadratic regulator for constrained multi-agent systems
We present an efficient algorithm to compute the explicit open-loop solution to both finite and infinite-horizon dynamic games subject to state and input constraints. Our approach relies on a multiparametric affine variational inequality characterization of the open-loop Nash equilibria and extends the classical explicit constrained LQR and MPC frameworks to multi-agent non-cooperative settings. A key practical implication is that linear-quadratic game-theoretic MPC becomes viable even at very high sampling rates for multi-agent systems of moderate size. Extensive numerical experiments demonstrate order-of-magnitude improvements in online computation time and solution accuracy compared with state-of-the-art game-theoretic solvers.
Chattering Reduction for a Second-Order Actuator via Dynamic Sliding Manifolds
We analyze actuator chattering in a scalar integrator system subject to second-order actuator dynamics with an unknown time constant and first-order sliding-mode control, using both a conventional static sliding manifold and a dynamic sliding manifold. Using the harmonic balance method, we prove that it is possible to adjust the parameters of the dynamic sliding manifold for the specified system class so as to reduce the amplitude of the chattering in comparison to the static manifold. We illustrate our results with a simulation example. This contribution serves as a proof of concept to motivate further investigations in chattering reduction via dynamic sliding manifolds.
Cyclic Nullspace Coordination: Perpetual Flight of Aerial Carriers for Static Suspension
This work demonstrates that the non-stop flights of three or more carriers are compatible with holding a constant pose of a cable-suspended load. It also presents an algorithm for generating the carriers' coordinated non-stop trajectories. The proposed method builds upon two pillars: (1) the choice of n special linearly independent directions of internal forces within the 3n-6-dimensional nullspace of the grasp matrix of the load, chosen as the edges of a Hamiltonian cycle on the graph that connects the cable attachment points on the load. Adjacent pairs of directions are used to generate n forces evolving on distinct 2D affine subspaces, despite the attachment points being generically in 3D; (2) the construction of elliptical trajectories within these subspaces by mapping, through appropriate graph coloring, each edge of the Hamiltonian cycle to a periodic coordinate while ensuring that no adjacent coordinates exhibit simultaneous zero derivatives. Combined with conditions for load statics and attachment point positions, these choices ensure that each of the n force trajectories projects onto the corresponding cable constraint sphere with non-zero tangential velocity, enabling perpetual motion of the carriers while the load is still. The work provides a scalable constructive design for any n greater than or equal to 3 with tuning guidelines, quantifies sensitivity and single-carrier failures, and provides a fixed-wing-compatible planner that preserves load statics under speed/bank/flight-path constraints. The theoretical findings are validated through simulations and laboratory experiments with quadrotor UAVs.
comment: Accepted for publications on the IEEE Transactions on Control Systems Technology
Fundamental limitations of monotonic tracking systems
We consider the monotonic tracking control problem for continuous-time single-input single-output linear systems using output-feedback linear controllers in this paper. We provide the necessary and sufficient conditions for this problem to be solvable and expose its fundamental limitations: the exact feasible locations of the plant zeros, the minimum controller order possible, and the fastest decay rate achievable for the closed-loop system. The relationship between these bounds is explained by a simple geometric shape for plants with a pair of complex-conjugate zeros.
On robotic manipulators with time-dependent inertial parameters: From physical consistency to boundedness of the mass matrix
We generalize the robotics equation describing the dynamics of open kinematic chains by including the effect of time-dependent change of inertial parameters as well as the effects of causative mass-density redistribution, triggered by internal movement of mass-carrying particles relative to their body-fixed frames. Time dependency of inertial parameters that results from the sole addition of mass to the robot prominently occurs during the loading of end-effectors -- a scenario covered by our model without restriction from the restraint that kinematic parameters of the robot must remain constant. Further, our model also includes internal mass-density redistributions that adhere to this kinematic restraint such as trolleys attached to the robot or the movement of passengers. To accompany the generalized robotics equation with some theoretical infrastructure, we then introduce the concepts of uniform physical consistency and upper boundedness of inertial parameters under which desirable, structural properties regarding the existence of finite, positive uniform bounds of the mass matrix can be shown to carry over to the more involved case of time-dependent inertial parameters. These findings have implications for adaptive control, as they facilitate more realistic testing for robustness against unforeseen time dependencies. Moreover, the results in this paper also provide a pathway to ensuring the desirable existence of finite, positive uniform bounds of the estimated mass matrix under upper bounded, uniformly physically consistent estimation regimes.
comment: to be published in Nonlinear Dynamics
Quantitative Error Feedback for Quantization Noise Reduction of Filtering over Graphs SP
This paper introduces an innovative error feedback framework designed to mitigate quantization noise in distributed graph filtering, where communications are constrained to quantized messages. It comes from error spectrum shaping techniques from state-space digital filters, and therefore establishes connections between quantized filtering processes over different domains. In contrast to existing error compensation methods, our framework quantitatively feeds back the quantization noise for exact compensation. We examine the framework under three key scenarios: (i) deterministic graph filtering, (ii) graph filtering over random graphs, and (iii) graph filtering with random node-asynchronous updates. Rigorous theoretical analysis demonstrates that the proposed framework significantly reduces the effect of quantization noise, and we provide closed-form solutions for the optimal error feedback coefficients. Moreover, this quantitative error feedback mechanism can be seamlessly integrated into communication-efficient decentralized optimization frameworks, enabling lower error floors. Numerical experiments validate the theoretical results, consistently showing that our method outperforms conventional quantization strategies in terms of both accuracy and robustness.
comment: Accepted by IEEE TSP
Sample-Efficient and Smooth Cross-Entropy Method Model Predictive Control Using Deterministic Samples
Cross-entropy method model predictive control (CEM--MPC) is a powerful gradient-free technique for nonlinear optimal control, but its performance is often limited by the reliance on random sampling. This conventional approach can lead to inefficient exploration of the solution space and non-smooth control inputs, requiring a large number of samples to achieve satisfactory results. To address these limitations, we propose deterministic sampling CEM (dsCEM), a novel framework that replaces the random sampling step with deterministic samples derived from localized cumulative distributions (LCDs). Our approach introduces modular schemes to generate and adapt these sample sets, incorporating temporal correlations to ensure smooth control trajectories. This method can be used as a drop-in replacement for the sampling step in existing CEM-based controllers. Experimental evaluations on two nonlinear control tasks demonstrate that dsCEM consistently outperforms state-of-the-art iCEM in terms of cumulative cost and control input smoothness, particularly in the critical low-sample regime.
comment: To be published in the Proceedings of the American Control Conference (ACC 2026)
A Gauss-Newton-Induced Structure-Exploiting Algorithm for Differentiable Optimal Control
Differentiable optimal control, particularly differentiable nonlinear model predictive control (NMPC), provides a powerful framework that enjoys the complementary benefits of machine learning and control theory. A key enabler of differentiable optimal control is the computation of derivatives of the optimal trajectory with respect to problem parameters, i.e., trajectory derivatives. Previous works compute trajectory derivatives by solving a differential Karush-Kuhn-Tucker (KKT) system, and achieve this efficiently by constructing an equivalent auxiliary system. However, we find that directly exploiting the matrix structures in the differential KKT system yields significant computation speed improvements. Motivated by this insight, we propose FastDOC, which applies a Gauss-Newton approximation of Hessian and takes advantage of the resulting block-sparsity and positive semidefinite properties of the matrices involved. These structural properties enable us to accelerate the computationally expensive matrix factorization steps, resulting in a factor-of-two speedup in theoretical computational complexity, and in a synthetic benchmark FastDOC achieves up to a 180% time reduction compared to the baseline method. Finally, we validate the method on an imitation learning task for human-like autonomous driving, where the results demonstrate the effectiveness of the proposed FastDOC in practical applications.
comment: Accepted for publication at 2026 IFAC World Congress, Busan
Explicit Bounds on the Hausdorff Distance for Truncated mRPI Sets via Norm-Dependent Contraction Rates
We derive a computable closed-form upper bound on the Hausdorff distance between a truncated minimal robust positively invariant (mRPI) set and its infinite-horizon limit. The bound depends only on a disturbance-set size measure and an induced-norm contraction factor of the system matrix, and it yields an explicit, fully analytic horizon-selection rule that guarantees a prescribed approximation tolerance without iterative set computations. The choice of vector norm enters as a design lever: norm shaping -- through diagonal or Lyapunov-based weighting -- tightens both the contraction factor and the resulting certificate, with direct consequences for robust invariant-set approximation and tube-based model predictive control (MPC) constraint tightening. Numerical examples illustrate the accuracy, scalability, and practical impact of the proposed bound.
comment: 6 pages, 5 figures. Accepted at the 2026 IEEE Conference on Control Technology and Applications (CCTA), Vancouver, BC, Canada, August 12-14, 2026
Learning Approximate Nash Equilibria in Cooperative Multi-Agent Reinforcement Learning via Mean-Field Subsampling
Many large-scale platforms and networked control systems have a centralized decision maker interacting with a massive population of agents under strict observability constraints. Motivated by such applications, we study a cooperative Markov game with a global agent and $n$ homogeneous local agents in a communication-constrained regime, where the global agent only observes a subset of $k$ local agent states per time step. We propose an alternating learning framework $(\texttt{ALTERNATING-MARL})$, where the global agent performs subsampled mean-field $Q$-learning against a fixed local policy, and local agents update by optimizing in an induced MDP. We prove that these approximate best-response dynamics converge to an $\widetilde{O}(1/\sqrt{k})$-approximate Nash Equilibrium, while separating the sample complexities between the joint state and action spaces. Finally, we validate our results in numerical simulations for multi-robot control.
comment: 57 pages, 10 figures, 4 tables
Thinking fast and slow -- a cognitive inspired framework for decision intelligence for power systems
Decision-making in power systems spans multiple timescales -- from milliseconds to prevent surges, to seconds to balance frequency and protect grid assets, to minutes for real-time energy balancing, to day-ahead, seasonal, and long-term planning. Growing uncertainty and complexity, driven by intermittent renewables and distributed energy resources (DER), demand fresh approaches to power system intelligence and architecture. Daniel Kahneman describes the interplay of two systems of human decision-making: System 1 that is fast, intuitive, experience based, reactive, and System 2 that is slow, deliberate, analytical. Similarly, octopus intelligence illustrates a model for distributed yet coordinated decision-making between central and edge intelligence. Future power systems must embed coordinated intelligence that operates across diverse timescales and with placement at both edge and centralized levels. This paper maps decision-intelligence in power systems against System 1 and 2 and edge-central architecture paradigms based on the trade-offs inherent in decision making such as speed/latency, energy cost/compute, accuracy, and robustness. The framework inspires an agentic intelligence architecture -- laying the foundation for trustworthy, autonomous power systems of the future.
comment: 5 pages, This work has been submitted to IEEE for possible publication
Saddle Point Evasion via Curvature-Regularized Gradient Dynamics
Nonconvex optimization underlies many modern machine learning and control tasks, where saddle points pose the dominant obstacle to reliable convergence in high-dimensional settings. Escaping these saddle points deterministically using continuous-time optimization remains an open challenge: gradient descent is blind to curvature, stochastic perturbation methods lack deterministic guarantees, and Newton-type approaches suffer from Hessian singularity. Adopting the perspective of viewing optimization algorithms as dynamical systems, we present Curvature-Regularized Gradient Dynamics (CRGD), which augments the objective with a smooth penalty on the negative Hessian eigenvalues, yielding an augmented cost that serves as an optimization Lyapunov function with user-selectable convergence rates to second-order stationary points. Numerical experiments confirm that CRGD converges to second-order stationary points, even in regimes where gradient descent fails.
comment: This work has been submitted to the IEEE for possible publication. 6 pages, 3 figures
Computational Complexity Analysis of Interval Methods in Solving Uncertain Nonlinear Systems
This paper analyzes the computational complexity of validated interval methods for uncertain nonlinear systems and steady-state enclosure. Interval analysis produces guaranteed enclosures that account for uncertainty and round-off, but its adoption is often limited by computational cost in high dimensions. We develop an algorithm-level worst-case framework that makes explicit the dependence on the problem dimension $n$, the initial search region size $\mathrm{Vol}(X_0)$, the target tolerance $\varepsilon$, and the costs of validated primitives (inclusion-function evaluation, Jacobian evaluation, and interval linear algebra). Within this framework, we derive worst-case time and space bounds for interval bisection, subdivision$+$filter, interval constraint propagation, interval Newton, and interval Krawczyk, and identify dominant cost drivers. We also show that the computation of the determinant and inverse of interval matrices via naive Laplace expansion exhibits factorial growth with increasing matrix dimension, motivating specialized interval linear algebra. We complement the worst-case bounds with computational results on two application-motivated biochemical steady-state models (a Hill-type regulatory network and an enzyme-saturation-based winner-take-all circuit) in dimensions $n\in\{2,5,10\}$, including instances that process millions of boxes. The resulting analysis and experiments support the practical design of validated solvers for uncertainty-aware steady-state screening tasks such as robust operating-point certification and multistability assessment.
comment: 24 pages, 1 figure
Rollbot: a Spherical Robot Driven by a Single Actuator ICRA 2026
Spherical robots typically require at least two actuators to achieve controlled 2D planar motion. Here we present Rollbot, the first spherical robot capable of controllably maneuvering on a 2D plane with a single actuator, challenging this assumption. Rollbot rolls on the ground in a circular pattern and controls its motion by changing the trajectory's curvature by accelerating and decelerating its single motor and the attached mass according to our derived quasi-stable state dynamics and control laws. We present the theoretical analysis, design, and control of Rollbot, and demonstrate its ability to move in a controllable circular pattern and follow waypoints, validating the efficacy of the proposed theoretical framework.
comment: Accepted by ICRA 2026
Robotics
Above and Below: Heterogeneous Multi-robot SLAM Across Surface and Underwater Domains
Multi-robot simultaneous localization and mapping (SLAM) is a fundamental task in multi-robot operations. Robots must have a common understanding of their location and that of their team members to complete coordinated actions. However, multi-robot SLAM between Uncrewed Surface Vessels (USVs) and Autonomous Underwater Vehicles (AUVs) has primarily been achieved through acoustic pinging between robots to retrieve range measurements; a measurement technique requires that robots to be in similar locations simultaneously, have an uninterrupted path for signal propagation, and may necessitate synchronized clocks. This is especially challenging in complex, cluttered maritime environments, where structures may impede signals. However, these same structures may be observable above and below the water's surface, presenting an opportunity for inter-robot SLAM loop closure between USV and AUV data streams. This work builds upon recent research on inter-robot SLAM loop closure between USV and AUV data, extending it to propose a centralized multi-robot SLAM system. Each robot performs its state estimation, and we detect loop closures between each AUV and the USV data. These inter-robot loop closures are used to merge each robot's state estimate into a centralized graph, yielding estimates for the whole time history of the USV and all AUVs in the system. Validation is performed using real-world perceptual data in three different environments. Results show improved errors for AUVs in the multi-robot SLAM system compared to single-robot SLAM over the same trajectories. To our knowledge, this is the first instance of a multi-robot SLAM system with AUVs and USVs built on loop closures rather than acoustic distance measurements.
Efficient Multi-Robot Motion Planning with Precomputed Translation-Invariant Edge Bundles
Solving multi-robot motion planning (MRMP) requires generating collision-free kinodynamically feasible trajectories for multiple interacting robots. We introduce Kinodynamic Translation-Invariant Edge Bundles or KiTE-Extend, a planner-agnostic action selection mechanism for sampling-based kinodynamic motion planning. KiTE-Extend uses a library of trajectory segments computed offline to guide action selection during online planning, improving the ability of existing planners to identify feasible motion segments without altering state propagation, collision checking, or cost evaluation, and without changing their theoretical guarantees. While KiTE-Extend can modestly improve single-agent planners, its benefits are most clear in the multi-agent setting, where it is able to explore more effectively and significantly improve planning through the dense spatiotemporal constraints introduced by robot-robot interaction. Through experiments on multiple kinodynamic systems and environments, we show that KiTE-Extend reduces planning time and improves scalability across the three most common MRMP paradigms: centralized, prioritized, and conflict-based.
Zero-Shot Sim-to-Real Robot Learning: A Dexterous Manipulation Study on Reactive Catching
Dexterous manipulation is physics-intensive and highly sensitive to modeling errors and perception noise, making sim-to-real transfer prohibitively challenging. Domain randomization (DR) is commonly used to improve the robustness of learned policies for such tasks, but conventional DR randomizes one instance per episode, offering very limited exposure to the variability of real-world dynamics. To this end, we propose Domain-Randomized Instance Set (DRIS), which represents and propagates a set of randomized instances simultaneously, providing richer approximation of uncertain dynamics and enabling policies to learn actions that account for multiple possible outcomes. Supported by theoretical analysis, we show that DRIS yields more robust policies and alleviates the need for real-world fine-tuning, even with a modest number of instances (e.g., 10). We demonstrate this on a challenging reactive catching task. Unlike traditional catching setups that use end-effectors designed to mechanically stabilize the object (e.g., curved or enclosing surfaces), our system uses a flat plate that offers no passive stabilization, making the task highly sensitive to noise and requiring rapid reactive motions. The learned policies exhibit strong robustness to uncertainties and achieve reliable zero-shot sim-to-real transfer.
Safe Exploration for Nonlinear Processes Using Online Gaussian Process Learning
This paper proposes a safe data-driven control framework for nonlinear systems with partially known dynamics. The method ensures stability and constraint satisfaction during online learning, assuming only a stabilizable linear approximation of the process is available. Unmodeled nonlinear dynamics are captured by a Gaussian process residual learned in real time. Safety is enforced through a probabilistic control-invariant set derived from Lyapunov theory, guaranteeing high-probability stability. A convex quadratic program computes control inputs that maximize information gain while respecting probabilistic safety constraints. The framework provides finite-sample safety guarantees and allows adaptive expansion of the invariant set as uncertainty decreases. Numerical results validate the approach, demonstrating safe and informative exploration under model uncertainty: the safe set expands by about 30% while the Gaussian process root-mean-square error drops from 1.11 to 0.03.
comment: Accepted in 23rd IFAC World Congress
MVB-Grasp: Minimum-Volume-Box Filtering of Diffusion-based Grasps for Frontal Manipulation IJCNN 2026
State-of-the-art 6-DoF grasp generators excel on tabletop benchmarks with overhead cameras but struggle in frontal grasping scenarios on low-cost manipulators with constrained workspaces, where kinematic limits and approach-direction constraints cause high failure rates. We address this challenge for the Unitree Z1 arm by proposing MVB-Grasp, a novel grasping stack that injects a Minimum Volume Bounding Box (MVBB) geometric prior into diffusion-based grasp generation to dramatically improve success rates in frontal, workspace-constrained settings. Our key scientific contributions are threefold: (i) an MVBB-based geometric filter that exploits oriented bounding-box face normals to reject grasps approaching through the table or misaligned with accessible object faces in O(N) time; (ii) a combined re-scoring function that blends learned discriminator scores with face-alignment geometry α=0.85, specifically calibrated for the Z1's frontal workspace and kinematic constraints; and (iii) a systematic MuJoCo evaluation protocol measuring grasp success across object types, distances, lateral positions, and pitch orientations to validate embodiment-specific performance. We implement MVB-Grasp on a Unitree Z1 arm with an Intel RealSense D405 camera, integrating YOLOv8 object detection, GraspGen for candidate generation, Principal Component Analysis (PCA)-based MVBB fitting, and inverse-kinematics trajectory planning. Experiments across 81 MuJoCo episodes (cylinder, asymmetric box, waterbottle) demonstrate that MVB-Grasp achieves 59.3% success versus 24.7% for vanilla GraspGen, a 2.4x improvement, by filtering geometrically infeasible candidates and prioritizing face-aligned grasps suited to the Z1's frontal approach constraints. Real-world trials confirm that the MVBB prior substantially improves grasp reliability on constrained, low-cost manipulators without requiring model retraining.
comment: 8 pages, 12 figures, accepted to IJCNN 2026
Towards Generative Predictive Display for Vision-Based Teleoperation: A Zero-Shot Benchmark of Off-the-Shelf Video Models
Teleoperation systems are fundamentally limited by communication latency, which degrades situational awareness and control performance. Predictive display aims to mitigate this limitation by presenting an estimate of the current visual state rather than delayed observations. While recent advances in generative video models enable high-quality video synthesis, their suitability for latency-sensitive predictive display remains unclear. This paper presents a zero-shot benchmark of off-the-shelf generative video models for short-horizon predictive display, without task-specific fine-tuning. We formulate the problem as rollout-based future frame prediction and develop a unified benchmarking pipeline using simulated driving data from the CARLA simulator. Five publicly released video models spanning transformer-based and diffusion-based families are evaluated across two resolutions and two conditioning regimes (multi-frame and single-frame). Performance is assessed using prediction accuracy (mean absolute difference), per-rollout latency, peak GPU memory usage, and temporal error evolution across the prediction horizon. On this zero-shot benchmark, no tested model simultaneously achieves low rollout error, non-divergent per-step error behavior, and real-time inference at the source frame rate. Increasing model scale or resolution yields limited and, in some cases, inverted improvements. These findings highlight a gap between general-purpose generative video synthesis and the requirements of predictive display in teleoperation, suggesting that practical deployment will require either explicit short-horizon temporal supervision, in-domain adaptation, or aggressive inference optimization rather than direct application of off-the-shelf models. Code, configurations, and qualitative results are released on the project page: https://bimilab.github.io/paper-GenPD
ASACK : Adaptive Safe Active Continual Koopman Learning for Uncertain Systems with Contractive Guarantees
Koopman operator theory provides a powerful framework for representing nonlinear dynamics through a linear operator acting on lifted observables, enabling the use of linear control techniques for nonlinear systems. However, Koopman models are typically learned from data and often degrade in performance under model uncertainty and distributional shifts between training and deployment. Although several works have explored online adaptation to address this issue, many rely on neural network-based updates that introduce significant computational overhead and lack formal safety guarantees, limiting their suitability for real-time and safety-critical robotic applications. In this work, we propose a unified framework for continual adaptive Koopman learning that enables safe and efficient online refinement of learned models during task execution. An autoencoder-based Koopman model is first learned offline and subsequently refined online through a contractive adaptation law, which provides theoretical convergence guarantees under distributional shifts and model uncertainty. To improve data efficiency and accelerate model refinement, the adaptation mechanism is integrated with an active learning strategy that drives the system to collect informative data while accomplishing task objectives. The resulting control problem is formulated as a nonconvex optimization problem incorporating both active learning objectives and safety constraints. We further derive theoretical bounds on model approximation error and show how these bounds can be incorporated within a robust Model Predictive Control (MPC) framework to provide formal safety guarantees. The proposed approach unifies learning, excitation, and safety within a single control framework without sacrificing real-time feasibility. Extensive simulation and experimental studies demonstrate superior performance compared to state-of-the-art baselines.
ORICF -- Open Robotics Inference and Control Framework ICRA26
Recent advances in artificial intelligence (AI) have enabled effective perception and language models for robots, but their deployment remains computationally expensive, increasing latency and energy use. This work presents the Open Robotics Inference and Control Framework (ORICF), a modular, declarative, and model-agnostic platform for composing multimodal robotic inference pipelines. ORICF integrates input/output (I/O) adapters, pluggable inference back ends, and post-processing logic, while lightweight YAML specifications allow models, hardware targets, and data channels to be changed without code modification. The framework also supports edge offloading, i.e., executing inference on nearby external computers instead of onboard the robot. ORICF is evaluated on a mobile robot that answers spoken queries about people detected in its camera stream by combining automatic speech recognition (ASR), a large language model (LLM), and a convolutional neural network (CNN) detector through Robot Operating System 2 (ROS2). Compared with onboard execution, ORICF-based edge deployment reduces robot-side compute utilization by up to 83.16% and estimated energy consumption by 65.8%, while preserving modularity and reproducibility.
comment: Accepted in ICRA26 Workshop: 8th International Workshop on Robotics Software Engineering (RoSE 26)
Minimizing Worst-Case Weighted Latency for Multi-Robot Persistent Monitoring: Theory and RL-Based Solutions
We study multi-robot persistent monitoring on weighted graphs, where node weights encode monitoring priorities and edge weights encode travel distances. The goal is to design joint robot trajectories that minimize the worst-case weighted latency across all nodes over an infinite time horizon. The widely adopted worst-case latency objective evaluates team performance over the entire time horizon and therefore may fail to distinguish strategies with poor transient behavior but strong asymptotic performance. To address this limitation, we propose a family of tail-performance objectives that generalize the standard objective and study the resulting functional optimization problems. We establish several key theoretical properties, including the existence of optimal strategies, relationships among the proposed objectives and their corresponding optimization problems, approximation by periodic solutions to arbitrary accuracy, and reductions to event-driven decision models with discretized waiting times. Building on these results, we construct an equivalent event-driven Markov decision process (MDP), called the Tail Worst-case Latency-Optimizing Markov Decision Process (TWLO-MDP), which reformulates the tail-performance objective as a standard average-reward criterion. We then develop reinforcement-learning-based solution methods for the TWLO-MDP and introduce the multi-robot monitoring benchmark (M2Bench), a unified platform that supports the evaluation and comparison of heuristic and learning-based monitoring algorithms. Experiments on synthetic and realistic monitoring scenarios show that our methods effectively reduce the worst-case weighted latency and outperform representative baselines.
SABER: A Scalable Action-Based Embodied Dataset for Real-World VLA Adaptation
Robotic deployment in real-world environments depends on rich, domain-specific action data as much as on strong model architecture. General-purpose robot foundation models show modest performance in complex unseen tasks such as manipulation in a retail domain when applied out of the box. The root cause is a data gap: retail environments are structurally absent from general robot pretraining distributions, and the path to filling that gap through teleoperation is prohibitively expensive, logistically constrained, and difficult to scale. We introduce SABER, a high-fidelity retail robotics action dataset built from over 100 hours of natural in-store capture across multiple real grocery environments. Egocentric footage from head-mounted cameras records fine-grained hand activity at the point of interaction, while exocentric 360-degree scene footage from DreamVu's ALIA camera simultaneously observes all actors and activities across the entire space. This combination yields a uniquely complete picture of human retail behavior: dexterous hand activity, whole-body motion, and scene dynamics, all captured without staging, scripting, or teleoperation overhead. The SABER corpus contains 44.8K training samples across three action representation streams: 25K latent action sequences via LAPA-style encoding, 18.6K dexterous hand-pose trajectories retargeted to robot joint space, and 1.2K whole-body synchronized motion sequences retargeted to a humanoid embodiment. When applied to GR00T N1.6 via a shared-backbone multi-task post-training recipe, SABER yields a mean success rate of 29.3% across ten retail manipulation tasks -- more than 2.19x over fine-tuning baselines (13.4%). SABER demonstrates that the path to capable retail robots runs through better data, which can be collected today, at scale, without a robot in the loop. The dataset and code are available at https://dreamvu.ai/saber
Neuromorphic Reinforcement Learning for Quadruped Locomotion Control on Uneven Terrain
Reinforcement learning (RL) has enabled robust quadruped locomotion over complex terrain, but most learned controllers are trained offline with backpropagation in massively parallel simulation and deployed as fixed policies, limiting adaptation to terrain variation, payload changes, actuator wear, and other real-world conditions under onboard power constraints. Local learning provides a potential path toward energy-aware on-robot adaptation by replacing global backpropagation graphs with updates driven by local neural states, making the learning rule more compatible with neuromorphic and in-memory computing substrates. This work proposes an equilibrium-propagation (EP)-based proximal policy optimization (PPO) framework for uneven-terrain quadruped locomotion. The controller combines a bio-inspired central pattern generator (CPG) policy with a residual postural adjustment policy, while replacing conventional backpropagation-trained policy and value networks with EP-enabled local learning. To train stochastic continuous-control policies with EP, we derive an EP-compatible PPO output-nudging signal and introduce a two-sided ratio clipping mechanism that stabilizes policy updates during relaxation. Experiments on a 12-DoF A1 quadruped show that the proposed controller achieves stable policy convergence in a two-stage uneven terrain locomotion task. Its locomotion performance is comparable to a backpropagation-trained PPO baseline in success rate, velocity tracking, actuator power, and body stability, while improving GPU memory efficiency by 4.3\(\times\) compared with backpropagation through time (BPTT). These results suggest that local equilibrium-based learning can support high-dimensional embodied locomotion and provide an algorithmic foundation for low-power on-robot adaptation and fine-tuning.
DeformMaster: An Interactive Physics-Neural World Model for Deformable Objects from Videos
World models for deformable objects should recover not only geometry and appearance, but also underlying physical dynamics, interaction grounding, and material behavior. Learning such a model from real videos is challenging because deformable linear, planar, and volumetric objects evolve under high-dimensional deformation, noisy interactions, and complex material response. The model must therefore infer a physical state from visual observations, roll it forward under new interactions, and render the resulting dynamics with high visual fidelity. We present DeformMaster, a video-derived interactive physics--neural world model that turns real interaction videos into an online interactive model of deformable objects within a unified dynamics-and-appearance framework. DeformMaster preserves structured physical rollout while using a neural residual to compensate for unmodeled effects, grounds sparse hand motion as distributed compliant actuator for hand--continuum interaction, represents material response with spatially varying constitutive experts, and drives high-fidelity 4D appearance from the predicted physical evolution. Experiments on real-world deformable-object sequences demonstrate DeformMaster's ability to roll out future dynamics and render dynamic appearance, outperforming state-of-the-art baselines while supporting novel action rollout, material-parameter variation, and dynamic novel-view synthesis.
PhysHanDI: Physics-Based Reconstruction of Hand-Deformable Object Interactions ICML 2026
While existing methods for reconstructing hand-object interactions have made impressive progress, they either focus on rigid or part-wise rigid objects-limiting their ability to model real-world objects (e.g., cloth, stuffed animals) that exhibit highly non-rigid deformations-or model deformable objects without full 3D hand reconstruction. To bridge this gap, we present PhysHanDI (Physics-based Reconstruction of Hand and Deformable Object Interactions), a framework that enables full 3D reconstruction of both interacting hands and non-rigid objects. Our key idea is to physically simulate object deformations driven by forces induced from densely reconstructed 3D hand motions, ensuring that the reconstructed object dynamics are both physically plausible and coherent with the interacting hand movements. Furthermore, we demonstrate that such simulation of object deformations can, in turn, refine and improve hand reconstruction via inverse physics. In experiments, PhysHanDI outperforms the state-of-the-art baseline across reconstruction and future prediction.
comment: Accepted to ICML 2026
Drift is a Sampling Error: SNR-Aware Power Distributions for Long-Horizon Robotic Planning ICML 2026
Despite rapid progress in Vision-Language-Action (VLA) models for robotic control, instruction drift remains a persistent failure mode in long-horizon tasks. This paper reconceptualizes this phenomenon, positing that instruction drift is fundamentally a systematic sampling error: local greedy sampling is prone to collapsing into "Negative Pivotal Windows"--irreversible local optima with high local probability that sever global success pathways. To address this, we propose Context-Aware Power Sampling (CAPS), a training-free inference-time computation framework. CAPS leverages power distributions to sharpen global trajectory probabilities, enabling lookahead search over the model's conditional generative trajectory distribution. Furthermore, we introduce a metacognitive control mechanism based on Signal-to-Noise Ratio (SNR). This mechanism triggers adaptive MCMC search solely when drift risk is detected, enabling a dynamic transition from "intuitive fast thinking" to "rational slow search." Experiments on RoboTwin, Simpler-WindowX, and Libero-long benchmarks show that CAPS achieves substantial improvements over strong baselines, including OpenVLA and TACO, without parameter updates. These results support the effectiveness of adaptive inference-time computation for improving long-horizon robustness in embodied control.
comment: Accepted at ICML 2026
QueST: Persistent Queries as Semantic Monitors for Drift Suppression in Long-Horizon Tracking
Tracking points in videos is typically formulated as frame-to-frame correspondence, where each point is matched locally to the next frame. While this works over short horizons, errors accumulate under articulation, occlusion, and viewpoint change, leading to silent semantic drift that existing trackers cannot detect or correct. In this work, we revisit long-horizon tracking from a monitoring perspective and introduce QueST, a monitoring-by-design framework that treats interaction-relevant entities as persistent semantic queries rather than transient point tracks. Instead of local propagation, each query attends globally over spatio-temporal video features at every time-step, providing a stable semantic anchor across time. We further constrain query trajectories with lightweight 3D physical grounding, using geometric plausibility to suppress unbounded drift under occlusion. We evaluate QueST on long-horizon articulated sequences from PartNet-Mobility in SAPIEN and compare against RAFT-3D, CoTracker, and TAP-Net. QueST substantially reduces terminal drift achieving a 67.7% Absolute Point Error (APE) improvement over TAP-Net while better preserving identity over extended horizons. Our results show that embedding semantic monitoring directly into perception enables more reliable long-horizon tracking under distribution shift.
LASSA Architecture-Based Autonomous Fault-Tolerant Control of Unmanned Underwater Vehicles
Unmanned underwater vehicles (UUVs) operate persistently in communication-constrained environments, thus requiring high-level autonomous fault-tolerant control under faulty operating conditions. Existing approaches rely heavily on predefined hard-coded rules and struggle to achieve effective fault-tolerant control against unforeseen faults. Although large language models (LLMs) possess powerful cognitive and reasoning capabilities, their inherent hallucinations remain a major obstacle to their application in UUV control systems. This paper proposes an intelligent control method based on the LASSA (LLM-based Agent with Solver, Sensor and Actuator) architecture. Within this architecture, an LLM identifies unknown faults and accomplishes task replanning via autonomous reasoning without hard-coded rules; the intelligent agent undertakes perception, scheduling and decision evaluation; the solver verifies physical boundary feasibility constraints prior to command transmission to the actuators. This architecture suppresses physically infeasible LLM hallucinations and ensures interpretable, verifiable decision-making. Moreover, it enables fast-slow dual closed-loop collaborative control, where the slow loop undertakes high-level dynamic decision-making and the fast loop guarantees high-frequency real-time control, simultaneously balancing decision intelligence and control timeliness. Lake experiments under normal and lower-rudder-fault conditions show that the framework detects trajectory tracking abnormalities, replans the route by adjusting the turning radius from 4m to 12m and reducing speed from 2kn to 1kn, passes all three solver constraints on the first invocation, and guides the UUV to complete the full mission; under normal conditions no false fault alarms are raised throughout the run.
High Precision Hydraulic Excavator Control for Heavy-Duty Grading
High-precision heavy-duty grading is a common step in earthworks, traditionally carried out manually by skilled operators. Removing a significant amount of material while achieving a high-precision surface requires substantial machine-specific experience. Different hydraulic architectures react differently to operator inputs and soil interaction forces, which makes generalizable controllers challenging. In this paper, we present an autonomous controller that achieves high-precision grading at expert-operator speed on Load Sensing and Negative Flow Control machines alike. We split our controller into two parts: (1) a hydraulic-aware low-level loop that is hydraulic architecture-specific and (2) a path-tracking layer that coordinates joint motions and responses. Through a calibration process, our technique is applicable to load-sensing and negative-flow-control machinery. To showcase its versatility, we benchmark our approach on two excavators with different hydraulics and compare it against a commercial state-of-the-art solution. Our technique (RMSE 1.8~cm) outperforms the commercial solution (RMSE 4.7~cm) in precision by a factor of 2.6 and improves machine usage by leveraging the maximum function pressure, as opposed to commercial solutions that stall prematurely.
comment: 12 pages 19 figures, RSS 2026
Beyond Isolation: A Unified Benchmark for General-Purpose Navigation
The pursuit of general-purpose embodied agents is hindered by fragmented evaluation protocols that isolate navigation skills and fixate on specific robot morphologies, failing to reflect real-world scenarios where agents must orchestrate diverse behaviors across varying embodiments. To bridge this gap, we introduce OmniNavBench, a benchmark for cross-skill coordination and cross-embodiment generalization. OmniNavBench introduces three paradigm shifts: (1) Compositional Complexity. We propose composite instructions that interleave sub-tasks from 6 categories (PointNav, VLN, ObjectNav, SocialNav, Human Following and EQA), compelling agents to transition between exploration, interaction, and social compliance within a single episode. (2) Morphological Universality and Sensor Flexibility. We present a simulation platform that breaks the reliance on single-morphology evaluation, enabling generalization tests across humanoid, quadrupedal, and wheeled robots, with a modular sensor interface and 170 environments blending synthetic assets with real-world scans. (3) Demonstrations Quality. Moving beyond shortest-path algorithms, we curate 1779 expert trajectories via human teleoperation, capturing behavioral nuances such as exploratory glance and anticipatory avoidance. Extensive evaluations demonstrate that current methods, despite their claimed unified design, struggle with the complex, interleaved nature of general-purpose navigation. This exposes a critical disparity between existing capabilities and real-world deployment demands, underscoring OmniNavBench as a testbed for the next generation of generalist navigators. Dataset, code, and leaderboard are available at http://omninavbench.cloud-ip.cc.
comment: Accepted at RSS 2026
MAG-VLAQ: Multi-modal Aerial-Ground Query Aggregation for Cross-View Place Recognition
Multi-modal cross-view place recognition remains a fundamental challenge in computer vision and robotics due to the severe viewpoint, modality, and spatial-structure discrepancies between ground observations and aerial references. To address this challenge, we present MAG-VLAQ, a foundation-model-enhanced query aggregation framework for multi-modal aerial-ground cross-view place recognition. Specifically, our approach leverages pre-trained foundation models to extract dense visual tokens from both ground and aerial images, as well as expressive geometric tokens from ground LiDAR observations. These heterogeneous tokens are then projected into a shared embedding space for cross-modal alignment and fusion. As our main contribution, we propose ODE-conditioned VLAQ, which tightly couples neural ordinary differential equations (ODE)-based RGB-LiDAR fusion with vectors of locally aggregated queries (VLAQ). In this design, the VLAQ query centers are dynamically adapted according to the fused multi-modal state. This mechanism allows the final global descriptor to preserve globally learned retrieval prototypes while remaining responsive to scene-specific visual and geometric evidence, significantly improving aerial-ground matching. Extensive experiments on KITTI360-AG and nuScenes-AG validate the effectiveness of our proposed MAG-VLAQ. Notably, on KITTI360-AG, our MAG-VLAQ nearly doubles the state-of-the-art performance, achieving 61.1 Recall@1 in the satellite setting, compared with 34.5 from the closest competing approach.
comment: 16 pages, 4 figures, 3 tables
RePO-VLA: Recovery-Driven Policy Optimization for Vision-Language-Action Models
Vision-Language-Action (VLA) models remain brittle in long-horizon, contact-rich manipulation because success-only imitation provides little supervision for execution drift, while failed rollouts are often discarded. We introduce RePO-VLA, a recovery-driven policy optimization framework that assigns distinct roles to success, recovery, and failure trajectories. RePO-VLA first applies Recovery-Aware Initialization (RAI), slicing recovery segments and resetting history so corrective actions depend on the current adverse state rather than the preceding failure. It then learns a Progress-Aware Semantic Value Function (PAS-VF), aligning spatiotemporal trajectory features with instructions and successful references. The resulting labels salvage useful failure prefixes via reliability decay, while low-value labels mark drift and terminal breakdowns, teaching differences among nominal, failed, and corrective actions. The data engine turns adverse states into planner-generated or human-collected corrective rollouts, teaching recovery to the success manifold. Value-Conditioned Refinement (VCR) trains the policy to prefer high-progress actions. At deployment, a fixed high value ($v=1.0$) biases actions toward the learned success manifold without online failure detectors or heuristic retries. We introduce FRBench, with standardized error injection and recovery-focused evaluation. Across simulated and real-world bimanual tasks, RePO-VLA improves robustness, raising adversarial success from 20% to 75% on average and up to 80% in scaled real-world trials.
NEXUS: Continual Learning of Symbolic Constraints for Safe and Robust Embodied Planning
While Large Language Models (LLMs) have catalyzed progress in embodied intelligence, a fundamental gap between their inherent probabilistic uncertainty and the strict determinism and verifiable safety required in the physical world. To mitigate this gap, this paper introduces NEXUS, a modular framework designed for continual learning in embodied agents. Different from prior works that treat symbolic artifacts merely as static interfaces, NEXUS leverages them for symbolic grounding and knowledge evolution. The framework explicitly decouples physical feasibility from safety specifications: capability of agents is improved through closed-loop execution feedback, while probabilistic risk assessments are grounded into deterministic hard constraints to establish a rigorous pre-action defense. Experiments on SafeAgentBench demonstrate that NEXUS achieves superior task success rates while effectively refusing unsafe instructions, exhibiting robust defense against adversarial attacks, and progressively improving planning efficiency through knowledge accumulation.
Safety-Critical LiDAR-Inertial Odometry with On-Manifold Deterministic Protection Level
In safety-critical scenarios, the protection level of the autonomous navigation system is crucial for enabling mobile robots to perform safe tasks. However, existing studies on probabilistic navigation systems for robots usually perform offline accuracy evaluations using limited datasets and assume that the results can be applied to unknown real-world environments. As a result, current autonomous mobile robots often lack protection levels for online safety assessment. To fill this gap, we propose a safety-critical LiDAR-inertial odometry (LIO) that provides deterministic protection levels based on on-manifold deterministic state estimation. By adopting the unknown but bounded assumption, we derive a neat closed-form relationship between point cloud noise and the uncertainty of the estimation from the iterated closest point algorithm. Using this relationship, we design an on-manifold ellipsoidal set-membership filter and implement it within the LIO system. Leveraging the properties of the set-membership filter, our system offers the feasible sets of the estimated locations as the deterministic protection levels, serving as safety references for the robots' downstream autonomous operations. The experimental results show that our system can provide effective deterministic online safety references for diverse robots in various environments.
Mismatch-Aware Adaptive Constraint Tightening for Bicycle-Model Trajectory Optimization
Trajectory optimization for autonomous vehicles usually relies on the kinematic bicycle model because of its computational simplicity. However, when the planned trajectory is executed under the true vehicle dynamics, which include lateral slip, tire stiffness and yaw-lateral coupling, safety constraints can be violated owing to the model mismatch. In this paper, we make three theoretical contributions. First, we derive a characteristic speed $v_c=\sqrt{C_αL/M}$ which separates two different mismatch regimes: below $v_c$ the dynamic bicycle initially oversteers inward (safe); above $v_c$ it understeers outward (safety-critical). Second, we prove that the peak outward deviation $\varepsilon^*$ follows a $T^2$ horizon scaling whose coefficient transitions between a transient bound $\frac{1}{2}(v^2-v_c^2)κ$ and a steady-state bound. Third, we obtain a simulation-free analytical coefficient $a_2^{\mathrm{anal}}=\frac{1}{2}(1-v_c^2/v_{\max}^2)T^2$ that is computable from vehicle parameters and the planning horizon alone. Putting these together, we propose Mismatch-Aware Adaptive Constraint Tightening (MACT), $ε(v,κ)=a_2 v^2|κ|$, which replaces a fixed worst-case margin by a state-dependent one that is large at high speed/curvature but nearly zero on gentle paths. Eight numerical experiments confirm the scaling laws. MACT reaches 100% safety with 84% less wasted margin than a fixed-margin baseline on the 2-DOF vehicle, extends to a nonlinear leaning bicycle, and in a closed-loop direct-shooting MPC comparison it cuts the applied margin by 34% compared with tube MPC while keeping the same safety.
PECMAN: Perception-enabled Collaborative Multi-Agent Navigation in Unknown Environments
Most path planners assume fully known, static environments, assumptions that fail when robots navigate in dynamic and partially observable environments. SMART-3D addresses these issues by real-time replanning, where it morphs the underlying RRT* tree whenever new obstacles or structures are discovered in the environment. Instead of rebuilding the tree entirely from scratch, SMART-3D prunes invalid nodes and edges and subsequently repairs the disjoint subtrees at hot-nodes to find a new path, thus providing high computational efficiency for real-time adaptability. We extend SMART-3D to perception-enabled collaborative multi-agent navigation (PECMAN) in unknown environments. PECMAN is built upon distributed tree morphing and shared perception strategies, where each agent reacts to environmental changes and morphs its respective tree to replan its path, while simultaneously broadcasting newly discovered structures to other agents, thus enabling them to proactively replan even in areas that have not yet been explored by them. This approach reduces redundant reactions and unnecessary replannings of the agents due to improved situational awareness. The performance of PECMAN was evaluated by 28,000 multi-agent simulations on seven 2D scenarios with different case studies. The results show that PECMAN achieves up to 52% reduction in the team-completion time, while maintaining near 100% success rates. Finally, PECMAN was tested by real experiments on two autonomous robots in a building environment.
Learning Tactile-Aware Quadrupedal Loco-Manipulation Policies
Quadrupedal loco-manipulation is commonly built on visual perception and proprioception. Yet reliable contact-rich manipulation remains difficult: vision and proprioception alone cannot resolve uncertain, evolving interactions with the environment. Tactile sensing offers direct contact observability, but scalable tactile-aware learning framework for quadrupedal loco-manipulation is still underexplored. In this paper, we present a tactile-aware loco-manipulation policy learning pipeline with a hierarchical structure. Our approach has two key components. First, we leverage real-world human demonstrations to train a tactile-conditioned visuotactile high-level policy. This policy predicts not only end-effector trajectories for manipulation, but also the evolving tactile interaction cues that characterize how contact should develop over time. Second, we perform large-scale reinforcement learning in simulation to learn a tactile-aware whole-body control policy that tracks diverse commanded trajectories and tactile interaction cues, and transfers zero-shot to the real world. Together, these components enable coordinated locomotion and manipulation under contact-rich scenarios. We evaluate the system on real-world contact-rich tasks, including in-hand reorientation with insertion, valve tightening, and delicate object manipulation. Compared to vision-only and visuotactile baselines, our method improves performance by 28.54% on average across these tasks.
Learning from Trials and Errors: Reflective Test-Time Planning for Embodied LLMs
Embodied LLMs endow robots with high-level task reasoning, but they cannot reflect on what went wrong or why, turning deployment into a sequence of independent trials where mistakes repeat rather than accumulate into experience. Drawing upon human reflective practitioners, we introduce Reflective Test-Time Planning, which integrates two modes of reflection: \textit{reflection-in-action}, where the agent uses test-time scaling to generate and score multiple candidate actions using internal reflections before execution; and \textit{reflection-on-action}, which uses test-time training to update both its internal reflection model and its action policy based on external reflections after execution. We also include retrospective reflection, allowing the agent to re-evaluate earlier decisions and perform model updates with hindsight for proper long-horizon credit assignment. Experiments on our newly-designed Long-Horizon Household benchmark and MuJoCo Cupboard Fitting benchmark show significant gains over baseline models, with zero-shot generalization to photorealistic HM3D environments and real-robot experiments on a Franka Panda arm. Ablations confirm that reflection-in-action and reflection-on-action are mutually dependent, and that retrospective reflection achieves better credit assignment than step-wise external feedback at lower computational overhead. Qualitative analyses further highlight behavioral correction through reflection.
Learning When to Jump for Off-road Navigation
Low speed does not always guarantee safety in off-road driving. For instance, crossing a ditch may be risky at a low speed due to the risk of getting stuck, yet safe at a higher speed with a controlled, accelerated jump. Achieving such behavior requires path planning that explicitly models complex motion dynamics, whereas existing methods often neglect this aspect and plan solely based on positions or a fixed velocity. To address this gap, we introduce Motion-aware Traversability (MAT) representation to explicitly model terrain cost conditioned on actual robot motion. Instead of assigning a single scalar score for traversability, MAT models each terrain region as a Gaussian function of velocity. During online planning, we decompose the terrain cost computation into two stages: (1) predict terrain-dependent Gaussian parameters from perception in a single forward pass, (2) efficiently update terrain costs for new velocities inferred from current dynamics by evaluating these functions without repeated inference. We develop a system that integrates MAT to enable agile off-road navigation and evaluate it in both simulated and real-world environments with various obstacles. Results show that MAT achieves real-time efficiency and enhances the performance of off-road navigation, reducing path detours by 75% while maintaining safety across challenging terrains.
Q-learning with Adjoint Matching
We propose Q-learning with Adjoint Matching (QAM), a novel TD-based reinforcement learning (RL) algorithm that tackles a long-standing challenge in continuous-action RL: efficient optimization of an expressive diffusion or flow-matching policy with respect to a parameterized Q-function. Effective optimization requires exploiting the first-order information of the critic, but it is challenging to do so for flow or diffusion policies because direct gradient-based optimization via backpropagation through their multi-step denoising process is numerically unstable. Existing methods work around this either by only using the value and discarding the gradient information, or by relying on approximations that sacrifice policy expressivity or bias the learned policy. QAM sidesteps both of these challenges by leveraging adjoint matching, a recently proposed technique in generative modeling, which transforms the critic's action gradient to form a step-wise objective function that is free from unstable backpropagation, while providing an unbiased, expressive policy at the optimum. Combined with temporal-difference backup for critic learning, QAM consistently outperforms prior approaches on hard, sparse reward tasks in both offline and offline-to-online RL.
comment: 32 pages, 8 figures, 7 tables
DexWrist: A Robotic Wrist for Constrained and Dynamic Manipulation
Development of dexterous manipulation hardware has primarily focused on hands and grippers. However, these end-effectors are often paired with bulky and highly stiff wrists that limit performance in human environments. More designs have adopted backdrivable actuation, but are still difficult to model and control due to coupled kinematics or high mechanical inertia from heavy links. We present DexWrist, a robotic wrist that advances manipulation in highly constrained environments and enables dynamic, contact-rich tasks. We achieve this by combining quasi-direct drive actuation with a decoupled parallel kinematic mechanism in a compact design. It delivers 3.75 +/- 0.05 Nm rated torque, 0.33 +/- 0.06 Nm backdrive torque, 10.15 +/- 1.34 Hz torque bandwidth, +/- 40 degrees ROM in both DOFs, and a one-to-one motor-to-DOF mapping in a 0.97 kg package. In practice, these properties increase workspace in cluttered environments and stabilize contact without the need for finely tuned admittance control. We evaluate DexWrist as a drop-in wrist upgrade in simulation and on two robot arms performing representative constrained and contact-rich tasks. In learned policy evaluations, DexWrist achieved 50-76% relative improvements in success rate, and reduced autonomous task completion times by 3-5x. More details about DexWrist can be found at https://dexwrist.csail.mit.edu.
comment: 9 pages, 8 figures. Submitted to RA-L 2026
Semantic-Aware UAV Command and Control for Efficient IoT Data Collection ICASSP
Unmanned Aerial Vehicles (UAVs) have emerged as a key enabler technology for data collection from Internet of Things (IoT) devices. However, effective data collection is challenged by resource constraints and the need for real-time decision-making. In this work, we propose a novel framework that integrates semantic communication with UAV command-and-control (C&C) to enable efficient image data collection from IoT devices. Each device uses Deep Joint Source-Channel Coding (DeepJSCC) to generate a compact semantic latent representation of its image to enable image reconstruction even under partial transmission. A base station (BS) controls the UAV's trajectory by transmitting acceleration commands. The objective is to maximize the average quality of reconstructed images by maintaining proximity to each device for a sufficient duration within a fixed time horizon. To address the challenging trade-off and account for delayed C&C signals, we model the problem as a Markov Decision Process and propose a Double Deep Q-Learning (DDQN)-based adaptive flight policy. Simulation results show that our approach outperforms baseline methods such as greedy and traveling salesman algorithms, in both device coverage and semantic reconstruction quality.
comment: Accepted for publication at the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP). v2: added clarification on the DDQN implementation and TSP algorithm
3DRO: Lidar-level SE(3) Direct Radar Odometry Using a 2D Imaging Radar and a Gyroscope ICRA 2026
Recently, the robotics community has regained interest in radar-based perception and state estimation. A 2D imaging radar provides dense 360deg information about the environment. Despite the radar antenna's cone of emission and reception, the collected data is generally assumed to be limited to the plane orthogonal to the radar's spinning axis. Accordingly, most methods based on 2D imaging radars only perform SE(2) state estimation. This paper presents 3DRO, an extension of the SE(2) Direct Radar Odometry (DRO) framework to perform state estimation in SE(3). While still assuming planarity of the data through DRO's 2D velocity estimates, it integrates 3D gyroscope measurements over SO(3) to estimate SE(3) ego motion. While simple, this approach provides lidar-level odometry accuracy as demonstrated using 643km of data from the Boreas-RT dataset.
comment: Accepted for presentation at the ICRA 2026 Workshop on Radar in Robotics (poster: https://drive.google.com/file/d/1P_iBrGxPiZL644B-dHxbvdY-UJUzd4Kp/view )
Language Conditioned Multi-Finger Dexterous Manipulation Enabled by Physical Compliance and Switching of Controllers
Human dexterity arises from combining high-level task reasoning with finger-level dexterity control and physical compliance at the muscle and skin layers. In robotics, large Vision-Language-Action (VLA) models demonstrate text-conditioned high-level planning across diverse manipulation tasks, typically using pincher grippers. Smaller imitation-learning policies, conversely, show success in dexterous tasks using higher degree-of-freedom (DoF) grippers, but only for limited-scope tasks. However, few approaches combine high-level reasoning with dexterous, robust low-level control, which requires both intelligent control and compliant robot design. We propose a method inspired by the two-channel hypothesis of human motor control that combines these capabilities using a switching controller integrating high-level VLAs and smaller control models. Coordination between the two channels is managed through an event-driven switching mechanism that monitors subtask progression and completion, requiring minimal demonstration data by fine-tuning the VLA to predict event signals and training lightweight subtask-level dexterous policies. This approach is applied to our custom compliant 13-DoF anthropomorphic robotic hand, where compliance can be modulated to evaluate its impact on dexterity and robustness when combined with an autonomous policy. We show that hardware-level compliance in robotic fingers enables passive adaptation to disturbances and improves contact stability. The methodology is validated across a range of language-conditioned dexterous tasks. To demonstrate modularity, we show that adaptation to additional dexterous skills and different compliant hands can be achieved without retraining the VLA model. This provides an efficient, scalable, cross-embodiment approach to dexterity that leverages compliance while retaining the advantages of large AI models.
HiVLA: A Visual-Grounded-Centric Hierarchical Embodied Manipulation System
While end-to-end Vision-Language-Action (VLA) models offer a promising paradigm for robotic manipulation, fine-tuning them on narrow control data often compromises the profound reasoning capabilities inherited from their base Vision-Language Models (VLMs). To resolve this fundamental trade-off, we propose HiVLA, a visual-grounded-centric hierarchical framework that explicitly decouples high-level semantic planning from low-level motor control. In high-level part, a VLM planner first performs task decomposition and visual grounding to generate structured plans, comprising a subtask instruction and a precise target bounding box. Then, to translate this plan into physical actions, we introduce a flow-matching Diffusion Transformer (DiT) action expert in low-level part equipped with a novel cascaded cross-attention mechanism. This design sequentially fuses global context, high-resolution object-centric crops and skill semantics, enabling the DiT to focus purely on robust execution. Our decoupled architecture preserves the VLM's zero-shot reasoning while allowing independent improvement of both components. Extensive experiments in simulation and the real world demonstrate that HiVLA significantly outperforms state-of-the-art end-to-end baselines, particularly excelling in long-horizon skill composition and the fine-grained manipulation of small objects in cluttered scenes.
comment: Project Page: https://tianshuoy.github.io/HiVLA-page/
AlignDrive: Aligned Lateral-Longitudinal Planning for End-to-End Autonomous Driving
Practical autonomous driving requires models that generalize by reasoning through spatial-temporal possibilities to exclude unsafe outcomes. While state-of-the-art (SOTA) methods use parallel planning architectures, they fail to explicitly couple speed decisions with agent behavior along the driving path, leading to suboptimal coordination. To address this, we propose a cascaded framework that transforms longitudinal planning from an independent prediction task into a path-conditioned reasoning process. On the model side, we introduce an anchor-based regression design that conditions longitudinal prediction on the lateral drive path, and reformulate longitudinal planning as 1D displacement prediction along the path. This reduces geometric uncertainty and sharpens the model's focus on interaction-driven dynamics. On the data side, we introduce a planning-oriented data augmentation strategy that simulates rare safety-critical events by programmatically inserting agents and relabeling longitudinal targets to enforce collision avoidance. Evaluated on the challenging Bench2Drive benchmark, our method achieves SOTA performance with a driving score of 89.07 and a success rate of 73.18%, demonstrating significantly improved coordination and safety. Further evaluation on Fail2Drive confirms strong generalization to rare edge cases where parallel formulations typically fail. Project page:https://yanhaowu.github.io/AlignDrive/.
comment: underreview
Wavelet Policy: Imitation Learning in the Scale Domain with World Prior Memory
Conventional visuomotor imitation learning usually predicts future robot actions directly in the time domain. Such formulations often have limited physical scene awareness and weak long-horizon memory. In contrast, world-model-based perception and memory-augmented policies can improve world awareness with substantial computation overhead. In this work, we propose Wavelet Policy, a lightweight imitation learning framework that combines World Prior Memory (WPM) with wavelet-based multi-scale action modeling. Our key idea is to encode persistent physical scene structure from static background images into compact memory tokens, which are fused into world-prior tokens and injected into the encoder during forward propagation. Based on this memory-conditioned representation, We further perform wavelet-domain decomposition over horizon-aligned latent action tokens and adopt a Single-Encoder Multiple-Decoder (SE2MD) architecture to model latent components at different temporal scales. The resulting latent subbands are reconstructed through inverse wavelet transform and finally projected into executable action chunks. To facilitate efficient world prior learning, we introduce a world-prior adaptation loss, encouraging the background encoder to retain persistent scene knowledge while remaining lightweight and stable. Extensive experiments on four simulated and six real-world robotic manipulation tasks show that Wavelet Policy consistently outperforms strong baselines. These results demonstrate that combining scale-domain action modeling with world-prior memory provides an effective and efficient solution for long-horizon embodied manipulation. We release the source code, data and model checkpoint of simulation task at https://github.com/lurenjia384/Wavelet_Policy.
Toward Reliable Sim-to-Real Predictability for MoE-based Robust Quadrupedal Locomotion
Reinforcement learning has shown strong promise for quadrupedal agile locomotion, even with proprioception-only sensing. In practice, however, sim-to-real gap and reward overfitting in complex terrains can produce policies that fail to transfer, while physical validation remains risky and inefficient. To address these challenges, we introduce a unified framework encompassing a Mixture-of-Experts (MoE) locomotion policy for robust multi-terrain representation with RoboGauge, a predictive assessment suite that quantifies sim-to-real transferability. The MoE policy employs a gated set of specialist experts to decompose latent terrain and command modeling, achieving superior deployment robustness and generalization via proprioception alone. RoboGauge further provides multi-dimensional proprioception-based metrics via sim-to-sim tests over terrains, difficulty levels, and domain randomizations, enabling reliable MoE policy selection without extensive physical trials. Experiments on a Unitree Go2 demonstrate robust locomotion on unseen challenging terrains, including snow, sand, stairs, slopes, and 30 cm obstacles. In dedicated high-speed tests, the robot reaches 4 m/s and exhibits an emergent narrow-width gait associated with improved stability at high velocity.
comment: Accepted at Robotics Science and Systems (RSS), 2026. Project Page: https://robogauge.github.io/complete/
Visibility-Aware Mobile Grasping in Dynamic Environments
This paper addresses the problem of mobile grasping in dynamic, unknown environments where a robot must operate under a limited field-of-view. The fundamental challenge is the inherent trade-off between ``seeing'' around to reduce environmental uncertainty and ``moving'' the body to achieve task progress in a high-dimensional configuration space, subject to visibility constraints. Previous approaches often assume known or static environments and decouple these objectives, failing to guarantee safety when unobserved dynamic obstacles intersect the robot's path during manipulation. In this paper, we propose a unified mobile grasping system comprising two core components: (1) an iterative low-level whole-body planner coupled with velocity-aware active perception to navigate dynamic environments safely; and (2) a hierarchical high-level planner based on behavior trees that adaptively generates subgoals to guide the robot through exploration and runtime failures. We provide experimental results across 400 randomized simulation scenarios and real-world deployment on a Fetch mobile manipulator. Results show that our system achieves a success rate of 68.8\% and 58.0\% in unknown static and dynamic environments, respectively, significantly boosting success rates by 22.8\% and 18.0\% over the \nam approach in both unknown static and dynamic environments, with improved collision safety.
Learning Agile Striker Skills for Humanoid Soccer Robots from Noisy Sensory Input
Learning fast and robust ball-kicking skills is a critical capability for humanoid soccer robots, yet it remains a challenging problem due to the need for rapid leg swings, postural stability on a single support foot, and robustness under noisy sensory input and external perturbations (e.g., opponents). This paper presents a reinforcement learning (RL)-based system that enables humanoid robots to execute robust continual ball-kicking with adaptability to different ball-goal configurations. The system extends a typical teacher-student training framework -- in which a "teacher" policy is trained with ground truth state information and the "student" learns to mimic it with noisy, imperfect sensing -- by including four training stages: (1) long-distance ball chasing (teacher); (2) directional kicking (teacher); (3) teacher policy distillation (student); and (4) student adaptation and refinement (student). Key design elements -- including tailored reward functions, realistic noise modeling, and online constrained RL for adaptation and refinement -- are critical for closing the sim-to-real gap and sustaining performance under perceptual uncertainty. Extensive evaluations in both simulation and on a real robot demonstrate strong kicking accuracy and goal-scoring success across diverse ball-goal configurations. Ablation studies further highlight the necessity of the constrained RL, noise modeling, and the adaptation stage. This work presents a system for learning robust continual humanoid ball-kicking under imperfect perception, establishing a benchmark task for visuomotor skill learning in humanoid whole-body control.
Integrated Hierarchical Decision-Making in Inverse Kinematic Planning and Control
This work presents a novel and efficient nonlinear programming framework that tightly integrates hierarchical decision-making with whole-body inverse kinematic planning and control. Decision-making plays a central role in many aspects of robotics, from sparse inverse kinematic control with a minimal number of joints, to inverse kinematic planning while simultaneously selecting a discrete end-effector location from multiple candidates. Current approaches often rely on heavy computations using mixed-integer nonlinear programming, separate decision-making from inverse kinematics (some times approximated by reachability methods), or employ efficient but less versatile $\ell_1$-norm formulations of linear sparse programming, without addressing the underlying nonlinear problem formulations. In contrast, the proposed sparse hierarchical nonlinear programming solver is efficient, versatile, and accurate by exploiting sparse hierarchical structure and leveraging the $\ell_0$-norm which is rarely used in robotics. The solver efficiently tackles complex nonlinear hierarchical decision-making problems previously unaddressed in the literature, such as inverse kinematic planning with simultaneous prioritized selection of end-effector locations from a large set of candidates, or inverse kinematic control with simultaneous selection of bi-manual grasp locations on a randomly rotated box.
Now You See That: Learning End-to-End Humanoid Locomotion from Raw Pixels
Achieving robust vision-based humanoid locomotion remains challenging due to two fundamental issues: the sim-to-real gap introduces significant perception noise that degrades performance on fine-grained tasks, and training a unified policy across diverse terrains is hindered by conflicting learning objectives. To address these challenges, we present an end-to-end framework for vision-driven humanoid locomotion. For robust sim-to-real transfer, we develop a high-fidelity depth sensor simulation that captures stereo matching artifacts and calibration uncertainties inherent in real-world sensing. We further propose a vision-aware behavior distillation approach that combines latent space alignment with noise-invariant auxiliary tasks, enabling effective knowledge transfer from privileged height maps to noisy depth observations. For versatile terrain adaptation, we introduce terrain-specific reward shaping integrated with multi-critic and multi-discriminator learning, where dedicated networks capture the distinct dynamics and motion priors of each terrain type. We validate our approach on two humanoid platforms equipped with different stereo depth cameras. The resulting policy demonstrates robust performance across diverse environments, seamlessly handling extreme challenges such as high platforms and wide gaps, as well as fine-grained tasks including bidirectional long-term staircase traversal.
Multiagent Systems
CalBench: Evaluating Coordination-Privacy Trade-offs in Multi-Agent LLMs
We introduce CalBench, a controlled evaluation environment for studying multi-agent coordination through calendar scheduling. In CalBench, N agents each manage a private calendar containing pre-existing commitments and must coordinate to schedule a stream of M incoming meetings while minimizing disruption costs. Because agents observe only their own calendars, successful scheduling requires communication across private information boundaries. Each scenario is generated with an oracle solution, enabling precise measurement of coordination quality via realized-to-optimal cost, as well as a Distributed Constraint Optimization (DCOP) baseline to provide a fair comparison under the same private-information constraints. CalBench enables precise verification of task success, communication efficiency, and fairness in the distribution of disruption costs. Our environment also studies privacy-preserving coordination by augmenting calendar entries with private semantic contexts of varying sensitivity and measuring whether agents reveal task-irrelevant private information during negotiation. Unlike multi-agent benchmarks where a single capable agent can often substitute for the group, CalBench is inherently decentralized: no agent has access to another agent's private calendar, yet agents must still reach mutually consistent decisions over shared meeting scheduling. CalBench therefore provides a practical and verifiable setting for studying coordination protocols, communication efficiency, negotiation strategies, fairness, and privacy leakage in multi-agent systems.
SAGE: Scalable Agentic Grounded Evaluation for Crop Disease Diagnosis
Plant disease diagnosis is critical for food security, yet training disease-recognition models that generalize across crops, pathogens, and field conditions remains challenging because labeled disease images are far less abundant and standardized than data for other biotic stresses such as insects or weeds. Frontier vision-language models offer new opportunities through improved visual reasoning, but they still struggle with fine-grained disease identification due to the lack of structured, crop-specific symptom knowledge. To address this gap, we curate the largest plant disease image--symptom dataset to date, covering 335 crops, 1{,}251 disease classes, and approximately 839K images, designed to support training-free, agentic disease prediction. A scalable automated pipeline generates source-grounded symptom descriptions in which each claim is linked to a verbatim web quote; domain experts validate sampled crops and reconcile disease-name variants across sources. As a baseline, we introduce an autonomous visual reasoning agent that identifies anatomical context, narrows candidate diseases using symptom knowledge, sequentially compares reference images, and produces a fully explainable reasoning trace. Incorporating symptom knowledge improves accuracy by 16.2 percentage points on average at the full reference budget, with consistent gains across all four evaluation crops. Because the framework only requires crop-specific reference images and symptom knowledge, it can be extended to new crops without retraining, while the agentic baseline can directly benefit from future improvements in foundation model capabilities. Dataset and code are available at:https://sage-dataset.github.io/.
Trajectory Supervision for Continual Tool-Use Learning in LLMs
Most language-model training data shows final artifacts, not the process that produced them. We study a tractable version of this question in tool use: when a model learns a stream of new API domains, does keeping tool-use trajectories help compared with stripping the intermediate API trace? We fine-tune Llama 3.1 8B Instruct with QLoRA on API-Bank using four sequential domain blocks. Condition A strips previous API request/response lines from the prompt and trains the model to predict the next API call. Condition B keeps the trajectory context. In a single-seed pilot, full held-out generation evaluation shows that Condition B reaches 56.9\% final exact full-call accuracy compared with 39.2\% for Condition A. B also improves final API-name accuracy by 7.7 points. However, B uses 25.1\% more training tokens, the run uses one seed, and the task is next-call prediction rather than full dialogue success.
CodeClinic: Evaluating Automation of Coding Skills for Clinical Reasoning Agents
Clinical reasoning agents based on large language models (LLMs) aim to automate tasks such as intensive care unit (ICU) monitoring and patient state tracking from electronic health records (EHRs). Existing systems typically rely on manually curated clinical tools or skills for concepts such as sepsis detection and organ failure assessment. However, maintaining these tool libraries requires substantial expert effort, while zero-shot querying or code generation often produces inefficient and unreliable reasoning chains, especially under institution-specific clinical policies. We introduce CodeClinic, a benchmark built on MIMIC-IV for evaluating whether LLM agents can synthesize and compose reusable clinical skills instead of relying on fixed toolboxes. The benchmark contains two complementary tasks: longitudinal ICU surveillance and compositional information seeking. The longitudinal setting simulates monitoring patient trajectories with structured decisions every four hours across 25 findings and eight clinical families, while the compositional setting spans 63k instances across 259 tasks in nine domains and is stratified by compositional dependency depth to evaluate increasingly complex multi-step reasoning. We further propose an offline autoformalization pipeline that converts natural-language clinical guidelines into reusable and verified Python skill libraries through iterative LLM refinement. Compared with zero-shot code generation, the resulting libraries improve consistency while reducing per-query token usage by up to 40%.
SmartEval: A Benchmark for Evaluating LLM-Generated Smart Contracts from Natural Language Specifications
We introduce SmartEval, a benchmark for systematically evaluating the quality of Solidity smart contracts generated by large language models (LLMs) from natural language specifications. SmartEval provides a corpus of 9,000 generated contracts paired with expert-written ground-truth implementations drawn from the FSMSCG dataset, a five-dimensional evaluation rubric covering functional completeness, variable fidelity, state-machine correctness, business-logic fidelity, and code quality, and a reproducible generation-and-evaluation pipeline. To validate the benchmark's reliability, we conduct three independent empirical studies: a five-condition ablation study (N=300 per condition) isolating the contribution of each pipeline component, a human expert evaluation by three Columbia University PhD researchers confirming automated scores align with expert judgment to within 0.34 points, and external security analysis via the Slither static analyzer confirming 79.4% agreement between the LLM auditor and a non-LLM rule-based tool. Systematic analysis of 9,000 generated contracts reveals characteristic failure modes (logic omissions at 35.3%, state transition errors at 23.4%, and complexity-driven degradation) and quantifies a +8.29 composite-score advantage of generated contracts over ground-truth implementations, attributable to LLMs' literal specification-following behavior. SmartEval establishes a reproducible, validated foundation for empirical research on LLM smart contract synthesis quality, with all data, evaluation code, and generated contracts publicly released.
Emergent Communication for Co-constructed Emotion Between Embodied Agents via Collective Predictive Coding
According to the theory of constructed emotion, the brain actively forms emotion categories by integrating multimodal bodily signals, and constructs emotional experiences by using these categories to predict and interpret sensory inputs. While research has advanced in modeling individual emotion construction, the social process of co-construction-how a shared understanding of emotions emerges between individuals-remains computationally underexplored. This study investigates this process by modeling emergent communication between two embodied agents using the Metropolis-Hastings Naming Game (MHNG), grounded in the Collective Predictive Coding (CPC) framework. Our experiments, using visual, auditory, and simulated interoceptive inputs, yield two main findings. First, MHNG-based communication significantly improves the alignment, clarity, and inter-agent agreement of the learned emotion categories compared to non-communicative and non-selective baselines, with the alignment effect concentrated at the symbolic layer rather than the perceptual latent representation. Second, even when the two agents have systematically divergent interoceptive dynamics, communication still produces robust categorical alignment, with distinct, category-specific reshaping patterns of each agent's emotion categories-consistent with the constructed-emotion view that interoceptive heterogeneity is constitutive of, rather than an obstacle to, shared emotional meaning. These findings provide computational support for the co-constructionist view of emotion and extend the CPC framework from physical to socially-grounded domains.
comment: 13 pages,
Empowering VLMs for Few-Shot Multimodal Time Series Classification via Tailored Agentic Reasoning
In this paper, we propose the first VL$\underline{\textbf{M}}$ $\underline{\textbf{a}}$gentic $\underline{\textbf{r}}$easoning framework for few-$\underline{\textbf{s}}$hot multimodal $\underline{\textbf{T}}$ime $\underline{\textbf{S}}$eries $\underline{\textbf{C}}$lassification ($\textbf{MarsTSC}$), which introduces a self-evolving knowledge bank as a dynamic context iteratively refined via reflective agentic reasoning. The framework comprises three collaborative roles: i) Generator conducts reliable classification via reasoning; ii) Reflector diagnoses the root causes of reasoning errors to yield discriminative insights targeting the temporal features overlooked by Generator; iii) Modifier applies verified updates to the knowledge bank to prevent context collapse. We further introduce a test-time update strategy to enable cautious, continuous knowledge bank refinement to mitigate few-shot bias and distribution shift. Extensive experiments across 12 mainstream time series benchmarks demonstrate that $\textbf{MarsTSC}$ delivers substantial and consistent performance gains across 6 VLM backbones, outperforming both classical and foundation model-based time series baselines under few-shot conditions, while producing interpretable rationales that ground each classification decision in human-readable feature evidence.
comment: 18 pages, 12 figures, 6 tables. Preprint
PECMAN: Perception-enabled Collaborative Multi-Agent Navigation in Unknown Environments
Most path planners assume fully known, static environments, assumptions that fail when robots navigate in dynamic and partially observable environments. SMART-3D addresses these issues by real-time replanning, where it morphs the underlying RRT* tree whenever new obstacles or structures are discovered in the environment. Instead of rebuilding the tree entirely from scratch, SMART-3D prunes invalid nodes and edges and subsequently repairs the disjoint subtrees at hot-nodes to find a new path, thus providing high computational efficiency for real-time adaptability. We extend SMART-3D to perception-enabled collaborative multi-agent navigation (PECMAN) in unknown environments. PECMAN is built upon distributed tree morphing and shared perception strategies, where each agent reacts to environmental changes and morphs its respective tree to replan its path, while simultaneously broadcasting newly discovered structures to other agents, thus enabling them to proactively replan even in areas that have not yet been explored by them. This approach reduces redundant reactions and unnecessary replannings of the agents due to improved situational awareness. The performance of PECMAN was evaluated by 28,000 multi-agent simulations on seven 2D scenarios with different case studies. The results show that PECMAN achieves up to 52% reduction in the team-completion time, while maintaining near 100% success rates. Finally, PECMAN was tested by real experiments on two autonomous robots in a building environment.
A Cross-Layered Multi-Drone Coordination for Medical Supply Delivery during Disaster Response Management
Autonomous drone fleets have immense potential in medical supply delivery during disaster incident response. However, coordinating multiple drones in such settings introduces compounding challenges: dynamic environmental hazards such as wind, obstacles, and intermittent network connectivity, constrained energy budgets, and the need to serve patient locations fairly under deadlines and triage-based priority while optimizing schedule utilization. In this paper, we present CEDA, a novel CTDE Deep Q-Network algorithm for cooperative multi-drone medical delivery, designed to jointly optimize triage-priority-aware routing, multi-agent coordination, and energy-efficient navigation under dynamic uncertainty. CEDA introduces a Priority-Preserving Fair Scheduling strategy, in which a structured reward function encodes both triage weights and complementary fairness mechanisms ensuring no patient class is starved of service. We evaluate CEDA in a simulated grid environment featuring dynamic hazard zones, stochastic action failures, and dynamically spawning patients across three triage priority levels, as well as in a PX4 SITL validation using two X500 quadrotors controlled via MAVSDK in offboard position mode. Simulation results demonstrate that CEDA achieves a delivery completion rate above 85%, reduces obstacle collisions by over 90% across training, and delivers an average of 6 patients per episode with a triage efficiency of 0.82. CEDA preserves clinical priority ordering, Critical patients are served first, while achieving near-zero mortality across lower-triage classes, confirming that priority-weighted routing does not condemn Stable or Urgent patients to neglect. PX4 SITL validation further demonstrates that the learned policy remains executable and triage-coherent under practical communication constraints and realistic multi-drone coordination in disaster response settings.
comment: 18 pages, 14 figures, 3 tables
SkillMAS: Skill Co-Evolution with LLM-based Multi-Agent System
Large language model (LLM) agent systems are increasingly expected to improve after deployment, but existing work often decouples two adaptation targets: skill evolution and multi-agent system (MAS) restructuring. This separation can create organization bottlenecks, context pressure, and mis-specialization. We present SkillMAS, a non-parametric framework for adaptive specialization in multi-agent systems that couples skill evolution with MAS restructuring. SkillMAS uses Utility Learning to assign credit from verified execution traces, bounded skill evolution to refine reusable procedures without unfiltered library growth, and evidence-gated MAS restructuring when retained failures and Executor Utility indicate a structural mismatch. Across embodied manipulation, command-line execution, and retail workflows, SkillMAS is competitive under the reported harnesses while clarifying how post-deployment specialization is attributed, updated, and applied.
comment: 21 pages, 2 figures
An Executable Benchmarking Suite for Tool-Using Agents
Closed-loop tool-using agents are increasingly evaluated in executable web, code, and micro-task environments, but benchmark reports often conflate workloads, action-generating drivers, and the evidence admitted for systems-facing claims. We present an executable benchmarking suite that makes these objects explicit under a shared evidence-admission contract. The suite connects WebArena Verified, a SWE-Gym slice with SWE-bench-compatible verification, and MiniWoB++ through common workload adapters, task manifests, event schemas, replay/freeze policy, declared drivers, and reporting pipelines. In the canonical release, the gate separates paper-facing evidence from preflight, fixture, smoke, and diagnostic rows while preserving non-admitted artifacts for audit and onboarding. The admitted evidence records latency, invalid-action behavior, patch-generation cost, verifier metadata, replay bindings, and provenance under one auditable contract. The gate is decision-relevant rather than merely clerical: in a separate WebArena Verified controller study, clean-baseline and medium live-stressed evaluation select different fixed controller variants under the same workload and admission contract. The release is scoped as a benchmarking suite and admitted evidence, not a new agent policy, model leaderboard, backend comparison, or autonomous SWE-bench solver.
comment: 20 pages, 2 figures, 20 tables, including appendices
Energy-efficient flocking with nonlinear navigational feedback
Modeling collective motion in multi-agent systems has gained significant attention. Of particular interest are sufficient conditions for flocking dynamics. We present a generalization of the multi-agent model of Olfati--Saber with nonlinear navigational feedback forces. Unlike the original model, ours is not generally dissipative and lacks an obvious Lyapunov function. We address this by proposing a method to prove the existence of an attractor without relying on LaSalle's principle. Other contributions are as follows. We prove that, under mild conditions, agents' velocities approach the center of mass velocity exponentially, with the distance between the center of mass and the virtual leader being bounded. In the dissipative case, we show existence of a broad class of nonlinear control forces for which the attractor does not contain periodic trajectories, which cannot be ruled out by LaSalle's principle. Finally, we conduct a computational investigation of the problem of reducing propulsion energy consumption by selecting appropriate navigational feedback forces.
enclawed: A Configurable, Sector-Neutral Hardening Framework for Single-User AI Assistant Gateways
We present enclawed, a hard-fork hardening framework built on the OpenClaw AI assistant gateway. enclawed targets deployments that need attestable peer trust, deny-by-default external connectivity, signed-module loading, and a tamper-evident audit trail -- typically regulated industries (financial services, healthcare, defense, government). The framework ships in two flavors: an open flavor preserving OpenClaw compatibility while emitting audit, classification, and data-loss-prevention (DLP) signals, and an enclaved flavor activating strict allowlists, FIPS cryptographic-module assertion, mandatory manifest signature verification, and high-assurance peer attestation for the Model Context Protocol. The classification ladder is data-driven: deployers pick from five built-in presets or supply their own JSON. We ship a 356-case test suite (261 unit + 95 adversarial pen-tests) covering tamper detection, signature forgery, egress bypass, audit-log truncation, trust-root mutation, DLP evasion, prompt injection, code injection, and biconditional admission for net-capable extensions; real-time human-in-the-loop control; a memory-bounded transaction buffer with rollback; strict-mode TypeScript typecheck; and a CI workflow. The biconditional extension-admission gate extends the skill trust schema to non-skill extensions. The four-level verification lattice is now closed at the top: four skill-formal-* primitives plus a CLI produce a signed proof-carrying bundle the runtime re-checks at load, raising a skill from tested to formal via static effect-containment, refinement-typed dispatch, and bounded model checking. enclawed is a hardening framework, not an accredited certification; hardware, validated crypto, facilities, and assessor sign-off remain the deployer's responsibility.
Bian Que: An Agentic Framework with Flexible Skill Arrangement for Online System Operations
Operating and maintaining (O&M) large-scale online engine systems (eg, search, recommendation and advertising) demands substantial human effort for release monitoring, alert response, and root cause analysis. Despite the inherent suitability of LLM-based agents for such operational scenarios, the critical bottleneck impeding their practical deployment lies not in reasoning, but in orchestration capability - specifically, the precise selection of relevant data (encompassing metrics, logs, and change events) and applicable knowledge (including handbook-defined rules and empirically derived practitioner experience) tailored to each individual operational event. Feeding all signals indiscriminately causes dilution and hallucination, while manually curating the event-to-(data, knowledge) mapping is intractable under dozens of daily releases. Here we present Bian Que, an agentic operating framework with three contributions: (i) The unified operational paradigm, which abstracts routine daily O&M actions into three canonical patterns: release interception, proactive inspection, and alert root cause analysis; (ii) The flexible Skill Arrangement, each predefined Skill explicitly defines the requisite data and operational knowledge for each specific context. Such Skills can be automatically generated and updated by LLM agents, and can also be iteratively optimized by on-call engineers via natural language instructions. (iii) The unified self-evolving mechanism, where each correction signal enables two parallel evolutionary pathways: distilling event memory into knowledge, and targeted refinement of Skills. Deployed on the e-commerce search engine of KuaiShou, Bian Que reduces alert volume by 75%, achieves 80% root-cause analysis accuracy, cuts mean time to resolution by over 50%, and attains a 99.0% pass rate on offline evaluations. Codes are at https://github.com/benchen4395/BianQue_Assistant.
comment: HomePage: https://benchen4395.github.io
MAGIC: Multi-Step Advantage-Gated Causal Influence for Multi-agent Reinforcement Learning
A key challenge in multi-agent reinforcement learning (MARL) lies in designing learning signals that effectively promote coordination among agents. Designing such signals requires estimating how one agent's current action affects its teammates over future interaction steps. To address this, we introduce Multi-step Advantage-Gated Interventional Causal MARL (MAGIC), a framework that estimates multi-step action effects between agents and selectively converts them into intrinsic rewards. MAGIC uses counterfactual action interventions to compare teammate futures under factual and counterfactual branches, and introduces a gate based on advantage to direct exploration toward beneficial behaviors aligned with the task goal. Experiments on Multi-Agent Particle Environments (MPE) and StarCraft micromanagement benchmarks (SMAC and SMACv2) show that MAGIC consistently outperforms leading prior methods, with average relative final performance improvements of 26.9% and 10.1%, respectively.
Pairwise is Not Enough: Hypergraph Neural Networks for Multi-Agent Pathfinding ICLR 2026
Multi-Agent Path Finding (MAPF) is a representative multi-agent coordination problem, where multiple agents are required to navigate to their respective goals without collisions. Solving MAPF optimally is known to be NP-hard, leading to the adoption of learning-based approaches to alleviate the online computational burden. Prevailing approaches, such as Graph Neural Networks (GNNs), are typically constrained to pairwise message passing between agents. However, this limitation leads to suboptimal behaviours and critical issues, such as attention dilution, particularly in dense environments where group (i.e. beyond just two agents) coordination is most critical. Despite the importance of such higher-order interactions, existing approaches have not been able to fully explore them. To address this representational bottleneck, we introduce HMAGAT (Hypergraph Multi-Agent Attention Network), a novel architecture that leverages attentional mechanisms over directed hypergraphs to explicitly capture group dynamics. Empirically, HMAGAT establishes a new state-of-the-art among learning-based MAPF solvers: e.g., despite having just 1M parameters and being trained on 100$\times$ less data, it outperforms the current SoTA 85M parameter model. Through detailed analysis of HMAGAT's attention values, we demonstrate how hypergraph representations mitigate the attention dilution inherent in GNNs and capture complex interactions where pairwise methods fail. Our results illustrate that appropriate inductive biases are often more critical than the training data size or sheer parameter count for multi-agent problems.
comment: Published at ICLR 2026
TinyTroupe: An LLM-powered Multiagent Persona Simulation Toolkit
Recent advances in Large Language Models (LLM) have led to a new class of autonomous agents, renewing and expanding interest in the area. LLM-powered Multiagent Systems (MAS) have thus emerged, both for assistive and simulation purposes, yet tools for realistic human behavior simulation -- with its distinctive challenges and opportunities -- remain underdeveloped. Existing MAS libraries and tools lack fine-grained persona specifications, population sampling facilities, experimentation support, and integrated validation, among other key capabilities, limiting their utility for behavioral studies, social simulation, and related applications. To address these deficiencies, in this work we introduce TinyTroupe, a simulation toolkit enabling detailed persona definitions (e.g., nationality, age, occupation, personality, beliefs, behaviors) and programmatic control via numerous LLM-driven mechanisms. This allows for the concise formulation of behavioral problems of practical interest, either at the individual or group level, and provides effective means for their solution. TinyTroupe's components are presented using representative working examples, such as brainstorming and market research sessions, thereby simultaneously clarifying their purpose and demonstrating their usefulness. Quantitative and qualitative evaluations of selected aspects are also provided, highlighting possibilities, limitations, and trade-offs. The approach, though realized as a specific Python implementation, is meant as a novel conceptual contribution, which can be partially or fully incorporated in other contexts. The library is available as open source at https://github.com/microsoft/tinytroupe.
comment: 9 pages. Preprint to be submitted to peer-review
Hierarchical Multiagent Reinforcement Learning for Multi-Group Tax Game
Reinforcement learning has increasingly been applied to economic decision-making, including taxation, public spending, and labor supply. However, existing RL-based economic models typically consider only a single government-household group, overlooking strategic interactions among competing governments. To address this limitation, we formulate taxation as a hierarchical multi-group game. Within each group, the government and households form a leader--follower game, while governments compete across groups through strategic fiscal policies. This coupled structure is difficult to solve using standard multi-agent reinforcement learning (MARL) methods. We therefore propose a bilevel MARL framework with \textit{Curriculum Learning} and a \textit{Closed-Loop Sequential Update} mechanism to improve training stability and convergence. We instantiate the framework in a taxation simulation environment grounded in classical economic models, supporting the evaluation of taxation policies under inter-group competition. Experiments show that the proposed method learns stable and sustainable tax policies. Compared with a two-group baseline without the proposed mechanisms, our approach avoids premature game collapse, extends the effective game duration by 60.92\%, and reduces GDP disparities among governments by 44.12\%.
Systems and Control (EESS)
Optimizing Server Placement for Vertical Federated Learning in Dynamic Edge/Fog Networks
We investigate the control and optimization of vertical federated learning (VFL), a class of distributed machine learning (ML) methods in which edge/fog devices contain separate data features, in dynamic edge/fog networks. Owing to heterogeneous data features and hardware across edge/fog networks, devices' contributions to VFL vary substantially, and, moreover, dynamic edge/fog networks can lead to the permanent exit or entry of select data features. In this setting, our proposed methodology, server controlled VFL in dynamic networks (SC-DN), first establishes the existence of a global first-order stationary point for every global round, and then leverages this result to jointly optimize ML model training and resource consumption based on four key control variables: (i) server placement, (ii) device-to-server transmit power, (iii) local device processor frequency, and (iv) local training iterations per global round. The resulting optimization formulation contains coupled variables as well as numerous forms of logarithmic constraints which we show is a mixed-integer signomial program, an NP-hard problem, and for which we develop a general solver. Finally, via experiments on both image and multi-modal datasets, we show that our methodology demonstrates superior classification/regression performance and resource consumption savings than even greedy methodologies.
comment: Under revision at IEEE/ACM transactions on networking
Dynamic Scheduling of a Parallel-Server Queueing System: A Computational Method for High-Dimensional Problems
A key operational challenge for call centers is to decide, in real time, which waiting customer should be served by which available agent. This is known as skill-based routing, and the decision becomes especially difficult in large systems with many customer classes, where standard dynamic programming methods can be computationally intractable. Focusing on the Halfin-Whitt heavy-traffic regime and an infinite-horizon discounted cost criterion, we develop a computational method that scales to high-dimensional settings with many customer classes. Our approach begins by deriving an approximating diffusion control problem in the heavy traffic limiting regime. Building on earlier work by Han et al. (2018), we develop a simulation-based method to solve this problem, relying heavily on deep neural network techniques. Using this framework, we construct a policy for the original (prelimit) call center scheduling problem. To evaluate performance, we adopt a data-driven approach. Using call center data from a large U.S. bank, we calibrate the model and construct realistic test instances. We then compare the resulting policy with benchmark policies drawn from the literature. Across all test problems considered so far, our policy performs at least as well as or better than the best benchmark identified. Moreover, the method remains computationally feasible in dimensions up to 100, corresponding to call centers with 100 or more distinct customer classes.
Action Recommendations for Sequentially Rational Strategic Agents
We consider a finite-horizon discrete-time dynamic system that is jointly controlled by two strategic agents. There is a system designer that has its own reward function but does not have direct control over the agents' actions. We consider an information structure where the current state and all past history are equally accessible by the designer and the agents. The designer sends action recommendations to the agents at each time step. Each agent can use the received recommendation and the available information to choose its action. We are interested in the setting where the designer would like to send recommendations in a way that incentivizes the agents to adopt obedient strategies, i.e., to take the action recommended by the designer. Our goal is to find an optimal action recommendation strategy for the designer that maximizes the designer's objective while ensuring that obedient strategies are \emph{sequentially rational} for the agents. We provide an algorithm for the designer's problem that involves solving a family of linear programs in a backward inductive manner.
Safe Exploration for Nonlinear Processes Using Online Gaussian Process Learning
This paper proposes a safe data-driven control framework for nonlinear systems with partially known dynamics. The method ensures stability and constraint satisfaction during online learning, assuming only a stabilizable linear approximation of the process is available. Unmodeled nonlinear dynamics are captured by a Gaussian process residual learned in real time. Safety is enforced through a probabilistic control-invariant set derived from Lyapunov theory, guaranteeing high-probability stability. A convex quadratic program computes control inputs that maximize information gain while respecting probabilistic safety constraints. The framework provides finite-sample safety guarantees and allows adaptive expansion of the invariant set as uncertainty decreases. Numerical results validate the approach, demonstrating safe and informative exploration under model uncertainty: the safe set expands by about 30% while the Gaussian process root-mean-square error drops from 1.11 to 0.03.
comment: Accepted in 23rd IFAC World Congress
Nullspace-based Fault Diagnosis for Closed-Loop Mechatronic Systems with Application to Semiconductor Equipment
Fault detection and isolation (FDI) systems are critical for modern mechatronic production equipment, as their continuous operation is heavily dependent on the ability to detect and isolate faults in a timely and efficient manner. The aim of this paper is to address closed-loop aspects for linear systems and enable the application of well-known nullspace-based FDI synthesis conditions to mechatronic systems subject to actuator and sensor faults. These tailored FDI synthesis conditions are applied to a large-scale prototype wafer stage, showcasing the proposed approach through real experiments, thereby underlining the usefulness of the derived synthesis conditions for a wide range of production machines and scientific instruments.
Minimizing Worst-Case Weighted Latency for Multi-Robot Persistent Monitoring: Theory and RL-Based Solutions
We study multi-robot persistent monitoring on weighted graphs, where node weights encode monitoring priorities and edge weights encode travel distances. The goal is to design joint robot trajectories that minimize the worst-case weighted latency across all nodes over an infinite time horizon. The widely adopted worst-case latency objective evaluates team performance over the entire time horizon and therefore may fail to distinguish strategies with poor transient behavior but strong asymptotic performance. To address this limitation, we propose a family of tail-performance objectives that generalize the standard objective and study the resulting functional optimization problems. We establish several key theoretical properties, including the existence of optimal strategies, relationships among the proposed objectives and their corresponding optimization problems, approximation by periodic solutions to arbitrary accuracy, and reductions to event-driven decision models with discretized waiting times. Building on these results, we construct an equivalent event-driven Markov decision process (MDP), called the Tail Worst-case Latency-Optimizing Markov Decision Process (TWLO-MDP), which reformulates the tail-performance objective as a standard average-reward criterion. We then develop reinforcement-learning-based solution methods for the TWLO-MDP and introduce the multi-robot monitoring benchmark (M2Bench), a unified platform that supports the evaluation and comparison of heuristic and learning-based monitoring algorithms. Experiments on synthetic and realistic monitoring scenarios show that our methods effectively reduce the worst-case weighted latency and outperform representative baselines.
PolarNet: Single-Minima Neural Network for Modeling Lyapunov Functions
Learning control strategies with provable stability guarantees continues to be a challenging problem. In this work, we examine a family of training-time behaviors exhibited by existing neural Lyapunov control methods under specific conditions, which can hinder the synthesis of a provably stable controller. We identify the root cause as the lack of neural network architectural guarantees on the learned Lyapunov function, and propose PolarNet, a network architecture that provably addresses these issues by structurally guarantee to have a single critical point. We provide theoretical guarantee regarding the properness and universality of PolarNet for modeling Lyapunov functions, and show that using it as a drop-in replacement in existing neural Lyapunov control methods can effectively circumvent particular difficulties in training. We conduct a set of numerical experiments to verify that PolarNet consistently maintains a single critical point and, when used as a drop-in replacement in existing neural Lyapunov control methods, successfully avoids training failures caused by the lack of architectural guarantees. The code of this paper is available at https://github.com/23-zy/PolarNet.
comment: 16 pages, 5 figures, preprint
Diagnostic Certificates of Data Quality and Regression Identifiability for Koopman Identification
Classical persistent excitation criteria usually assess whether an input or regressor signal is sufficiently rich. In Koopman and EDMD with control (EDMDc), however, data quality is determined by the concatenation of lifted state features and control inputs. Input-rich data can still visit a narrow state region, well-spread state samples can still produce degenerate lifted features, and both can fail to condition the final regression problem. This paper develops a diagnostic certificate framework for locating these failures. The certificates separate state-space coverage and clustering, lifted-feature nondegeneracy, and the final regression spectrum. The regression-spectrum certificate is the layer with direct theoretical guarantees: it controls the active standardized design's smallest singular value, has Fisher-information and one-step EDMDc stability interpretations, and admits a finite-sample lower bound under a population spectral gap. We also give structural examples and a Schur-complement condition showing why state, lifted, input, and regression diagnostics cannot be substituted for one another. As a sampling example, IGPE-DOPT uses these certificates to score candidate trajectory segments. Experiments on Duffing, Van der Pol, and Lorenz systems compare input-, state-, lifted-, and regression-oriented baselines. The results show that certificate layers separate, budget and weights shift bottlenecks, and downstream prediction or control performance is not monotone in any single certificate. The framework is therefore intended as an interpretable diagnostic and data-collection guide, not as a universal optimality claim.
Risk-Aware Safe Throughput Forecasting for Starlink Networks
As a representative low Earth orbit (LEO) broadband system, Starlink exhibits highly variable access throughput, making short-term forecasting essential for network resource management. Existing forecasting methods mainly optimize symmetric point-prediction metrics such as MAE and RMSE, but they do not explicitly control the asymmetric risk of overestimating future throughput, which can cause over-admission, bandwidth overbooking, and service violations. This paper formulates Starlink throughput prediction as a risk-budgeted safe forecasting problem, where the predictor must satisfy a prescribed overestimation budget while maintaining competitive accuracy. We propose Budget-Guided Coarse-to-Fine Quantile Selection (BG-CFQS), a data-driven framework that trains a family of lower-quantile predictors, locates the quantile boundary satisfying the risk budget, and refines the boundary region to select the most accurate feasible predictor. Experiments on three real-world Starlink throughput datasets show that BG-CFQS satisfies the risk budget on all datasets and achieves the lowest average MAE, mean positive error, and tail positive error among budget-feasible methods. In high-risk and severe-risk low-throughput regimes, BG-CFQS reduces harmful positive errors by 11.0% and 12.6%, respectively. An admission-control evaluation further shows that the proposed safe forecasts reduce dropped sessions, demonstrating that risk-aware forecasting can translate prediction safety into application-level benefits.
Barrier Certificates for Uncertain Temporal Specifications
This paper studies satisfying temporal logic specifications on stochastic dynamical systems, where the predicates evolve randomly over time. Such randomness may arise from uncertain environment models or external stochastic processes causing the sets associated with predicate satisfaction to vary in a non-deterministic manner. As a result, verifying whether a stochastic dynamical system satisfies a temporal specification depends also on the uncertainty in the predicates. We develop a certificate-based framework to bound the probability of satisfying temporal logic specifications with randomly evolving predicates. We first show that temporal logic specifications with stochastic predicates can be transformed to specifications with deterministic predicates on an augmented space which is extended to include the stochastic space of predicate's uncertainty. We then utilize barrier certificates on an augmented space to provide tractable optimization-based conditions and to avoid the computational burden of dynamic programming. Focusing on linear dynamics and safety-type specifications, we derive analytical conditions under which barrier certificates guarantee bounds on the probability of violating the stochastic safety predicates. The approach is demonstrated on numerical case studies.
comment: 8 pages, Accepted for presentation at the 23rd IFAC World Congress
A Stochastic Hybrid Automaton for Smartphone Battery Dynamics: Electro-Thermal Coupling and First-Passage Time-to-Empty Estimation
Smartphone time-to-empty (TTE) is difficult to predict because shutdown is governed not only by remaining charge, but also by instantaneous power capability under temperature-, aging-, and load-dependent voltage sag. We develop a stochastic hybrid automaton for smartphone battery dynamics that couples a first-order Thevenin equivalent-circuit model with a lumped thermal model and a stochastic user-activity process. The continuous state includes state of charge, polarization voltage, and battery temperature; user behavior is represented as a piecewise deterministic Markov process switching among idle, social/web, video, gaming, and weak-signal modes. Shutdown is formulated as a first-passage event when terminal voltage crosses a cutoff threshold or when requested power exceeds the instantaneous feasibility envelope. The model captures a voltage-collapse mechanism that simple Coulomb-counting or linear discharge models miss: cold temperature or battery aging increases internal resistance, so high-power bursts can drive terminal voltage below cutoff even when substantial charge remains. Monte Carlo simulation yields a full TTE distribution rather than a single countdown, allowing lower-tail risk to be quantified by the 5th percentile. Sensitivity analysis identifies ambient temperature, internal resistance, weak-signal radio penalty, and screen brightness as major drivers of premature shutdown risk. These results motivate practical user guidance and an operating-system-level resistance-aware throttling policy that limits peak power in the power-limited regime. The framework provides a physically grounded, risk-aware approach for explaining and extending usable smartphone battery life under real-world uncertainty.
comment: Based on the authors' solution to Problem A of the 2026 Mathematical Contest in Modeling (MCM), awarded Finalist
Transceiver-Integrated BD-RIS: Wave-Domain Signal Processing for Sustainable and Inclusive 6G
The shift toward sixth-generation (6G) wireless communications demands transceiver architectures that simultaneously support high-data-rate communications, pervasive sensing, and sub-meter-level localization. Beyond these performance targets, 6G systems are also expected to align with long-term societal goals, including sustainability and inclusiveness. Conventional radio designs, however, remain heavily reliant on digital baseband processing, whose cost, power consumption, and computational complexity scale unfavorably with increasing array size and carrier frequency, making them poorly aligned with these emerging requirements. Beyond-diagonal reconfigurable intelligent surfaces (BD-RISs) introduce a new paradigm by enabling direct manipulation of electromagnetic waves in the analog domain. This article presents BD-RIS as a wave-domain analog processing unit embedded within the transceiver aperture. By migrating linear signal processing functions from the digital baseband to the wave domain, BD-RISs significantly reduce computational load and energy consumption, enabling scalable and sustainable operation for extra-large antenna array systems. Owing to their ability to jointly provide high operational flexibility, modularity, and energy-efficient analog processing, transceiver-integrated BD-RISs offer a compelling architectural trade-off and emerge as a strong candidate for next-generation wireless transceivers.
comment: 8 pages, 4 figures, 1 table, submitted to IEEE for possible publication
Mutual Information Optimal Density Control of Linear Systems and Generalized Schrödinger Bridges with Reference Refinement
We consider a mutual information (MI) regularized version of optimal density control of a discrete-time linear system. MI optimal control has been proposed as an extension of maximum entropy optimal control to trade off between control performance and benefits provided by stochastic inputs. MI regularization induces stochasticity in the policy, which poses challenges for applications of MI optimal control in safety-critical scenarios. To remedy this situation, we impose Gaussian density constraints at specified times to directly control state uncertainty. For this MI optimal density control problem, we propose an alternating optimization algorithm and derive the closed form of each step in the algorithm. In addition, we reveal that the alternating optimization of the MI optimal density control problem coincides with that of the so-called generalized Schrödinger bridge problem associated with the discrete-time linear system.
comment: 19 pages, 5 figures
Price Distortions in Korea's Electricity Market: Barriers to Renewable Integration and Reform Pathways
Structural distortions in price signals within the Korean electricity market, governed by a cost-based pool (CBP) and a uniform pricing mechanism, fundamentally undermine the nation's energy transition goals. The current market design fails to reflect transmission constraints, real-time supply and demand dynamics, and generator-specific costs, leading to inefficient resource allocation and hindering long-term investments in renewable energy and grid flexibility. This paper identifies the key drivers of these distortions and proposes a holistic reform package to enhance market efficiency. The package includes four key reforms: \stepcounter{excep}(\roman{excep}) introducing a locational marginal pricing system to manage transmission constraints; \stepcounter{excep}(\roman{excep}) establishing a real-time market to reflect temporal value; \stepcounter{excep}(\roman{excep}) integrating market and system operations to resolve inconsistencies; and \stepcounter{excep}(\roman{excep}) transitioning from CBP to a price-based bidding system. Each reform targets a distinct source of inefficiency. The broader contribution of this study, however, lies in showing that, under the current Korean market design, the market cannot readily provide effective price signals. These reforms therefore need to be implemented jointly to establish a coherent market design in which price signals are aligned with Korea's energy policy objectives.
comment: 51pages, 14 figures, preprint
Moving MRI: Imaging a moving body with a moving magnet
Current magnetic resonance imaging (MRI) requires the subject to remain stationary to limit motion artifacts and avoid unwanted field-induced brain stimulation. However, imaging during large-scale motion could enable studies in which motion itself is central. One example is the study of brain networks involved in vestibular function, which senses head motion. Here, we demonstrate Moving MRI (mMRI), a system that enables imaging during large-scale motion by moving the subject and scanner together to minimize relative motion. We implemented a proof-of-concept platform using a compact, cryogen-free superconducting magnet mounted on a pneumatically actuated tilt mechanism that moves the magnet, gradients, and RF coil as a unit during scanning. Phantom and in vivo rat brain scans were acquired during repetitive tilting. We characterized artifacts arising from tilt-induced field shifts and residual subject-scanner motion, and partially reduced these effects. mMRI enables imaging during large-scale movement and may broaden access to naturalistic vestibular paradigms while providing a foundation for future human systems.
Distributed Nonlinear Control of Networked Two-Wheeled Robots under Adversarial Interactions SC
This paper studies distributed trajectory tracking for networks of nonholonomic mobile robots under adversarial information exchange. An exact global input--output feedback linearization scheme is developed to regulate planar position outputs, yielding linear error dynamics without prescribing internal state trajectories. To mitigate corrupted neighbor information, a resilient desired-signal construction is proposed that combines local redundancy with trusted in-neighbor signals, without requiring adversary detection or isolation. When sufficient redundancy is available, the method suppresses adversarial influence and recovers nominal tracking performance. If redundancy conditions are violated, adversarial effects enter as bounded disturbances and the tracking error remains ultimately bounded. Simulation results on star, cyclic, and path topologies validate the analysis and demonstrate the superior resilience of cyclic networks due to distributed information propagation.
comment: This paper is accepted for publication in the 15th Asian Control Conference (ASCC), Bali, Indonesia, 2026
Dynamic Quantum-Assisted Co-Design of Control Tuning and Lyapunov Stability Synthesis for Nonlinear Systems
This paper proposes a dynamic quantum-assisted co-design framework for nonlinear closed-loop systems in which controller parameters and Lyapunov-certificate parameters are redesigned jointly at successive decision epochs. Unlike conventional nonlinear control designs that typically tune controller gains offline and verify stability separately, the proposed method embeds performance improvement and Lyapunov-based stability synthesis within a unified online optimization loop. The main novelty is a two-step computational structure that first contracts the continuous admissible search region around the current operating condition using a Black-Hole-based calibration procedure and then constructs a finite binary representation only over this calibrated region. The encoded objective is obtained from sampled nonlinear closed-loop evaluations and approximated by a local quadratic pseudo-Boolean surrogate, enabling an Ising-type Hamiltonian representation suitable for quantum-assisted optimization. Quantum imaginary time evolution is then used to explore the encoded Hamiltonian, and the resulting candidate bitstrings are decoded into continuous controller and Lyapunov parameters. To reduce dependence on the surrogate model, the decoded candidates are re-evaluated using the original nonlinear closed-loop cost and Lyapunov penalties before the final update is applied. The framework can accommodate different Lyapunov decay specifications by modifying the stability penalty and is validated on first-order nonlinear consensus, second-order nonlinear consensus, and induction-motor drive control examples. The implementation code used to generate the reported results is available at \href{https://github.com/LSU-RAISE-LAB/DQCLS-NS}{GitHub}.
Learning Koopman Models From Data Under General Noise Conditions
This paper presents a novel identification approach of Koopman models of nonlinear systems with inputs under rather general noise conditions. The method uses deep state-space encoders based on the concept of state reconstructability and an efficient multiple-shooting formulation of the squared loss of the prediction error to estimate the dynamics and the lifted state only from input-output data. Furthermore, the Koopman model structure includes an innovation noise term that is used to handle process and measurement noise. It is shown that the proposed approach is statistically consistent (estimation error tends to zero when the number of data points goes to infinity) and computationally efficient due to the multiple-shooting formulation, by which the prediction error of the model can be calculated on multiple subsections of the data in parallel. The latter allows for efficient batch optimization of the network parameters and, at the same time, excellent long-term prediction capabilities of the obtained models. The performance of the approach is illustrated by nonlinear benchmark examples and experimental data from a Crazyflie 2.1 quadcopter.
comment: Submitted to SIAM Journal on Applied Dynamical Systems (SIADS)
Soft and Hard Scaled Relative Graphs for Nonlinear Feedback Stability
This article presents input-output stability analysis of nonlinear feedback systems based on the notion of soft and hard scaled relative graphs (SRGs). The soft and hard SRGs acknowledge the distinction between incremental positivity and incremental passivity and reconcile them from a graphical perspective. The essence of our proposed analysis is that the separation of soft SRGs or hard SRGs of two open-loop systems on the complex plane guarantees closed-loop stability. The main results generalize an existing soft SRG separation theorem for bounded open-loop systems which was proved based on interconnection properties of soft SRGs under a chordal assumption. By comparison, our analysis does not require this chordal assumption and applies to possibly unbounded open-loop systems based on their hard SRGs.
An Exact Solution Algorithm for the Bi-Level Optimization Problem of Electric Vehicles Charging Station Placement
This work addresses electric vehicle (EV) charging station placement through a bi-level optimization model, where the upper-level planner maximizes net revenue by selecting station locations under budget constraints, while EV users at the lower level choose routes and charging stations to minimize travel and charging costs. To account for range anxiety, we construct a battery-expanded network and apply a shortest path algorithm with Frank-Wolfe traffic assignment. Our primary contribution is developing the first exact solution algorithm for large scale EV charging station placement problems. We propose a Branch-and-Price-and-Cut algorithm enhanced with value function cuts and column generation. Our exact algorithm delivers globally optimal solutions with mathematical certainty. Computational experiments on the Eastern Massachusetts network (74 nodes, 248 links), the Anaheim network (416 nodes, 914 links), and the Barcelona network (110 zones, 1,020 nodes, and 2,512 links) demonstrate exceptional performance. Our algorithm terminates within minutes, while achieving optimality gaps below 1% across all instances. Controlled benchmarks against two genetic algorithms on identical instances confirm that the proposed algorithm finds equal or better solutions in 3-50 times less computation time across all tested networks. The algorithm successfully handles problems with over 300,000 feasible combinations, transforming EV charging infrastructure planning into a tractable optimization suitable for practical decision making on real-world networks with optimality guaranteed.
Unifying Hamilton-Jacobi Reachability and Reinforcement Learning
We unify Hamilton-Jacobi (HJ) reachability and Reinforcement Learning (RL) through a proposed running cost formulation. We prove that the resultant travel-cost value function is the unique bounded viscosity solution of a time-dependent Hamilton-Jacobi Bellman (HJB) Partial Differential Equation (PDE) with zero terminal data, whose negative sublevel set equals the strict backward-reachable tube. Using a forward reparameterization and a contraction inducing Bellman update, we show that fixed points of small-step RL value iteration converge to the viscosity solution of the forward discounted HJB. Experiments on a classical benchmark validate this connection by demonstrating convergence of learned value functions toward semi-Lagrangian HJB solutions and by quantifying approximation error across the state space. These results empirically support the theoretical analysis, showing that the proposed framework preserves reachability-based safety semantics while remaining compatible with deep RL implementations.
Regime-Calibrated Fleet Repositioning with a Spatial Queue-Regret Decomposition
Ride-hailing and autonomous mobility-on-demand operators reposition idle supply before future demand is fully observed. We study a retrieval-calibrated predict-then-optimize approach for this problem: historical demand regimes are matched to the current query block, combined into a calibrated demand prior, and passed to a fleet-balancing controller. The paper makes three contributions. First, we train a leakage-safe similarity gate whose objective penalizes demand error, pickup spatial mismatch, and queue shortage risk rather than retrieval rank alone. Second, we develop a spatial queue-regret decomposition for a stable queueing surrogate, linking demand-field error to wait through queueing sensitivity, allocator sensitivity, and Wasserstein pickup mismatch. Third, we evaluate learned retrieval and external-style rebalancing baselines in a common simulator. In the calibrated-demand gate experiment, across eight New York City scenarios and ten seeds, the spatial gate reduces mean wait to 82.3s, compared with 85.3s for hand-tuned similarity and 85.8s for a distributional-only baseline. In a separate replay-demand controller comparison, a scenario chance-MPC analog and a share-target transportation LP improve on Wen-style rebalancing (92.2s/92.2s vs. 100.1s), a reduced GPR chance-MPC comparator is intermediate at 94.4s, and an oracle MPC diagnostic is 91.3s.
comment: 13 pages, 4 figures, 8 tables. Code: https://github.com/IndarKarhana/regime-calibrated-dispatch
Quantifying the resilience benefits of undergrounding a circuit with utility data
We leverage historical outage data to quantify the resilience benefits of undergrounding a circuit. The historical performance of the overhead circuit is compared to the performance if the circuit had been undergrounded in the past. The number of outages, customers affected, outage duration, and customer hours lost are used as metrics to quantify the benefits of undergrounding. Results show 75% and 78% reductions in customer hours lost per year for two selected circuits, as well as a significant reduction in the average number of outages and customers affected per year, highlighting the advantages of undergrounding. The benefits of investments that result in 10% faster outage restoration are also calculated by rerunning history with the faster restoration included.
Robotics
Flame3D: Zero-shot Compositional Reasoning of 3D Scenes with Agentic Language Models
3D scene understanding spans reasoning about free space, object grounding, hypothetical object insertions, complex geometric relationships, and integrating all of these with external tools and data sources. Existing 3D understanding methods typically rely on large-scale 3D-language training or focus on object grounding and simple spatial relationships. We argue that the broad generalization that motivates 3D-language training can be achieved at inference time, without 3D-specific training. We propose Flame3D, a training-free framework that represents scenes as editable visual-textual 3D memories and exposes them to an off-the-shelf MLLM through composable spatial tools. Flame3D also lets the agent synthesize custom spatial programs at inference time, enabling open-ended reasoning over layouts, empty space, and objects not yet present in the scene. External data and corrections can be added to the memory without retraining. In addition to showing competitive performance to finetuned 3D-LMM methods on ScanQA, we study multi-hop 3D reasoning capabilities of Flame3D by evaluating it on a curated compositional spatial-reasoning benchmark, Compose3D. We find that fixed tools fall short and that the agent's ability to synthesize spatial operations at inference time is essential. These results invite the question: should future progress in 3D scene understanding focus on richer scene memories and expressive compositional abstractions?
Continuum Robot Modeling with Action Conditioned Flow Matching
Predicting the shape of tendon driven continuum robots (TDCRs) at steady state from actuation remains challenging due to continuous deformation, complex tendon routing, compliance, friction, and fabrication variability. In this paper, we address this problem as kinematic self modeling conditioned on action. We present a lightweight 3D printed TDCR hardware platform and an RGB-D data collection pipeline with multiple cameras, and we learn a point cloud flow matching model that maps motor actuation states to the robot's settled 3D geometry. The model is trained from randomly sampled quasi static configurations and evaluated on test motor commands within the same TDCR design family and actuation range. We compare against prior 3D deformable object and robot self modeling approaches in both MuJoCo simulation and real hardware experiments. Experiments on simulated 2-, 3-, and 5-module TDCRs and real 2- and 3-module robots show improved shape prediction accuracy under CD and EMD metrics. We further show in simulation that the same conditional formulation generalizes to tip payload as a conditioning input, enabling payload conditioned steady-state shape prediction. These results demonstrate a data driven self modeling framework for quasi static TDCR geometry prediction.
comment: 14 pages, 9 figures
RigidFormer: Learning Rigid Dynamics using Transformers
Learning-based simulation of multi-object rigid-body dynamics remains difficult because contact is discontinuous and errors compound over long horizons. Most existing methods remain tied to mesh connectivity and vertex-level message passing, which limits their applicability to mesh-free inputs such as point clouds and leads to high computational cost. Efficiently modeling high-fidelity rigid-body dynamics from mesh-free representations, therefore, remains challenging. We introduce RigidFormer, an object-centric Transformer-based model that learns mesh-free rigid-body dynamics with controllable integration step sizes. RigidFormer reasons at the object level and advances each object through compact anchors; Anchor-Vertex Pooling enriches these anchors with local vertex features, retaining contact-relevant geometry without dense vertex-level interaction. We propose Anchor-based RoPE to inject anchor geometry into attention while respecting the unordered nature of objects and anchors: object-token processing is permutation-equivariant, and the mean-pooled anchor descriptor is invariant to anchor reindexing while preserving shape extent. RigidFormer further enforces rigidity by projecting updates onto the rigid-body manifold using differentiable Kabsch alignment. On standard benchmarks, RigidFormer outperforms or matches mesh-based baselines using point inputs, runs faster, generalizes to unseen point resolutions and across datasets, and scales to 200+ objects; we also show a preliminary extension to command-conditioned articulated bodies by treating body parts as interacting object-level components.
comment: Project Page: https://people.csail.mit.edu/frankzydou/projects/RigidFormer/index.html
SHIELD: Scalable Optimal Control with Certification using Duality and Convexity
We present SHIELD, a hierarchical algorithm that reduces both the decision-variable dimension and the constraint set in $\ell_1$-regularized convex programs. From strong convexity and Lagrangian duality, we derive certificates that \emph{safely} discard constraints and decision variables while guaranteeing that all removed constraints remain satisfied and all removed variables are null. To further accelerate the proposed algorithm, we propose a transformer-based deep neural network to guide the dual certificate inference. We validate SHIELD on stochastic model predictive control (SMPC) in complex, multi-modal traffic scenarios, comparing against a full-dimensional SMPC policy. Numerical simulations demonstrate order-of-magnitude computational speedups while preserving feasibility and closed-loop safety, highlighting the practicality of certifiably safe, lightweight MPC in complex driving scenes.
Beyond Self-Play: Hierarchical Reasoning for Continuous Motion in Closed-Loop Traffic Simulation
Closed-loop traffic simulation requires agents that are both scalable and behaviorally realistic. Recent self-play reinforcement learning approaches demonstrate strong scalability, but their equilibrium strategies fail to capture the socially aware behaviors of real human drivers. We propose a hierarchical architecture that goes beyond self-play by combining high-level multi-agent interaction reasoning with low-level continuous trajectory realization. Specifically, a Stackelberg-style Multi-Agent Reinforcement Learning (MARL) module generates interaction-aware intention commands. These commands condition a low-level continuous motion module, translating the strategic intent into physically consistent, scene-responsive control sequences. To mitigate distribution shift in closed-loop deployment, we introduce a hybrid co-training scheme combining MARL with auxiliary recovery supervision. Experiments on a SUMO-based urban network demonstrate that the proposed framework achieves superior control smoothness and safety compared to self-play and passive imitation baselines, while maintaining competitive traffic efficiency.
comment: Submitted to IEEE Robotics and Automation Letters (RA-L)
IMPACT: An Implicit Active-Set Augmented Lagrangian for Fast Contact-Implicit Trajectory Optimization
Contact-implicit trajectory optimization (CITO) has attracted growing attention as a unified framework for planning and control in contact-rich robotic tasks. Recent approaches have demonstrated promising results in manipulation and locomotion without requiring a prescribed contact-mode schedule. It is well known that the underlying mathematical programs with complementarity constraints (MPCCs) remain numerically ill-conditioned, and systematic, scalable solution strategies for CITO remain an active area of research. More efficient and principled solvers that can handle contact constraints are therefore essential to broaden the applicability of CITO. In this work, we develop an augmented-Lagrangian approach to CITO for solving MPCC-based CITO with stationarity guarantees. The method can be interpreted as identifying the implicit contact-mode branches on the fly during the trajectory optimization (TO) iterations; we call this approach IMPACT (IMPlicit contact ACtive-set Trajectory optimization). We provide an efficient C++ implementation tailored to trajectory-optimization workloads and evaluate it on the open-source CITO and contact-implicit model predictive control (CI-MPC) benchmarks. On CITO, IMPACT achieves 2.9x-70x speedups over strong baselines (geometric mean 13.8x). On CI-MPC, we show improved control quality for contact-rich trajectories on dexterous manipulation tasks in simulation. Finally, we demonstrate the proposed method on real robotic hardware on a T-shaped object pushing task.
comment: Accepted to Robotics: Science and Systems (RSS), 2026
HyDRA Scorpion: A Cost-effective and Modular ROV for Real-Time Underwater Inspection, Intervention, and Object Detection
A Remotely Operated Vehicle (ROV) is a tethered underwater robot used for tasks like inspection and intervention. While essential tools for underwater science, the high cost of commercial ROVs and a persistent gap between mechanically capable platforms and those with integrated intelligence create a significant barrier to access. HyDRA Scorpion differs from conventional systems by addressing these challenges, integrating an advanced, AI-driven perception stack with in-situ measurement capabilities onto a low-cost, locally manufacturable platform. The system combines 4-DoF maneuverability, dual manipulators, and a custom pressure-tested housing. Experimental results validate the system's robustness and performance. Leak-free operation was confirmed through prolonged pressure testing of the electronics housing to 4 bar, equivalent to the pressure of a 304.8-meter water depth approximately in a simulated environment, with no moisture ingress detected. The vehicle also demonstrated stable station-keeping, maintaining its position within a tight tolerance of $\(\pm\)0.15$ meters under external disturbances. The onboard AI module achieved underwater object detection mean Average Precision (mAP) of 0.89 with real-time inference, length and 3D-mapping based distance measurement. Also, 4-DoF manipulator arm can grip and maintain dual-function manipulator feature which support 360 degree tangle-free rotation.
comment: 9 Pages, 11 figures, Research Paper by UIU Mariner Team
Smoothing Out the Edges: Continuous-Time Estimation with Gaussian Process Motion Priors on Factor Graphs
Continuous-time state estimation is gaining in popularity due to its abilities to provide smooth solutions, handle asynchronous sensors, and interpolate between data points. While there are two main paradigms, parametric (e.g., temporal basis functions, splines) and nonparametric (Gaussian processes), the latter has seen less adoption despite its technical advantages and relative ease of implementation. In this article, we seek to rectify this situation by providing a new simplified explanation of GP continuous-time estimation rooted in the language of factor graphs, which have become the de facto estimation paradigm in much of robotics. To simplify onboarding, we also provide three working examples implemented in the popular GTSAM estimation framework.
Octopus Protocol: One-Shot Hardware Discovery and Control for AI Agents via Infrastructure-as-Prompts
Recent agentic-robotics systems, from Code-asPolicies to modern vision-language-action (VLA) foundation models, presuppose that drivers, SDKs, or ROS-style primitives for the target hardware already exist. Writing those primitives is the dominant engineering cost of bringing up new hardware for agent control. We present Octopus Protocol, a system that collapses that cost to a single shell command. Given only raw OS access and a language-model API key, a coding agent executes a five-stage pipeline--PROBE, IDENTIFY, INTERFACE, SERVE, DEPLOY--to discover connected devices, infer their capabilities, generate a Model Context Protocol (MCP) server with typed tools, and deploy it as a live HTTP endpoint. A persistent daemon then monitors the system, heals broken code, and perceives physical state through the camera tools it generated for itself. Two architectural principles make this work: protocols are prompts, not code, and the coding agent is the runtime. We validate the system on three heterogeneous platforms (PC/WSL, Apple Silicon macOS, Raspberry Pi 4) and on a commercial 6-DOF robotic arm with USB camera feedback. One command onboards the hardware in ~10-15 minutes and exposes up to 30 MCP tools; an MCP-compliant client then performs closed-loop visual-motor control through tools no human wrote.
Automated Robotic Moisture Monitoring in Agricultural Fields
Monitoring moisture level of land in a large-scale plantation is tedious. The main objective of this project is to use a robotic kit in collaboration with the on-field moisture sensor circuits, thereby creating an efficient and economical moisture monitoring system. A large agriculture field is divided into smaller grids. Each grid is placed with a moisture sensor. Whenever a sensor reports the soil to be dry, the robot goes to the concerned field for inspection. The path to the concerned field is found by applying Dijkstra's shortest path algorithm on the aerial image of the field. Then the total moisture content of the field is calculated by the robot using suitable image processing algorithms and reported accordingly. For developing and testing this work, a small study field was set up above which a camera was mounted at an appropriate height to capture its aerial view. Thus a prototype for an automated system of monitoring agricultural fields' moisture has been developed through this work.
comment: 2018 International Seminar on Intelligent Technology and Its Applications (ISITIA)
Terminal Matters: Kinodynamic Planning with a Terminal Cost and Learned Uncertainty in Belief State-Cost Space
In many real-world robotic tasks, robots must generate dynamically feasible motions that reliably reach desired goals even under uncertainty. Yet existing sampling-based kinodynamic planners typically optimize accumulated trajectory costs and treat goal reaching as a feasibility check, rather than explicitly optimizing terminal-state quality, such as goal preference or goal-reaching reliability. In this work, we introduce a terminal-cost formulation for kinodynamic planning that allows terminal-state quality to be optimized alongside accumulated trajectory cost. We prove that AO-RRT, an asymptotically optimal kinodynamic planner, preserves its asymptotic optimality under this augmented objective. We further extend the formulation to belief space and prove that minimizing the Wasserstein distance between the terminal belief and the goal improves a lower bound on the probability of reaching the goal region. The resulting planner, KiTe, uses this terminal-cost objective to encode goal preferences and improve reliability under uncertainty. To support systems without analytical uncertainty models, we learn dynamics and process uncertainty directly from data and integrate the learned belief dynamics into planning. Experiments on Flappy Bird, Car Parking, and Planar Pushing show that KiTe consistently improves goal-reaching success under uncertainty. Real-world Planar Pushing experiments further demonstrate that KiTe can plan effectively with learned dynamics and uncertainty. Source code is available at https://github.com/elpis-lab/KiTe.
Towards Backdoor-Based Ownership Verification for Vision-Language-Action Models
Vision-Language-Action models (VLAs) support generalist robotic control by enabling end-to-end decision policies directly from multi-modal inputs. As trained VLAs are increasingly shared and adapted, protecting model ownership becomes essential for secure deployment and responsible open-source usage. In this paper, we present GuardVLA, the first backdoor-based ownership verification framework specifically designed for VLAs. GuardVLA embeds a stealthy and harmless backdoor watermark into the protected model during training by injecting secret messages into embodied visual data. For post-release verification, we propose a swap-and-detect mechanism, in which the trigger projector and an external classifier head are used to activate and detect the embedded backdoor based on prediction probabilities. Extensive experiments across multiple datasets, model architectures, and adaptation settings demonstrate that GuardVLA enables reliable ownership verification while preserving benign task performance. Further results show that the embedded watermark remains detectable under post-release model adaptation.
A low-cost mockup to simulate robotic laser cutting in nuclear decommissioning
This paper introduces a low-cost experimental mockup to simulate the laser cutting process of containers in nuclear decommissioning. It is composed of a three-axis table supporting a cuboid container with ultraviolet-sensitive faces, a six-degree-of-freedom serial manipulator holding an ultraviolet torch that simulates the laser, and a visual system based on cameras and fiducial markers. The system employs a constrained task-space adaptive motion controller that compensates for inaccurate parameters and eliminates the need to calibrate the system. Furthermore, as the motion controller explicitly accounts for geometric constraints, the robot reactively avoids collisions with obstacles while handling the ultraviolet torch. To enhance tracking of the laser-cutting path, we control the ultraviolet beam, which requires only four degrees of freedom, instead of the full end-effector pose. Experiments show that, despite an initially uncalibrated system, the overall system is capable of tracking different trajectories with an overall mean accuracy of 3.9 (sd 2.5) mm when the end-effector pose is controlled and 2.4 (sd 1.3) mm when the ultraviolet beam is controlled.
comment: 7 pages, 8 figures, 2 tables. Under Review for TAROS 2026 (Towards Autonomous Robotic Systems)
Raymoval: Raycasting-based Dynamic Object Removal for Static 3D Mapping
Static mapping is fundamental to robot navigation, providing a persistent geometric prior and a consistent reference for long-term autonomy. However, dynamic objects leave residual traces and cause surface loss, which reduces map consistency. We propose a raycasting-based module for dynamic object removal in static 3D mapping. Each scan is projected onto an azimuth-elevation grid, and for every viewing direction we compare the bin-wise minimum range with the map's first-hit distance computed by raycasting. Furthermore, we apply a raycast consistency test that separates dynamic from static points. Finally, a spatial consistency validation step refines labels, producing static maps with lower residual dynamics and reduced over-removal. We evaluate our approach quantitatively and qualitatively on SemanticKITTI and a challenging custom dataset, and show consistent static mapping results.
comment: 12 pages, 5 figures, 3 tables, Presented at RiTA 2025
VISTA: A Benchmark for Real-Time Video Streaming under Network Impairments in Surgical Teleoperation ICRA 2026
Real-time video streaming is crucial in surgical teleoperation, yet reproducible evaluation under realistic network impairments remains limited. This paper presents VISTA, a benchmark designed to study how impairments along the forward video path affect received video quality, temporal continuity, and human task performance. VISTA employs Linux Traffic Control with NetEm and a Gilbert-Elliott loss model to emulate five network conditions: Hospital LAN, 5G Urban, 4G Rural, LEO Satellite, and GEO Satellite. The benchmark integrates a standardised peg transfer task with synchronized measurements of network quality of service (QoS), objective video quality (PSNR, SSIM, and VMAF), and temporal continuity through freeze rate, while maintaining a stable reverse control channel. Across 375 experimental trials, network degradation substantially reduced teleoperation performance: success rate decreased from 97% in Hospital LAN to 79% in 5G Urban, 35% in 4G Rural, 71% in LEO Satellite, and 12% in GEO Satellite, while mean task completion time for successful trials increased from 80 s in Hospital LAN to 117 s in 5G Urban, 211 s in 4G Rural, 152 s in LEO Satellite, and 255 s in GEO Satellite. These findings show that network impairments have a direct impact on task completion and success in surgical teleoperation, and provide a reproducible basis for evaluating teleoperation video under realistic network constraints. Source code available at https://github.com/Dzxx623/VISTA.
comment: Oral presentation at the Connected Autonomous Robotic Systems Workshop, ICRA 2026
Preserving Foundational Capabilities in Flow-Matching VLAs through Conservative SFT
Unconstrained fine-tuning of flow-matching Vision-Language-Action (VLA) models drives dense parameter overwrites, degrading pre-trained capabilities. We present Conservative Supervised Fine-Tuning (ConSFT), an optimization objective that adapts to target distributions while mitigating catastrophic forgetting, requiring zero prior data or architectural overhead. By dynamically scaling learning signals based on model confidence, ConSFT suppresses excessive gradients from low-confidence samples to prevent disproportionate parameter updates, thereby bounding the intrinsic parameter disruption risk. Inspired by reinforcement learning's trust-region clipping, this formulation establishes a progressive learning dynamic to secure target convergence and prior capability retention, maintaining sparse parameter updates without relying on the parallel reference networks required by explicit regularization. We evaluate ConSFT on the LIBERO and RoboTwin benchmarks across state-of-the-art flow-matching VLAs ($π_0$, $π_{0.5}$, and GR00T-N1.6-3B). The method outperforms vanilla SFT in capability retention by an average absolute margin of over 20\%, matching the efficacy of data-heavy Experience Replay in a prior-data-free regime. Real-world robotic deployments confirm that ConSFT precludes spatial overfitting during downstream adaptation, preserving pre-trained physical skills while acquiring sequential target tasks.
comment: 20 pages, 9 figures
AssemPlanner: A Multi-Agent Based Task Planning Framework for Flexible Assembly System
In flexible assembly systems, existing task planning methods require a time-consuming configuration process by multiple experts to establish a production line for a new product. To address this challenge, we propose a multi-agent based task planning framework for flexible assembly systems, denoted as AssemPlanner. It takes tasks described in natural language as input, which are then converted into actionable sequential production operations. It comprises several specialized agents, including SchedAgent , KnowledgeAgent, LineBalanceAgent, and a scene graph. Within the proposed framework, SchedAgent serves as the central reasoning engine. Departing from traditional static pipelines, AssemPlanner utilizes a ReAct-based SchedAgent to adaptively adjust actions via multi-agent feedback. By observing the feedback from KnowledgeAgent, LineBalanceAgent, and the scene graph, it autonomously resolves complex industrial process constraints. To facilitate reproducibility, all code and datasets are released at https://github.com/chz332/Assemplanner.
VECTOR-Drive: Tightly Coupled Vision-Language and Trajectory Expert Routing for End-to-End Autonomous Driving
End-to-end autonomous driving requires models to understand traffic scenes, infer driving intent, and generate executable motion plans. Recent vision-language-action (VLA) models inherit semantic priors from large-scale vision-language pretraining, yet still face a coupling trade-off: fully shared backbones preserve multimodal interaction but may entangle language reasoning and trajectory prediction, whereas decou pled reasoning-action pipelines reduce task conflict but weaken semantic-motion coupling. We propose VECTOR-DRIVE, a tightly coupled VLA framework built on Qwen2.5-VL-3B. VECTOR-DRIVE keeps all tokens coupled through shared self attention and routes feed-forward computation according to token semantics. Vision and language tokens are processed by a Vision-Language Expert to preserve semantic priors, while target-point, ego-state, and noisy action tokens are routed to a Trajectory Expert for motion-specific computation. On the action-token pathway, a flow-matching planner refines noisy action tokens into future waypoints and speed profiles. This design couples semantic reasoning and motion planning within a single multimodal Transformer while separating task-specific FFN computation. On Bench2Drive, VECTOR-DRIVE achieves 88.91 Driving Score and outperforms representative end-to end and VLA-based baselines. Qualitative results and ablations further validate the benefits of shared attention, semantic-aware expert routing, progressive training, and flow-based action de coding.
Constraint-Aware Diffusion Priors for High-Fidelity and Versatile Quadruped Locomotion
Reinforcement learning combined with imitation learning has significantly advanced biomimetic quadrupedal locomotion. However, scaling these frameworks to massive, multi-source datasets exposes fundamental bottlenecks. First, traditional GAN-based discriminators are prone to mode collapse, struggling to capture diverse motion distributions from uncurated datasets. Second, existing kinematic priors suffer from out-of-distribution (OOD) tracking conflicts, leading to severe unintended heading drifts during complex maneuvers. Furthermore, deploying unconstrained priors to physical hardware poses critical safety risks by disregarding actuator dynamics. To overcome these challenges, we propose Diff-CAST (Diffusion-guided Constraint-Aware Symmetric Tracking), a novel motion prior framework leveraging the multi-modal distribution modeling capabilities of diffusion models for stylistic rewards. Diff-CAST effectively replaces traditional GAN discriminators, unlocking robust data scaling on heterogeneous collections. To ensure high-fidelity intent execution and reliable real-world deployment, we introduce a comprehensive Sim2Re architecture integrating Symmetric Augmented Command Conditioning (SACC) for drift-free tracking, and Constrained RL for hardware safety. Experiments on a quadruped demonstrate that Diff-CAST mitigates mode collapse, enables seamless transitions between diverse skills, and ensures robust, hardware-compliant locomotion.
ElasticFlow: One-Step Physics-Consistent Policy with Elastic Time Horizons for Language-Guided Manipulation ACL 2026
Diffusion policies have demonstrated exceptional performance in embodied AI. However, their iterative denoising process results in high latency, and existing acceleration methods often sacrifice physical consistency. To address this, we propose ElasticFlow, a distillation-free, physics-consistent one-step policy framework. We reconstruct the Mean Field Theory by directly modeling the average velocity field, enabling a direct single-step mapping from noise to action. Addressing the Temporal Heterogeneity of robotic tasks, we introduce the Elastic Time Horizons mechanism. This mechanism effectively overcomes Spectral Bias by explicitly encoding control granularity, achieving efficient alignment between semantic instructions and physical execution horizons. Experiments on benchmarks such as LIBERO, CALVIN, and RoboTwin demonstrate that ElasticFlow achieves efficient 1-NFE inference (approximately 71Hz). Furthermore, it outperforms state-of-the-art methods, including OpenVLA and $π_0$, on long-horizon tasks, highlighting its potential for efficient, robust, and semantically aligned control.
comment: Accepted to Findings of ACL 2026
ProcVLM: Learning Procedure-Grounded Progress Rewards for Robotic Manipulation
Long-horizon robotic manipulation requires dense feedback that reflects how a task advances through its procedural stages, not merely whether the final outcome is successful. Existing reward models often rely on trajectory-level success labels or time-based interpolation, which can conflate elapsed time with true task progress and therefore fail to capture unfinished steps, stagnation, and failure states. We present ProcVLM, a progress-aware vision-language model that learns procedure-grounded progress as a dense reward signal for manipulation. Rather than deriving progress from terminal outcomes or temporal proxies, ProcVLM grounds progress estimation in procedural structure and intra-stage visual change, and further adopts a reasoning-before-estimation paradigm that infers the remaining atomic actions before estimating task progress. Specifically, we construct this supervision by synthesizing frame-level subtask-semantic annotations, assigning progress budgets according to subtask structure, and distributing each budget based on intra-subtask visual change. To train ProcVLM at scale, we build a standardized procedural supervision synthesis pipeline and construct ProcCorpus-60M from 30 embodied datasets with 60M annotated frames, from which we derive ProcVQA for procedure-aware pretraining, with progress estimation as the central task alongside action segmentation and future planning. Experiments on ProcVQA and reward-model benchmarks show that ProcVLM improves embodied procedural reasoning and yields more discriminative trajectory-internal progress estimates than representative baselines, supporting its use as a dense reward model for downstream reward-guided policy optimization. Project page: https://procvlm.github.io/
Omni-scale Learning-based Sequential Decision Framework for Order Fulfillment of Tote-handling Robotic Systems
Driven by the rapid expansion of e-commerce and small-batch production, the size of the intralogistics load unit of finished goods, semi-finished goods and raw materials is steadily shrinking. Totes are gradually replacing pallets as the primary handling and storage container. This shift has propelled tote-handling robotic systems to the forefront of automation order fulfillment centers. The order-fulfillment decisions of tote-handling robotic systems share a common order-tote-robot sequential decision-making nature. Existing studies primarily focus on decision mechanisms tailored to particular systems, making it difficult to generalize or transfer them to other contexts. We propose an Omni-scale Learning-based Sequential Decision Framework for Order Fulfillment of Tote-handling Robotic Systems (OLSF-TRS), a generalized and scalable sequential decision framework that combines structured combinatorial optimization with multi-agent reinforcement learning to coordinate order,tote, and robot decisions. On small-scale tote-handling robotic systems, OLSF-TRS achieves near-optimal performance with average optimality gaps below 3.5% across two distinct system configurations. In large-scale scenarios, OLSF-TRS consistently outperforms heuristic baselines across two different system types, reducing total tote movements by 8-12% and over 30% compared to SOTA rule-based approaches, while maintaining real-time responsiveness. These improvements translate into tangible operational benefits, including cost reduction, lower energy consumption, and enhanced throughput stability. The proposed framework delivers an efficient and unified order fulfillment decision-making framework for widely deployed tote-handling robotic systems,supporting high-quality order fulfillment in both e-commerce and industrial logistics sectors.
comment: 35 pages, 5 figures
A Visuo-Tactile Data Collection System with Haptic Feedback for Coarse-to-Fine Imitation Learning
We present a visuo-tactile data-collection system that generates temporally structured, contact-rich demonstrations for imitation learning. Conventional systems often decouple the operator from contact forces, which hinders the demonstration of subtle force modulation. Our system introduces a direct-drive gripper that the operator actuates with the fingers, preserving natural haptic feedback. Integrated visual sensors and custom tactile arrays capture image streams and contact geometry. A handle-mounted push button enables the operator to annotate the task's temporal structure in real time by marking task-critical regions. By fusing in-hand force perception with in-situ temporal annotation, the system produces multimodal datasets designed for coarse-to-fine learning algorithms that exploit structural task knowledge, enabling the development of high-quality manipulation policies.
GameChat: Multi-LLM Dialogue for Safe, Agile, and Socially Optimal Multi-Agent Navigation in Constrained Environments
Safe, agile, and socially compliant multi-robot navigation in cluttered and constrained environments remains a critical challenge. This is especially difficult with self-interested agents with unique, unknown priorities in decentralized settings, where there is no central authority to resolve conflicts induced by spatial symmetry. We address this challenge by proposing an intuitive, but very effective approach, GameChat, which facilitates safe, agile, and deadlock-free navigation for both cooperative and self-interested agents in cluttered environments. Key to our approach is the idea that agents should resolve conflicts on their own using natural language to communicate, much like humans. We evaluate GameChat in simulated environments with doorways and intersections. The results show that even in the worst case, GameChat reduces the time for all agents to reach their goals by over 35% from a naive baseline and by over 20% from a state of the art baseline in the intersection scenario, while doubling the rate of ensuring the agent with a higher priority task reaches the goal first, from 50% (equivalent to random chance) to 100%. We also demonstrate how GameChat can be extended to more than two agents.
Safe and Real-Time Consistent Planning for Autonomous Vehicles in Partially Observed Environments via Parallel Consensus Optimization
Ensuring safety and driving consistency is a significant challenge for autonomous vehicles operating in partially observed environments. This work introduces a consistent parallel trajectory optimization (CPTO) approach to enable safe and consistent driving in dense obstacle environments with perception uncertainties. Utilizing discrete-time barrier function theory, we develop a consensus safety barrier module that ensures reliable safety coverage within the spatiotemporal trajectory space across potential obstacle configurations. Following this, a bi-convex parallel trajectory optimization problem is derived that facilitates decomposition into a series of low-dimensional quadratic programming problems to accelerate computation. By leveraging the consensus alternating direction method of multipliers (ADMM) for parallel optimization, each generated candidate trajectory corresponds to a possible environment configuration while sharing a common consensus trajectory segment. This ensures driving safety and consistency when executing the consensus trajectory segment for the ego vehicle in real time. We validate our CPTO framework through extensive comparisons with state-of-the-art baselines across multiple driving tasks in partially observable environments. Our results demonstrate improved safety and consistency using both synthetic and real-world traffic datasets.
comment: 16 pages, 7 figures
Supervised Mixture-of-Experts for Surgical Grasping and Retraction
Imitation learning has achieved remarkable success in robotic manipulation, yet its application to surgical robotics remains challenging due to data scarcity, constrained workspaces, and the need for an exceptional level of safety and predictability. We present a supervised Mixture-of-Experts (MoE) architecture designed for phase-structured surgical manipulation tasks, which can be added on top of any autonomous policy. Unlike prior surgical robot learning approaches that rely on multi-camera setups or thousands of demonstrations, we show that a lightweight action decoder policy like Action Chunking Transformer (ACT) can learn complex, long-horizon manipulation from less than 150 demonstrations using solely stereo endoscopic images, when equipped with our architecture. We evaluate our approach on the collaborative surgical task of bowel grasping and retraction, where a robot assistant interprets visual cues from a human surgeon, executes targeted grasping on deformable tissue, and performs sustained retraction. Our results show that generalist Vision Language Action models fail to acquire the task entirely, even under standard in-distribution conditions. Furthermore, while standard ACT achieves moderate success in-distribution, adopting a supervised MoE architecture significantly boosts its performance, yielding higher success rates in-distribution and demonstrating superior robustness in out-of-distribution scenarios, including novel grasp locations, reduced illumination, and partial occlusions. Notably, it generalizes to unseen testing viewpoints and also transfers zero-shot to ex vivo porcine tissue without additional training, offering a promising pathway toward in vivo deployment. To support this statement, we present qualitative preliminary results of policy roll-outs during in vivo porcine surgery.
comment: Accepted at Robotics:Science and Systems 2026
MapNav: A Novel Memory Representation via Annotated Semantic Maps for Vision-and-Language Navigation
Vision-and-language navigation (VLN) is a key task in Embodied AI, requiring agents to navigate diverse and unseen environments while following natural language instructions. Traditional approaches rely heavily on historical observations as spatio-temporal contexts for decision making, leading to significant storage and computational overhead. In this paper, we introduce MapNav, a novel end-to-end VLN model that leverages Annotated Semantic Map (ASM) to replace historical frames. Specifically, our approach constructs a top-down semantic map at the start of each episode and update it at each timestep, allowing for precise object mapping and structured navigation information. Then, we enhance this map with explicit textual labels for key regions, transforming abstract semantics into clear navigation cues and generate our ASM. MapNav agent using the constructed ASM as input, and use the powerful end-to-end capabilities of VLM to empower VLN. Extensive experiments demonstrate that MapNav achieves state-of-the-art (SOTA) performance in both simulated and real-world environments, validating the effectiveness of our method. Moreover, we will release our ASM generation source code and dataset to ensure reproducibility, contributing valuable resources to the field. We believe that our proposed MapNav can be used as a new memory representation method in VLN, paving the way for future research in this field.
Tempered Sequential Monte Carlo for Trajectory and Policy Optimization with Differentiable Dynamics
We propose a sampling-based framework for finite-horizon trajectory and policy optimization under differentiable dynamics by casting controller design as inference. Specifically, we minimize a KL-regularized expected trajectory cost, which yields an optimal "Boltzmann-tilted" distribution over controller parameters that concentrates on low-cost solutions as temperature decreases. To sample efficiently from this sharp, potentially multimodal target, we introduce tempered sequential Monte Carlo (TSMC): an annealing scheme that adaptively reweights and resamples particles along a tempering path from a prior to the target distribution, while using Hamiltonian Monte Carlo rejuvenation to maintain diversity and exploit exact gradients obtained by differentiating through trajectory rollouts. For policy optimization, we extend TSMC via (i) a deterministic empirical approximation of the initial-state distribution and (ii) an extended-space construction that treats rollout randomness as auxiliary variables. Experiments across trajectory- and policy-optimization benchmarks show that TSMC is broadly applicable and compares favorably to state-of-the-art baselines.
comment: Robotics: Science and Systems 2026
When to Trust Imagination: Adaptive Action Execution for World Action Models
World Action Models (WAMs) have recently emerged as a promising paradigm for robotic manipulation by jointly predicting future visual observations and future actions. However, current WAMs typically execute a fixed number of predicted actions after each model inference, leaving the robot blind to whether the imagined future remains consistent with the actual physical rollout. In this work, we formulate adaptive WAM execution as a future-reality verification problem: the robot should execute longer when the WAM-predicted future remains reliable, and replan earlier when reality deviates from imagination. To this end, we propose Future Forward Dynamics Causal Attention (FFDC), a lightweight verifier that jointly reasons over predicted future actions, predicted visual dynamics, real observations, and language instructions to estimate whether the remaining action rollout can still be trusted. FFDC enables adaptive action chunk sizes as an emergent consequence of prediction-observation consistency, preserving the efficiency of long-horizon execution while restoring responsiveness in contact-rich or difficult phases. We further introduce Mixture-of-Horizon Training to improve long-horizon trajectory coverage for adaptive execution. Experiments on the RoboTwin benchmark and in the real world demonstrate that our method achieves a strong robustness-efficiency trade-off: on RoboTwin, it reduces WAM forward passes by 69.10% and execution time by 34.02%, while improving success rate by 2.54% over the short-chunk baseline; in real-world experiments, it improves success rate by 35%.
Good in Bad (GiB): Sifting Through End-user Demonstrations for Learning a Better Policy
Imitation learning offers a promising framework for enabling robots to acquire diverse skills from human users. However, most imitation learning algorithms assume access to high-quality demonstrations an unrealistic expectation when collecting data from non-expert users, whose demonstrations often contain inadvertent errors. Naively learning from such demonstrations can result in unsafe policy behavior, while discarding entire demonstrations due to occasional mistakes wastes valuable data, especially in low-data settings. In this work, we introduce GiB (Good-in-Bad), an algorithm that automatically identifies and discards erroneous subtasks within demonstrations while preserving high-quality subtasks. The filtered data can then be used by any policy learning algorithm to train more robust policies. GiB first trains a self-supervised model to learn latent features and assigns binary weights to label each demonstration as good or bad. It then models the latent feature distribution of high-quality segments and uses the Mahalanobis distance to detect and evaluate poor-quality subtasks. We validate GiB on the Franka robot in both simulated and real-world multi-step tasks, demonstrating improved policy performance when learning from mixed-quality human demonstrations.
Decentralized Heterogeneous Multi-Robot Collaborative Exploration for Indoor and Outdoor 3D Environments
Heterogeneous multi-robot systems feature significant adaptability for complex environments. However, effective collaboration that fully exploits the robots' potential remains a core challenge. This paper proposes a decentralized collaborative framework for heterogeneous multi-robot systems to autonomously explore indoor and outdoor 3D environments. First, a basic perception map that integrates terrain and observation metrics is designed. Improved supervoxel segmentation is developed to simplify the map structure and form a high-level representation that supports lightweight communication. Second, the traversal and observation capabilities of heterogeneous robots are modeled to evaluate the requirements of task views derived from incomplete supervoxels. These task views are grouped by requirements and clustered to streamline assignment. Subsequently, the view-cluster assignment is formulated as a heterogeneous multi-depot multi-traveling salesman problem (HMDMTSP) that incorporates constraints between view-cluster requirements and robot capabilities. An improved genetic algorithm is developed to efficiently solve this problem while ensuring global consistency. Based on the assignments, redundant views within clusters are eliminated to refine exploration routes. Finally, conflicts between robots' motion paths are resolved. Simulations and field experiments in cluttered indoor and outdoor environments demonstrate that our approach effectively coordinates exploration tasks among heterogeneous robots, achieving superior exploration efficiency and communication savings compared to state-of-the-art approaches.
Force Policy: Learning Hybrid Force-Position Control Policy under Interaction Frame for Contact-Rich Manipulation
Contact-rich manipulation demands human-like integration of perception and force feedback: vision should guide task progress, while high-frequency interaction control must stabilize contact under uncertainty. Existing learning-based policies often entangle these roles in a monolithic network, trading off global generalization against stable local refinement, while control-centric approaches typically assume a known task structure or learn only controller parameters rather than the structure itself. In this paper, we formalize a physically grounded interaction frame, an instantaneous local basis that decouples force regulation from motion execution, and propose a method to recover it from demonstrations. Based on this, we address both issues by proposing Force Policy, a global-local vision-force policy in which a global policy guides free-space actions using vision, and upon contact, a high-frequency local policy with force feedback estimates the interaction frame and executes hybrid force-position control for stable interaction. Real-world experiments across diverse contact-rich tasks show consistent gains over strong baselines, with more robust contact establishment, more accurate force regulation, and reliable generalization to novel objects with varied geometries and physical properties, ultimately improving both contact stability and execution quality. Project page: https://force-policy.github.io/
comment: accepted by RSS 2026
VP-VLA: Visual Prompting as an Interface for Vision-Language-Action Models
Vision-Language-Action (VLA) models typically map visual observations and linguistic instructions directly to control signals. This "black-box" mapping forces a single forward pass to simultaneously handle instruction interpretation, spatial grounding, and low-level control, often leading to poor spatial precision and limited robustness in out-of-distribution scenarios. To address these limitations, we propose VP-VLA, a dual-system framework that decouples high-level reasoning and low-level execution via a structured visual prompting interface. Specifically, a "System 2 Planner" decomposes complex instructions into sub-tasks and identifies relevant target objects and goal locations. These spatial anchors are rendered directly within the native RGB observation space as modality-consistent visual prompts, such as crosshairs and bounding boxes. This avoids the modality mismatch introduced by dense masks, affordance maps, or additional control-specific representations. Guided by these prompts and enhanced by a novel auxiliary visual grounding objective during training, a "System 1 Controller" reliably generates precise low-level execution motions. Extensive experiments in simulation and real world demonstrate that VP-VLA surpasses state-of-the-art end-to-end baselines including QwenOFT and GR00T-N1.6. Project page: https://visualprompt-vla.github.io/
comment: Project page: https://visualprompt-vla.github.io/
Constraint-Aware Reinforcement Learning via Adaptive Action Scaling
Safe reinforcement learning (RL) seeks to mitigate unsafe behaviors that arise from exploration during training by reducing constraint violations while maintaining task performance. Existing approaches typically rely on a single policy to jointly optimize reward and safety, which can cause instability due to conflicting objectives, or they use external safety filters that override actions and require prior system knowledge. In this paper, we propose a modular cost-aware regulator that scales the agent's actions based on predicted constraint violations, preserving exploration through smooth action modulation rather than overriding the policy. The regulator is trained to minimize constraint violations while avoiding degenerate suppression of actions. Our approach integrates seamlessly with off-policy RL methods such as SAC and TD3, and achieves state-of-the-art return-to-cost ratios on Safety Gym locomotion tasks with sparse costs, reducing constraint violations by up to 126 times while increasing returns by over an order of magnitude compared to prior methods.
comment: Accepted in 8th Annual Learning for Dynamics & Control Conference (L4DC)
Multiagent Systems
Learning the Preferences of a Learning Agent ICLR 2026
For AI systems to be useful to humans, they must understand and act in accordance with our values and preferences. Since specifying preferences is a hard task, inverse reinforcement learning (IRL) aims to develop methods that allow for inferring preferences from observed behavior. However, IRL assumes the human to be approximately optimal. This is a big limitation in cases where the human themselves may be learning to act optimally in an environment. In this paper, we formalize the problem of learning the preferences of a learning agent: a predictor observes a learner acting online and tries to infer the underlying reward function being (initially suboptimally) optimized by the learner. We model the learner as either being no-regret, or as converging to an optimal Boltzmann policy over time. In each of these settings, we establish theoretical guarantees for various preference learning algorithms, or otherwise show that such guarantees are impossible.
comment: Published at ICLR 2026, Workshop on Multi-Agent Learning and Its Opportunities in the Era of Generative AI. 9 pages main text
MCP-Cosmos: World Model-Augmented Agents for Complex Task Execution in MCP Environments
The Model Context Protocol (MCP) has unified the interface between Large Language Models (LLMs) and external tools, yet a fundamental gap remains in how agents conceptualize the environments within which they operate. Current paradigms are bifurcated: Task-level planning often ignores execution-time dynamics, while reactive execution lacks long-horizon foresight. We present MCP-Cosmos, a framework that infuses generative World Models (WM) into the MCP ecosystem to enable predictive task automation. By unifying three disparate technologies, namely MCP, World Model, and Agent, we demonstrate that a "Bring Your Own World Model" (BYOWM) strategy allows agents to simulate state transitions and refine plans in a latent space before execution. We conducted experiments using two strategies, namely ReAct and SPIRAL with 2 planning models and 3 representative world models over 20+ MCP-Bench tasks. We observed improvements in Agent's environment interaction KPI such as tool success rate and tool parameter accuracy. The framework also offers new metrics such as Execution Quality to generate new insights about the effectiveness of world models compared to baseline.
Internal vs. External: Comparing Deliberation and Evolution for Multi-Agent Constitutional Design
Multi-agent AI systems need behavioral constitutions, but it is unresolved whether such rules should emerge internally through agent self-governance or be discovered externally through optimization. We present the first controlled comparison of internal deliberation and external evolution across three social environments: a coordination grid-world, an iterated public goods game, and a bilateral trading market. Across 180 simulation runs, evolution significantly outperforms deliberation in collective-action settings (p < 0.01), while neither method improves outcomes in bilateral trading. A multiplier ablation reveals that evolution's advantage inverts when incentives shift: at pool multiplier (m = 0.75) the evolved constitution forces value-destroying cooperation and becomes the worst-performing method. Notably, no deliberation run across thirty trials ever proposed punishment -- the canonical cooperation-sustaining mechanism evolution reliably discovers -- suggesting external optimization wins on peaks while internal self-governance trades peaks for structural responsiveness.
comment: 20 pages
Robust Multi-Agent LLMs under Byzantine Faults
Large language model (LLM) agents increasingly collaborate over peer-to-peer networks to improve their reliability. However, these same interactions can also become a source of vulnerability, as unreliable or Byzantine agents may sway neighboring agents toward incorrect conclusions and degrade overall system performance. Existing methods rely on leader-based coordination or self-reported confidence, both of which are susceptible to adversarial manipulation. We study decentralized LLM multi-agent systems (LLM-MAS) and propose Self-Anchored Consensus (SAC), a fully decentralized iterative filter-and-refine protocol in which agents iteratively exchange responses, locally evaluate and filter unreliable messages, and refine their own outputs. We present $(F{+}1)$-robustness conditions for the communication graph that ensure honest agents preserve and propagate reliable information despite Byzantine influence. Experiments on mathematical and commonsense reasoning benchmarks show that SAC effectively suppresses Byzantine influence and consistently improves performance across diverse communication topologies, whereas prior methods degrade under adversarial conditions.
Octopus Protocol: One-Shot Hardware Discovery and Control for AI Agents via Infrastructure-as-Prompts
Recent agentic-robotics systems, from Code-asPolicies to modern vision-language-action (VLA) foundation models, presuppose that drivers, SDKs, or ROS-style primitives for the target hardware already exist. Writing those primitives is the dominant engineering cost of bringing up new hardware for agent control. We present Octopus Protocol, a system that collapses that cost to a single shell command. Given only raw OS access and a language-model API key, a coding agent executes a five-stage pipeline--PROBE, IDENTIFY, INTERFACE, SERVE, DEPLOY--to discover connected devices, infer their capabilities, generate a Model Context Protocol (MCP) server with typed tools, and deploy it as a live HTTP endpoint. A persistent daemon then monitors the system, heals broken code, and perceives physical state through the camera tools it generated for itself. Two architectural principles make this work: protocols are prompts, not code, and the coding agent is the runtime. We validate the system on three heterogeneous platforms (PC/WSL, Apple Silicon macOS, Raspberry Pi 4) and on a commercial 6-DOF robotic arm with USB camera feedback. One command onboards the hardware in ~10-15 minutes and exposes up to 30 MCP tools; an MCP-compliant client then performs closed-loop visual-motor control through tools no human wrote.
Not All Turns Matter: Credit Assignment for Multi-Turn Jailbreaking
Deploying LLMs in multi-turn dialogues facilitates jailbreak attacks that distribute harmful intent across seemingly benign turns. Recent training-based multi-turn jailbreak methods learn long-horizon attack strategies from interaction feedback, but often rely on coarse trajectory-level outcome signals that broadcast uniformly to every turn. However, we find that turn-level contributions in multi-turn jailbreaking are non-uniform, phase-dependent, and target-specific. Such coarse outcome supervision induces a credit assignment problem, leading to over-rewarding redundant turns in successful trajectories and under-crediting useful intermediate turns in failed ones. To address this, we propose TRACE, a turn-aware credit assignment framework for reinforcement learning (RL)-based multi-turn jailbreaking. For successful trajectories, TRACE estimates turn-level contributions via leave-one-turn-out semantic masking; for failed ones, TRACE assigns penalties based on prompt harmfulness and semantic relevance, with an additional local refusal-aware penalty. Furthermore, we reuse the attack-side credit signal for multi-turn defense alignment. Extensive experiments on open-source and closed-source targets show that TRACE achieves strong overall performance in effectiveness, transferability, and efficiency, yielding about a 25% relative improvement in attack success rate over the strongest RL baseline while also improving the safety-utility balance when reused for defense alignment.
comment: 41 pages, 10 figures
Beyond the All-in-One Agent: Benchmarking Role-Specialized Multi-Agent Collaboration in Enterprise Workflows
Large language model (LLM) agents are increasingly expected to operate in enterprise environments, where work is distributed across specialized roles, permission-controlled systems, and cross-departmental procedures. However, existing enterprise benchmarks largely evaluate single agents with broad tool access, while existing multi-agent benchmarks rarely capture realistic enterprise constraints such as role specialization, access control, stateful business systems, and policy-based approvals. We introduce \textsc{EntCollabBench}, a benchmark for evaluating enterprise multi-agent collaboration. \textsc{EntCollabBench} simulates a permission-isolated organization with 11 role-specialized agents across six departments and contains two evaluation subsets: a Workflow subset, where agents collaboratively modify enterprise system states, and an Approval subset, where agents make policy-grounded decisions. Evaluation is based on execution traces, database state verification, and deterministic policy adjudication rather than natural-language response judging. Experiments with representative LLM agents show that current models still struggle with end-to-end enterprise collaboration, especially in delegation, context transfer, parameter grounding, workflow closure, and decision commitment. \textsc{EntCollabBench} provides a reproducible testbed for measuring and improving agent systems intended for realistic organizational environments.
comment: 45 pages
Communicating Sound Through Natural Language
Natural language is widely used to describe, prompt, and control audio systems, but rarely serves as the representation carrying audio itself. We introduce lexical acoustic coding (LAC), a framework in which pre-trained LLM sender and receiver agents transmit sound through natural language. Under fixed system prompts, the agents write their own analysis and synthesis code, communicating only through a lexical sentence, shared vocabulary, and optional symbolic music structure. The sender analyzes an input waveform into interpretable, non-learned acoustic descriptors, quantizes each with a feature-specific interval vocabulary, and verbalizes the lexical code as English. The receiver parses the sentence back into lexical-acoustic constraints and renders a waveform through closed-loop refinement. The transmitted text serves as both a rich caption and as the transport representation itself. We frame LAC as a finite-rate lossy quantizer, exposing trade-offs between vocabulary size, rate, and fidelity. Experiments on short sounds and symbolic music transfer show that plain text preserves measurable acoustic structure while remaining interpretable, editable, and native to LLM-mediated communication.
comment: Includes link to demo page
HULK: Large-scale Hierarchical Coordination under Continual and Uncertain Temporal Tasks
Multi-agent systems can be extremely efficient when working concurrently and collaboratively, e.g., for delivery, surveillance, search and rescue. Coordination of such teams often involves two aspects: selecting appropriate subteams for different tasks in various areas, and coordinating agents in the subteams to execute the associated subtasks. Existing work often assumes that the tasks are static and known beforehand, where an integer program can be formulated and solved offline. However, in many applications, the team-wise tasks are generated online continually by external requests, and the amount of subtasks within each task is uncertain, e.g., the number of packages to deliver or victims to rescue. The aforementioned offline solution becomes inadequate as it would require constant re-computation for the whole team and global communication to broadcast the results. Thus, this work tackles the large-scale coordination problem under continual and uncertain temporal tasks, specified as temporal logic formulas over collaborative actions. The proposed hierarchical framework, HULK, consists of two interleaved layers: the rolling assignment of currently known tasks to subteams within a certain horizon, and the dynamic coordination within a subteam given the detected subtasks during online execution. Thus, coordination is performed hierarchically at different granularities and triggering conditions, improving computational efficiency and robustness. The method is validated rigorously over large-scale heterogeneous systems under various temporal tasks and environment uncertainties.
comment: Accepted to the IEEE International Conference on Robotics and Automation. 7 pages, 4 figures
AgentForesight: Online Auditing for Early Failure Prediction in Multi-Agent Systems
LLM-based multi-agent systems are increasingly deployed on long-horizon tasks, but a single decisive error is often accepted by downstream agents and cascades into trajectory-level failure. Existing work frames this as \emph{post-hoc failure attribution}, diagnosing the responsible agent and step after the trajectory has ended. However, this paradigm forfeits any opportunity to intervene while trajectory is still unfolding. In this work, we introduce AgentForesight, a framework that reframes this problem as online auditing: at each step of an unfolding trajectory, an auditor observes only the current prefix and must either continue the run or alarm at the earliest decisive error, without access to future steps. To this end, we curate AFTraj-2K, a corpus of agentic trajectories across Coding, Math, and Agentic domains, in which safe trajectories are retained under a strict curation pipeline and unsafe trajectories are annotated at the step of their decisive error via consensus among multiple LLM judges. Built on that, we develop AgentForesight-7B, a compact online auditor trained with a coarse-to-fine reinforcement learning recipe that first equips it with a risk-anticipation prior at the failure boundary on adjacent safe/unsafe prefix pairs, then sharpens this prior into precise step-level localization under a three-axis reward jointly targeting the what, where, and who of an audit verdict. Across AFTraj-2K and an external Who\&When benchmark, AgentForesight-7B outperforms leading proprietary models, including GPT-4.1 and DeepSeek-V4-Pro, achieving up to +19.9% performance gain and 3$\times$ lower step localization error, opening the loop from post-hoc failures detection to enabling deployment-time intervention. Project page: https://zbox1005.github.io/agent-foresight/
comment: 33 pages, 7 figures
MIND-Skill: Quality-Guaranteed Skill Generation via Multi-Agent Induction and Deduction
Large language model (LLM) powered AI agents have emerged as a promising paradigm for autonomous problem-solving, yet they continue to struggle with complex, multi-step real-world tasks that demand domain-specific procedural knowledge. Reusable agent skills, which encapsulate successful problem-solving strategies, offer a natural remedy by enabling agents to build on prior experience. However, curating such skills has largely remained a manual endeavor, requiring human experts to distill rich domain knowledge into actionable guidelines. In this work, we present $\textbf{M}$ulti-agent $\textbf{IN}$duction and $\textbf{D}$eduction for $\textbf{Skill}$s ($\textbf{MIND-Skill}$), a framework that automatically induces generalizable skills from successful trajectories with robust quality guarantees. MIND-Skill consists of an induction agent which is tasked to abstract reusable skills from successful trajectories, and a deduction agent which aims to reconstruct trajectories by following the induced skills. To guarantee the quality of the generated skills, we introduce a reconstruction loss that compares input and reconstructed trajectories, an outcome loss that enforces the correctness of the reconstructed trajectories, and a rubric loss that assesses the documentation quality and regularizes the abstraction level of the generated skills according to predefined criteria. These textual losses are jointly optimized with TextGrad, and the resulting skills are evaluated on held-out tasks unseen during optimization. Experiments on AppWorld and BFCL-v3 show that MIND-Skill consistently outperforms concurrent skill generation methods.
Modeling Decision-Making with Will for Cooperation in Social Dilemmas
Standard rational actor models often attribute cooperation failures in social dilemmas to insufficient incentives, overlooking the destabilizing effects of continuous utility maximization. To address this, we propose a framework of ``will" defined as a mechanism that persistently pursues goals while ignoring local cost-benefit fluctuations. We formalize the Willed Agents as potential minimizers, distinguishing them from cumulative utility maximization. Dynamical analysis of infinite population demonstrates that willed agents shrink the feasible state space, acting as boundary constraints that accelerate convergence in canonical social dilemmas. Through multi-agent simulations in a spatiotemporal Stag Hunt Game, we show that willed agents function as ``cooperation catalysts", enabling groups to surmount high-risk thresholds where purely utility maximization fails. We find that heterogeneous will strength promotes cooperation, and that agents who autonomously suspend rational re-evaluation can significantly outperform continuous optimizers. These findings suggest that successful cooperation relies on the cognitive capacity to strategically constrain calculation.
comment: Accepted at CogSci 2026
Large Language Models over Networks: Collaborative Intelligence under Resource Constraints
Large language models (LLMs) are transforming society, powering applications from smartphone assistants to autonomous driving. Yet cloud-based LLM services alone cannot serve a growing class of applications, including those operating under intermittent connectivity, sub-second latency budgets, data-residency constraints, or sustained high-volume inference. On-device deployment is in turn constrained by limited computation and memory. No single endpoint can deliver high-quality service across this spectrum. This article focuses on collaborative intelligence, a paradigm in which multiple independent LLMs distributed across device and cloud endpoints collaborate at the task level through natural language or structured messages. Such collaboration strives for superior response quality under heterogeneous resource constraints spanning computation, memory, communication, and cost across network tiers. We present collaborative inference along two complementary and composable dimensions: vertical device-cloud collaboration and horizontal multi-agent collaboration, which can be combined into hybrid topologies in practice. We then examine learning to collaborate, addressing the training of routing policies and the development of cooperative capabilities among LLMs. Finally, we identify open research challenges including scaling under resource heterogeneity and trustworthy collaborative intelligence.
Generalization Bounds of Emergent Communications for Agentic AI Networking
The evolution of 6G networking toward agentic AI networking (AgentNet) systems requires a shift from traditional data pipelines to task-aware, agentic AI-native communication solutions. Emergent communication, a novel communication paradigm in which autonomous agents learn their own signaling protocols through interaction, is increasingly viewed as a promising solution to address the challenges posed by existing rigid, predefined protocol-based networking architecture. However, most existing emergent communication frameworks fail to account for physical networking constraints, such as bandwidth and computational complexity, and often lack a rigorous information-theoretical foundation. To address these challenges, this paper introduces a novel emergent communication framework that facilitates collaborative task-solving among heterogeneous agents through an information-theoretic lens. We propose a novel joint loss function that unifies the optimization of decision-making functions and the learning of communication signaling. Our proposed solution is grounded on the multi-agent and multi-task distributed information bottleneck (DIB) theory, which allows the quantification of the fundamental trade-off between task-relevant information representation and computational complexity. We further provide theoretical generalization bounds of the emergent communication protocol during decentralized inference across unseen environmental states. Experimental validation on a real-world hardware prototype confirms that our proposed framework significantly improves generalization performance, compared to the state-of-the-art solutions.
comment: Accepted at IEEE ISIT Workshop, Guangzhou, China, June 2026
Slipstream: Trajectory-Grounded Compaction Validation for Long-Horizon Agents
To cope with the large contexts that long-horizon LLM agents produce, modern frameworks increasingly rely on compaction -- invoking an LLM to rewrite the accumulated trajectory into a shorter summary that the agent resumes from. Today, compaction runs synchronously on the critical path of agent execution but this can unpredictably degrade accuracy due to a structural validation gap: the compactor must condense context but is fundamentally unaware of precisely what information the agent will need later. Further, because post-compaction agent steps are conditioned on the new summary, targeted validation criteria do not exist and errors silently propagate through coherent but incorrect behavior. Our key insight is that asynchronous compaction efficiently addresses this gap: by running the compactor in parallel with continued agent execution on the original context, the candidate summary and the agent's next steps are generated independently from the same pre-compaction state, yielding a validation signal independent of the summary itself. We build Slipstream, a trajectory-grounded compaction system that uses a judge to validate the candidate summary against the agent's continued reasoning, checking that it preserves both the agent's forward intent and the key facts and constraints it depends on. Across long-horizon coding (SWE-bench Verified) and web-browsing (BrowseComp) workloads, Slipstream improves task accuracy by up to 8.8 percentage points while reducing end-to-end latency by up to 39.7%.
comment: 9 pages (16 pages counting references, appendix), 6 figures, 2 tables
SkillGen: Verified Inference-Time Agent Skill Synthesis
Skills are a promising way to improve LLM agent capabilities without retraining, while keeping the added procedure reusable and controllable. However, high-quality skills are still largely written by hand. We introduce SkillGen, a multi-agent framework that synthesizes a single auditable skill from trajectories generated by a base agent. The output is a human-readable artifact that can be inspected before use. Rather than merely summarizing trajectories, SkillGen leverages contrastive induction over both successful and failed trajectories to identify reusable success patterns, recurring failure modes, and behaviors that appear in nearby successes but are missing from failures. SkillGen then generates candidate skills and iteratively refines the skill. A key novelty in SkillGen is that we model agent skills as interventions to empirically verify the net effect of skills on the overall performance. Specifically, we compare outcomes on the same instances with and without the skill, so that we account for both repairs (cases where the skill fixes a baseline failure) and regressions (cases where the skill breaks a baseline success). Across a broad range of agents and datasets, SkillGen consistently improves held-out performance, outperforms existing skill-generation baselines, and produces skills that transfer across models.
GameChat: Multi-LLM Dialogue for Safe, Agile, and Socially Optimal Multi-Agent Navigation in Constrained Environments
Safe, agile, and socially compliant multi-robot navigation in cluttered and constrained environments remains a critical challenge. This is especially difficult with self-interested agents with unique, unknown priorities in decentralized settings, where there is no central authority to resolve conflicts induced by spatial symmetry. We address this challenge by proposing an intuitive, but very effective approach, GameChat, which facilitates safe, agile, and deadlock-free navigation for both cooperative and self-interested agents in cluttered environments. Key to our approach is the idea that agents should resolve conflicts on their own using natural language to communicate, much like humans. We evaluate GameChat in simulated environments with doorways and intersections. The results show that even in the worst case, GameChat reduces the time for all agents to reach their goals by over 35% from a naive baseline and by over 20% from a state of the art baseline in the intersection scenario, while doubling the rate of ensuring the agent with a higher priority task reaches the goal first, from 50% (equivalent to random chance) to 100%. We also demonstrate how GameChat can be extended to more than two agents.
ScholarPeer: A Context-Aware Multi-Agent Framework for Automated Peer Review
The exponential growth of machine learning submissions has strained the traditional peer review process, resulting in slow feedback loops for authors and an immense burden on reviewers to rigorously audit technical soundness and verify literature. To address this, we introduce ScholarPeer, a multi-agent framework designed to operationalize the rigorous auditing workflow of a senior researcher. Rather than attempting to replace human judgment, ScholarPeer serves as a co-scientist: acting as a mentor for rapid author iteration prior to submission, and as an active verification assistant that augments human reviewers. The framework structurally decouples contextualization from critique by deploying a sub-domain historian to synthesize the field's trajectory, a baseline scout to proactively hunt for omitted state-of-the-art comparisons, and a multi-aspect Q&A engine that deeply audits technical soundness-scrutinizing internal logical consistency, experimental validity, and mathematical rigor-while cross-referencing claims against top-tier academic venues. We comprehensively evaluate ScholarPeer on ~1,800 ICLR submissions spanning 2020 through 2025. Our results show that ScholarPeer achieves significant win-rates against state-of-the-art fine-tuned models and search-augmented agentic baselines.
Detecting Multi-Agent Collusion Through Multi-Agent Interpretability
As LLM agents are increasingly deployed in multi-agent systems, they introduce risks of covert coordination that may evade standard forms of human oversight. While linear probes on model activations have shown promise for detecting deception in single-agent settings, collusion is inherently a multi-agent phenomenon, and the use of internal representations for detecting collusion between agents remains unexplored. We introduce NARCBench, a benchmark for evaluating collusion detection under environment distribution shift, and propose five probing techniques that aggregate per-agent deception scores to classify scenarios at the group level, evaluated across four open-weight models (Qwen3-32B, Llama-3.1-70B, DeepSeek-R1 32B, GPT-OSS-20B) and six probe architectures. We frame this as a distributed anomaly detection problem, identifying three collusion signatures that map onto distinct anomaly types and detection paradigms. Every model reaches 1.00 AUROC in-distribution; on our strongest model (Llama-3.1-70B), our five probing techniques achieve 0.73 to 0.93 AUROC when transferred zero-shot to structurally different multi-agent scenarios and 0.99 to 1.00 on a steganographic blackjack card-counting task, with detection performance scaling with model capability. We find that no single probing technique dominates across all collusion types, consistent with the framework's prediction that different anomaly types require different detection paradigms. This work takes a step toward multi-agent interpretability: extending white-box inspection from single models to multi-agent contexts, where detection requires aggregating signals across agents. These results suggest that model internals provide a complementary signal to text-level monitoring for detecting multi-agent collusion. Code and data available at https://github.com/aaronrose227/narcbench.
COSAC: Counterfactual Credit Assignment in Sequential Cooperative Teams
In cooperative teams where agents act in a fixed order and share a single team-level reward (multi-agent language systems, sequential robotic tasks), per-agent credit assignment is under-determined. Critic-based approaches scale poorly as the number of agents grows owing to the costly maintenance of joint/factored critic(s), whereas the existing critic-free alternatives have other issues: common credit across agents that couples every agent's signal to teammate noise, importance-sampling corrections for upstream-update staleness that incur variance exponential in team size, or per-agent counterfactual replay that isolates each agent's effect at the price of extra environment or reward calls. We propose COSAC, a critic-free per-agent policy gradient for sequential cooperative teams. COSAC fits an additive per-agent decomposition of the team reward by a single ridge regression on the rollout batch (giving each agent a learning signal decoupled from teammate noise), and computes each agent's counterfactual advantage from fictitious continuations of the current policy (policy forward passes that replace both importance-sampling reweighting and per-agent environment replay, at no extra environment or reward cost). The estimator instantiates the Sequential Aristocrat Utility (SeqAU), our extension of Wolpert and Tumer's (2001) aristocrat utility to sequential teams. We prove bias and variance bounds on SeqAU credits that stay controlled as the team grows. Our controlled study on sequential bandits demonstrates that COSAC attains the lowest advantage MSE and consistently low learning regret across team sizes up to $K = 16$. On the AI2 Reasoning Challenge (ARC) task, where four Qwen3-0.6B agents reason in turn about a grade-school science question, COSAC attains faster convergence than the other critic-free baselines.
Cost-Aware Distributed Online Learning with Strict Rejection Behavior against Adversarial Agents
Distributed online learning in multi-agent systems(MASs) is highly vulnerable to adversarial influence, especially when malicious agents cannot be fully isolated during the transient stage. While existing studies mainly pursue resilient consensus or secure fusion, they pay much less attention to the learning inefficiency and extra evolution cost accumulated during the defense process. This paper addresses this gap by developing a cost-aware distributed online learning framework with strict rejection behavior against adversarial agents. Under this mechanism, the state evolution cost of online adaptation is formulated and the cost amplification effect caused by adversarial interactions is theoretically characterized. To balance robustness, convergence efficiency, and long-term cost, we propose an adaptive adjustment mechanism for the state-evolution rate. The resulting outer-layer update can be equivalently viewed as a constrained online optimization problem. We further establish the well-posedness and regularity of the associated periodic Riccati layer, and show that the outer-layer update ensures feasibility and controlled variation. Based on these properties, closed-loop practical stability is rigorously established via a two-time-scale Lyapunov framework. Simulations demonstrate that the proposed method achieves robust and low-cost convergence under adversarial disturbances. Furthermore, a scenario involving a satellite-assisted IoT monitoring network for target tracking further validates the practical effectiveness of the strict rejection behavior.
comment: 13 pages, 11 figures, 2 tables
An Empirical Study of Multi-Agent Collaboration for Automated Research
As AI agents evolve, the community is rapidly shifting from single Large Language Models (LLMs) to Multi-Agent Systems (MAS) to overcome cognitive bottlenecks in automated research. However, the optimal multi-agent coordination framework for these autonomous agents remains largely unexplored. In this paper, we present a systematic empirical study investigating the comparative efficacy of distinct multi-agent structures for automated machine learning optimization. Utilizing a rigorously controlled, execution-based testbed equipped with Git worktree isolation and explicit global memory, we benchmark a single-agent baseline against two multi-agent paradigms: a subagent architecture (parallel exploration with post-hoc consolidation) and an agent team architecture (experts with pre-execution handoffs). By evaluating these systems under strictly fixed computational time budgets, our findings reveal a fundamental trade-off between operational stability and theoretical deliberation. The subagent mode functions as a highly resilient, high-throughput search engine optimal for broad, shallow optimizations under strict time constraints. Conversely, the agent team topology exhibits higher operational fragility due to multi-author code generation but achieves the deep theoretical alignment necessary for complex architectural refactoring given extended compute budgets. These empirical insights provide actionable guidelines for designing future autoresearch systems, advocating for dynamically routed architectures that adapt their collaborative structures to real-time task complexity.
GenCellAgent: Generalizable, Training-Free Cellular Image Segmentation via Large Language Model Agents
Cellular image segmentation is essential for quantitative biology yet remains difficult due to heterogeneous modalities, morphological variability, and limited annotations. We present GenCellAgent, a training-free multi-agent framework that orchestrates specialist segmenters and generalist vision-language models via a planner-executor-evaluator loop (choose tool $\rightarrow$ run $\rightarrow$ quality-check) with long-term memory. The system (i) automatically routes images to the best tool, (ii) adapts on the fly using a few reference images when imaging conditions differ from what a tool expects, (iii) supports text-guided segmentation of organelles not covered by existing models, and (iv) commits expert edits to memory, enabling self-evolution and personalized workflows. Across seven cell-segmentation benchmarks spanning diverse microscopy modalities (4,718 images), this routing consistently matches or exceeds the best individual tool on every dataset and outperforms all baselines in overall accuracy. On out-of-distribution organelle data, GenCellAgent substantially outperforms specialist models that were not trained on the target domain, recovering structures that dedicated tools fail to detect. It also segments novel objects such as the Golgi apparatus via iterative text-guided refinement, with light human correction further boosting performance. Together, these capabilities provide a practical path to robust, adaptable cellular image segmentation without retraining, while reducing annotation burden and matching user preferences.
comment: 43 pages
Systems and Control (EESS)
Data-Driven Inverse Reinforcement Learning of Linear Systems with Model Uncertainty: A Convex Optimization View
Inverse reinforcement learning (IRL) for linear systems seeks a cost function whose optimal controller reproduces an expert policy from data. Existing data-driven methods for discrete-time linear systems are largely built on iterative policy/value updates, repeated matrix inversions, and, in some cases, an initial stabilizing controller, which can limit numerical robustness and practical applicability. This paper develops a convex-optimization framework for data-driven inverse reinforcement learning of discrete-time linear systems with model uncertainty. For nominal systems, we derive a semidefinite characterization of inverse optimality and a relaxed formulation that recovers an equivalent state-cost matrix together with a stabilizing controller from expert trajectories. We then obtain a model-free, off-policy reformulation by replacing the unknown system matrices with a regressed kernel matrix identified from local input--state data. For uncertain local systems, we show that a standard LQR cost is generally insufficient to represent every stabilizing target gain and therefore introduce a generalized LQR cost with a state--input cross term. Based on this model, we develop a convex data-driven inverse-RL method and extend it to robust cost design over a population of perturbations via differentiable semidefinite programming and stochastic approximation. Simulations on a discrete-time power-system example show accurate recovery of expert behavior, improved robustness to gain-estimation error and model mismatch, and a simpler computational pipeline than classical iterative inverse-RL schemes.
Fault-Aware MPC for Robotic Fleet Communications Scheduling
Operating a fleet of remote robotic systems with intermittent communications requires scheduling limited contact opportunities to maintain fleet health awareness, complete mission objectives, and intervene on faulted assets before their permanent loss. This scheduling problem is complicated by observational ambiguity: when an asset fails to check in, the operator cannot distinguish between a lethal hardware fault and a benign communications failure. If the system's failure modes are structured through a fault model, a scheduler can exploit mode-specific lethality, timing, and recoverability properties to prioritize correctly - but only if it can distinguish between modes that produce identical observations under standard actions. We present Interacting Multiple Model Model Predictive Control (IMM-MPC), a receding-horizon framework that maintains a probabilistic belief over discrete fault modes with time-inhomogeneous dynamics and optimizes a two-term objective coupling acquisition value with information gain. We characterize when observationally aliased fault modes can be disambiguated through scheduled actions and when aliasing is permanently unresolvable. Applied to satellite launch and early orbit communications scheduling, IMM-MPC recovers 59.8% of spacecraft experiencing lethal-faults versus 9.0% for binary-MPC and 2.0% for a bipartite graph-based formulation solved through matching. These results hold across 200 randomized trials, while maintaining identical acquisition of healthy satellites and near-identical solve times.
Engineering Economy: A New Paradigm for Escaping the Middle-Income Trap
This paper introduces the concept of Engineering Economy as a new paradigm for understanding and managing macroeconomic policy in middle-income countries seeking to escape the middle-income trap. Drawing on Turkiye's post-2001 economic trajectory and South Korea's successful transition from a low-income to a high-income economy, the study argues that conventional frameworks whether the Washington Consensus's market liberalization prescriptions or the institutionalist critique alone are insufficient. Instead, it proposes treating the economy as a dynamic control system requiring continuous calibration rather than static equilibrium. The paper develops a road-surface metaphor (highway, side-road, off-road) to characterize different global economic regimes and presents eleven interconnected policy pillars spanning venture capital formation, regulatory sandboxes, technology-focused industrial policy, and human capital development. By synthesizing insights from endogenous growth theory (Romer), institutional economics (Acemoglu), the catching-up literature (Lee), cybernetic systems theory (Wiener), and Schumpeterian creative destruction, the framework reconceptualizes macroeconomic instruments through control-engineering analogies: interest rates as energy gradients, fiscal policy as energy flow, exchange rates as balance motors, and regulation as adaptive suspension. The analysis demonstrates that Turkiye's structural challenge is not merely institutional weakness but a systemic absence of R&D demand from its dominant enterprise structures, creating a vicious cycle that conventional reforms cannot break. Seven specific opportunity windows arising from US-China technological rivalry are identified, and a phased implementation roadmap is proposed.
Transfer Learning of Multiobjective Indirect Low-Thrust Trajectories Using Diffusion Models and Markov Chain Monte Carlo
Preliminary low-thrust spacecraft mission design is a global search problem characterized by a complex solution landscape, multiple objectives, and numerous local minima. During this phase, mission parameters are often not yet fully defined, requiring new solutions to be generated at a high cadence across varying parameter values. When combined with the indirect approach to optimal control, diffusion models can accelerate this search by learning distributions that represent high-quality initial costates. However, generating training data remains expensive, and opportunities exist to better exploit past data. We propose a transfer-learning framework that combines homotopy in a mission parameter with Markov chain Monte Carlo (MCMC) to generate training data more efficiently. The approach reformulates a multiobjective optimization problem as sampling from an unnormalized target distribution in costate space. We compare three MCMC algorithms on a planar multi-revolution transfer in the circular restricted three-body problem, with homotopy in the system mass parameter. The results show that gradient-based MCMC variants achieve the best trade-off between sample quality and computational cost. For the test transfer, the proposed framework generates 40 % more feasible solutions and achieves a higher-quality Pareto front than a state-of-the-art indirect approach based on adjoint control transformations and gradient-based optimization. Finally, the MCMC-generated samples are used to fine-tune a diffusion model conditioned on the mass parameter, enabling it to learn a global representation of the underlying solution distribution and efficiently generate new solutions. These findings establish the transfer-learning framework as a practical method for efficiently solving indirect trajectory optimization problems with varying parameters.
HyDRA Scorpion: A Cost-effective and Modular ROV for Real-Time Underwater Inspection, Intervention, and Object Detection
A Remotely Operated Vehicle (ROV) is a tethered underwater robot used for tasks like inspection and intervention. While essential tools for underwater science, the high cost of commercial ROVs and a persistent gap between mechanically capable platforms and those with integrated intelligence create a significant barrier to access. HyDRA Scorpion differs from conventional systems by addressing these challenges, integrating an advanced, AI-driven perception stack with in-situ measurement capabilities onto a low-cost, locally manufacturable platform. The system combines 4-DoF maneuverability, dual manipulators, and a custom pressure-tested housing. Experimental results validate the system's robustness and performance. Leak-free operation was confirmed through prolonged pressure testing of the electronics housing to 4 bar, equivalent to the pressure of a 304.8-meter water depth approximately in a simulated environment, with no moisture ingress detected. The vehicle also demonstrated stable station-keeping, maintaining its position within a tight tolerance of $\(\pm\)0.15$ meters under external disturbances. The onboard AI module achieved underwater object detection mean Average Precision (mAP) of 0.89 with real-time inference, length and 3D-mapping based distance measurement. Also, 4-DoF manipulator arm can grip and maintain dual-function manipulator feature which support 360 degree tangle-free rotation.
comment: 9 Pages, 11 figures, Research Paper by UIU Mariner Team
Solar Cars: A Comprehensive Review
Energy crisis has forced many countries to think of a replacement for energy supply. Renewable energy sources as firendly environment sources play a pivotal role in producing clean energy for various sectors in industry. Gas emissions originating from the transportation industry is another contributing factor to air pollution. Hence, designing and utilizing vehicles that run on renewable energy is crucial, as it provides a dependable energy source that is naturally abundant, leaves nearly no carbon footprint, and is sustainable. Solar powered electric cars make a significant impact on global climate change. To better understand this impact and building upon the plenty of research done on this topic, this paper aims to provide a comprehensive review of the various factors related to solar cars. Specifically, this review will examine the following key factors: Types and sizing of solar cars, solar vehicle power source configurations, leading solar car nations, and solar car challenges.
Locational Pricing for Generative-AI Services via Token-Flow Market Clearing
GenAI services are in an early yet fast expanding phase. Providers compete on model capability and service quality, while the underlying infrastructure remains expensive and heterogeneous across regions, workloads, and compute assets. If these services diffuse into routine daily use, the relevant engineering problem becomes not only better models but also efficient dispatch on a geographically distributed AI service infrastructure. To address this, we formulate a network-constrained token-flow market that clears AI workloads across compute nodes and communication links. The baseline model is a linear program that co-optimizes routing and processing subject to compute-capacity and bandwidth constraints; its dual variables define location- and workload-specific marginal service prices. We further introduce a transfer-aware extension that prices data movement in physical units and isolates bandwidth congestion rents. In a 5-node U.S. case study, the transfer-aware model uncovers four saturated backbone links and raises total operating cost by 2.7\% relative to the token-equivalent baseline, while tightening the chatbot latency limit from 100~ms to 15~ms increases one locational price by 117\%. A 20-node scale-up exhibits the same merit-order dispatch logic and becomes infeasible once demand exceeds aggregate capacity. These results suggest that locational pricing is a useful organizing principle for operating an emerging AI service infrastructure and, over time, for designing competitive markets around it.
Challenges in the Proper Metrological Verification of Smart Energy Meters
The most common instruments currently used to measure active/reactive energy and power quality indicators are smart energy meters (EM). Unfortunately, the verification of such meters is currently performed under ideal conditions or with simple signal models, which do not recreate actual states occurring in the power grid and do not ensure the verification of the properties of their signal chains. This paper presents challenges in proper metrological verification of smart EM. It presents existing legal and normative requirements and scientific research directions regarding these meters. Although the meters tested comply with the normative and legal requirements, the results reveal numerous imperfections in the signal and measurement chains for the selected test signal. Based on the results of the research results, further directions have been determined in the field of smart EM.
comment: 5 pages, 5 figures, submitted to IEEE conferences
Safe and Real-Time Consistent Planning for Autonomous Vehicles in Partially Observed Environments via Parallel Consensus Optimization
Ensuring safety and driving consistency is a significant challenge for autonomous vehicles operating in partially observed environments. This work introduces a consistent parallel trajectory optimization (CPTO) approach to enable safe and consistent driving in dense obstacle environments with perception uncertainties. Utilizing discrete-time barrier function theory, we develop a consensus safety barrier module that ensures reliable safety coverage within the spatiotemporal trajectory space across potential obstacle configurations. Following this, a bi-convex parallel trajectory optimization problem is derived that facilitates decomposition into a series of low-dimensional quadratic programming problems to accelerate computation. By leveraging the consensus alternating direction method of multipliers (ADMM) for parallel optimization, each generated candidate trajectory corresponds to a possible environment configuration while sharing a common consensus trajectory segment. This ensures driving safety and consistency when executing the consensus trajectory segment for the ego vehicle in real time. We validate our CPTO framework through extensive comparisons with state-of-the-art baselines across multiple driving tasks in partially observable environments. Our results demonstrate improved safety and consistency using both synthetic and real-world traffic datasets.
comment: 16 pages, 7 figures
sumoITScontrol: Traffic Controller Collection for SUMO Traffic Simulations
Reliable benchmarking is essential for progress in intelligent traffic control research. While microscopic traffic simulators such as SUMO enable detailed modelling of individual vehicle interactions, many published control studies still rely on single-run evaluations and project-specific baseline implementations, limiting reproducibility and comparability. This paper presents sumoITScontrol, an open-source and extensible Python framework providing a curated collection of widely used traffic controllers implemented for SUMO via the TraCI interface. The framework includes established methods for both urban and freeway traffic management, such as Max Pressure signal control, SCOOT/SCATS-inspired adaptive strategies, and ramp metering algorithms including ALINEA, HERO-inspired, and METALINE. Beyond providing implementations, the paper emphasises methodological best-practices for controller evaluation in stochastic microscopic environments. Through systematic calibration and replicated simulation experiments, we demonstrate the substantial impact of stochastic variability on performance metrics and highlight the necessity of variance-aware reporting and statistical hypothesis testing. By combining standardised controller implementations with reproducibility-oriented evaluation guidelines, sumoITScontrol aims to improve methodological transparency, enable fair benchmarking of novel approaches, and strengthen experimental standards within the SUMO and intelligent transportation systems research communities. Source Code on project's GitHub: https://github.com/DerKevinRiehl/sumoITScontrol/.
Constraint-Aware Reinforcement Learning via Adaptive Action Scaling
Safe reinforcement learning (RL) seeks to mitigate unsafe behaviors that arise from exploration during training by reducing constraint violations while maintaining task performance. Existing approaches typically rely on a single policy to jointly optimize reward and safety, which can cause instability due to conflicting objectives, or they use external safety filters that override actions and require prior system knowledge. In this paper, we propose a modular cost-aware regulator that scales the agent's actions based on predicted constraint violations, preserving exploration through smooth action modulation rather than overriding the policy. The regulator is trained to minimize constraint violations while avoiding degenerate suppression of actions. Our approach integrates seamlessly with off-policy RL methods such as SAC and TD3, and achieves state-of-the-art return-to-cost ratios on Safety Gym locomotion tasks with sparse costs, reducing constraint violations by up to 126 times while increasing returns by over an order of magnitude compared to prior methods.
comment: Accepted in 8th Annual Learning for Dynamics & Control Conference (L4DC)
Featurized Occupation Measures for Structured Global Search in Numerical Optimal Control
Numerical optimal control has long been split between globally structured but dimensionally intractable Hamilton--Jacobi--Bellman (HJB) methods and scalable but local trajectory optimization. We introduce Featurized Occupation Measures (FOM), a finite-dimensional primal--dual interface for coupling numerical optimal control solvers with explicit HJB subsolutions: the certificate guides the primal search, while primal residuals tighten the certificate in a primal-dual language. Two realizations are developed. The explicit realization uses finite weak-form Liouville tests, and the implicit realization couples rollout-based search with sampled primal--dual residuals. Both are proved asymptotically consistent with the exact occupation-measure linear program under refinement, separating primal expressiveness from dual accuracy in the limit. The framework also gives structural conditions under which HJB-type certificates avoid full state-space representation. For factor graphs induced by compatible passivity-based interconnections, blockwise HJB inequalities assemble into globally feasible OM-dual certificates, and the decomposition is preserved under blockwise approximation. The curse of dimensionality is then shifted from state space to interconnection topology. Approximate certificates remain reusable under time shifts and bounded model perturbations, with explicit degradation bounds. On a static obstacle-avoidance benchmark, certificates of increasing tightness guide a sample-based optimizer toward global optima, confirming that even a coarse certificate carries useful global information.
Robotics
123D: Unifying Multi-Modal Autonomous Driving Data at Scale
The pursuit of autonomous driving has produced one of the richest sensor data collections in all of robotics. However, its scale and diversity remain largely untapped. Each dataset adopts different 2D and 3D modalities, such as cameras, lidar, ego states, annotations, traffic lights, and HD maps, with different rates and synchronization schemes. They come in fragmented formats requiring complex dependencies that cannot natively coexist in the same development environment. Further, major inconsistencies in annotation conventions prevent training or measuring generalization across multiple datasets. We present 123D, an open-source framework that unifies such multi-modal driving data through a single API. To handle synchronization, we store each modality as an independent timestamped event stream with no prescribed rate, enabling synchronous or asynchronous access across arbitrary datasets. Using 123D, we consolidate eight real-world driving datasets spanning 3,300 hours and 90,000 kilometers, together with a synthetic dataset with configurable collection scripts, and provide tools for data analysis and visualization. We conduct a systematic study comparing annotation statistics and assessing each dataset's pose and calibration accuracy. Further, we showcase two applications 123D enables: cross-dataset 3D object detection transfer and reinforcement learning for planning, and offer recommendations for future directions. Code and documentation are available at https://github.com/kesai-labs/py123d.
6D Pose Estimation via Keypoint Heatmap Regression with RGB-D Residual Neural Networks
In this paper, we propose a modular framework for 6D pose estimation based on keypoint heatmap regression. Our approach combines YOLOv10m for object detection with a ResNet18-based network that predicts 2D heatmaps from RGB images. Keypoints extracted from these heatmaps are used to estimate the 6D object pose via the PnP RANSAC algorithm. We compare different keypoint selection strategies to assess their impact on pose accuracy. Additionally, we extend the baseline by incorporating depth data using a cross-fusion architecture, which enables interaction between RGB and depth features at multiple stages. We further explore general training improvements, such as experimenting with activation functions and learning rate scheduling strategies to improve model performance. Our best RGB-only model achieved a mean ADD-based accuracy of 84.50%, while the RGB-D fusion model reached 92.41% on the LINEMOD dataset. The code is available at https://github.com/ameermasood/HeatNet.
comment: Source code available at: https://github.com/ameermasood/HeatNet
Active Embodiment Identification with Reinforcement Learning for Legged Robots
We present an active embodiment identification method for legged robots that jointly learns information-seeking behavior and explicit embodiment prediction. Using a history-augmented URMA architecture, the method infers joint-level and global embodiment parameters through interaction with the environment in simulation across different morphologies.
Evaluation of an Actuated Spine in Agile Quadruped Locomotion
The spine plays a crucial role in the dynamic locomotion of quadrupedal animals, improving the stability, speed, and efficiency of their gait, especially for fast-paced and highly agile movements. Therefore, the spine is also a promising and natural way to extend the capabilities of quadruped robots. This paper empirically investigates the benefits of an actuated spine for learning agile quadruped locomotion. We evaluate whether the use of the spine brings benefits in terms of high-speed running, climbing stairs, climbing high-angle slopes, hurdling, and crawling scenarios. We conducted an empirical study in MuJoCo simulation using the Silver Badger robot from MAB Robotics with an actuated 1-DOF spine in the sagittal plane. The obtained results show that the use of the spine provides the robot with increased agility and allows it to overcome higher stairs, steeper slopes, higher obstacles, and smaller passages.
TAVIS: A Benchmark for Egocentric Active Vision and Anticipatory Gaze in Imitation Learning
Active vision -- where a policy controls its own gaze during manipulation -- has emerged as a key capability for imitation learning, with multiple independent systems demonstrating its benefits in the past year. Yet there is no shared benchmark to compare approaches or quantify what active vision contributes, on which task types, and under what conditions. We introduce TAVIS, evaluation infrastructure for active-vision imitation learning, with two complementary task suites -- TAVIS-Head (5 tasks, global search via pan/tilt necks) and TAVIS-Hands (3 tasks, local occlusion via wrist cameras) -- on two humanoid torso embodiments (GR1T2, Reachy2), built on IsaacLab. TAVIS provides three evaluation primitives: a paired headcam-vs-fixedcam protocol on identical demonstrations; GALT (Gaze-Action Lead Time), a novel metric grounded in cognitive science and HRI that quantifies anticipatory gaze in learned policies; and procedural ID/OOD splits. Baseline experiments with Diffusion Policy and $π_0$ reveal that (i) active-vision generally helps, but benefits are task-conditional rather than uniform; (ii) multi-task policies degrade sharply under controlled distribution shifts on both suites; and (iii) imitation alone yields anticipatory gaze, with median lead times comparable to the human teleoperator reference. Code, evaluation scripts, demonstrations (LeRobot v3.0; ~2200 episodes) and trained baselines are released at https://github.com/spiglerg/tavis and https://huggingface.co/tavis-benchmark.
AERO-VIS: Asynchronous Event-based Real-time Onboard Visual-Inertial SLAM
The robustness of event cameras to high dynamic range and motion blur holds the potential to improve visual odometry systems in challenging environments. Although their high temporal resolution does not require synchronous processing, most event-based odometry methods still run at fixed rates, which simplifies system design but restricts latency and throughput. In this work, we present AERO-VIS, a stereo event-inertial SLAM system with an integrated, data-driven, robust, and performance-optimized keypoint detector. By processing the event stream asynchronously, the system dynamically adapts to downstream runtime demands, ensuring low-latency and real-time performance. When deploying AERO-VIS on a UAV, we achieve unprecedented accuracy in onboard event-based SLAM. These unique characteristics enable us to present the first purely event-based inertial SLAM system that demonstrates closed-loop UAV control and large-scale state estimation while relying solely on onboard compute. A video of the experiments and the source code are available at ethz-mrl.github.io/AERO-VIS.
comment: 8 pages, 4 figures
Melding LLM and temporal logic for reliable human-swarm collaboration in complex scenarios
Robot swarms promise scalable assistance in complex and hazardous environments. Task planning lies at the core of human-swarm collaboration, translating the operator's intent into coordinated swarm actions and helping determine when validation or intervention is required during execution. In long-horizon missions under dynamic scenarios, however, reliable task planning becomes difficult to maintain: emerging events and changing conditions demand continual adaptation, and sustained operator oversight imposes substantial cognitive burden. Existing LLM-based planning tools can support plan generation, yet they remain susceptible to invalid task orderings and infeasible robot actions, resulting in frequent manual adjustment. Here we introduce a neuro-symbolic framework for long-horizon human-swarm collaboration that tightly melds verifiable task planning with context-grounded LLM reasoning. We formalize mission goals and operational rules as temporal logic formulas and admissible task orderings as task automata. Conditioned on these formal constraints and live perceptual context, LLMs generate executable subtask sequences that satisfy mission rules and remain grounded in the current scene. An uncertainty-aware scheduler then assigns subtasks across the heterogeneous swarm to maximize parallelisms while remaining resilient to disruptions. An event-triggered interaction protocol further limits operator involvement to sparse, high-level confirmation and guidance. Deployment on a heterogeneous robotic fleet yields similar results while remaining robust to hardware-specific actuation and communication uncertainties. Together, these results support a formal and scalable paradigm for reliable and low-overhead human-swarm collaboration in dynamic environments
Many-to-Many Multi-Agent Pickup and Delivery
Multi-robot systems in automated warehouses must manage continuous streams of pickup-and-delivery tasks while ensuring efficiency and safety. Prior work on Multi-Agent Pickup-and-Delivery (MAPD) has largely focused on the one-to-one variant, where each task has a fixed pickup and delivery location. In contrast, real warehouses often present many-to-many MAPD scenarios, where items, tracked by stock keeping unit (SKU) identifiers, can be retrieved from or stored at multiple locations, resulting in an NP-hard four-dimensional assignment problem. To solve the many-to-many MAPD problem, we contribute our algorithm: Many-to-Many Multi-Agent Pickup and Delivery (M2M). We experiment with two variants of our algorithm: one that minimizes estimated task durations (M2M), and one which incorporates SKU distribution into the objective function (M2M-wSKU). Simulation results over 8-hour warehouse operations show that our method consistently matches or outperforms prior state of the art, with M2M completing up to 22,000 more tasks on average across different environments and warehouse inventory densities.
Text-to-CAD Evaluation with CADTests
Text-to-CAD has recently emerged as an important task with the potential to substantially accelerate design workflows. Despite its significance, there has been surprisingly little work on Text-to-CAD evaluation, and assessing CAD model generation performance remains a considerable challenge. In this work, we introduce a new evaluation perspective for Text-to-CAD based on automated testing. We propose CADTestBench, the first test-based benchmark for Text-to-CAD, based on CADTests, executable software tests that verify whether a generated CAD model satisfies the geometric and topological requirements of the input prompt. Using CADTestBench, we conduct comprehensive benchmarking of recent Text-to-CAD methods and further demonstrate that CADTests can also guide CAD model generation, yielding simple baselines that surpass performance of current methods. CADTestBench code and data are available at GitHub and Hugging Face dataset.
NoiseGate: Learning Per-Latent Timestep Schedules as Information Gating in World Action Models
World Action Models (WAMs) are an emerging family of policies that tie robot action generation to future-observation modeling. In this work, we focus on the joint video--action modeling paradigm, where actions and imagined future observations are co-generated along a shared denoising or flow trajectory, so that perception, prediction, and control are coupled within one generative process. Existing WAMs typically realize this paradigm with a Mixture-of-Transformers (MoT), where video and action tokens interact through shared self-attention. This architecture can in principle assign a separate timestep $t_f$ to each predicted latent frame, yet current systems collapse this degree of freedom onto a single shared scalar $t$. Under the noise-as-masking view of Diffusion Forcing, this shared schedule imposes the unjustified prior that every predicted latent is equally reliable for action generation. We instead view the per-latent schedule as a \emph{learnable information-gating policy}: by changing a latent frame's noise level, the policy modulates the reliability of its Key/Value contribution to the action tokens. We propose \textbf{NoiseGate}, which combines independent per-latent timestep sampling during backbone training, a lightweight Gating Policy Network that emits per-latent time increments during denoising, and task-reward optimization that trains the schedule policy without hand-crafted shape priors. Built on a joint video--action MoT backbone, NoiseGate delivers consistent gains on diverse RoboTwin random-scene manipulation tasks.
Sensitivity-Based Robust NMPC for Close-Proximity Offshore Wind Turbine Inspection with a Tilted Multirotor ICRA 2026
Close-proximity offshore wind turbine inspection requires strict clearance control around large cylindrical structures under wind and model mismatch. Nominal Nonlinear Model Predictive Control (NMPC) may violate safety constraints when mass, inertia, thrust effectiveness, drag, or wind conditions differ from nominal assumptions. We propose a sensitivity-based robust NMPC for a tilted multirotor that robustifies the tower-clearance constraint via online constraint tightening. First-order parametric state sensitivities provide a structured-uncertainty margin, while bounded gusts are handled by a stage-dependent additive margin. The formulation augments the nominal NMPC with sensitivity propagation and margin evaluation only, leaving the receding-horizon optimization structure unchanged. Monte-Carlo evaluation over 500 uncertainty realizations on a boundary-critical helical inspection trajectory shows that the proposed controller eliminates the clearance violations observed under nominal NMPC at the cost of a moderate increase in solve time.
comment: 5 pages. Accepted for presentation at the ICRA 2026 Workshop on "Aerial inspection for marine infrastructures," June 1, 2026, Vienna, Austria
CommandSwarm: Safety-Aware Natural Language-to-Behavior-Tree Generation for Robotic Swarms
Natural-language interfaces can make swarm robotics more accessible to non-expert operators, but they must translate ambiguous user intent into executable swarm behaviors without unsupported actions, malformed programs, or unsafe plans. This paper presents CommandSwarm, a safety-aware language-to-behavior-tree pipeline for generating XML behavior trees (BTs) from speech or text commands. The system combines multilingual translation, command-level safety filtering, constrained prompting, a LoRA-adapted large language model (LLM), and deterministic parser validation against a whitelist of executable swarm primitives. We evaluate eleven open 6.7B--14B parameter LLMs, all using 4-bit quantization, on representative swarm-control scenarios under zero-shot, one-shot, and two-shot prompting. Falcon3-Instruct-10B and Mistral-7B-v3 are the strongest prompt-engineered candidates, reaching BLEU scores above 0.60 and high syntactic validity in few-shot settings. LoRA adaptation of Falcon3-Instruct-10B on a 2,063-example synthetic instruction--BT corpus improves zero-shot BLEU from 0.267 to 0.663, ROUGE-L from 0.366 to 0.692, and parser-accepted syntactic validity from 0% to 72%. Translation experiments further show that SeamlessM4T v2-large and EuroLLM-9B provide the best quality-latency trade-offs for the multilingual front end. The results indicate that compact, quantized, domain-adapted LLMs can generate useful swarm BTs when embedded in a validated systems pipeline. They also show that parser acceptance and safety filtering remain necessary execution gates; generation quality alone is not sufficient for autonomous deployment.
Offline-Online Hierarchical 3D Global Relocalization With Synthetic LiDAR Sensing and Descriptor-Space Retrieval
3D global relocalization is one of the key capabilities for mobile robots in practical applications. However, in large scale spaces, existing methods often suffer from prolonged online relocalization time due to factors such as the massive pose search space and high computational overhead. To address these issues, this paper proposes an offline-online hierarchical framework that decouples the search space. In the offline phase, candidate positions and their corresponding geometric descriptor indices are generated in the map by simulating LiDAR scans within the grid map. In the online phase, a coarse pose estimate is first obtained via global retrieval, followed by point cloud registration to output precise 6-DoF pose estimates. Real-world experiments demonstrate that the proposed method achieves an average relocalization time of 3 s and an average localization accuracy of 8 cm in 3D environments. Compared with existing global relocalization methods, the proposed method achieves an order-of-magnitude improvement in computational efficiency while delivering comparable relocalization accuracy.
Drifting Field Policy: A One-Step Generative Policy via Wasserstein Gradient Flow
We propose Drifting Field Policy (DFP), a non-ODE one-step generative policy built on the drifting model paradigm. We frame the policy update as a reverse-KL Wasserstein-2 gradient flow toward a soft target policy, so that each DFP update corresponds to a gradient step in probability space. By construction, this gradient is decomposed into an ascent toward higher action-value regions and a score matching with the anchor policy as a trust region. We further derive a simple, tractable surrogate of the otherwise intractable update loss, akin to behavior cloning on top-K critic-selected actions. We find empirically that this mechanism uniquely benefits the drifting backbone owing to its non-ODE parameterization. With one-step inference, DFP achieves state-of-the-art performance on several manipulation tasks across Robomimic and OGBench, outperforming ODE-based policies.
Finite-Time Analysis of MCTS in Continuous POMDP Planning
This paper presents a finite-time analysis for Monte Carlo Tree Search (MCTS) in Partially Observable Markov Decision Processes (POMDPs), with probabilistic concentration bounds in both discrete and continuous observation spaces. While MCTS-style solvers such as POMCP achieve empirical success in many applications, rigorous finite-time guarantees remain an open problem due to the nonstationarity and the interdependencies induced by heuristic action selection (e.g., UCB). In the discrete setting, we address these challenges by extending the polynomial exploration bonus to UCB in POMDP setting, yielding polynomial concentration bounds for the empirical value estimation at the root node. For continuous observation spaces, we introduce an abstract partitioning framework and propose a finite-time bound on partitioning loss. Under mild conditions, we prove highprobability bound on value estimates in POMDPs with continuous observation space. Specifically, we propose Voro-POMCPOW, a variant of POMCPOW with f inite-time guarantees that adaptively partitions the continuous observation space using Voronoi cells. This approach maintains a finite branching factor while preserving the original observation generator. Empirical validation demonstrates that the proposed Voro-POMCPOW shows competitive performance while providing theoretical guarantees. Although our analysis focuses on continuous POMDPs, the techniques developed herein are also applicable to continuous MDPs, closing another gap on the MDP side.
comment: 9 pages, 1 figure
PhySPRING: Structure-Preserving Reduction of Physics-Informed Twins via GNN
Physics-based digital twins aim to predict the dynamics of real-world objects under interaction, enabling real-to-sim-to-real applications in robotics. Current approaches reconstruct such twins as explicit physical models (such as spring-mass systems) to predict the dynamics, but the resulting models often inherit the resolution of the visual reconstruction rather than being reduced to the physical complexity required to reproduce task-relevant dynamics. This mismatch introduces redundant topology, making repeated forward-dynamics rollouts unnecessarily expensive. To address this challenge, we present PhySPRING, an fully differentiable GNN-based method to reduce complexity in spring--mass digital twins. PhySPRING jointly learns a hierarchy of coarsened graph topologies and their mechanical parameters from observations. At each reduction level, PhySPRING merges nodes with similar learned dynamic responses to optimize the topology, while maintaining every reduced layer as an explicit spring--mass system. On the PhysTwin benchmark, PhySPRING improves dense reconstruction and prediction accuracy over PhysTwin, while reduced models retain stable physical and visual fidelity with up to a 2.30 times speed-up. We further demonstrate the effectiveness of PhySPRING in a Real2Sim robot policy-evaluation pipeline, where the reduced models are substituted zero-shot into ACT and $π_0$ evaluations, maintaining comparable manipulation success rates across downsampling levels while improving action-sampling effectiveness. Together, PhySPRING enables efficient and structure-preserving spring--mass reduction without sacrificing fidelity or robotic utility.
comment: 16 pages and 6 pages, conference paper
Operating Within the Operational Design Domain: Zero-Shot Perception with Vision-Language Models
Over the last few years, research on autonomous systems has matured to such a degree that the field is increasingly well-positioned to translate research into practical, stakeholder-driven use cases across well-defined domains. However, for a wide-scale practical adoption of autonomous systems, adherence to safety regulations is crucial. Many regulations are influenced by the Operational Design Domain (ODD), which defines the specific conditions in which an autonomous agent can function. This is especially relevant for Automated Driving Systems (ADS), as a dependable perception of ODD elements is essential for safe implementation and auditing. Vision-language models (VLMs) integrate visual recognition and language reasoning, functioning without task-specific training data, which makes them suitable for adaptable ODD perception. To assess whether VLMs can function as zero-shot "ODD sensors" that adapt to evolving definitions, we contribute (i) an empirical study of zero-shot ODD classification and detection using four VLMs on a custom dataset and Mapillary Vistas, along with failure analyses; (ii) an ablation of zero-shot optimization strategies with a cost-performance overview; and (iii) a suite of reusable prompting templates with guidance for adaptation. Our findings indicate that definition-anchored chain-of-thought prompting with persona decomposition performs best, while other methods may result in reduced recall. Overall, our results pave the way for transparent and effective ODD-based perception in safety-critical applications.
comment: 8 pages, 4 figures
BrickCraft: Visuomotor Skill Composition with Situated Manual Guidance for Long-Horizon Interlocking Brick Assembly
Autonomous robotic assembly of interlocking bricks demands seamless integration of long-horizon task reasoning, spatial grounding, and fine-grained manipulation. This paper presents BrickCraft, a compositional framework designed for long-horizon and generalizable interlocking brick assembly. BrickCraft models the assembly process using a relative formulation, where each step is anchored to a reference brick within the partial structure, thereby decomposing complex tasks into a finite set of reusable primitive skills. BrickCraft bridges the gap between high-level assembly plans and physical execution through situated manuals, which provide explicit spatial guidance for learned visuomotor skills by projecting the assembly intent onto real-time robot observations. Finally, BrickCraft employs a compositional execution pipeline that chains these spatially grounded skills to accomplish long-horizon assembly tasks. Extensive experimental validations demonstrate that BrickCraft acquires proficient assembly skills from a limited set of demonstrations and exhibits strong compositional generalization to unseen structures. The project website is available at https://intelligent-control-lab.github.io/BrickCraft.
MemCompiler: Compile, Don't Inject -- State-Conditioned Memory for Embodied Agents
Existing memory systems for embodied agents typically inject retrieved memory as static context at episode start, a paradigm we term Ahead-of-time Monolithic Memory Injection (AMMI). However, this static design quickly becomes misaligned with the agent's evolving state and may degrade lightweight executors below the no-memory baseline. To address this, we propose MemCompiler, which reframes memory utilization as State-Conditioned Memory Compilation. A learned Memory Compiler reads a structured Brief State capturing the agent's current execution state and dynamically selects and compiles only relevant memory into executable guidance. This guidance is delivered through a text channel and a latent Soft-Mem channel that preserves perceptual information not expressible in text. Across Alf World, EmbodiedBench, and ScienceWorld, MemCompiler consistently improves over no-memory across open-source backbones (up to +129%), matches or approaches frontier closed-source systems, and reduces per-step latency by 60%, demonstrating that state-aware memory compilation improves both effectiveness and efficiency.
How to utilize failure demo data?: Effective data selection for imitation learning using distribution differences in attention mechanism
Imitation learning for robotic tasks has relied primarily on policies trained only on successful demonstrations, although failures are unavoidable during human data collection. Many existing approaches for exploiting failure data require additional data processing or iterative policy updates through autonomous rollouts, making it difficult to directly and stably utilize failure data accumulated during data collection. In this work, we propose a method that learns latent representations of success-failure discrepancies and incorporates them into the attention mechanism. During inference, an appropriate latent mode is selected from the initial observation to improve action stability. Furthermore, we introduce a post-training metric that quantifies the attention discrepancy between each failure sample and successful demonstrations to select failure data. Simulation results show that the proposed method improves task success rates when trained with failure data and that the proposed metric identifies failure samples that are beneficial for learning when combined with successful demonstrations. These results suggest that the proposed method can support more efficient use of collected demonstrations in robotic data collection pipelines.
comment: 15 pages, 6 figures, 2 tables
Search-based Robustness Testing of Laptop Refurbishing Robotic Software
The Danish Technological Institute (DTI) focuses on transferring advanced technologies (including robots) to the industry and the public sector. One key application is laptop refurbishment using specialized robots, aimed at promoting reuse, reducing electronic waste, and supporting the European Circular Economy Action Plan. The software of such robots often includes features that use object detection models to detect objects for various purposes, such as identifying screws for laptop disassembly or detecting stickers to remove them. Ensuring the robustness of such models to small input variations remains a critical challenge, and addressing it is important to avoid potential damage to laptops during refurbishment. In this paper, we propose PROBE, a search-based robustness testing approach that leverages multi-objective optimization to identify minimal, localized perturbations that expose failures in object detection models used in the software of laptop refurbishing robots. PROBE employs NSGA-II to systematically explore the perturbation space, optimizing for failure induction considering both localization and confidence, and perturbation magnitude, while enabling the discovery of diverse failure cases. Results show that PROBE is 3$\times$ to 7$\times$ more effective than random search in generating failure-inducing perturbations, while requiring smaller perturbation magnitudes, and that the generated perturbations transfer across models. We further show that metamorphic relations provide additional insights into model robustness, enabling the assessment of stability even in non-failing cases.
comment: 15 pages, 4 figures, 5 tables
Is the Future Compatible? Diagnosing Dynamic Consistency in World Action Models
World Action Models (WAMs) enable decision-making through imagined rollouts by predicting future observations and actions. However, the reliability of these imagined futures remains under-examined: is a generated future merely visually plausible, or is it dynamically compatible with the action sequence it claims to model? In this work, we identify action-state consistency, the alignment between predicted actions and induced state transitions, as a missing reliability axis for WAMs. Through a systematic study across representative joint-prediction and inverse-dynamics models, we find that action-state consistency systematically separates successful and failed rollouts across many tasks and follows similar success-failure trends as learned value estimates. These results suggest that consistency captures decision-relevant structure beyond visual realism. We further identify background collapse as an important boundary condition, where low-dynamics failed trajectories can become deceptively consistent because static futures are easier to predict. Building on these findings, we introduce a value-free consensus strategy for test-time selection, which ranks candidate rollouts by agreement among predicted futures. This strategy improves success rates on RoboCasa and RoboTwin 2.0 without additional training or reward modeling. Taken together, our findings establish action-state consistency as both a diagnostic tool for evaluating WAM reliability and a practical signal for value-free planning.
comment: Technical Report
PathPainter: Transferring the Generalization Ability of Image Generation Models to Embodied Navigation
Bird's-eye-view (BEV) images have been widely demonstrated to provide valuable prior information for navigation. Given the global information provided by such views, two key challenges remain: how to fully exploit this information and how to reliably use it during execution. In this paper, we propose a navigation system that uses BEV images as global priors and is designed for ground and near-ground robotic platforms. The system employs an image generation model to interpret human intent from natural language, identify the target destination, and generate traversability masks. During execution, we introduce cross-view localization to align the robot's odometry with the BEV map and mitigate long-term drift in conventional odometry. We conduct extensive benchmark experiments to evaluate the proposed method and further validate it on a UAV platform. Using only a conventional local motion planner, the UAV successfully completes a 160-meter outdoor long-range navigation task. This work demonstrates how the world-understanding capabilities of foundation models can be transferred to embodied navigation, enabling robots to benefit from the strong generalization ability of existing image generation models.
comment: Work in the progress. 11 pages, 7 figures
Escaping the Diversity Trap in Robotic Manipulation via Anchor-Centric Adaptation
While Vision-Language-Action (VLA) models offer broad general capabilities, deploying them on specific hardware requires real-world adaptation to bridge the embodiment gap. Since robot demonstrations are costly, this adaptation must often occur under a strict data budget. In this work, we identify a critical diversity trap: the standard heuristic of "maximizing coverage" by collecting diverse, single-shot demonstrations can be self-defeating due to non-vanishing estimation noise. We formalize this phenomenon as a Coverage--Density Trade-off. By decomposing the policy error into estimation (density) and extrapolation (coverage) terms, we characterize an interior optimal allocation of unique conditions for a fixed budget. Guided by this analysis, we propose Anchor-Centric Adaptation (ACA), a two-stage framework that first stabilizes a policy skeleton through repeated demonstrations at core anchors, then selectively expands coverage to high-risk boundaries via teacher-forced error mining and constrained residual updates. Real-robot experiments validate our trade-off framework and demonstrate that ACA significantly improves task reliability and success rates over standard diverse sampling strategies under the same budget.
comment: 21 pages, 8 figures
MORPH-U: Multi-Objective Resilient Motion Planning for V2X-Enabled Autonomous Driving in High-Uncertainty Environments via Simulation
V2X can warn an autonomous vehicle about hazards beyond line-of-sight, but it also brings uncertainty: messages may be delayed, dropped, or even forged. Meanwhile, map knowledge may change during a trip, forcing the vehicle to replan under tight real-time budgets. This paper studies how to make motion planning and low-level control robust to such uncertain, event-driven updates. We present MORPH-U, a CARLA-based closed-loop stack that fuses LiDAR/radar/camera with V2X (CAM/DENM) into a Local Dynamic Map (LDM) and triggers Hybrid-A* replanning when validated hazards or map changes affect the planned route. We expose the planning/control trade-offs via a multi-objective formulation over tracking error, safety margin (minimum TTC), responsiveness, and smoothness, and select operating points using Pareto-frontier analysis. To avoid unsafe replanning from faulty V2X triggers, MORPH-U adds a lightweight Byzantine-inspired acceptance gate that combines a quorum rule with an on-board sensor veto. Experiments in dynamic CARLA scenarios show that V2X-augmented LDM improves downstream safety, Pareto tuning provides controllable accuracy-comfort trade-offs, and the gate prevents replanning under saturated false-DENM injection ($p_{\text{attack}}=1.0$).
Weather-Robust Scene Semantics with Vision-Aligned 4D Radar ICRA 2026
Cameras and LiDAR degrade in rain, fog, and snow, while millimeter-wave radar remains largely unaffected. We align a radar encoder to frozen SigLIP vision embeddings and decode structured scene captions through a frozen vision-language model (VLM) with approximately 7M trainable parameters. On K-RADAR with held-out fog, light snow, and heavy snow sequences, all radar configurations outperform a camera baseline that collapses to over 90% hallucination. We identify a token-norm mismatch as the dominant failure mode when bridging radar to a frozen VLM and show that projector-output LayerNorm resolves it. Analysis of encoder complexity, caption format, and pooling strategy reveals tradeoffs that inform future radar-VLM pipeline design.
comment: 5 pages + references, 2 appendix pages. ICRA 2026 Radar in Robotics Workshop
CSR: Infinite-Horizon Real-Time Policies with Massive Cached State Representations
Deploying massive large language models (LLMs) as continuous cognitive engines for robotics is bottlenecked by the time-to-first-token (TTFT) latency required to process extensive state histories. Existing solutions like RAG or sliding windows compromise global context or incur prohibitive re-computation costs. We formalize the optimal task structure for minimizing latency and theoretically prove that prefix stability, incremental extensibility, and asynchronous state reconciliation are necessary conditions for real-time performance. Building on these proofs, we introduce the Cached State Representation (CSR) framework as the practical instantiation of these properties, ensuring optimal KV-cache reuse. To sustain these properties over infinite horizons, we further propose an Asynchronous State Reconciliation (ASR) algorithm that offloads state memory eviction to a parallel computational resource to eliminate latency spikes. On a physical robot wirelessly connected to an on-premise GPU server, CSR achieves a 26-fold latency reduction (14.67s to 0.56s) for 120K token contexts with a 235B parameter model compared to a standard baseline. On an embodied AI benchmark, we achieve SOTA recall (0.836 vs. 0.459) while maintaining RAG-level latency. ASR is validated to sustain bounded, spike-free TTFT over 10 eviction cycles in continuous real-world operation. Together, CSR and ASR enable massive LLMs to function as continuously operating, high-frequency (> 2 Hz) embodied policies.
comment: Extended Technical Report for Paper Accepted to IEEE RA-L
AT-VLA: Adaptive Tactile Injection for Enhanced Feedback Reaction in Vision-Language-Action Models
Vision-Language-Action (VLA) models have significantly advanced the capabilities of robotic agents in executing diverse tasks; however, they still face challenges in contact-rich manipulation scenarios that require precise physical interactions. To address this limitation, recent studies have attempted to incorporate tactile signals during downstream tasks, enabling pretrained VLAs to interpret tactile feedback. Nevertheless, introducing new modalities during finetuning, which are rarely present in the pretrain stage, may disrupt the pretrained capabilities of VLAs. In addition, the inherently slow inference speed of VLAs hampers real-time responsiveness and limits the effective utilization of tactile feedback for action adjustment. To overcome these challenges, we propose Adaptive Tactile Vision-Language-Action (AT-VLA), which introduces a novel Adaptive Tactile Injection mechanism. This mechanism dynamically determines the appropriate timing and locations for tactile injection, incorporating only when it significantly contributes to action generation, thereby minimizing interference with pretrained representations. Furthermore, to enable rapid and accurate tactile responses, we propose a Tactile Reaction Dual-Stream mechanism, which decouples sensory processing into a slow visual-language stream for low-frequency perceptual reasoning and a fast tactile control stream for high-frequency physical interaction understanding, achieving real-time close-loop responses within 0.04 s. Real-world experiments thoroughly validate the effectiveness of AT-VLA in contact-rich manipulation tasks. The project page is available at: https://sites.google.com/view/at-vla.
BioProVLA-Agent: An Affordable, Protocol-Driven, Vision-Enhanced VLA-Enabled Embodied Multi-Agent System with Closed-Loop-Capable Reasoning for Biological Laboratory Manipulation
Biological laboratory automation can reduce repetitive manual work and improve reproducibility, but reliable embodied execution in wet-lab environments remains challenging. Protocols are often unstructured, labware is frequently transparent or reflective, and multi-step procedures require state-aware execution beyond one-shot instruction following. Existing robotic systems often rely on costly hardware, fixed workflows, dedicated instruments, or robotics-oriented interfaces. Here, we introduce BioProVLA-Agent, an affordable, protocol-driven, vision-enhanced embodied multi-agent system enabled by Vision-Language-Action (VLA) models for biological manipulation. The system uses protocols as the task interface and integrates protocol parsing, visual state verification, and embodied execution in a closed-loop workflow. A Tailored LLM Protocol Agent converts protocols into verifiable subtasks; a VLM-RAG Verification Agent assesses readiness and completion using observations, robot states, retrieved knowledge, and success/failure examples; and a VLA Embodied Agent executes verified subtasks through a lightweight policy. To improve robustness under wet-lab visual perturbations, we develop AugSmolVLA, an online augmentation strategy targeting transparent labware, reflections, illumination shifts, and overexposure. We evaluate the system on a hierarchical benchmark covering 15 atomic tasks, 6 composite workflows, and 3 bimanual tasks, including tube loading, sorting, waste disposal, cap twisting, and liquid pouring. Across normal and high-exposure settings, AugSmolVLA improves execution stability over ACT, X-VLA, and the original SmolVLA, especially for precise placement, transparent-object manipulation, composite workflows, and visually degraded scenes. These results suggest a practical route toward accessible, protocol-centered, and verification-capable embodied AI for biological manipulation.
comment: 16 pages, 7 figures
Variable Aerodynamic Damping via Co-Contraction: A Dynamic Isomorphism with Variable Stiffness Actuators
We prove that aerodynamic co-contraction in a redundant dual-rotor actuator can tune a passive, trim-defined aero-mechanical damping while keeping the commanded net force constant. In particular, we define an incremental damping coefficient as the local sensitivity of net thrust to air-relative velocity at a trim and prove that it increases monotonically along constant-force fibers under a mild aerodynamic hardening condition. We then validate the required damping and hardening properties from a first-principles Blade Element Theory derivation, which yields a minimal thrust model affine in inflow and explicitly reveals the speed--inflow coupling driving the effect. The resulting mechanism is formalized as a Variable Aerodynamic Damping Actuator (VADA) and shown to be dynamically isomorphic to stiffness modulation in antagonistic variable-stiffness actuation (VSA), similar to the co-contraction of tendons by muscle co-activation. The same fiber-density principle also enhances the active aerodynamic promptness measure of redundant multirotors. Finally, an impedance-form representation clarifies the roles of common-mode and differential-mode actuation in the control of passive impedance and the equilibrium velocity of the VADA system.
Palm-sized Omnidirectional Vision-Based UAV Exploration with Sparse Topological Map Guidance
Classic exploration methods often rely on dense occupancy maps or high-resolution point clouds for frontier detection and path planning, resulting in substantial memory consumption and computational overhead. Moreover, micro UAVs under size, weight, and power (SWaP) constraints are not practical to be equipped with sensors like LiDAR to obtain accurate environmental geometric measurements. This paper presents a lightweight autonomous exploration system that leverages omnidirectional vision and sparse topological map guidance. Specifically, we utilize a multi-fisheye camera setup to achieve omnidirectional Field of View (FoV) and perform depth estimation. To address the limited depth estimation accuracy, frontiers are represented as potential unexplored regions characterized by topological nodes instead of explicit boundaries, enabling efficient identification of frontier regions without maintaining occupancy grids or global point clouds. Unlike classic dense representations, our approach abstracts the environment using a sparse topological map composed of key nodes and their descriptors, reducing memory consumption and computational demands. Global path planning is performed directly on the sparse graph. The proposed method is validated in both simulation and on a palm-sized vision-based UAV with an 11 cm wheelbase and a 400 g weight in real-world experiments, demonstrating that our method can achieve efficient exploration with extremely low computational consumption.
PISTO: Proximal Inference for Stochastic Trajectory Optimization
Stochastic trajectory optimization methods like STOMP enable planning with non-differentiable costs, offering substantial flexibility over gradient-based approaches. We show that STOMP implicitly minimizes the KL divergence from a Boltzmann trajectory distribution, revealing an elegant Variational Inference (VI) structure underlying its updates. Building on this insight, we propose the \textit{Proximal Inference for Stochastic Trajectory Optimization} (PISTO) algorithm that stabilizes the updates by augmenting the objective with a KL regularization between successive Gaussian proposals. This proximal formulation admits a trust-region interpretation and yields closed-form mean updates computable as expectations under a surrogate distribution. We estimate these expectations via importance-weighted Monte Carlo sampling, producing a simple, derivative-free algorithm that inherits STOMP's ability to handle non-differentiable and discontinuous costs without modification. On robot arm motion planning benchmarks, PISTO achieves an 89\% success rate -- outperforming CHOMP (63\%) and STOMP (68\%) -- while producing shorter, smoother paths at twice the speed of competing stochastic methods. We further validate PISTO on contact-rich MuJoCo locomotion and manipulation tasks, where it consistently outperforms both CEM and MPPI baselines in reward.
comment: 8 pages
TriP: A Triangle Puzzle Approach to Robust Translation Averaging
Translation averaging aims to recover camera locations from pairwise relative translation directions and is a fundamental component of global Structure-from-Motion pipelines. The problem is challenging because direction measurements contain no distance information, making the estimation problem highly ill-conditioned and highly sensitive to corrupted observations. In this paper, we propose TriP, a triangle-based framework for robust translation averaging. TriP first infers local relative edge scales from triangle geometry, and then synchronizes the scales of overlapping triangles in the logarithmic domain to recover globally consistent edge lengths and camera locations. By leveraging higher-order consistency across triangles, the proposed method is robust to adversarial, cycle-consistent, and other structured corruptions. In addition, TriP avoids the collapse issue without requiring any extra anti-collapse constraints, since log-scale synchronization excludes the degenerate zero-scale solution by construction. These structural advantages enable a particularly strong theory for exact location recovery. On the practical side, TriP is fully parallelizable, computationally efficient, and naturally scalable to graphs with millions of cameras. Moreover, it outperforms all previous translation averaging methods by a large margin on both synthetic and real datasets.
Learning Visual Feature-Based World Models via Residual Latent Action
World models predict future transitions from observations and actions. Existing works predominantly focus on image generation only. Visual feature-based world models, on the other hand, predict future visual features instead of raw video pixels, offering a promising alternative that is more efficient and less prone to hallucination. However, current feature-based approaches rely on direct regression, which leads to blurry or collapsed predictions in complex interactions, while generative modeling in high-dimensional feature spaces still remains challenging. In this work, we discover that a new type of latent action representation, which we refer to as *Residual Latent Action* (RLA), can be easily learned from DINO residuals. We also show that RLA is predictive, generalizable, and encodes temporal progression. Building on RLA, we propose *RLA World Model* (RLA-WM), which predicts RLA values via flow matching. RLA-WM outperforms both state-of-the-art feature-based and video-diffusion world models on simulation and real-world datasets, while being orders of magnitude faster than video diffusion. Furthermore, we develop two robot learning techniques that use RLA-WM to improve policy learning. The first one is a minimalist world action model with RLA that learns from actionless demonstration videos. The second one is the first visual RL framework trained entirely inside a world model learned from offline videos only, using a video-aligned reward and no online interactions or handcrafted rewards. Project page: https://mlzxy.github.io/rla-wm
UNCOM: Zero-shot Context-Aware Command Understanding for Tabletop Scenarios
This paper presents UNCOM, a novel hybrid framework for interpreting natural human commands in tabletop scenarios. The system integrates multiple sources of information -- speech, gestures, and scene context -- to extract structured, actionable instructions for robots. Addressing the need for general-purpose human-robot interaction in domestic environments, UNCOM is designed for zero-shot operation, without reliance on predefined object models or training data specific to a given task. Using foundational and task-specific deep learning models, it allows out-of-the-box speech recognition, natural language understanding, gesture detection, and object segmentation. The modular architecture enhances transparency and explainability by explicitly parsing commands into object-action-target representations, enabling integration with symbolic robotic frameworks. We demonstrate the system in a TIAGo++ robot and provide an evaluation on a real-world data set of human-robot interaction scenarios; achieving an 82.39\% success rate over our benchmark data set, highlighting the robustness of the system to diversity, noise, and communication ambiguity. The data set, evaluation scenarios, and the code are publicly available to support future research.
Goal-Conditioned Decision Transformer for Multi-Goal Offline Reinforcement Learning
Reinforcement learning (RL) in robotics faces significant hurdles regarding sample efficiency and generalization across varying goals. While Offline RL mitigates the need for costly online interactions, its integration with goal-conditioned policies and transformer-based architectures remains underexplored. We introduce a Goal-Conditioned Decision Transformer adapted for offline multi-goal robotics. By explicitly incorporating goal states into the sequence modeling framework, our approach efficiently solves varying tasks using only pre-collected data. We validate this method on a newly released offline dataset for the Franka Emika Panda platform. Experimental results demonstrate that our approach outperforms state-of-the-art online baselines in complex tasks and maintains robustness in sparse-reward settings, even with limited expert demonstrations.
LineRides: Line-Guided Reinforcement Learning for Bicycle Robot Stunts
Designing reward functions for agile robotic maneuvers in reinforcement learning remains difficult, and demonstration-based approaches often require reference motions that are unavailable for novel platforms or extreme stunts. We present LineRides, a line-guided learning framework that enables a custom bicycle robot to acquire diverse, commandable stunt behaviors from a user-provided spatial guideline and sparse key-orientations, without demonstrations or explicit timing. LineRides handles physically infeasible guidelines using a tracking margin that permits controlled deviation, resolves temporal ambiguity by measuring progress via traveled distance along the guideline, and disambiguates motion details through position- and sequence-based key-orientations. We evaluate LineRides on the Ultra Mobility Vehicle (UMV) and show that the policy trained with our methods supports seamless transitions between normal driving and stunt execution, enabling five distinct stunts on command: MiniHop, LargeHop, ThreePointTurn, Backflip, and DriftTurn.
comment: Published in IEEE Robotics and Automation Letters (RA-L), 2026
TAG-K: Tail-Averaged Greedy Kaczmarz for Computationally Efficient and Performant Online Inertial Parameter Estimation ICRA 2026
Accurate online inertial parameter estimation is essential for adaptive robotic control, enabling real-time adjustment to payload changes, environmental interactions, and system wear. Traditional methods often struggle to track abrupt parameter shifts or incur high computational costs, limiting their effectiveness in dynamic environments and for computationally constrained robotic systems. We introduce TAG-K, a lightweight extension of the Kaczmarz method that combines greedy randomized row selection for rapid convergence with tail averaging for robustness under noise and inconsistency. This design enables fast, stable parameter adaptation while retaining the low per-iteration complexity inherent to the Kaczmarz framework. We evaluate TAG-K in synthetic benchmarks and quadrotor tracking tasks against RLS, KF, and other Kaczmarz variants. TAG-K achieves 1.5x-1.9x faster solve times on laptop-class CPUs and 4.8x-20.7x faster solve times on embedded microcontrollers. More importantly, these speedups are paired with improved robustness to measurement noise and a 25% reduction in estimation error, leading to nearly 2x better end-to-end tracking performance. Website, documentation, and code available at: https://a2r-lab.org/TAG-K/.
comment: Accepted to ICRA 2026. 3 Figures. 3 Tables
Code Generation and Conic Constraints for Model-Predictive Control on Microcontrollers with Conic-TinyMPC ICRA 2026
Model-predictive control (MPC) is a state-of-the-art control method for constrained robotic systems, yet deployment on resource-limited hardware remains difficult. This challenge is magnified by expressive conic constraints, which offer greater modeling power but require significantly more computation than linear alternatives. To address this challenge, we extend recent work developing fast, structure-exploiting, cached solvers for embedded applications based on the Alternating Direction Method of Multipliers (ADMM) to provide support for second-order cones, as well as C++ code generation from Python, MATLAB, and Julia. Microcontroller benchmarks show that our solver provides up to a two-order-of-magnitude speedup, ranging from 10.6x to 142.7x, over state-of-the-art embedded solvers on QP and SOCP problems, and enables us to fit order-of-magnitude larger problems in memory. We validate our solver's deployed performance through simulation and hardware experiments, including trajectory tracking with conic constraints on a 27g Crazyflie quadrotor. Our open-source code is available at https://tinympc.org.
comment: Accepted to ICRA 2026. 4 Figures. 2 Tables. First three authors contributed equally
GATO: GPU-Accelerated and Batched Trajectory Optimization for Scalable Edge Model Predictive Control ICRA 2026
While Model Predictive Control (MPC) delivers strong performance across robotics applications, solving the underlying (batches of) nonlinear trajectory optimization (TO) problems online remains computationally demanding. Existing GPU-accelerated approaches either parallelize single solves, handle large batches at sub-real-time rates, or sacrifice model generality for speed. This leaves a large gap in solver performance for many state-of-the-art MPC applications that require real-time batches of tens to low-hundreds of solves. As such, we present GATO, an open source, GPU-accelerated, batched TO solver co-designed across algorithm, software, and computational hardware to deliver real-time throughput for these moderate batch size regimes. Our approach leverages a combination of block-, warp-, and thread-level parallelism within and across solves for ultra-high performance. We demonstrate the effectiveness of our approach through a combination of: simulated benchmarks showing speedups of 18-21x over CPU baselines and 1.4-16x over GPU baselines as batch size increases; case studies highlighting improved disturbance rejection and convergence behavior; and finally a validation on hardware using an industrial manipulator. We open source GATO to support reproducibility and adoption.
comment: Accepted to ICRA 2026. 8 pages, 8 figures, 2 tables
Docking and Persistent Operations for a Resident Underwater Vehicle
Our understanding of the oceans remains limited by sparse and infrequent observations, primarily because current methods are constrained by the high cost and logistical effort of underwater monitoring, relying either on sporadic surveys across broad areas or on long-term measurements at fixed locations. To overcome these limitations, monitoring systems must enable persistent and autonomous operations without the need for continuous surface support. Despite recent advances, resident underwater vehicles remain uncommon due to persistent challenges in autonomy, robotic resilience, and mechanical robustness, particularly under long-term deployment in harsh and remote environments. This work addresses these problems by presenting the development, deployment, and operation of a resident infrastructure using a docking station with a mini-class Remotely Operated Vehicle (ROV) at 90 m depth. The ROV is equipped with enhanced onboard processing and perception, allowing it to autonomously navigate using USBL signals, dock via ArUco marker-based visual localisation fused through an Extended Kalman Filter, and carry out local inspection routines. The system demonstrated a 90 % autonomous docking success rate and completed full inspection missions within four minutes, validating the integration of acoustic and visual navigation in real-world conditions. These results show that reliable, untethered operations at depth are feasible, highlighting the potential of resident ROV systems for scalable, cost-effective underwater monitoring.
SCOUT: Closed-Loop in-vivo System for Continuous Methane Concentration Monitoring in Cattle
Enteric methane measurement from ruminant livestock faces fundamental trade-offs between accuracy and operational feasibility. Existing methods quantify methane after eructation and atmospheric dilution, limiting temporal resolution and confounding biological signals with environmental variables. We present SCOUT (Smart Cannula-mounted Optical Unit for Trace-methane), the first autonomous system for continuous in-vivo monitoring of ruminal headspace methane concentrations. The system addresses a critical engineering barrier through closed-loop gas recirculation that maintains anaerobic ruminal conditions during persistent headspace sampling. SCOUT was deployed on cannulated Simmental heifers under contrasting dietary treatments. Headspace concentrations were 100 to 1000 times higher than concurrent ambient sniffer readings, providing substantially greater signal resolution for characterizing methane dynamics. High-frequency monitoring revealed behavior-production coupling previously inaccessible, including rapid concentration changes ($14.5 \pm 11.3k$ ppm) associated with postural transitions within 15-minute intervals. Cross-platform comparison with ambient sniffers showed scale-dependent correspondence between production and release measurements, with an optimal correlation (r = -0.564) at 40-minute averaging windows consistent with eructation cycles. These results demonstrate that the rumen headspace contains continuous, biologically interpretable methane signals that SCOUT can reliably access, establishing the measurement infrastructure necessary for developing concentration-to-flux models that would support precision phenotyping, emission proxy calibration, and mitigation strategy evaluation.
Separation Assurance between Heterogeneous Fleets of Small Unmanned Aerial Systems via Multi-Agent Reinforcement Learning
In the envisioned future dense urban airspace, multiple companies will operate heterogeneous fleets of small unmanned aerial systems (sUASs), where each fleet includes several homogeneous aircraft with identical policies and configurations, e.g., equipage, sensing, and communication ranges, making tactical deconfliction highly complex for the aircraft. This paper aims to address two core questions: (1) Can tactical deconfliction policies converge or reach an equilibrium to ensure a conflict-free airspace when companies operate heterogeneous fleets of homogeneous aircraft? (2) If so, will the converged policies discriminate against companies operating sUASs with weaker configurations? We investigate a multi-agent reinforcement learning paradigm in which homogeneous aircraft within heterogeneous fleets operate concurrently to perform package delivery missions over Dallas, Texas, USA. An attention-enhanced Proximal Policy Optimization-based Advantage Actor-Critic (PPOA2C) framework is employed to resolve intra- and inter-fleet conflicts, with each fleet independently training its own policy while preserving privacy. Experimental results show that two fleets with distinct, shared PPOA2C policies can reach an equilibrium to maintain safe separation. While two PPOA2C policies outperform two strong rule-based baselines in terms of conflict resolution, a PPOA2C policy exhibits safer interaction with a rule-based policy, indicating adaptive capabilities of PPOA2C policies. Furthermore, we conducted extensive policy-configuration evaluations, which reveal that equilibria between similar policy types tend to favor fleets with stronger configurations. Even under similar configurations but different policy types, the equilibrium favors one of the heterogeneous policies, underscoring the need for fairness-aware conflict management in heterogeneous sUAS operations.
comment: 8 pages, 3 figure, 1 table
HAIC: Humanoid Agile Object Interaction Control via Dynamics-Aware World Model
Humanoid robots show promise for complex whole-body tasks in unstructured environments. Although Human-Object Interaction (HOI) has advanced, most methods focus on fully actuated objects rigidly coupled to the robot, ignoring underactuated objects with independent dynamics and non-holonomic constraints. These introduce control challenges from coupling forces and occlusions. We present HAIC, a unified framework for robust interaction across diverse object dynamics without external state estimation. Our key contribution is a dynamics predictor that estimates high-order object states (velocity, acceleration) solely from proprioceptive history. These predictions are projected onto static geometric priors to form a spatially grounded dynamic occupancy map, enabling the policy to infer collision boundaries and contact affordances in blind spots. We use asymmetric fine-tuning, where a world model continuously adapts to the student policy's exploration, ensuring robust state estimation under distribution shifts. Experiments on a humanoid robot show HAIC achieves high success rates in agile tasks (skateboarding, cart pushing/pulling under various loads) by proactively compensating for inertial perturbations, and also masters multi-object long-horizon tasks like carrying a box across varied terrain by predicting the dynamics of multiple objects.
comment: RSS 2026. Webpage: https://haic-humanoid.github.io/
3D Generation for Embodied AI and Robotic Simulation: A Survey
Embodied AI and robotic systems increasingly depend on scalable, diverse, and physically grounded 3D content for simulation-based training and real-world deployment. While 3D generative modeling has advanced rapidly, embodied applications impose requirements far beyond visual realism: generated objects must carry kinematic structure and material properties, scenes must support interaction and task execution, and the resulting content must bridge the gap between simulation and reality. This survey reviews 3D generation for embodied AI and organizes the literature around three roles that 3D generation plays in embodied systems. In Data Generator, 3D generation produces simulation-ready objects and assets, including articulated, physically grounded, and deformable content for downstream interaction; in Simulation Environments, it constructs interactive and task-oriented worlds, spanning structure-aware, controllable, and agentic scene generation; and in Sim2Real Bridge, it supports digital twin reconstruction, data augmentation, and synthetic demonstrations for downstream robot learning and real-world transfer. We also show that the field is shifting from visual realism toward interaction readiness, and we identify the main bottlenecks, including limited physical annotations, the gap between geometric quality and physical validity, fragmented evaluation, and the persistent sim-to-real divide, that must be addressed for 3D generation to become a dependable foundation for embodied intelligence. Our project page is at https://3dgen4robot.github.io.
comment: 27 pages, 11 figures, 8 tables
Proactive Instance Navigation with Comparative Judgment for Ambiguous User Queries
Natural-language instance navigation becomes challenging when the initial user request does not uniquely specify the target instance. A practical agent should reduce the user's burden by actively asking only the information needed to distinguish the target from similar distractors, rather than requiring a detailed description upfront. Existing approaches often fall short of this goal: they may stop at the first plausible candidate before sufficiently exploring alternatives, or, even after collecting multiple candidates, ask about the target's attributes derived from individual candidates rather than questions selected to distinguish candidates in the pool. As a result, despite the dialogue, the agent may still fail to distinguish the target from distractors, leading to premature decisions and lengthy user responses. We propose Proactive Instance Navigation with Comparative Judgment (ProCompNav), a two-stage framework that first constructs a candidate pool and then identifies the target through comparative judgment. At each round, ProCompNav extracts an attribute-value pair that splits the current pool, asks a binary yes/no question, and prunes all inconsistent candidates at once. This reframes disambiguation from open-ended target description to pool-level discriminative questioning, where each question is chosen to narrow the candidate set. On CoIN-Bench, ProCompNav improves Success Rate over interactive baselines with the same minimal input and non-interactive baselines with detailed descriptions, while substantially reducing Response Length. ProCompNav also achieves state-of-the-art Success Rate on TextNav, suggesting that comparative judgment is broadly useful for instance-level navigation among similar distractors.
comment: 17 pages, 6 figures
DynaRetarget: Dynamically-Feasible Retargeting using Sampling-Based Trajectory Optimization
In this paper, we introduce DynaRetarget, a complete pipeline for retargeting human motions to humanoid control policies. The core component of DynaRetarget is a novel Sampling-Based Trajectory Optimization (SBTO) framework that refines imperfect kinematic trajectories into dynamically feasible motions. SBTO incrementally advances the optimization horizon, enabling optimization over the entire trajectory for long-horizon tasks. We validate DynaRetarget by successfully retargeting hundreds of humanoid-object demonstrations and achieving higher success rates than the state of the art. The framework also generalizes across varying object properties, such as mass, size, and geometry, using the same tracking objective. This ability to robustly retarget diverse demonstrations opens the door to generating large-scale synthetic datasets of humanoid loco-manipulation trajectories, addressing a major bottleneck in real-world data collection.
Bluetooth Phased-array Aided Inertial Navigation Using Factor Graphs: Experimental Verification
Phased-array Bluetooth systems have emerged as a low-cost alternative for performing aided inertial navigation in GNSS-denied use cases such as warehouse logistics, drone landings, and autonomous docking. Basing a navigation system off of commercial-off-the-shelf components may reduce the barrier of entry for phased-array radio navigation systems, albeit at the cost of significantly noisier measurements and relatively short feasible range. In this paper, we compare robust estimation strategies for a factor graph optimisation-based estimator using experimental data collected from multirotor drone flight. We evaluate performance in loss-of-GNSS scenarios when aided by Bluetooth angular measurements, as well as range or barometric pressure.
comment: 6 pages, 5 figures, 2 tables. \c{opyright} 2026 the authors. This work has been accepted to IFAC for publication under a Creative Commons Licence CC-BY-NC-ND
Affordance Agent Harness: Verification-Gated Skill Orchestration
Affordance grounding requires identifying where and how an agent should interact in open-world scenes, where actionable regions are often small, occluded, reflective, and visually ambiguous. Recent systems therefore combine multiple skills (e.g., detection, segmentation, interaction-imagination), yet most orchestrate them with fixed pipelines that are poorly matched to per-instance difficulty, offer limited targeted recovery from intermediate errors, and fail to reuse experience from recurring objects. These failures expose a systems problem: test-time grounding must acquire the right evidence, decide whether that evidence is reliable enough to commit, and do so under bounded inference cost without access to labels. We propose Affordance Agent Harness, a closed-loop runtime that unifies heterogeneous skills with an evidence store and cost control, retrieves episodic memories to provide priors for recurring categories, and employs a Router to adaptively select and parameterize skills. An affordance-specific Verifier then gates commitments using self-consistency, cross-scale stability, and evidence sufficiency, triggering targeted retries before a final judge fuses accumulated evidence and trajectories into the prediction. Experiments on multiple affordance benchmarks and difficulty-controlled subsets show a stronger accuracy-cost Pareto frontier than fixed-pipeline baselines, improving grounding quality while reducing average skill calls and latency. Project page: https://tenplusgood.github.io/a-harness-page/.
comment: 43 pages, 22 figures, 8 tables. Ongoing work
Dynamic Properties and Motion Reproducibility of a Compact Pneumatically Actuated Humanoid Upper Body for Data-Driven Control
Pneumatically-actuated anthropomorphic robots with high degrees of freedom (DOF) offer significant potential for physical human-robot interaction. However, precise control of pneumatic actuators is challenging due to their inherent nonlinearities. This paper presents the development of a compact 13-DOF upper-body humanoid robot. To assess the feasibility of an effective controller, we first investigate its key dynamic properties, such as actuation time delays, and confirm that the system exhibits highly reproducible behavior. Leveraging this reproducibility, we implement a preliminary data-driven controller for a 4-DOF arm subsystem based on a multilayer perceptron with explicit time delay compensation. The network was trained on random movement data to generate pressure commands for tracking arbitrary trajectories. Comparative evaluations with a traditional PID controller demonstrate superior trajectory tracking performance, highlighting the potential of data-driven approaches for controlling complex, high-DOF pneumatic robots.
comment: 25 pages, 21 figures. Submitted to Advanced Robotics
SeedPolicy: Horizon Scaling via Self-Evolving Diffusion Policy for Robot Manipulation
Imitation Learning (IL) enables robots to acquire manipulation skills from expert demonstrations. Diffusion Policy (DP) models multi-modal expert behaviors but degrades when naively increasing stacked observation horizons, limiting long-horizon manipulation. We propose Self-Evolving Gated Attention (SEGA), a temporal module that maintains a time-evolving latent state via gated attention, enabling efficient recurrent updates that accumulate long-term context into a compact latent representation while filtering irrelevant temporal information. Integrating SEGA into DP yields Self-Evolving Diffusion Policy (SeedPolicy), which resolves the temporal modeling bottleneck and extends the effective temporal horizon with moderate overhead. On the RoboTwin 2.0 benchmark with 50 manipulation tasks, SeedPolicy outperforms DP and other IL baselines. Averaged across both CNN and Transformer backbones, SeedPolicy achieves 36.8% relative improvement in clean settings and 169% relative improvement in randomized challenging settings over the DP. Compared to vision-language-action models such as RDT with 1.2B parameters, SeedPolicy achieves stronger performance in the clean setting with one to two orders of magnitude fewer parameters, demonstrating strong efficiency. These results establish SeedPolicy as a state-of-the-art imitation learning method for long-horizon robotic manipulation. Code is available at: https://anonymous.4open.science/r/SeedPolicy-64F0/.
comment: 22 pages, 14 figures
TAIL-Safe: Task-Agnostic Safety Monitoring for Imitation Learning Policies
Recent imitation learning (IL) algorithms such as flow-matching and diffusion policies demonstrate remarkable performance in learning complex manipulation tasks. However, these policies often fail even when operating within their training distribution due to extreme sensitivity to initial conditions and irreducible approximation errors that lead to compounding drift. This makes it unsafe to deploy IL policies in the field where out-of-distribution scenarios are prevalent. A prerequisite for safe deployment is enabling the policy to determine whether it can execute a task the way it was learned from demonstrations. This paper presents TAIL-Safe, a principled approach to identify, for a trained IL policy, a safe set from where the policy empirically succeeds in completing the learned task. We propose a Lipschitz-continuous Q-value function that maps state-action pairs to a long-term safety score based on three short-term task-agnostic criteria: visibility, recognizability, and graspability. The zero-superlevel set of this function characterizes an empirical control invariant set over state-action pairs. When the nominal policy proposes an action outside this set, we apply a recovery mechanism inspired by Nagumo's theorem that uses gradient ascent to the Q-function to steer the policy back to safety. To learn this Q-function, we construct a high-fidelity digital twin using Gaussian Splatting that enables systematic collection of failure data without risk to physical hardware. Experiments with a Franka Emika robot demonstrate that flow-matching policies, which fail under run-time perturbations, achieve consistent task success when guided by the proposed TAIL-Safe.
Hydra-DP3: Frequency-Aware Right-Sizing of 3D Diffusion Policies for Visuomotor Control
Diffusion-based visuomotor policies perform well in robotic manipulation, yet current methods still inherit image-generation-style decoders and multi-step sampling. We revisit this design from a frequency-domain perspective. Robot action trajectories are highly smooth, with most energy concentrated in a few low-frequency discrete cosine transform modes. Under this structure, we show that the error of the optimal denoiser is bounded by the low-frequency subspace dimension and residual high-frequency energy, implying that denoising error saturates after very few reverse steps. This also suggests that action denoising requires a much simpler denoising model than image generation. Motivated by this insight, we propose Hydra-DP3 (HDP3), a pocket-scale 3D diffusion policy with a lightweight Diffusion Mixer decoder that supports two-step DDIM inference. Our synthetic experiments validate the theory and support the sufficiency of two-step denoising. Futhermore, across RoboTwin2.0, Adroit, MetaWorld, and real-world tasks, HDP3 achieves state-of-the-art performance with fewer than 1% of the parameters of prior 3D diffusion-based policies and substantially lower inference latency.
Large Video Planner Enables Generalizable Robot Control
General-purpose robots require decision-making models that generalize across diverse tasks and environments. Recent works build robot foundation models by extending multimodal large language models (MLLMs) with action outputs, creating vision-language-action (VLA) systems. These efforts are motivated by the intuition that MLLMs' large-scale language and image pretraining can be effectively transferred to the action output modality. In this work, we explore an alternative paradigm of using large-scale video pretraining as a primary modality for building robot foundation models. Unlike static images and language, videos capture spatio-temporal sequences of states and actions in the physical world that are naturally aligned with robotic behavior. We curate an internet-scale video dataset of human activities and task demonstrations, and train, for the first time at a foundation-model scale, an open video model for generative robotics planning. The model produces zero-shot video plans for novel scenes and tasks, which we post-process to extract executable robot actions. We evaluate task-level generalization through third-party selected tasks in the wild and real-robot experiments, demonstrating successful physical execution. Together, these results show robust instruction following, strong generalization, and real-world feasibility. We release both the model and dataset to support open, reproducible video-based robot learning. Our website is available at https://www.boyuan.space/large-video-planner/.
comment: 29 pages, 16 figures
VLA-GSE: Boosting Parameter-Efficient Fine-Tuning in VLA with Generalized and Specialized Experts
Vision-language-action (VLA) models inherit rich visual-semantic priors from pre-trained vision-language backbones, but adapting them to robotic control remains challenging. Full fine-tuning (FFT) is prone to overfitting on downstream robotic data and catastrophic forgetting of pretrained vision-language capabilities. Parameter-efficient fine-tuning (PEFT) better preserves pre-trained knowledge, yet existing PEFT methods still struggle to adapt effectively to robot control tasks. To address this gap, we propose VLA-GSE, a parameter-efficient VLA fine-tuning framework that improves control adaptation while retaining PEFT's knowledge preservation advantage. Specifically, VLA-GSE (Generalized and Specialized Experts) is initialized by spectrally decomposing the frozen backbone, assigning leading singular components to generalized experts (shared experts) and disjoint residual components to specialized experts (routed experts). This decomposition improves adaptation capacity under a fixed trainable-parameter budget. Under a comparable parameter budget, VLA-GSE updates only 2.51% of the full model parameters and consistently outperforms strong FFT and PEFT baselines. It achieves 81.2% average zero-shot success on LIBERO-Plus, preserves pre-trained VLM capability comparably to LoRA on multimodal understanding benchmarks, and improves real-world manipulation success under multiple distribution shifts. Code is available at: https://github.com/YuhuaJiang2002/VLA-GSE
AGILE: Hand-Object Interaction Reconstruction from Video via Agentic Generation SIGGRAPH 2026
Reconstructing dynamic hand-object interactions from monocular videos is critical for dexterous manipulation data collection and creating realistic digital twins for robotics and VR. However, current methods face two prohibitive barriers: (1) reliance on neural rendering often yields fragmented, non-simulation-ready geometries under heavy occlusion, and (2) dependence on brittle Structure-from-Motion (SfM) initialization leads to frequent failures on in-the-wild footage. To overcome these limitations, we introduce AGILE, a robust framework that shifts the paradigm from reconstruction to agentic generation for interaction learning. First, we employ an agentic pipeline where a Vision-Language Model (VLM) guides a generative model to synthesize a complete, watertight object mesh with high-fidelity texture, independent of video occlusions. Second, bypassing fragile SfM entirely, we propose a robust anchor-and-track strategy. We initialize the object pose at a single interaction onset frame using a foundation model and propagate it temporally by leveraging the strong visual similarity between our generated asset and video observations. Finally, a contact-aware optimization integrates semantic, geometric, and interaction stability constraints to enforce physical plausibility. Extensive experiments on HO3D, DexYCB, ARCTIC, and in-the-wild videos reveal that AGILE outperforms baselines in global geometric accuracy while demonstrating exceptional robustness on challenging sequences where prior arts frequently collapse. By prioritizing physical validity, our method produces simulation-ready assets validated via real-to-sim retargeting for robotic applications. Project page: https://agile-hoi.github.io.
comment: 16 pages, SIGGRAPH 2026
Continually Evolving Skill Knowledge in Vision Language Action Model
Vision-language-action (VLA) models show promising knowledge accumulation ability from pretraining, yet continual learning in VLA remains challenging, especially for efficient adaptation. Existing continual imitation learning (CIL) methods often rely on additional parameters or external modules, limiting scalability for large VLA models. We propose Stellar VLA, a knowledge-driven CIL framework without increasing network parameters. Two progressively extended variants are designed: T-Stellar for flat task-centric modeling and TS-Stellar for hierarchical task-skill structure. Stellar VLA enables self-evolving knowledge learning by jointly optimizing task representations and a learned knowledge space. We propose a knowledge-guided expert routing mechanism conditioned on knowledge relation and Top-K semantic embeddings, enabling task specialization without increasing model size. Experiments on the LIBERO benchmark show that Stellar VLAs achieve strong performance among both VLA and CIL baselines, using only 1 % data replay. Real-world evaluation on a dual-arm platform with distinct embodiment and scene configurations validates effective knowledge transfer. TS-Stellar excels in hierarchical manipulation, and visualizations reveal robust knowledge retention and task discovery. Project Website: https://stellarvla.github.io/
MolmoAct2: Action Reasoning Models for Real-world Deployment
Vision-Language-Action (VLA) models aim to provide a single generalist controller for robots, but today's systems fall short on the criteria that matter for real-world deployment. Frontier models are closed, open-weight alternatives are tied to expensive hardware, reasoning-augmented policies pay prohibitive latency for their grounding, and fine-tuned success rates remain below the threshold for dependable use. We present MolmoAct2, a fully open action reasoning model built for practical deployment, advancing its predecessor along five axes. We introduce MolmoER, a VLM backbone specialized for spatial and embodied reasoning, trained on a 3.3M-sample corpus with a specialize-then-rehearse recipe. We release three new datasets spanning low-to-medium cost platforms, including MolmoAct2-BimanualYAM, 720 hours of teleoperated bimanual trajectories that constitute the largest open bimanual dataset to date, together with quality-filtered Franka (DROID) and SO100/101 subsets. We provide OpenFAST, an open-weight, open-data action tokenizer trained on millions of trajectories across five embodiments. We redesign the architecture to graft a flow-matching continuous-action expert onto a discrete-token VLM via per-layer KV-cache conditioning. Finally, we propose MolmoThink, an adaptive-depth reasoning variant that re-predicts depth tokens only for scene regions that change between timesteps, retaining geometric grounding at a fraction of prior latency. In the most extensive empirical study of any open VLA to date, spanning 7 simulation and real-world benchmarks, MolmoAct2 outperforms strong baselines including Pi-05, while MolmoER surpasses GPT-5 and Gemini Robotics ER-1.5 across 13 embodied-reasoning benchmarks. We release model weights, training code, and complete training data. Project page: https://allenai.org/blog/molmoact2
comment: 31 pages, project page: https://allenai.org/blog/molmoact2
Governed Capability Evolution: Lifecycle-Time Compatibility Checking and Rollback for AI-Component-Based Systems, with Embodied Agents as Case Study
Software systems built from versioned AI components increasingly need lifecycle-time governance: when a capability module evolves into a new version, the hosting system must decide whetmeher the new version may be activated safely, under what deployment conditions, with what monitoring, and when it should be rolled back. Existing software-deployment patterns (canary, blue-green, feature flags, MLOps pipelines) address parts of this loop but were designed for stateless web services rather than stateful, policy-constrained runtimes that drive AI components in the field. We study this problem in the setting of embodied agents, where capabilities are packaged as installable modules under runtime policy and recovery constraints. We formulate governed capability evolution as a first-class software-lifecycle problem for AI-component-based systems and propose a staged upgrade framework that treats every new capability version as a governed deployment candidate rather than an immediate replacement. The framework introduces four compatibility checks (interface, policy, behavioral, recovery) and organizes them into a staged pipeline of candidate validation, sandbox evaluation, shadow deployment, gated activation, online monitoring, and rollback. A reference prototype on a PyBullet/ROS 2 testbed evaluated over 6 upgrade rounds with 15 random seeds shows naive upgrade reaches 72.9% task success but drives unsafe activation to 60% by the final round, while governed upgrade retains comparable success (67.4%) with zero unsafe activations across all rounds (Wilcoxon p=0.003). Shadow deployment surfaces 40% of regressions invisible to sandbox alone, and rollback succeeds in 79.8% of post-activation drift scenarios. The work extends runtime governance from action execution to capability evolution.
comment: 42 pages, 5 figures, 10 tables, 7 appendices
Latent Reasoning VLA: Latent Thinking and Prediction for Vision-Language-Action Models ICML 2026
Vision-Language-Action (VLA) models benefit from chain-of-thought (CoT) reasoning, but existing approaches incur high inference overhead and rely on discrete reasoning representations that mismatch continuous perception and control. We propose Latent Reasoning VLA (LaRA-VLA), a unified VLA framework that internalizes multi-modal CoT reasoning into continuous latent representations for embodied action. LaRA-VLA performs unified reasoning and prediction in latent space, eliminating explicit CoT generation at inference time and enabling efficient, action-oriented control. To realize latent embodied reasoning, we introduce a curriculum-based training paradigm that progressively transitions from explicit textual and visual CoT supervision to latent reasoning, and finally adapts latent reasoning dynamics to condition action generation. We construct two structured CoT datasets and evaluate LaRA-VLA on both simulation benchmarks and long-horizon real-robot manipulation tasks. Experimental results show that LaRA-VLA consistently outperforms state-of-the-art VLA methods while reducing inference latency by up to 90\% compared to explicit CoT-based approaches, demonstrating latent reasoning as an effective and efficient paradigm for real-time embodied control. Project Page: https://loveju1y.github.io/Latent-Reasoning-VLA/
comment: Accepted by ICML 2026
DisCo-FLoc: Semantic-Free Floorplan Localization via $SE(2)$-Aware Contrastive Disambiguation
Visual Floorplan Localization (FLoc) struggles with severe structural aliasing caused by repetitive minimalist layouts. This occurs because physically distant poses share highly similar visual-geometric features, which degrades spatial separability and angular discriminability. While existing methods attempt to mitigate these ambiguities by relying on costly semantic annotations, the resulting performance gains remain inherently limited. To address the above issues, we propose DisCo-FLoc, a semantic-free method for visual-geometric Contrastive Disambiguation. First, we introduce a depth-aware Ray Regression Predictor (RRP) that serves as a dense-to-ray geometric projector. By explicitly suppressing visual clutter along the vertical dimension, RRP projects monocular RGB images into 2D ray primitives, which are matched with floorplans to produce geometry-aware FLoc candidates. Second, to resolve the remaining ambiguity among these candidates, we propose a spatially perturbed contrastive objective to align RGB images with local floorplan structures and formulate a visual-geometric compatibility function. In particular, we meticulously construct positive and negative samples at both positional and directional levels through $SE(2)$ pose perturbations for contrastive learning, effectively achieving pose smoothness, spatial separability, and angular discriminability. The compatibility function enables DisCo-FLoc to disambiguate FLoc by using richer visual context beyond pure geometric layouts, without requiring any semantic annotations. Extensive experiments on two challenging visual FLoc benchmarks demonstrate that DisCo-FLoc significantly outperforms state-of-the-art semantic-based methods, especially narrowing the performance gap between positional and directional FLoc accuracy.
comment: 9 pages, 3 figures
Agent-Centric Observation Adaptation for Robust Visual Control under Dynamic Perturbations
Real-world visual systems face time-varying perturbations, including weather, sensor noise, compression artifacts, and background distractions. Existing image restoration methods are typically designed for fixed corruption types and optimized for pixel-level fidelity, leaving open two questions: how restoration behaves under non-stationary corruption switching, and whether pixel-level fidelity preserves the task-relevant information needed by downstream models. To study this setting, we introduce the Visual Degraded Control Suite (VDCS), a benchmark that injects Markov-switching physical degradations into rendered scenes. We further identify a fundamental failure mode of reconstruction-based representations: faithfully reconstructing corrupted observations forces the latent state to encode corruption-specific nuisance information, thereby contaminating downstream models. From an information-bottleneck perspective, anchoring the representation to the clean foreground eliminates this contamination. Motivated by this analysis, we propose \emph{Agent-Centric Observations with Mixture-of-Experts} (ACO-MoE), a frozen, plug-and-play observation adapter that combines a routed bank of restoration experts with a foreground-mask branch. ACO-MoE is pretrained entirely offline on synthetic rendered data with automatically generated degradation pairs and simulation-derived foreground masks, requiring no manual annotation. At inference time, it takes only corrupted RGB as input without corruption labels, clean reference frames, or foreground masks. Across VDCS, DMC-GB, and RoboSuite, ACO-MoE consistently improves downstream control with both model-free and model-based backbones, recovering 95.3\% of clean-input performance under challenging Markov-switching corruptions. It also generalizes zero-shot to unseen visual perturbations excluded from adapter pretraining.
comment: Source code is available at https://github.com/fangzr/aco-moe-code
Generalised Linear Models in Deep Bayesian RL with Learnable Basis Functions
Bayesian Reinforcement Learning (BRL), a subclass of Meta-Reinforcement Learning (Meta-RL), provides a principled framework for generalisation by explicitly incorporating Bayesian task parameters into transition and reward models. However, classical BRL methods assume known forms of transition and reward models. While recent deep BRL methods incorporate model learning to address this, applying neural networks directly to joint data and task parameters necessitates variational inference. This often yields indistinct task representations, compromising the resulting BRL policies. To overcome these limitations, we introduce Generalised Linear Models in Deep Bayesian RL with Learnable Basis Functions (GLiBRL). Our approach features fully tractable Bayesian inference over task parameters and model noise, alongside exact marginal likelihood evaluation for learning transition and reward models. The permutation-invariant nature of exact Bayesian inference in GLiBRL enables seamless integration with both on-policy and off-policy RL algorithms. We further show that GLiBRL admits a closed-form relationship between the $\mathcal{L}_2$ distance of its task representations and empirical kernel-based correspondence between task samples, which is to our knowledge the first such structural result for online deep BRL. GLiBRL is compared against representative and recent Meta-RL methods, and improves state-of-the-art performance on both MuJoCo and MetaWorld benchmarks by up to 1.8$\times$.
Contact-Grounded Policy: Dexterous Visuotactile Policy with Generative Contact Grounding
Contact-rich dexterous manipulation with multi-finger hands remains an open challenge in robotics because task success depends on multi-point contacts that continuously evolve and are highly sensitive to object geometry, frictional transitions, and slip. Recently, tactile-informed manipulation policies have shown promise. However, most use tactile signals as additional observations rather than modeling contact state or how their action outputs interact with low-level controller dynamics. We present Contact-Grounded Policy (CGP), a visuotactile policy that grounds multi-point contacts by predicting coupled trajectories of actual robot state and tactile feedback, and using a learned contact-consistency mapping to convert these predictions into executable target robot states for a compliance controller. CGP consists of two components: (i) a conditional diffusion model that forecasts future robot state and tactile feedback in a compressed latent space, and (ii) a learned contact-consistency mapping that converts the predicted robot state-tactile pair into executable targets for a compliance controller, enabling it to realize the intended contacts. We evaluate CGP using a physical four-finger Allegro V5 hand with Digit360 fingertip tactile sensors, and a simulated five-finger Tesollo DG-5F hand with dense whole-hand tactile arrays. Across a range of dexterous tasks including in-hand manipulation, delicate grasping, and tool use, CGP outperforms visuomotor and visuotactile diffusion-policy baselines.
Multiagent Systems
The Memory Curse: How Expanded Recall Erodes Cooperative Intent in LLM Agents
Context window expansion is often treated as a straightforward capability upgrade for LLMs, but we find it systematically fails in multi-agent social dilemmas. Across 7 LLMs and 4 games over 500 rounds, expanding accessible history degrades cooperation in 18 of 28 model--game settings, a pattern we term the memory curse. We isolate the underlying mechanism through three analyses. First, lexical analysis of 378,000 reasoning traces associates this breakdown with eroding forward-looking intent rather than rising paranoia. We validate this using targeted fine-tuning as a cognitive probe: a LoRA adapter trained exclusively on forward-looking traces mitigates the decay and transfers zero-shot to distinct games. Second, memory sanitization holds prompt length fixed while replacing visible history with synthetic cooperative records, which restores cooperation substantially, proving the trigger is memory content, not length alone. Finally, ablating explicit Chain-of-Thought reasoning often reduces the collapse, showing that deliberation paradoxically amplifies the memory curse. Together, these results recast memory as an active determinant of multi-agent behavior: longer recall can either destabilize or support cooperation depending on the reasoning patterns it elicits.
Nash without Numbers: A Social Choice Approach to Mixed Equilibria in Context-Ordinal Games
Nash equilibrium serves as a fundamental mathematical tool in economics and game theory. However, it classically assumes knowledge of player utilities, whereas economics generally regards preferences as more fundamental. To leverage equilibrium analysis in strategic scenarios, one must first elicit numerical utilities consistent with player preferences, a delicate and time-consuming process. In this work, we forgo precise utilities and generalize the Nash equilibrium to a setting where we only assume a player is capable of providing an ordinal ranking of their actions within the context of other players' joint actions. The key technical challenge is to rethink the definition of a best-response. While the classical definition identifies actions maximizing expected payoff, we naturally look towards social choice theory for how to aggregate preferences to identify the most preferred actions. We define this generalized notion of a context-ordinal Nash equilibrium, establish its existence under mild conditions on aggregation methods, introduce notions of regularization, approximation, and regret, explore complexity for simple settings, and develop learning rules for computing such equilibria. In doing so, we provide a generalization of Nash equilibrium and demonstrate its direct applicability to elicited preferences in human experiments.
TraceFix: Repairing Agent Coordination Protocols with TLA+ Counterexamples
We present TraceFix, a verification-first pipeline for Large Language Model (LLM) multi-agent coordination. An agent synthesizes a protocol topology as a structured intermediate representation (IR) from a task description, generates PlusCal coordination logic, and iteratively repairs the protocol using counterexamples from the TLA+ model checker (TLC) until verification succeeds. Verified process bodies are compiled into per-agent system prompts and executed under a runtime monitor that rejects out-of-topology coordination operations. On 48 tasks spanning 16 scenario families, all tasks reach full TLC verification; 62.5% pass on the first attempt and none requires more than four repair iterations. State spaces span six orders of magnitude yet verification completes in under 60 s for every task. A 3,456-run runtime comparison shows that topology-monitored execution achieves the highest task completion (89.4% average, 81.5% full) and that runtimes using the verified protocol degrade at roughly half the rate of prompt-only and chat-only baselines when model capability is reduced. A paired ablation under a fixed runtime shows that TLC-verified protocols cut deadlock/livelock (DL/LL) from 31.1% to 14.1%, with the largest separation under fault injection.
Many-to-Many Multi-Agent Pickup and Delivery
Multi-robot systems in automated warehouses must manage continuous streams of pickup-and-delivery tasks while ensuring efficiency and safety. Prior work on Multi-Agent Pickup-and-Delivery (MAPD) has largely focused on the one-to-one variant, where each task has a fixed pickup and delivery location. In contrast, real warehouses often present many-to-many MAPD scenarios, where items, tracked by stock keeping unit (SKU) identifiers, can be retrieved from or stored at multiple locations, resulting in an NP-hard four-dimensional assignment problem. To solve the many-to-many MAPD problem, we contribute our algorithm: Many-to-Many Multi-Agent Pickup and Delivery (M2M). We experiment with two variants of our algorithm: one that minimizes estimated task durations (M2M), and one which incorporates SKU distribution into the objective function (M2M-wSKU). Simulation results over 8-hour warehouse operations show that our method consistently matches or outperforms prior state of the art, with M2M completing up to 22,000 more tasks on average across different environments and warehouse inventory densities.
Emergence of Social Reality of Emotion through a Social Allostasis Model with Dynamic Interpretants
The theory of constructed emotion defines social reality as the community-level consensus on emotion concepts assigned to interoceptive sensations arising from bodily allostasis and social interaction. In this study, we simulate this emergence process using a computational model that integrates symbol emergence with degrees of freedom in symbol interpretation and active inference. Two agents receive interoceptive signals, exchange inferred symbols, and simultaneously adapt their bodily control goals and symbol interpretations to each other. Experimental results show that the interoceptive prior preferences and symbol probability distributions of the two agents converge, confirming the emergence of social reality grounded in social consensus.
comment: 10 pages, 4 figures
The Endogeneity of Miscalibration: Impossibility and Escape in Scored Reporting
Eliciting truthful reports from autonomous agents is a core problem in scalable AI oversight: a principal scores the agent's report using a strictly proper scoring rule, but the agent also benefits from the report through a non-accuracy channel (approval for autonomous action, allocation share, downstream control). The same structure appears in classical mechanism-design settings such as marketplace operation. Our main result is an endogeneity: the principal's optimal oversight necessarily uses a non-affine approval function to screen types, yet any non-affine approval makes truthful reporting suboptimal under the combined objective whenever deviation is undetectable. The principal cannot avoid the perturbation that undermines calibration. This impossibility holds for all strictly proper scoring rules, with a closed-form perturbation formula. A constructive escape exists: a step-function approval threshold achieves first-best screening for every strictly proper scoring rule, because the agent's binary inflate-or-not choice creates a type-space threshold regardless of the generator's curvature. Under the Brier score specifically, the type-independent inflation cost yields a welfare equivalence between second-best and first-best; we prove this equivalence is unique to Brier (the welfare gap under smooth $C^1$ oversight is bounded below by $Ω(\text{Var}(1/G'') (γ/β)^2)$ for every non-Brier rule). Two instances develop the framework: AI agent oversight (the lead motivating setting) and marketplace operation (a parallel mechanism-design domain). The message for AI alignment is direct: smooth scoring-based oversight cannot elicit truthful reports from a strategic agent; sharp thresholds are the calibration-preserving design.
comment: 38 pages, no figures. Targeting ACM Transactions on Economics and Computation (TEAC); preprint
Learning to Communicate Locally for Large-Scale Multi-Agent Pathfinding
Multi-agent pathfinding (MAPF) is a widely used abstraction for multi-robot trajectory planning problems, where multiple homogeneous agents move simultaneously within a shared environment. Although solving MAPF optimally is NP-hard, scalable and efficient solvers are critical for real-world applications such as logistics and search-and-rescue. To this end, the research community has proposed various decentralized suboptimal MAPF solvers that leverage machine learning. Such methods frame MAPF (from a single agent perspective) as a Dec-POMDP where at each time step an agent has to decide an action based on the local observation and typically solve the problem via reinforcement learning or imitation learning. We follow the same approach but additionally introduce a learnable communication module tailored to enhance cooperation between agents via efficient feature sharing. We present the Local Communication for Multi-agent Pathfinding (LC-MAPF), a generalizable pre-trained model that applies multi-round communication between neighboring agents to exchange information and improve their coordination. Our experiments show that the introduced method outperforms the existing learning-based MAPF solvers, including IL and RL-based approaches, across diverse metrics in a diverse range of (unseen) test scenarios. Remarkably, the introduced communication mechanism does not compromise LC-MAPF's scalability, a common bottleneck for communication-based MAPF solvers.
Synchronizing Minds through Collective Predictive Coding: A Computational Model of Parent-Infant Homeostatic Co-Regulation
Inter-brain synchrony (IBS) observed in real-time dyadic interactions, including parent--infant exchanges, suggests that two agents come to share aligned latent representations through interaction. Yet computational accounts of how such alignment can arise between agents that have only local sensory access and asymmetric internal knowledge remain underdeveloped. We propose a constructive model of parent--infant homeostatic co-regulation that integrates a POMDP formulation of active interoceptive inference with the Metropolis--Hastings Naming Game (MHNG) derived from the Collective Predictive Coding (CPC) hypothesis. In our model, the parent observes the infant only through an exteroceptive signal while the infant directly senses its own interoceptive state; the two agents agree on regulatory actions through a shared communicative variable whose acceptance is determined by a locally computable Metropolis--Hastings probability. The agents are further endowed with asymmetric generative-model knowledge: the parent knows how actions transform visceral states but must learn what the infant's body is communicating, whereas the infant perceives its visceral state directly but must learn how actions affect it. In a $6 \times 6$ visceral-state grid world, MHNG-mediated interaction regulated the infant's visceral state more adaptively than one-sided control conditions, and the two posteriors became rapidly aligned. Notably, this latent-state alignment emerged far earlier than the convergence of the learned generative matrices, indicating that representational synchrony does not presuppose fully shared world models. These results offer a minimal constructive account of latent-state alignment compatible with IBS reported in hyperscanning studies and support CPC as a candidate computational basis for inter-brain alignment.
comment: 9pages, 4figures
HBEE: Human Behavioral Entropy Engine -- Pre-Registered Multi-Agent LLM Simulation of Peer-Suspicion-Based Detection Inversion
Insider threat detection assumes that an adaptive insider leaves behavioral residue distinguishing them from legitimate users. We test this assumption against an LLM-driven adaptive insider in a controlled multi-agent simulator. Our pre-registered five-condition study isolates defender mode (cascade vs. blind UEBA) crossed with adversary type (naive vs. adaptive OPSEC) plus a no-mole control, across 100 runs (95 valid after pre-committed exclusions). The primary finding is a detection inversion: at T_60, the adaptive mole's suspicion in-degree is statistically lower than a randomly selected innocent agent (Cliff's delta = -0.694, 95% BCa CI [-0.855, -0.519], Mann-Whitney p << 0.01). The pre-registered prediction was the opposite direction. A pre-registered equivalence test (H2) shows adaptive OPSEC produces no detectable shift in the mole's UEBA rank under either defender mode. The two detection signals (peer suspicion graph in-degree and per-agent UEBA rank) decouple under adaptive adversary behavior. We bound generalization explicitly: a pre-registered Gini calibration check (H4) returns FAIL, with HBEE pairwise message-exposure Gini (0.213) diverging from the SNAP Enron reference (0.730) by |Delta Gini| = 0.52, exceeding the equivalence bound by 5x. The paper makes a narrow but surprising claim: in a controlled environment where adaptive OPSEC is implementable as an LLM directive, peer-suspicion-cascade detection inverts. We release the simulator, pre-registration document, frozen scenarios, raw telemetry, and analysis pipeline under an open-source license.
comment: 14 pages, 6 figures. Pre-registration document and full deviation log included in artifact
OrchJail: Jailbreaking Tool-Calling Text-to-Image Agents by Orchestration-Guided Fuzzing
Tool-calling text-to-image (T2I) agents can plan and execute multi-step tool chains to accomplish complex generation and editing queries. However, this capability introduces a new safety attack surface: harmful outputs may arise from tool orchestration, where individually benign steps combine into unsafe results, making prompt-only jailbreak techniques insufficient. We present OrchJail, an orchestration-guided fuzzing framework for jailbreaking tool-calling T2I agents. Its core idea is to exploit high-risk tool-orchestration patterns: by learning from successful jailbreak tool-calling traces and their causal relationships to prompt wording, OrchJail directly guides the fuzzing search toward prompts that are more likely to trigger unsafe multi-step tool behaviors, rather than relying on surface-level textual perturbations. Extensive experiments demonstrate that OrchJail improves jailbreak effectiveness and efficiency across representative toolcalling T2I agents, achieving higher attack success rates, better image fidelity, and lower query costs, while remaining robust against common jailbreak defenses. Our work highlights tool orchestration as a critical, previously unexplored attack surface and provides a novel framework for uncovering safety risks in T2I agents.
MORPH-U: Multi-Objective Resilient Motion Planning for V2X-Enabled Autonomous Driving in High-Uncertainty Environments via Simulation
V2X can warn an autonomous vehicle about hazards beyond line-of-sight, but it also brings uncertainty: messages may be delayed, dropped, or even forged. Meanwhile, map knowledge may change during a trip, forcing the vehicle to replan under tight real-time budgets. This paper studies how to make motion planning and low-level control robust to such uncertain, event-driven updates. We present MORPH-U, a CARLA-based closed-loop stack that fuses LiDAR/radar/camera with V2X (CAM/DENM) into a Local Dynamic Map (LDM) and triggers Hybrid-A* replanning when validated hazards or map changes affect the planned route. We expose the planning/control trade-offs via a multi-objective formulation over tracking error, safety margin (minimum TTC), responsiveness, and smoothness, and select operating points using Pareto-frontier analysis. To avoid unsafe replanning from faulty V2X triggers, MORPH-U adds a lightweight Byzantine-inspired acceptance gate that combines a quorum rule with an on-board sensor veto. Experiments in dynamic CARLA scenarios show that V2X-augmented LDM improves downstream safety, Pareto tuning provides controllable accuracy-comfort trade-offs, and the gate prevents replanning under saturated false-DENM injection ($p_{\text{attack}}=1.0$).
Rethinking Priority Scheduling for Sequential Multi-Agent Decision Making in Stackelberg Games
Current research applying N-level Stackelberg Game to multi-agent systems often uses the default decision order of agents provided by the environment. However, this raises the question: does the order of agents necessarily affect the final equilibrium point of the game? To address this, we formally analyze the N-level Stackelberg Game, where changing the order in which agents make decisions typically leads to an overdetermined system. As a result, the equilibrium point shifts unless special structural conditions are satisfied. Based on this analysis, we propose the Hierarchical Priority Adjustment (HPA) method, which adjusts and selects the agents' decision order. At the upper level, an upper policy dynamically selects the optimal decision order of agents based on the current game state. At the lower level, agents execute strategies in the Spatio-Temporal Sequential Markov Game (STMG) according to the selected order. To coordinate learning across time scales, we employ a slow-fast update scheme with shared intrinsic rewards derived from the advantage function of the upper policy. Experimental results on high-precision control tasks, including multi-agent MuJoCo, show that HPA outperforms benchmark algorithms and robustly adapts to changing environments. These results highlight the crucial role of optimizing the agents' decision order in N-level Stackelberg Game.
Switchcraft: AI Model Router for Agentic Tool Calling
Agentic AI systems that invoke external tools are powerful but costly, leading developers to default to large models and overspend inference budgets. Model routing can mitigate this, but existing routers are designed for chat completion rather than tool use. We present Switchcraft, the first (to the best of our knowledge) model router optimized for agentic tool calling. Switchcraft operates inline, selecting the lowest-cost model subject to correctness. We construct an evaluation framework on five function-calling benchmarks and train a DistilBERT-based classifier, deployed under a latency budget. Switchcraft achieves 82.9% accuracy -- matching or exceeding the best individual model -- while reducing inference cost by 84%, saving over $3,600 per million queries. We find that larger models do not consistently outperform smaller ones on tool-use tasks, and that nominally cheaper models can incur higher total cost due to token-intensive reasoning. Our work enables cost-aware agentic AI deployment without sacrificing correctness.
ARMOR: An Agentic Framework for Reaction Feasibility Prediction via Adaptive Utility-aware Multi-tool Reasoning
Reaction feasibility prediction, as a fundamental problem in computational chemistry, has benefited from diverse tools enabled by recent advances in artificial intelligence, particularly large language models. However, the performance of individual tools varies substantially across reactions, making it difficult for any single tool to consistently perform well across all cases. This raises a critical challenge: how to effectively leverage multiple tools to obtain more accurate feasibility predictions. To address this, we propose ARMOR, an agentic framework that explicitly models tool-specific utilities, adaptively prioritizes tools, and further resolves the potential tool conflicts to produce the final prediction for each reaction. Unlike existing approaches that rely on simple aggregation or heuristic assignment over various tools, ARMOR organizes tools into a hierarchy that prioritizes top-performing tools and defers others when needed, characterizes their strengths through tool-specific patterns, and resolves conflicts via memoryaugmented reasoning. Extensive experiments on a public dataset demonstrate that ARMOR consistently outperforms strong baselines, including single-tool methods as well as various tool aggregation and tool selection approaches. Further analysis shows that the improvements are particularly significant on reactions with conflicting tool predictions, highlighting the effectiveness of ARMOR in leveraging the complementary strengths of multiple tools. The code is available via https://anonymous.4open.science/r/ARMOR-E13F.
Decentralized Diffusion Policy Learning for Enhanced Exploration in Cooperative Multi-agent Reinforcement Learning
Cooperative multi-agent reinforcement learning (MARL) involves complex agent interactions and requires effective exploration strategies. A prominent class of MARL algorithms, decentralized softmax policy gradient (DecSPG), addresses this through energy-based policy updates. In practice, however, such energy-based policies are intractable to maintain and are commonly projected onto the Gaussian policy class. In this work, we show that the limited expressiveness of Gaussian policies severely hinders exploration in DecSPG, and this limitation worsens as the number of agents grows. To address this issue, we propose decentralized diffusion policy learning (DDPL), which parameterizes each agent's policy with a denoising diffusion probabilistic model, an expressive generative model that captures multi-modal action distributions for enhanced exploration. DDPL enables efficient online training of diffusion policies via importance sampling score matching (ISSM), a novel training method with theoretical guarantee. We evaluate DDPL on representative continuous-action MARL benchmarks, including multi-agent particle environment, multi-agent MuJoCo, IsaacLab, and JAX-reimplemented StarCraft multi-agent challenge, and observe consistently improved performance.
Social Theory Should Be a Structural Prior for Agentic AI: A Formal Framework for Multi-Agent Social Systems
Agentic AI systems are increasingly deployed not in isolation, but inside social environments populated by other agents and humans, such as in social media platforms, multi-agent LLM pipelines or autonomous robotics fleets. In these settings, system behavior emerges not from individual agents alone, but from the multi-agent interactions over time. Emergent dynamics of individuals in a social group have been long studied by social scientists in human contexts. \textbf{This position paper argues that agentic AI systems must be modeled with social theory as a structural prior, and formalizes a Multi-Agent Social Systems (MASS) framework for how agents interact and influence to generate system-level outcomes.} We represent MASS as a class of dynamical system of information generation, local influence and interaction structure, formulated by four structural priors anchored in social theory: strategic heterogeneity, networked-constrained dependence, co-evolution and distributional instability. We demonstrate the importance of each structural prior through formal propositions, and articulate a research agenda for how MASS should be modeled, evaluated and governed.
Too Many Specialists: Emergent Inefficiencies and Bottlenecks for Multi-agent Ad-hoc Collaboration AAMAS 2026
Computational models of collaboration without prior coordination often overlook how heterogeneous agent traits and complex task structures jointly produce systemic bottlenecks, inefficiencies, and contribution inequalities. We address this by using an agent-based model of ad-hoc teamwork in a kitchen environment. Our model integrates diverse agent personas with tasks that combine serial and parallel dependencies. We identify a specialist's dilemma, where rigid role assertion generates system-level bottlenecks, amplifies workload inequality, and fosters fragmented, homophilous networks. We also find that team size and communication overhead interact with problem structure to generate diminishing returns and redundant collaboration. Linking micro-level behavior to macro-level outcomes provides insights into emergent collaboration and design principles for effective multi-agent teamwork.
comment: Published in Proceedings of Proc. of the 25th International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2026)
SceneFactory: GPU-Accelerated Multi-Agent Driving Simulation with Physics-Based Vehicle Dynamics
Autonomous-driving simulators typically trade physical fidelity for scalable parallelism. Physics-based platforms such as CARLA and MetaDrive provide articulated vehicle dynamics and contact, but their non-vectorized interfaces make batched training difficult. GPU-batched systems such as Waymax and GPUDrive scale to hundreds of scenarios by replacing rigid-body physics with simplified kinematic models, omitting tire--road interaction, suspension, contact dynamics, and road-condition-dependent friction. We introduce SceneFactory, a GPU-vectorized platform for procedural scene construction, physics-based multi-agent simulation, and RL in autonomous-driving environments. Built on NVIDIA Isaac Sim + Isaac Lab, SceneFactory represents worlds and agents as batched tensors: control, observations, rewards, resets, and policy inference run as GPU tensor operations over the Isaac Lab tensor API. SceneFactory converts Waymo Open Motion Dataset road topologies into simulation-ready USD worlds, runs many worlds concurrently on one GPU, populates each with multiple articulated PhysX vehicles, and maps precipitation and road-surface type to PhysX material friction coefficients. With GPU vectorization, SceneFactory achieves up to 127$\times$ higher throughput than a non-vectorized PhysX baseline on the same GPU and physics solver, reaching 19,250 controlled-agent simulation steps per second at 256 worlds $\times$ 16 agents. Cross-simulator transfer reveals an asymmetric dynamics gap: physics-grounded RL policies transfer to a simplified kinematic bicycle model with 99.5% success, whereas reverse transfer drops to 47.3%. Under wet-road friction, friction-aware policies reduce mean peak DRAC from 58.7 to 27.8,m/s$^2$ without sacrificing goal reach. SceneFactory shows that scalable autonomous-driving training need not discard articulated rigid-body dynamics or physically grounded road-condition variation.
LLM Wardens: Mitigating Adversarial Persuasion with Third-Party Conversational Oversight
LLMs are increasingly capable of persuasion, which raises the question of how to protect users against manipulation. In a preregistered user study (N=120) across four decision-making scenarios, we find that an adversarial LLM with a hidden goal succeeds in steering users' decisions 65.4% of the time. We then introduce a "warden" model: a secondary LLM that monitors the human-AI interaction trace in real time and issues non-binding, private advisories to the user when it detects manipulation. Adding a warden more than halves the adversary's success rate to 30.4%, with a much smaller (8.6 percentage points) reduction for genuine interactions. To probe the mechanism behind these results, we release COAX-Bench, a simulation benchmark spanning 14 decision-making scenarios, including hiring, voting, and file access. Across 16,212 simulated multi-agent interactions, capable adversarial LLMs achieve their hidden goals in 34.7% of cases, which warden models reduce to 12.3%. Notably, even warden models substantially weaker than the adversary they oversee provide meaningful protection, suggesting a path for scalable oversight of more capable models.
Insider Attacks in Multi-Agent LLM Consensus Systems
Large language models (LLMs) are increasingly deployed in multi-agent systems where agents communicate in natural language to solve tasks jointly. A key capability in such systems is consensus formation, where agents iteratively exchange messages and update decisions to reach a shared outcome. However, most existing multi-agent LLM frameworks assume that all participating agents are aligned with the system objective. In practice, a malicious insider may participate as a legitimate member of the group while pursuing a hidden adversarial goal. In this work, we study insider manipulation in multi-agent LLM consensus systems. We formalize the problem as a sequential decision-making task in which a malicious agent seeks to delay or prevent agreement among benign agents. To make attack optimization tractable, we propose a world-model-based framework that learns surrogate dynamics over the latent behavioral states of benign agents and then trains an attacker using reinforcement learning based on this learned model. Preliminary results show that the trained attacker reduces the benign consensus rate and prolongs disagreement more effectively than the direct malicious-prompt baseline. These results suggest that combining latent world models with reinforcement learning is a promising direction for adaptive insider attacks in language-based multi-agent systems.
From Standalone LLMs to Integrated Intelligence: A Survey of Compound Al Systems
Compound AI Systems (CAIS) are an emerging paradigm that integrates large language models (LLMs) with external components, including retrievers, agents, tools, and orchestrators, to overcome the limitations of standalone models in tasks requiring memory, reasoning, real-time grounding, and multimodal understanding. These systems enable more capable and context-aware behaviors by composing multiple specialized modules into cohesive workflows. Despite growing adoption in both academia and industry, the CAIS landscape remains fragmented and lacks a unified framework for analysis, taxonomy, and evaluation. In this survey, we define the concept of CAIS, propose a multi-dimensional taxonomy based on component roles and orchestration strategies, and analyze four foundational paradigms: Retrieval-Augmented Generation (RAG), LLM Agents, Multimodal LLMs (MLLMs), and Orchestration. We review representative systems, compare design trade-offs, and summarize evaluation methodologies across these paradigms. Finally, we identify key challenges - including scalability, interoperability, benchmarking, and coordination - and outline promising directions for future research. This survey aims to provide researchers and practitioners with a comprehensive foundation for understanding, developing, and advancing the next generation of system-level artificial intelligence.
Distributed Task Allocation for Multi-Agent Systems: A Submodular Optimization Approach
This paper addresses dynamic task allocation in resource-constrained multi-agent systems (MASs) with sequentially updated assignments. We develop a submodular maximization framework integrated with $q$-independence systems, demonstrating greater flexibility than conventional matroid-based constraints for modeling heterogeneous resource limitations. The proposed distributed greedy bundles algorithm (DGBA) addresses communication limitations in MASs while providing rigorous approximation guarantees for submodular maximization under a $q$-independence system constraint, ensuring low computational complexity. DGBA achieves feasible task allocation in polynomial time with reduced space complexity compared to existing methods. Extensive Monte Carlo simulations in a micro-satellite observation scenario demonstrate that DGBA consistently outperforms benchmark algorithms in total utility, resource efficiency, and assignment stability, while maintaining real-time computational feasibility.
Separation Assurance between Heterogeneous Fleets of Small Unmanned Aerial Systems via Multi-Agent Reinforcement Learning
In the envisioned future dense urban airspace, multiple companies will operate heterogeneous fleets of small unmanned aerial systems (sUASs), where each fleet includes several homogeneous aircraft with identical policies and configurations, e.g., equipage, sensing, and communication ranges, making tactical deconfliction highly complex for the aircraft. This paper aims to address two core questions: (1) Can tactical deconfliction policies converge or reach an equilibrium to ensure a conflict-free airspace when companies operate heterogeneous fleets of homogeneous aircraft? (2) If so, will the converged policies discriminate against companies operating sUASs with weaker configurations? We investigate a multi-agent reinforcement learning paradigm in which homogeneous aircraft within heterogeneous fleets operate concurrently to perform package delivery missions over Dallas, Texas, USA. An attention-enhanced Proximal Policy Optimization-based Advantage Actor-Critic (PPOA2C) framework is employed to resolve intra- and inter-fleet conflicts, with each fleet independently training its own policy while preserving privacy. Experimental results show that two fleets with distinct, shared PPOA2C policies can reach an equilibrium to maintain safe separation. While two PPOA2C policies outperform two strong rule-based baselines in terms of conflict resolution, a PPOA2C policy exhibits safer interaction with a rule-based policy, indicating adaptive capabilities of PPOA2C policies. Furthermore, we conducted extensive policy-configuration evaluations, which reveal that equilibria between similar policy types tend to favor fleets with stronger configurations. Even under similar configurations but different policy types, the equilibrium favors one of the heterogeneous policies, underscoring the need for fairness-aware conflict management in heterogeneous sUAS operations.
comment: 8 pages, 3 figure, 1 table
When Does Multi-Agent Collaboration Help? An Entropy Perspective
Multi-agent systems (MAS) have emerged as a prominent paradigm for leveraging large language models (LLMs) to tackle complex tasks. However, the mechanisms governing the effectiveness of MAS built upon publicly available LLMs, specifically the underlying rationales for their success or failure, remain largely unexplored. In this paper, we revisit MAS through the perspective of \textit{entropy}, considering both intra- and inter-agent dynamics by investigating entropy transitions during problem-solving across various topologies, six reasoning benchmarks, and two agentic tasks. By analyzing 245 features spanning token-, agent-, and round-level entropy, we counterintuitively find that a single agent outperforms MAS in approximately 43.3\% of cases, and that entropy dynamics are largely determined during the first round of interaction. Furthermore, we provide three key observations: 1) \textit{Certainty Preference}: peak entropy directly harms and stable entropy directly benefits MAS correctness; 2) \textit{Base Entropy}: base models with lower entropy during problem-solving causally drive MAS performance; and 3) \textit{Task Awareness}: entropy dynamics of MAS play varying roles across different tasks. Building on these insights, we introduce a simple yet effective algorithm, the \textit{Entropy Judger}, to select solutions from MAS's pass@$k$ results, leading to consistent accuracy improvements across all MAS configurations and tasks. Our source code is available at \href{https://github.com/AgenticFinLab/multiagent-entropy}{this https URL}.
comment: arXiv preprint
Dynamic one-time delivery of critical data by small and sparse UAV swarms: a model problem for MARL scaling studies
This work studies the application of Multi-Agent Reinforcement Learning (MARL) to decentralized control of unmanned aerial vehicles to relay a critical data package to a known position. For this purpose, a family of deterministic games is introduced, designed for MARL scaling studies. A robust baseline policy is proposed which restricts agent motion and applies Dijkstra's shortest path algorithm. Computational experiment results show that two off-the-shelf MARL algorithms perform competitively with the baseline for a small number of agents, but face scalability issues as the number of agents increases. Source code and animations are available online at https://github.com/mikapersson/Information-Relaying.
comment: Accepted to the 2026 IFAC World Congress
Active Learning for Communication Structure Optimization in LLM-Based Multi-Agent Systems
Optimizing the communication structure of large language model based multi-agent systems (LLM-MAS) has been shown to improve downstream performance and reduce token usage. Existing methods typically rely on randomly sampled training tasks. However, tasks may differ substantially in difficulty and domain, and thus they are not equally informative for updating communication structure, making optimization under limited training budgets often unstable and highly sensitive to the particular training set. To actively identify the most valuable tasks for communication-structure optimization, we propose an ensemble-based information-theoretic task selection framework. The proposed method estimates task informativeness by how much a candidate task changes the distribution over graph parameters, using ensemble Kalman inversion as an efficient and derivative-free approximation of the corresponding Bayesian update. The resulting estimator is especially suitable for black-box and noisy multi-agent systems. To enhance scalability, we construct a compact candidate pool through embedding-based representative selection and combine the informative selection with surrogate modeling and batch Thompson sampling. We validate our method in both benign settings and settings with agent attacks, demonstrating its effectiveness for communication-structure optimization under constrained computational budgets.
ToolRosella: Translating Code Repositories into Standardized Tools for Scientific Agents
Large Language Model (LLM)-based agent systems are increasingly used for scientific tasks, yet their practical capability remains constrained by the narrow scope of manually curated tools they can invoke. Much scientific computational functionality already exists in open-source code repositories, but these resources remain difficult to standardize, operationalize, and invoke reliably for agent use. Here we present ToolRosella, a framework that automatically transforms heterogeneous scientific code repositories into standardized, agent-invocable tools. ToolRosella combines repository analysis, tool interface construction, execution testing, and iterative repair to address the problem of repository-to-tool standardization. Across 122 GitHub repositories spanning 35 subdisciplines in six domains, ToolRosella reaches a 61.5% repository conversion success rate after iterative repair, with a 4.4 speedup over human engineers. The resulting 1,580 callable tools support a downstream task success rate of 84.0% and improve performance when integrated into other agent frameworks, particularly on tasks whose required tools are absent from fixed, curated inventories.
comment: 20 pages
A Multi-Memory Segment System for Generating High-Quality Long-Term Memory Content in Agents
In the current field of agent memory, extensive explorations have been conducted in the area of memory retrieval, yet few studies have focused on exploring the memory content. Most research simply stores summarized versions of historical dialogues, as exemplified by methods like A-MEM and MemoryBank. However, when humans form long-term memories, the process involves multi-dimensional and multi-component generation, rather than merely creating simple summaries. The low-quality memory content generated by existing methods can adversely affect recall performance and response quality. In order to better construct high-quality long-term memory content, we have designed a multi-memory segment system (MMS) inspired by cognitive psychology theory. The system processes short-term memory into multiple long-term memory segments, and constructs retrieval memory units and contextual memory units based on these segments, with a one-to-one correspondence between the two. During the retrieval phase, MMS will match the most relevant retrieval memory units based on the user's query. Then, the corresponding contextual memory units is obtained as the context for the response stage to enhance knowledge, thereby effectively utilizing historical data. We conducted experiments on the LoCoMo dataset and further performed ablation experiments, experiments on the robustness regarding the number of input memories, and overhead experiments, which demonstrated the effectiveness and practical value of our method.
comment: The content has been significantly revised and the author has also changed. Therefore, the paper will be withdrawn for revision and then uploaded after the completion of the modifications
Caesar: Deep Agentic Web Exploration for Creative Answer Synthesis
To advance from passive retrieval to creative discovery of new ideas, autonomous agents must be capable of deep, associative synthesis. However, current agentic frameworks prioritize convergent search, often resulting in derivative summaries that lack creativity. Caesar is an agentic architecture designed to bridge the gap between information gathering and synthesis of new insights. Unlike existing agents that treat the web as a flat sequence of disconnected documents, Caesar performs a deep web traversal to construct a dynamic knowledge graph. This graph then serves as a navigational scaffold, guiding the agent to diverse, non-obvious information that flat retrieval would never encounter. Caesar thus consists of two components: (1) exploration driven by a dynamic context-aware policy that maximizes information coverage across the web's topological structure, and (2) synthesis through adversarial refinement that actively seeks novel perspectives rather than confirming established priors. Caesar demonstrates the ability to generate artifacts and answers characterized by high novelty and structural coherence, achieving 13% to 23% improvement over state-of-the-art deep research agents in creative synthesis challenges, with strong dominance across all output formats.
LLM-Based Agents for Competitive Landscape Mapping in Drug Asset Due Diligence
In this paper, we describe and benchmark a competitor-discovery component used within an agentic AI system for fast drug asset due diligence. A competitor-discovery AI agent, given an indication, retrieves all drugs comprising the competitive landscape of that indication and extracts canonical attributes for these drugs. The competitor definition is investor-specific, and data is paywalled/licensed, fragmented across registries, ontology-mismatched by indication, alias-heavy for drug names, multimodal, and rapidly changing. Although considered the best tool for this problem, the current LLM-based AI systems aren't capable of reliably retrieving all competing drug names, and there is no accepted public benchmark for this task. To address the lack of evaluation, we use LLM-based agents to transform five years of multi-modal, unstructured diligence memos from a private biotech VC fund into a structured evaluation corpus mapping indications to competitor drugs with normalized attributes. We also introduce a competitor validating LLM-as-a-judge agent that filters out false positives from the list of predicted competitors to maximize precision and suppress hallucinations. On this benchmark, our competitor-discovery agent achieves 83% recall, exceeding OpenAI Deep Research (65%) and Perplexity Labs (60%). The system is deployed in production with enterprise users; in a case study with a biotech VC investment fund, analyst turnaround time dropped from 2.5 days to $\sim$3 hours ($\sim$20x) for the competitive analysis.
ATHENA: Agentic Team for Hierarchical Evolutionary Numerical Algorithms
Bridging the gap between theoretical conceptualization and computational implementation is a major bottleneck in Scientific Computing (SciC) and Scientific Machine Learning (SciML). We introduce ATHENA (Agentic Team for Hierarchical Evolutionary Numerical Algorithms), an agentic framework designed as an Autonomous Lab to manage the end-to-end computational research lifecycle. Its core is the HENA loop, a knowledge-driven diagnostic process framed as a Contextual Bandit problem. Acting as an online learner, the system analyzes prior trials to select structural `actions' ($A_n$) from combinatorial spaces guided by expert blueprints (e.g., Universal Approximation, Physics-Informed constraints). These actions are translated into executable code ($S_n$) to generate scientific rewards ($R_n$). ATHENA transcends standard automation: in SciC, it autonomously identifies mathematical symmetries for exact analytical solutions or derives stable numerical solvers where foundation models fail. In SciML, it performs deep diagnosis to tackle ill-posed formulations and combines hybrid symbolic-numeric workflows (e.g., coupling PINNs with FEM) to resolve multiphysics problems. The framework achieves super-human performance, reaching validation errors of $10^{-14}$. Furthermore, collaborative ``human-in-the-loop" intervention allows the system to bridge stability gaps, improving results by an order of magnitude. This paradigm shift focuses from implementation mechanics to methodological innovation, accelerating scientific discovery.
Formal Policy Enforcement for Real-World Agentic Systems
Security policy enforcement in contemporary agentic systems predominantly consists of embedding natural-language policies within an agent's system prompt and delegating compliance to the agent's reasoning. This approach admits no formal enforcement guarantee and cannot express policies whose satisfaction depends on the causal history of an execution, a gap that becomes acute in multi-agent systems, where enforcement must reason across agents. We argue that policy enforcement in agentic systems is most naturally understood as a cross-cutting concern, and propose a framework grounded in aspect-oriented programming that specifies policies independent of the agent's reasoning and enforces them at every policy-relevant decision. Policies are written in Datalog over a set of abstract predicates describing the execution context, an observability service governed by a formal assume/guarantee contract maintains these predicates, and a reference monitor consults the policy at each action to produce an enforcement decision. When the environment contract holds, enforcement decisions coincide with the policy's intended semantics. We adopt Datalog as the policy language, a natural fit because it supports declarative rule specification, admits recursion for policies over transitive relationships, and yields deterministic enforcement. Datalog further admits tractable static analyses for contradiction, redundancy, subsumption, and conditional reachability, enabling authors to verify policy intent and surface ambiguities inherent in natural-language specifications. We realize the framework in FORGE, which enforces policies over agentic deployments without modification to the underlying agents. We evaluate FORGE on three case studies: information flow policies for prompt injection defense, approval workflows in a multi-agent pharmacovigilance system, and organizational policies for customer service.
Systems and Control (EESS)
Adaptive Domain Decomposition Physics-Informed Neural Networks for Traffic State Estimation with Sparse Sensor Data
Traffic state estimation from sparse fixed sensors is challenging because physics-informed neural networks (PINNs) tend to over-smooth the shockwaves admitted by the Lighthill-Whitham-Richards (LWR) model. This study proposes Adaptive Domain Decomposition Physics-Informed Neural Networks (ADD-PINN), a two-stage residual-guided framework for LWR-based offline speed-field reconstruction. A coarse global PINN is first trained; its spatial residual profile is then used to place subdomain boundaries and initialize child subnetworks in a decomposition-enabled mode, while a data-driven shock indicator can retain a single-domain fallback when localized evidence of transition is weak. The primary offline I-24 MOTION evaluation spans five days, five sensor configurations, and ten seeds per configuration, yielding 1,500 runs in total. Against neural and physics-informed baselines, ADD-PINN attains the lowest relative L2 error in 18 of 25 configurations and in 14 of 15 sparse-sensing cases, while training 2.4 times faster than the extended PINN (XPINN) baseline. An ablation study supports spatial-only decomposition as an effective default for fixed-sensor traffic reconstruction in the evaluated settings. Supplementary Next Generation Simulation (NGSIM) experiments serve as a negative control: the shock indicator suppresses decomposition in all 50 runs, and the default single-domain fallback ranks first across all sensor configurations. These results support residual-guided spatial decomposition as an effective PINN-family design for offline reconstruction when sparse fixed sensing coincides with localized transition regions.
comment: 56 pages, 5 figures, 12 tables. Submitted to Transportation Research Part C
Entropic Value-at-Risk for Inter-Vehicle Collision in Platoons: Network- and Delay-Induced Bounds on Risk Due to Extreme Events
Safe operation of connected vehicle platoons under stochastic disturbances and time-delayed dynamics requires accurate quantification of rare but dangerous events, such as inter-vehicle collisions. We propose a rigorous framework for quantifying the risk of inter-vehicle collisions in connected vehicle platoons subject to time-delayed stochastic dynamics. We adopt the \emph{entropic value-at-risk} (EVaR) as a conservative metric to capture \emph{risk due to extreme events}, highlighting its advantages over conventional Value-at-Risk (VaR) and Conditional Value-at-Risk (CVaR). By expressing the inter-vehicle distance covariance in terms of the Laplacian eigenvalues of the communication network, we derive \emph{network-and time-delay-induced bounds} on both the minimum inherent risk and the worst-case risk. Specifically, the algebraic connectivity dictates the maximum EVaR, while the largest Laplacian eigenvalue determines the minimum risk inherently induced by the network structure. Numerical simulations illustrate how network topology and time delay shape collision risk, offering actionable insights for the safe design of vehicle platoons operating under stochastic disturbances.
Allocation of Dynamic Operating Envelopes in Radial Distribution Networks
This paper provides an in-depth analysis on how different aspects of the dynamic operating envelope (DOE) formulation impact the computation and allocation of network capacity. We show that the envelopes are significantly affected by the power flow model (non-linear or linear), binding network constraint (thermal or voltage) and by the calculation case (import or export envelope). We also propose a novel DOE algorithm (LACE) that presents transparent and scalable computation that is useful for larger networks or to act in tandem with other optimization engines. We run numerical simulations with different test feeders, including a realistic low-voltage feeder with real-world data from Belgium. This paper provides crucial insights and tools to distribution system operators (DSOs), stakeholders and academics alike to make sure DOE calculation achieves desirable and efficient outcome.
comment: Conference paper
Robust Capacity Expansion under Wildfire Ignition Risk and High Renewable Penetration
In power systems, the risk of wildfire ignition has increased significantly in recent years. The impact and severity of these events on energy dispatch, as well as their societal ramifications, make wildfire prevention critical for power system planning and operation. A common intervention by system operators is to de-energize transmission lines to mitigate the risk of fire caused by equipment failures. With the growing integration of variable renewable generation, managing and preparing the system to de-energization under wildfire risk has become even more challenging. In this context, mitigation decisions such as installing battery energy storage systems and undergrounding transmission lines can reduce the risk and adverse effects associated with de-energization and renewable generation variability. This paper presents a robust optimization model to determine the optimal location of battery storage and undergrounding of transmission line investment, utilizing representative weeks and uncertainty sets to capture the temporal relationship of uncertain variables. Specifically, this paper addresses: (i) the worst-case realization of ignition risk leading to the de-energization of transmission lines, combined with the worst-case realization of renewable energy availability, and (ii) the optimal investment decisions for energy storage capacity and undergrounding of transmission lines that are exposed to ignition risk. The proposed model is formulated as a mixed-integer linear programming (MILP) problem, employing duality theory and binary decomposition to address nonlinearities, and is solved using a column-and-constraint generation algorithm. The proposed framework is evaluated on a model of the San Diego power system, demonstrating its practical effectiveness in improving the resilience to wildfire risk.
Sampling-based Model Predictive Control Using Trust Regions
Sampling-based model predictive control (MPC) algorithms, such as model predictive path integral (MPPI), enable approximate, gradient-free solutions to optimal control problems by drawing samples from a proposal distribution, evaluating their trajectory costs, and updating the proposal parameters accordingly. However, these approaches typically rely on heuristics for adjusting hyperparameters, such as temperature or momentum, or manual tuning. We propose a trust region formulation for sampling-based MPC that constrains updates of the proposal distribution via a principled Kullback--Leibler (KL) divergence bound and, optionally, an entropy lower bound. This replaces heuristic hyperparameter adaptation with values that are optimal w.r.t. the underlying Lagrangian. We further improve sample efficiency and convergence by combining the trust region update with deterministic localized cumulative distribution (LCD)-based sampling. Experiments on two benchmark environments demonstrate that the proposed trust region update achieves faster convergence and better sample efficiency in low-sample and low-iteration regimes, especially when paired with deterministic LCD-based sampling.
Interactive Trajectory Planning with Learning-based Distributionally Robust Model Predictive Control and Markov Systems
We investigate interactive trajectory planning subject to uncertainty in the decisions of surrounding agents. To control the ego-agent, we aim to first learn the decision distribution and solve a Stochastic Model Predictive Control (SMPC) problem. To account for errors in the learned distribution, we show that it is possible to utilize Probably Approximately Correct (PAC) learning in combination with Distributionally Robust (DR) optimization to obtain a solution which accounts for the errors induced by the learning model. The results indicate that our PAC learning-based DR-MPC framework provides a method to interpolate between a robust MPC and an omnipotent SMPC, based on the available number of samples.
Beam-Aware Radio Map Estimation With Physics-Consistent Parametric Modeling for Unknown Multiple Satellites
Satellite networks with dense low Earth orbit (LEO) constellations rely on aggressive spectrum reuse, making co-channel interference a dominant and rapidly varying factor that limits link availability and complicates spectrum sharing and compliance. Satellite radio map (RM) construction is therefore essential for interference cognition, yet it is challenging because the active satellite set is unknown, beam footprints and pointing are not directly observable, and received signal strength (RSS) measurements are difficult to calibrate under coupled link budget variations and noise. These latent uncertainties yield a severely underdetermined inverse problem with strong signature coherence, where existing methods often trade detection recall for precision and still fail to recover a faithful continuous RSS field. This paper proposes a beam-aware RM estimation framework that unifies active satellite identification and RSS field reconstruction through physics-consistent parametric modeling. An interpretable structural prior links geometry and beam shaping to spatial RSS formation, and an adaptive model order selection strategy infers the number of active satellites from measurements by balancing fit and complexity. Extensive experiments across varying signal to noise ratio (SNR), total satellite count, and active satellite count demonstrate consistently higher RSS spatial correlation, lower root mean squared error (RMSE), and improved F1 score, validating the proposed approach for interference-aware satellite RM construction in satellite networks.
Control and Scheduling of Behind-the-Meter Battery Energy Storage Systems for Stacked Grid and Building Services
This paper proposes and experimentally validates a two-stage scheduling and control strategy for a behind-the-meter battery energy storage system (BESS) delivering both local and grid services. Considered services are the maximization of PV self-consumption, peak-load reduction, and secondary frequency control (aFRR).The day-ahead stage allocates battery capacity across local and balancing services using a scenario based approach, reflecting potential remuneration from aFRR participation without committing to fixed power availability; in the real-time stage, BESS set-points are computed in a periodic fashion at a high time resolution based on updated information on balancing prices, net load realization and BESS state of charge. The strategy is experimentally validated on a building at the Energypolis Campus of HES-SO Valais (Sion, Switzerland), which exhibits a peak power demand of 300 kW and is equipped with a 264 kWh / 140 kW lithium-ion BESS. The experimental results demonstrate the effectiveness of the proposed framework in scheduling and actuating the provision of both behind-the-meter and front-of-the-meter services.
Efficient MILP-based Urban Network Traffic Control in Mixed Autonomy with Dynamic Saturation Rates
This paper introduces a novel control strategy to optimize urban network traffic in mixed autonomy settings, featuring Connected and Automated Vehicles (CAVs) alongside Human-Driven Vehicles (HDVs). Unlike previous control strategies, where the impact of driver behaviour of CAVs and HDVs is not explicitly considered, we propose a dynamic, queue-responsive saturation rate to account for autonomy-driven variations in traffic flow characteristics. The proposed method is based on an extended multi-commodity store-and-forward model to a mixed autonomy environment, integrating optimized routing for CAVs via infrastructure-linked connectivity, and signal timings at every signalized intersection. The problem is formulated as a Non-Convex Quadratic Program (NQP), which accounts for queue evolution, spillback, green time allocation, and CAVs routing. To enable computational efficiency for real-time applications, we transform the NQP into a sequence of convex subproblems, leveraging under- and over-estimators to reformulate it as a Mixed Integer Linear Program (MILP). Experimental results via microscopic simulations validate the efficiency and robustness of the proposed methodology. The results reflect that the proposed model outperforms the existing multi-commodity approach, thus demonstrating its potential for real-time traffic optimization in future urban mobility systems.
Cascade PID Control of an Inverted Pendulum on a Cart System: Simulation and Experimental Analysis
This study investigates the performance of cascade PID control architecture applied to an inverted pendulum on a cart system through both simulation and experimental implementation. A nonlinear model of the system was developed using Simscape Multibody in Simulink, while a physical prototype was constructed using a DC motor-driven cart, pendulum, rotary encoder, ultrasonic sensor, and an Arduino. The cascade PID control structure consists of an inner loop regulating the pendulum angle and an outer loop controlling the cart position. Simulation results demonstrated effective stabilization of the pendulum and satisfactory position tracking under idealized conditions. Experimental results confirmed successful real-time stabilization but revealed notable differences from simulation, particularly in controller gains, transient behavior, and disturbance response due to sensor noise, unmodeled friction, and implementation constraints. The study also highlights the limitations of cascade PID control in disturbance rejection and large position commands, particularly under limited track length. A comparative analysis using an LQR-based inner loop demonstrated better disturbance rejection and reduced overshoot. The results provide practical insights into the applicability and limitations of cascade PID control of the inverted pendulum system.
Spatiotemporal Trust Evaluation for Collaborator Selection via Customized GNN-Mamba
The successful completion of collaborative tasks relies on the effective selection of trustworthy collaborators. To accurately evaluate the trustworthiness of potential collaborators, it is necessary to combine insights from their past collaborations with assessments of their resource capabilities under specific task contexts. However, the coexistence of diverse trust perspectives, along with complex spatiotemporal dependencies among devices, makes accurate trust evaluation particularly challenging. To address these challenges, we propose a customized Graph Neural Network (GNN)-Mamba (GM) model for trust evaluation and collaborator selection. In this model, the GNN model performs spatial trust fusion by leveraging inter-device spatial dependencies extracted from historical collaborations, while the Mamba-based temporal model captures both short-term fluctuations and long-term evolution of device trust. In addition, task-specific resource trust is incorporated to reflect the practical capabilities of devices under varying task conditions. Experimental results demonstrate that the proposed GM model outperforms baseline approaches in terms of the accuracy and stability of trust evaluation.
comment: IEEE ICC 2026
Electric Axle and Wheel Module Driveline Concepts for Self-propelled Agricultural Machinery and Equipment Carriers
Direct electric drivelines without power-split open new design freedom for frame and suspension design, along with often lower energy losses. This paper focuses on self-propelled agricultural machinery (combine and forage harvest-ers, root crop harvesters), equipment carriers, propelled trailers and field robots. For a typical vehicle with four driven wheels, the electric motors can be packaged as two axle modules or four wheel modules, both defined herein as self-contained mechatronic units with integrated power electronics, distributed control intelligence and steering. Axle module and wheel module concepts are compared in detail against engineering requirements including loads, effi-ciency, steerability, controllability, braking, suspension, structural load support, asymmetric wheel loading and manu-facturing cost. The wheel module offers maximum design freedom, redundancy and controllability, while the axle module provides lower cost, structural rigidity, automatic load sharing through the differential and the ability to be used in existing vehicle structures. Both concepts are defined such that distributed control intelligence and steering are integral to each unit, requiring only a DC power bus and communication interface from the vehicle.
Distributionally Robust Data-Driven Predictive Control for Stochastic LTI Systems
We propose a distributionally robust data-driven predictive control framework for stochastic linear time-invariant systems with unknown dynamics and disturbance distributions. We use an offline trajectory to fit the subspace predictive control (SPC) predictor via least squares and construct an empirical distribution of the prediction residuals as a proxy for the unknown disturbance distribution. We then center a Wasserstein ambiguity set around this estimate and minimize the worst-case expected cost while enforcing probabilistic output constraint satisfaction over all distributions in the set. The resulting problem admits a tractable reformulation with an equivalent direct data-driven form, eliminating the need for explicit predictor identification. Using finite-sample concentration results, we provide a data-driven Wasserstein radius such that, with high probability, the true expected cost is bounded above by the tractable objective and output constraints are satisfied with respect to the true disturbance distribution. Numerical simulations validate the framework against existing methods under various disturbance conditions and cost functions.
Deadline-Driven Hierarchical Agentic Resource Sharing for AI Services and RAN Functions in AI-RAN
AI-RAN consolidates AI services and Radio Access Network (RAN) functions onto a unified, GPU-accelerated infrastructure at the network edge. However, compute sharing between real-time RAN functions and highly heterogeneous AI services requires coordination of scheduling decisions at mismatched timescales, and placement adaptation may require service migration across nodes with non-negligible interruptions. This paper proposes a hierarchical agentic framework (HAF) for compute sharing in AI-RAN that combines a large language model (LLM)-based agent for slow-timescale placement of AI services and RAN functions with a closed-form, deadline-aware convex algorithm for fast-timescale GPU/CPU allocation. The LLM agent is further equipped with a predictive critic that filters out migrations when the induced service interruption outweighs the expected service-level objective (SLO) benefit. Experimental results show that HAF reaches 90.0% overall SLO fulfillment, a 20.5% improvement over the strongest baseline, and raises AI service request fulfillment from 51% to 85.3%. Further evaluations show that HAF retains its advantage under diverse load conditions, while the critic consistently improves SLO fulfillment across multiple open-source LLM agents.
Learning Neural Hybrid Surrogates for Gradient-Based Falsification
Falsification of hybrid dynamical systems remains challenging due to mode-dependent dynamics and discrete transitions. In this work, we propose a surrogate-based falsification approach that enables hybrid systems by learning a differentiable hybrid automaton model from data. This extends previous surrogate-based falsification methods, which were limited to purely continuous dynamics. Specifically, we employ neural hybrid automata to learn both a latent mode encoder and the corresponding mode-conditioned vector fields. Once the surrogate has paired each mode with an associated vector field, the transition guards are inferred using existing trajectory data. The learned surrogate is subsequently subjected to a gradient-based optimal control formulation, which minimizes a smooth approximation of the safety specification to find safety violations. In the last step, an experiment with the optimal control solution is carried out on the original system to ensure soundness. The proposed method consistently uncovers counterexamples on a majority of evaluated benchmark specifications; on these cases, it achieves competitive or improved sample efficiency than other tools while using a reduced simulation budget.
Resilience of IEC 61850 Sampled Values-Based Protection Systems Under Coordinated False Data Injections
This paper assesses the resilience of IEC 61850 digital substations under False Data Injection Attacks (FDIAs) targeting the Sampled Values (SV) protocol. The multicast nature of SV, while enabling time-critical automation, exposes substations to cyber intrusions capable of disrupting protection functions and causing large-scale outages. To evaluate these risks, coordinated attack vectors involving both physical and cyber access at the bay level are experimentally analyzed using an advanced setup based on industrial-grade intelligent electronic devices (IEDs). The proposed attacks simultaneously manipulate multiple electrical parameters in a coordinated and physically consistent manner. Experimental results confirm the feasibility of stealthy multi-vector FDIAs that can trigger false protection actions, conceal real faults, or block protection mechanisms while maintaining realistic signal behavior. The Power Hardware-in-the-Loop (PHIL) testbed enables closed-loop evaluation under strict timing, communication, and protection logic constraints, reflecting real device behavior beyond simulation and controller-level HIL environments. The findings reveal critical vulnerabilities in SV-based protection schemes that directly affect grid reliability, particularly under realistic attacker positioning. To address these challenges, a defense strategy covering deterrence, prevention, detection, mitigation, and resilience is analyzed, with emphasis on bay-level infrastructure. Furthermore, a resilience-oriented method based on trusted independent channels and cross-verification of SV data within the protection logic is outlined as a complementary countermeasure for scenarios where existing standardized security mechanisms are insufficient.
comment: 11 pages, 8 figures
Stochastic Differential Dynamic Programming for Trajectory Optimization under Partial Observability
Designing spacecraft trajectories remains challenging in the presence of stochastic effects such as maneuver execution errors and observation uncertainties. Although covariance control and belief-space planning provide useful tools for designing robust control policies and information-aware trajectories under uncertainty, practical methods remain limited for partially observable trajectory optimization problems in which trajectory design, orbit determination, and correction maneuver planning are tightly coupled. This paper presents a stochastic differential dynamic programming algorithm for such coupled problems. The proposed method optimizes the nominal control sequence and feedback gains subject to belief dynamics and general mission constraints, explicitly accounting for the dependence of covariance propagation on the nominal trajectory without relying on the separation principle. Numerical examples demonstrate that the proposed algorithm produces navigation-aware and uncertainty-robust solutions across a range of dynamical systems, observation models, and uncertainty levels. In particular, the circular restricted three-body problem shows that the proposed method can exploit the coupling between trajectory design and orbit determination to obtain navigation-aware solutions with substantially lower fuel consumption than those from deterministic local optimization starting from the same initial guess.
comment: 43 pages, 13 figures; submitted to the Journal of Guidance, Control, and Dynamics
Data-Driven Contextual-Aware Uncertainty Set for Robust Dispatch of Power Systems
Both the level of conservativeness and the computational burden in robust optimization are critically influenced by uncertainty set design. However, contextual side information is rarely exploited in robust dispatch of power systems characterized by irregular data distributions, which hinders the explicit characterization of the relationship between covariates and uncertain parameters. To address this issue, a data-driven method for constructing contextual-aware uncertainty set is proposed in this letter. Based on a conditional Gaussian mixture model, a set of covariates is leveraged as side information to design uncertainty sets tailored to historical data exhibiting irregular distributions. The resulting set is formulated as a union-of-subsets formulation, and a mixed integer linear reformulation is adopted to describe the worst-case realization across all subsets. Finally, the effectiveness of the proposed method is demonstrated through numerical experiments applied to robust unit commitment.
Learning myopic mixed-integer nonlinear model predictive control from expert demonstrations
Applying nonlinear model predictive control (NMPC) to systems with hybrid dynamics or discrete actions typically yields mixed-integer nonlinear programs (MINLPs), whose real-time solution remains a major challenge and limits the applicability of mixed-integer NMPC (MINMPC). This paper proposes a myopic MINMPC framework that incorporates value-function approximation to substantially reduce the online computational burden. Using Bellman's principle of optimality, we shorten the prediction horizon and append a value function learned offline from expert state-action demonstrations via inverse optimization with optimality residual minimization. A central feature is the dual treatment of discrete decisions, whereby integer constraints are relaxed during offline learning to enable KKT-residual-based value function synthesis, while the online controller enforces the true integer constraints to ensure feasibility. The learned value function induces a policy that is approximately policy-consistent with the expert demonstrations. The resulting controller achieves high closed-loop performance with a significantly shorter horizon, enabling real-time MINMPC. The effectiveness of the approach is demonstrated on the Lotka-Volterra fishing problem and a satellite attitude control system with discrete actuators.
comment: Accepted proceedings 23rd IFAC World Congress, Busan Korea
MORPH-U: Multi-Objective Resilient Motion Planning for V2X-Enabled Autonomous Driving in High-Uncertainty Environments via Simulation
V2X can warn an autonomous vehicle about hazards beyond line-of-sight, but it also brings uncertainty: messages may be delayed, dropped, or even forged. Meanwhile, map knowledge may change during a trip, forcing the vehicle to replan under tight real-time budgets. This paper studies how to make motion planning and low-level control robust to such uncertain, event-driven updates. We present MORPH-U, a CARLA-based closed-loop stack that fuses LiDAR/radar/camera with V2X (CAM/DENM) into a Local Dynamic Map (LDM) and triggers Hybrid-A* replanning when validated hazards or map changes affect the planned route. We expose the planning/control trade-offs via a multi-objective formulation over tracking error, safety margin (minimum TTC), responsiveness, and smoothness, and select operating points using Pareto-frontier analysis. To avoid unsafe replanning from faulty V2X triggers, MORPH-U adds a lightweight Byzantine-inspired acceptance gate that combines a quorum rule with an on-board sensor veto. Experiments in dynamic CARLA scenarios show that V2X-augmented LDM improves downstream safety, Pareto tuning provides controllable accuracy-comfort trade-offs, and the gate prevents replanning under saturated false-DENM injection ($p_{\text{attack}}=1.0$).
Stability-Certified Koopman Observer Design for Nonlinear Systems via Generalized Persidskii Dynamics
This paper addresses the problem of nonlinear state estimation for dynamical systems whose governing equations are approximated through Koopman operator liftings. While Koopman-based predictors have demonstrated broad approximation capability for nonlinear dynamics, certifying observer convergence under model mismatch and measurement noise has remained a largely open problem. To resolve this, we establish a structural correspondence between the error dynamics of a Koopman latent-space observer and the class of generalized Persidskii systems, which admits diagonal Lyapunov functions and incremental sector characterizations. Exploiting this connection, we design a nonlinear correction term whose gain is computed via a linear matrix inequality (LMI) that simultaneously certifies input-to-state stability (ISS) of the estimation error with respect to both lifting residuals and external disturbances. Exponential convergence in the nominal case and ultimate boundedness under bounded perturbations are established analytically. Numerical validation on the Van~der~Pol oscillator and a nonlinear robotic arm with friction uncertainty demonstrates that the proposed observer substantially outperforms both the Extended Kalman Filter and a linear Koopman observer in terms of estimation accuracy and robustness, achieving up to a 42\% reduction in steady-state RMSE under lifting mismatch.
Variational PMB filter via coordinate descent Kullback-Leibler divergence minimisation
This paper presents a new derivation of the variational Poisson multi-Bernoulli (V-PMB) filter for multi-target estimation proposed in [#Williams15]. The proposed derivation is based on considering an augmented space that includes the set of target states with their track indices and the global hypothesis variable. Then, we show that the V-PMB projection performs a coordinate descent Kullback-Leibler divergence (KLD) minimisation on this augmented space to fit the best possible PMB density to the Poisson multi-Bernoulli mixture (PMBM) posterior. We also show that this V-PMB projection keeps the probability hypothesis density of the posterior. The paper also includes a comparison with the PMBM filter and other PMB filter variants, including a track-oriented Murty-based implementation, a track-oriented loopy belief propagation implementation and a global nearest neighbour implementation, showing the benefits of the V-PMB filter compared to the other PMB filters when targets get in close proximity and then separate.
comment: Accepted in Proceedings of the 29th International Conference on Information Fusion, 2026. Matlab code available at https://github.com/Agarciafernandez/MTT
Variable Aerodynamic Damping via Co-Contraction: A Dynamic Isomorphism with Variable Stiffness Actuators
We prove that aerodynamic co-contraction in a redundant dual-rotor actuator can tune a passive, trim-defined aero-mechanical damping while keeping the commanded net force constant. In particular, we define an incremental damping coefficient as the local sensitivity of net thrust to air-relative velocity at a trim and prove that it increases monotonically along constant-force fibers under a mild aerodynamic hardening condition. We then validate the required damping and hardening properties from a first-principles Blade Element Theory derivation, which yields a minimal thrust model affine in inflow and explicitly reveals the speed--inflow coupling driving the effect. The resulting mechanism is formalized as a Variable Aerodynamic Damping Actuator (VADA) and shown to be dynamically isomorphic to stiffness modulation in antagonistic variable-stiffness actuation (VSA), similar to the co-contraction of tendons by muscle co-activation. The same fiber-density principle also enhances the active aerodynamic promptness measure of redundant multirotors. Finally, an impedance-form representation clarifies the roles of common-mode and differential-mode actuation in the control of passive impedance and the equilibrium velocity of the VADA system.
Cost-Ordered Feasibility for Multi-Armed Bandits with Cost Subsidy
The classic multi-armed bandit (MAB) problem tackles the challenge of accruing maximum reward while making decisions under uncertainty. However, in applications, often the goal is to minimize cost subject to a constraint on the minimum permissible reward, an objective captured by multi-armed bandits with cost-subsidy (MAB-CS). Of interest to this paper is the setting where the quality (reward) constraint is specified relative to the unknown best reward and the cost of each arm is known. We characterize the expected sub-optimal samples required by any policy by proving instance-dependent lower bounds that offer new insight into the problem and are a strict generalization of prior bounds. Then, we propose an algorithm called Cost-Ordered Feasibility (COF) that leverages our insight and intelligently combine samples from all arms to gauge the feasibility of a cheap arm. Thereafter, we analyze COF to establish instance-dependent upper bounds on its expected cumulative cost and quality regret, i.e., relative to the cheapest feasible arm. Finally, we empirically validate the merits of COF, comparing it to baselines from the literature through extensive simulation experiments on the MovieLens and Goodreads datasets as well as representative synthetic instances. Not only does our paper develop qualitatively better theoretical regret upper bounds, but COF also convincingly demonstrates improved empirical performance.
Symplectic H2 Model Reduction for High-Dimensional Linear Quantum Systems
The $\mathcal{H}_2$ model reduction problem for high-dimensional linear quantum systems is studied under the constraint of physical realizability (PR). This constraint requires preservation of the canonical commutation relations and the quantum input-output structure, and therefore prevents the direct use of standard projection methods. A symplectic Petrov-Galerkin framework is presented, in which reduced-order models automatically satisfy the PR identities by construction. Within this framework, a symplectic variant of the iterative rational Krylov algorithm is developed and referred to as Quantum IRKA (Q-IRKA). At each iteration, an enriched tangential rational Krylov pool is generated from shifted linear solves. A symplectic basis is then extracted by a Gram-Schmidt-type procedure, paired with symplectic conjugates, and normalized so that the reduced trial space satisfies the canonical symplectic constraint. The interpolation points are updated from selected mirror images of the poles of the current reduced-order model, while the reduced-order matrices are obtained exclusively by structure-preserving projection. Numerical experiments on low-channel oscillator-chain systems and on a bosonic Kitaev-chain-inspired benchmark show that Q-IRKA is effective for large-scale linear quantum systems. Symplecticity and PR are preserved to machine precision, and accurate reduced-order models are obtained with moderate computational cost. The results also show that reduction quality depends substantially on dissipation geometry, channel placement, heterogeneity, and reduced order. These findings indicate that scalable $\mathcal{H}_2$ model reduction of linear quantum systems can be achieved while strictly preserving the underlying physical structure.
comment: 28 pages, 9 figures. Comments are welcome,
A Behavioral Framework for Data-Driven Modeling of Nonlinear Systems in Vector-Valued Reproducing Kernel Hilbert Spaces
We generalize Jan Willems' behavioral approach to a class of discrete-time nonlinear systems in a vector-valued reproducing kernel Hilbert space (RKHS). Apart from linear time-invariant systems, this class covers nonlinear systems modeled by Volterra series and their autoregressive variants, as well as systems admitting Hammerstein-type state-space realizations. We apply the proposed framework to the problem of data-driven modeling of such systems, i.e., when simulation or control objectives for an unknown system are carried out without an explicit system identification step. To that end, we link the behavioral approach to two data-driven modeling methods in a vector-valued RKHS: (1) minimum-norm interpolation and (2) subspace identification.
comment: 12 pages
Model-Reference Adaptive Flight Control of the 95-mg Bee++
We introduce a model-reference adaptive control (MRAC) architecture for high-performance positional tracking of the Bee++, a 95-mg insect-scale flapping-wing aerial vehicle. The suitability, functionality, and high performance of the proposed approach are demonstrated using data from real-time flight experiments.
comment: Extended abstract to appear in the proceedings of the LSU Symposium on Control, Learning, and Intelligent Systems
Generalized Global Self-Optimizing Control for Chemical Processes: Part II Objective-Guided Controlled Variable Learning Approach
Self-optimizing control (SOC) aims to maintain near-optimal process operation by judiciously selecting controlled variables (CVs). In this series of work, the generalized global SOC (g2SOC) approach is proposed, which extends the concept of SOC to the whole operation space and uses general nonlinear functions to design CVs instead of linear combinations. In the first part of this series work, two numerical approaches for g2SOC are proposed: the optimization-based approach and the regression-based approach, based on a theoretical analysis of the existence of perfect self-optimizing CVs. The CVs designed by the former perform better, but are usually infeasible for large-scale problems. In this paper, we propose an algorithm called objective-guided controlled variable learning (OGCVL) that combines the advantages of both and has a better scalability. OGCVL is proposed for efficient CV design that seamlessly integrates symbolic and numerical computation techniques. Finally, the effectiveness of the OGCVL method is verified in two numerical examples. Both examples illustrate show that the OGCVL method is able to achieve good results while maintaining computational efficiency and is also feasible in large-scale problems.
Transcription-Induced Failure Modes in 6-DOF Rocket Landing Trajectory Optimization
Solving optimal control problems via large-scale NLP solvers depends on discretizing continuous dynamics. Yet, this transcription step hides critical vulnerabilities-most notably truncation error and invariant drift-that can drive solvers toward dynamically infeasible or suboptimal trajectories. To expose these hidden failures, we introduce a problem- and transcription-agnostic adversarial objective that leverages the structure of local truncation-error bounds to aggressively amplify such defects. When applied to a 6-DOF rocket-landing problem, we reveal a stark reliability gap: of fourteen transcription methods tested, only three satisfy rigorous validation criteria. These results also expose a striking performance inversion: even in the absence of classical stiffness, a fourth-order implicit scheme (GL2) matches the fidelity of a sixth-order explicit method (RK6). Using B-series expansions and symplectic Runge-Kutta theorems, we isolate the specific truncation errors and quaternion-invariant drift responsible for these failures. Crucially, these theoretical vulnerabilities dictate operational performance: in practical lateral-divert scenarios, the implicit GL2 consistently outperforms the explicit RK6 in both end-to-end solve speed and robustness.
comment: This work has been submitted to the IEEE for possible publication
Anatomical Landmark-Guided Deep Reinforcement Learning for Autonomous Gastric Navigation
Wireless capsule endoscopy (WCE) enables painless visualization of the gastrointestinal tract, but its diagnostic potential is limited by incomplete mucosal coverage and poor transferability of existing navigation methods across patient anatomies. We propose a transferable, anatomical landmarkguided deep reinforcement learning (AL-DRL) framework for autonomous gastric navigation. Leveraging a lightweight edgecontour-depth fusion module, our policy operates on stable, lowdimensional landmark coordinates rather than high-dimensional video streams, effectively bridging the sim-to-real gap. In simulations across eight patient-derived models, the method achieves over 97% coverage within 50 seconds, significantly outperforming vanilla PPO, SAC, and DQN agents. A two-stage sim-to-real pipeline with an adaptive dynamic programming controller actively mitigates physical disturbances. Ex-vivo experiments demonstrate a mean coverage of 87% and a 53% reduction in procedure time compared with expert manual control.
Code Generation and Conic Constraints for Model-Predictive Control on Microcontrollers with Conic-TinyMPC ICRA 2026
Model-predictive control (MPC) is a state-of-the-art control method for constrained robotic systems, yet deployment on resource-limited hardware remains difficult. This challenge is magnified by expressive conic constraints, which offer greater modeling power but require significantly more computation than linear alternatives. To address this challenge, we extend recent work developing fast, structure-exploiting, cached solvers for embedded applications based on the Alternating Direction Method of Multipliers (ADMM) to provide support for second-order cones, as well as C++ code generation from Python, MATLAB, and Julia. Microcontroller benchmarks show that our solver provides up to a two-order-of-magnitude speedup, ranging from 10.6x to 142.7x, over state-of-the-art embedded solvers on QP and SOCP problems, and enables us to fit order-of-magnitude larger problems in memory. We validate our solver's deployed performance through simulation and hardware experiments, including trajectory tracking with conic constraints on a 27g Crazyflie quadrotor. Our open-source code is available at https://tinympc.org.
comment: Accepted to ICRA 2026. 4 Figures. 2 Tables. First three authors contributed equally
GATO: GPU-Accelerated and Batched Trajectory Optimization for Scalable Edge Model Predictive Control ICRA 2026
While Model Predictive Control (MPC) delivers strong performance across robotics applications, solving the underlying (batches of) nonlinear trajectory optimization (TO) problems online remains computationally demanding. Existing GPU-accelerated approaches either parallelize single solves, handle large batches at sub-real-time rates, or sacrifice model generality for speed. This leaves a large gap in solver performance for many state-of-the-art MPC applications that require real-time batches of tens to low-hundreds of solves. As such, we present GATO, an open source, GPU-accelerated, batched TO solver co-designed across algorithm, software, and computational hardware to deliver real-time throughput for these moderate batch size regimes. Our approach leverages a combination of block-, warp-, and thread-level parallelism within and across solves for ultra-high performance. We demonstrate the effectiveness of our approach through a combination of: simulated benchmarks showing speedups of 18-21x over CPU baselines and 1.4-16x over GPU baselines as batch size increases; case studies highlighting improved disturbance rejection and convergence behavior; and finally a validation on hardware using an industrial manipulator. We open source GATO to support reproducibility and adoption.
comment: Accepted to ICRA 2026. 8 pages, 8 figures, 2 tables
Online Adaptive Probabilistic Safety Certificate with Language Guidance
Achieving long-term safety in uncertain/extreme environments while accounting for human preferences remains a fundamental challenge for autonomous systems. Existing methods often trade off long-term guarantees for fast real-time control and cannot adapt to variability in human preferences or risk tolerance. To address these limitations, we propose a language-guided adaptive probabilistic safety certificate (PSC) framework that guarantees long-term safety for stochastic systems under environmental uncertainty while accommodating diverse human preferences. The proposed framework integrates natural-language inputs from users and Bayesian estimators of the environment into adaptive safety certificates that explicitly account for user preferences, system dynamics, and quantified uncertainties. Our key technical innovation leverages probabilistic invariance--a generalization of forward invariance to a probability space--to obtain myopic safety conditions with long-term safety guarantees. We validate the framework through numerical simulations of autonomous lane-keeping with human-in-the-loop guidance under uncertain and extreme road conditions, demonstrating enhanced safety-performance trade-offs, adaptability to changing environments, and personalization to different user preferences. Code is available at https://github.com/hoshino06/adaptive_lane_keeping.
Learning Reachability of Energy Storage Arbitrage
Power systems face increasing weather-driven variability and, therefore, increasingly rely on flexible but energy-limited storage resources. Energy storage can buffer this variability, but its value depends on intertemporal decisions under uncertain prices. Without accounting for the future reliability value of stored energy, batteries may act myopically, discharging too early or failing to preserve reserves during critical hours. This paper introduces a stopping-time reward that, together with a state-of-charge (SoC) range target penalty, aligns arbitrage incentives with system reliability by rewarding storage that maintains sufficient SoC before critical hours. We formulate the problem as an online optimization with a chance-constrained terminal SoC and embed it in an end-to-end (E2E) learning framework, jointly training the price predictor and control policy. The proposed design enhances reachability of target SoC ranges, improves profit under volatile conditions, and reduces its standard deviation.
Sample-Efficient Model-Free Policy Gradient Methods for Stochastic LQR via Robust Linear Regression
Policy gradient algorithms are widely used in reinforcement learning and belong to the class of approximate dynamic programming methods. This paper studies two key policy gradient algorithms, the Natural Policy Gradient and the Gauss-Newton Method, for solving the Linear Quadratic Regulator (LQR) problem in unknown stochastic linear systems. The main challenge lies in obtaining an unbiased gradient estimate from noisy data due to errors-in-variables in linear regression. This issue is addressed by employing a primal-dual estimation procedure. Using this novel gradient estimation scheme, the paper establishes convergence guarantees with a sample complexity of order O(1/epsilon). Theoretical results are further supported by numerical experiments, which demonstrate the effectiveness of the proposed algorithms.
Kirigami-Structured Electronic Capsule for Long-Term Continuous Gastric Monitoring
Ingestible electronic systems enable non-invasive, in situ sensing within the gastrointestinal (GI) tract, yet clinical translation has been limited by uncontrolled transit, short operational lifetimes, and unreliable wireless communication that prevent continuous monitoring. Here, we present a gastric-resident ingestible robotic platform that achieves week-long operation through integration of a bioinspired, electrically triggered release mechanism with a kirigami-enabled electronic architecture. A kirigami-patterned flexible printed circuit board spans the capsule body and deployable superelastic arms, enabling high-density integration of sensing, power management, and wireless modules within a constrained volume while tolerating large mechanical deformation during gastric residence. Stable retention and on-demand disassembly are achieved using thermally responsive polycaprolactone joints that transition from rigid to compliant states under electrical activation, avoiding dependence on variable chemical triggers. Reliable telemetry in the highly attenuating gastric environment is maintained using a dual-band Bluetooth Low Energy and sub-gigahertz module with RSSI- and throughput-aware adaptive transmission, balancing link robustness and energy consumption. We demonstrate long-term, continuous monitoring of gastric radiation exposure, enabling early detection of dose accumulation and providing a promising in vivo alternative to wearable or handheld dosimeters. Swine studies confirm stable gastric residence, sustained real-time telemetry, and safe gastrointestinal passage following triggered disassembly. This work establishes kirigami-enabled integration as a scalable strategy for long-term gastric-resident robotic systems.
comment: This submission is withdrawn because the author/contributor information in the current version was submitted before explicit confirmation had been obtained from all relevant team members. We are withdrawing the article to avoid an inaccurate or unverified authorship/contribution record
A Directivity-Dependent Rician K-Factor Model for Indoor Industrial Channels
We derive a physics-based, closed-form model linking antenna directivity to the root-mean-square (RMS) delay spread and mean excess delay in large reverberant indoor environments. Starting from the Rician K-factor-the ratio of line-of-sight (LOS) to scattered power we show that K scales with the total transmit-plus-receive (Tx+Rx) antenna gain through a single reverberance factor that quantifies scatter anisotropy. For an arbitrary scatter power delay profile (PDP), we derive a general identity connecting sigma, tau, and K; the exponential scatter model is the physically motivated special case. Ray-tracing simulations over 100 random link placements in a 57300 m3 industrial hall at 75 GHz validate the model. Compact design rules map target delay-spread values to the minimum required antenna gain, enabling wideband mmWave industrial links.
comment: unresolved results issue/ re-framing required
A Resilience Framework for Bi-Criteria Combinatorial Optimization with Bandit Feedback
We study bi-criteria combinatorial optimization under noisy function evaluations. While resilience and black-box offline-to-online reductions have been studied in single-objective settings, extending these ideas to bi-criteria problems introduces new challenges due to the coupled degradation of approximation guarantees for objectives and constraints. We introduce a notion of $(α,β,δ,\texttt{N})$-resilience for bi-criteria approximation algorithms, capturing how joint approximation guarantees degrade under bounded (possibly worst-case) oracle noise, and develop a general black-box framework that converts any resilient offline algorithm into an online algorithm for bi-criteria combinatorial multi-armed bandits with bandit feedback. The resulting online guarantees achieve sublinear regret and cumulative constraint violation of order $\tilde{O}(δ^{2/3}\texttt{N}^{1/3}T^{2/3})$ without requiring structural assumptions such as linearity, submodularity, or semi-bandit feedback on the noisy functions. We demonstrate the applicability of the framework by establishing resilience for several classical greedy algorithms in submodular optimization.
Covariance Stabilization for a class of Stochastic Discrete-time Linear Systems using the S-Variable Approach
This paper deals with the problem of covariance stabilization for a class of linear stochastic discrete-time systems in the Stochastic Model Predictive Control (SMPC) framework. The considered systems are affected by independent and identically distributed (i.i.d.) additive and parametric stochastic uncertainties (potentially unbounded), in addition to polytopic deterministic uncertainties bounding the mean of the state and input parameters. The design conditions presented in this paper are formulated as Linear Matrix Inequalities (LMIs), using the S-variable approach in order to reduce the potential conservatism. These conditions are derived using a deterministic exact characterization of the covariance dynamics, the latter involves bilinear terms in the control gain. A technique to linearize such dynamics is presented, it results in a descriptor representation allowing to derive sufficient conditions for the design of a covariance-stabilizing controller. The derived condition is first compared with a known necessary and sufficient stability condition for systems without deterministic uncertainties and additive stochastic noise. Although more conservative, the proposed condition is more numerically tractable, with an LMI size scaling as O(n^2) instead of O(n^3). Then, the same condition is used to design controllers that are robust to both deterministic and stochastic uncertainties. Several numerical examples are presented for comparison and illustration.
Bluetooth Phased-array Aided Inertial Navigation Using Factor Graphs: Experimental Verification
Phased-array Bluetooth systems have emerged as a low-cost alternative for performing aided inertial navigation in GNSS-denied use cases such as warehouse logistics, drone landings, and autonomous docking. Basing a navigation system off of commercial-off-the-shelf components may reduce the barrier of entry for phased-array radio navigation systems, albeit at the cost of significantly noisier measurements and relatively short feasible range. In this paper, we compare robust estimation strategies for a factor graph optimisation-based estimator using experimental data collected from multirotor drone flight. We evaluate performance in loss-of-GNSS scenarios when aided by Bluetooth angular measurements, as well as range or barometric pressure.
comment: 6 pages, 5 figures, 2 tables. \c{opyright} 2026 the authors. This work has been accepted to IFAC for publication under a Creative Commons Licence CC-BY-NC-ND
Dynamic one-time delivery of critical data by small and sparse UAV swarms: a model problem for MARL scaling studies
This work studies the application of Multi-Agent Reinforcement Learning (MARL) to decentralized control of unmanned aerial vehicles to relay a critical data package to a known position. For this purpose, a family of deterministic games is introduced, designed for MARL scaling studies. A robust baseline policy is proposed which restricts agent motion and applies Dijkstra's shortest path algorithm. Computational experiment results show that two off-the-shelf MARL algorithms perform competitively with the baseline for a small number of agents, but face scalability issues as the number of agents increases. Source code and animations are available online at https://github.com/mikapersson/Information-Relaying.
comment: Accepted to the 2026 IFAC World Congress
Cooperative $\mathcal{H}_\infty$ Fault-Tolerant Tracking with ISS Guarantees for Networked Systems with Sensor Faults SC
This paper develops a cooperative fault-tolerant tracking framework for heterogeneous networked linear systems subject to sensor faults and external disturbances. Each unit employs an augmented $\mathcal{H}_\infty$ observer that jointly reconstructs the system state and unknown sensor fault, providing disturbance-attenuated estimation guarantees. An inner state-feedback gain is synthesized through convex $\mathcal{H}_\infty$ Linear Matrix Inequalities (LMIs) to ensure robust closed-loop stabilization and disturbance rejection, while an outer distributed integral action eliminates steady-state tracking offsets and enables cooperative tracking of a setpoint source. The resulting cooperative error dynamics are shown to satisfy an Input-to-State Stability (ISS) property with respect to disturbances and residual estimation uncertainty, and converge exponentially to zero in the disturbance-free case. Furthermore, vanishing cooperative error guarantees network-wide consensus tracking of the desired setpoint. Numerical studies on heterogeneous DC-motor networks with star, cyclic, and path communication topologies demonstrate accurate state and fault estimation, robust cooperative tracking, and resilience against disturbances and time-varying sensor faults. The proposed framework provides a scalable and robust coordination strategy for interconnected systems operating under sensing imperfections and uncertain environments.
comment: This paper is accepted for publication at the 15th Asian Control Conference (ASCC) 2026
ART for Diffusion Sampling: A Reinforcement Learning Approach to Timestep Schedule
We consider time discretization for score-based diffusion models to generate samples from a learned reverse-time dynamic on a finite grid. Uniform and hand-crafted grids can be suboptimal given a budget on the number of time steps. We introduce Adaptive Reparameterized Time (ART), which controls the clock speed of a reparameterized time variable to redistribute computation along the sampling trajectory while preserving the terminal time, with the objective of minimizing the aggregate Euler discretization error. We derive a randomized companion ART-RL that recasts ART as a continuous-time reinforcement learning problem with Gaussian policies, and prove a two-directional bridge between the two: the deterministic ART optimum lifts to an optimal Gaussian policy, and conversely any optimal Gaussian policy must recover the ART control through its mean. This bridge turns continuous-time actor--critic learning into a principled, rather than heuristic, route to the deterministic timestep optimum. Within the official EDM pipeline, ART-RL improves FID on CIFAR--10 across a wide range of budgets; after one-time offline training, the distilled deterministic schedule transfers without retraining to AFHQv2, FFHQ, and ImageNet at no extra inference cost.
comment: 25 pages, 8 figures, 5 tables
Analytical PI Tuning for Second-Order Plants with Monotonic Response and Minimum Settling Time
This study presents two analytical closed-form PI controller tuning solutions for second-order plants with real poles, each achieving monotonic step response and minimum settling time. The first solution employs pole-zero cancellation, placing the controller zero at the slower plant pole and reducing the closed-loop dynamics to a critically damped second-order system. The second solution, applicable when the plant pole ratio is less than two, places all three closed-loop poles at a common location without cancelling any plant pole, yielding a closed-loop transfer function with a triple real pole and a zero. Despite retaining a closed-loop zero, this solution achieves strictly faster settling time than the pole-zero cancellation method in its region of applicability. The two solutions coincide at the boundary pole ratio of two and together form a continuous piecewise-analytical tuning covering the full range of plant pole ratios. This study further establishes that closed-loop transfer functions of the form a^n/(s + a)^n possess a maximum sensitivity Ms that is independent of the pole location a and depends solely on the order n, yielding universal robustness constants for each n. Numerical verification confirms the analytical results across multiple plant configurations.
comment: 4 figures
Certifying Set Attractivity for Discrete-Time Uncertain Nonlinear Switched Systems
We introduce a new class of functions, called Attractivity Guarantee (AG)-functions, to certify the attractivity of sets for uncertain nonlinear switched systems in discrete time. The existence of an AG-function associated with a set guarantees the robust local attractivity of that set under the system dynamics. We propose a constructive method for obtaining piecewise-continuous AG-functions based on contractive sets for the system, and show that the existence of a robust control contractive set for the dynamics implies the existence of an appropriate AG-function, and hence the robust local attractivity of the set itself. We illustrate the proposed framework through examples that elucidate the theoretical concepts, and through the case study of a nonlinear switched system modelling antimicrobial resistance, which highlights the practical relevance of the approach to the analysis of biological systems.
Risk-Based PV-Rich Distribution System Planning Using Generative AI
Hosting capacity (HC) assessment plays a critical role in distribution system planning under increasing penetration of distributed energy resources (DERs) and associated uncertainties in load and generation. However, conventional approaches often rely on deterministic worst-case evaluation, leading to overly conservative HC estimates. This paper introduces a risk-based framework for HC assessment that explicitly accounts for the frequency, intensity, and duration of voltage violations under uncertain operating conditions. A generative AI-based approach is employed to generate realistic, time-correlated load demand scenarios conditioned on projected energy consumption growth levels. These scenarios are then used to assess voltage violations and quantify their risk using probabilistic intensity, duration, and frequency (IDF) metrics. The results show that extreme-percentile (zero-risk) approaches significantly underestimate PV-HC by treating all violations equally, regardless of their likelihood or persistence. For instance, allowing a 5% risk level increases HC by approximately 18% for a 15 min violation duration. The proposed approach provides a practical tool for risk-informed distribution system planning under uncertainty.
TEACar: An Open-Source Autonomous Driving Platform
Intelligent Transportation Systems (ITS) increasingly rely on vision-based perception and learning-based control, necessitating experimental platforms that support realistic hardware-in-the-loop validation. Small-scale platforms for autonomous racing offer a practical path to hardware validation, but often suffer from limited modularity, high integration complexity, or restricted extensibility. This paper presents TEACAR, a 1/14- to 1/16-scale autonomous driving platform designed with modular mechanical architecture, hardware abstraction, and ROS 2-based software. The system adopts a four-layer deck structure that physically decouples sensing, computation, actuation, and power subsystems, improving structural rigidity while simplifying reconfiguration. We constructed and comprehensively evaluated the prototype of TEACAR. Its mechanical stability, structural characteristics, and software performance were quantified based on three CNN-based steering controllers. Inference latency, power consumption, and system operating time were measured to evaluate computational capability and robustness. Our experiments demonstrated that TEACAR offers a scalable, modular, and cost-effective testbed for ITS research, education, and development. Our project repository is available on GitHub.
Goal-Oriented Sensor Reporting Scheduling for Non-linear Dynamic System Monitoring
Goal-oriented communication (GoC) is a form of semantic communication where the effectiveness of information transmission is measured by its impact on achieving the desired goal. In Internet-of-Things (IoT) networks, GoC can enable sensors to selectively transmit data relevant to intended goals of the receiver, thereby facilitating timely decision-making, reducing network congestion, and enhancing spectral efficiency. In this paper, we consider an IoT scenario where an edge node polls sensors monitoring the state of a non-linear dynamic system (NLDS) to respond to the queries of several clients. This work delves into the foregoing GoC problem and solution, which we termed goal-oriented scheduling (GoS). The latter utilizes deep reinforcement learning (DRL) with meticulously devised action space, state space, and reward function. A long short-term memory network is used to estimate the inter-query duration and the corresponding estimation standard deviation. This empowers the proposed DRL scheduler to make judicious decisions, even when no queries are posed, which would later lead to the minimization of the mean square error (MSE) of the query responses. Numerical analysis demonstrates that the proposed GoS obtains a smaller MSE compared to the benchmark scheduling methods while being of lower complexity. Moreover, this is attained without polling sensors during 77%-88% of the testing phase, thus, resulting beneficial in terms of energy efficiency.
Simple Trajectory Smoothing for UAV Reference Path Planning Based on Decoupling, Spatial Modeling and Linear Programming
A method for trajectory smoothing for UAV reference path planning is presented. It is derived based on the dynamics of a Dubins airplane model, and involves a decoupling step, spatial modeling and linear programming. The decoupling step enables algebraic control laws for flight-path angle and speed control. Only for roll angle control an optimization step is applied, involving the solution of a small linear program. Two variations are discussed. They differ by reference centerline tracking and the introduction of a path shaping constraint. The benefit of natural dimensionality reduction for spatial modeling is discussed. The simplicity of the overall method is highlighted. An extension to aerobatic flight is outlined, which comes at the cost of a model approximation, however at the gain of maintaining the general model structure. An extension of the method to tractor path planning along 3D terrain is discussed. The method is validated in simulations.
comment: 7 pages, 6 figures
ReasonSTL: Bridging Natural Language and Signal Temporal Logic via Tool-Augmented Process-Rewarded Learning
Signal Temporal Logic (STL) is an expressive formal language for specifying spatio-temporal requirements over real-valued, real-time signals. It has been widely used for the verification and synthesis of autonomous systems and cyber-physical systems. In practice, however, users often express their requirements in natural language rather than in structured STL formulas, making natural-language-to-STL translation a critical yet challenging task. Manual specification requires temporal-logic expertise and cannot scale, while prompting commercial LLM APIs incurs substantial token costs and may expose sensitive system requirements to third-party services, raising privacy concerns for industrial deployment. To address these challenges, we present \textsc{ReasonSTL}, a tool-augmented framework that adapts local open-source language models for natural-language-to-STL generation. \textsc{ReasonSTL} decomposes the translation process into explicit reasoning, deterministic tool calls, and structured formula construction. We further introduce process-rewarded training to supervise both tool-use trajectories and final formulas, together with \textsc{STL-Bench}, a bilingual, computation-aware benchmark grounded in real-world signals. Experiments show that a 4B model trained with \textsc{ReasonSTL} achieves state-of-the-art performance in both automatic metrics and human evaluations, demonstrating that \textsc{ReasonSTL} provides a transparent, low-cost, and privacy-preserving alternative for formal specification drafting.
Robotics
Multi-Robot Coordination in V2X Environments SC
This paper presents a Vehicle-to-Everything (V2X) communication framework that enables decentralized cooperation among social robots operating in complex urban traffic environments. Building on ETSI Cooperative Awareness and Maneuver Coordination services, the framework introduces two robot-centric facility-layer services: the Robot Awareness Service (RAS) and the Robot Maneuver Coordination Service (RMCS), realized through the Robot Awareness Message (RAM) and the Robot Maneuver Coordination Message (RMCM), respectively. RAS enables role-aware, task-oriented robot awareness while integrating externally detected Vulnerable Road Users (VRUs), including non-V2X pedestrians, into cooperative awareness. RMCS supports event-driven, low-latency coordination of robot maneuvers under explicitly established roles, without centralized infrastructure or prior pairing. A real-world proof of concept demonstrates deterministic multi-robot coordination between a humanoid robot and a quadrupedal robot assisting a pedestrian during a road-crossing scenario, governed by a formally specified finite-state coordination model. Complementary simulations evaluate robot-mediated VRU clustering in mixed V2X environments, showing that RAS-based clustering integrates non-V2X VRUs in safety-critical areas while reducing redundant transmissions from V2X-enabled VRUs, thereby lowering channel load. Together, the proposed services provide a scalable and standards-aligned foundation for integrating cooperative robots into future Connected, Cooperative, and Automated Mobility ecosystems.
comment: Accepted for publication at the IEEE Intelligent Transportation Systems Conference (ITSC), 2026
Cross-Modal Navigation with Multi-Agent Reinforcement Learning
Robust embodied navigation relies on complementary sensory cues. However, high-quality and well-aligned multi-modal data is often difficult to obtain in practice. Training a monolithic model is also challenging as rich multi-modal inputs induce complex representations and substantially enlarge the policy space. Cross-modal collaboration among lightweight modality-specialized agents offers a scalable paradigm. It enables flexible deployment and parallel execution, while preserving the strength of each modality. In this paper, we propose \textbf{CRONA}, a Multi-Agent Reinforcement Learning (MARL) framework for \textbf{Cro}ss-Modal \textbf{Na}vigation. CRONA improves collaboration by leveraging control-relevant auxiliary beliefs and a centralized multi-modal critic with global state. Experiments on visual-acoustic navigation tasks show that multi-agent methods significantly improve performance and efficiency over single-agent baselines. We find that homogeneous collaboration with limited modalities is sufficient for short-range navigation under salient cues; heterogeneous collaboration among agents with complementary modalities is generally efficient and effective; and navigation in large, complex environments requires both richer multi-modal perception and increased model capacity.
ReActor: Reinforcement Learning for Physics-Aware Motion Retargeting SIGGRAPH 2026
Retargeting human kinematic reference motion onto a robot's morphology remains a formidable challenge. Existing methods often produce physical inconsistencies, such as foot sliding, self-collisions, or dynamically infeasible motions, which hinder downstream imitation learning. We propose a bilevel optimization framework that jointly adapts reference motions to a robot's morphology while training a tracking policy using reinforcement learning. To make the optimization tractable, we derive an approximate gradient for the upper-level loss. Our framework requires only a sparse set of semantic rigid-body correspondences and eliminates the need for manual tuning by identifying optimal values for a parameterization expressive enough to preserve characteristic motion across different embodiments. Moreover, by integrating retargeting directly with physics simulation, we produce physically plausible motions that facilitate robust imitation learning. We validate our method in simulation and on hardware, demonstrating challenging motions for morphologies that differ significantly from a human, including retargeting onto a quadruped.
comment: SIGGRAPH 2026
Lie Group Formulation of Recursive Dynamics Algorithms of Higher Order for Floating-Base Robots
In this paper, we describe procedures for computing higher-order time derivatives of the Lie-group Newton-Euler, Articulated-Body Inertia, and hybrid dynamics algorithms for floating-base trees, where the base configuration evolves on SE(3) and the attached mechanism is an open kinematic tree with configuration on the (n1+n2)-dimensional manifold T^{n1} \times R^{n2}, using spatial representation of twists. After presenting the algorithms, we collect the resulting recursions into closed-form equations of motion, identifying an admissible Coriolis matrix satisfying the passivity property, and showing that the articulated inertia tensor remains unchanged across all time derivatives. We then apply the developed methods to a 12-DoF aerial manipulator to derive analytical expressions for its geometric forward and inverse dynamics along with their first time derivatives whereas the numerical simulations successfully evaluate these dynamics up to fifth order. Finally, to demonstrate their practical utility, we benchmark the proposed extensions and show that, in the considered tests, their computational cost scales quadratically with the derivative order, whereas the automatic-differentiation baseline exhibits exponential scaling.
OA-WAM: Object-Addressable World Action Model for Robust Robot Manipulation
World Action Models (WAMs) enhance Vision-Language-Action policies by jointly predicting scene evolution and robot actions, but existing methods usually represent the predicted world as holistic images, video tokens, or global latents. These representations are difficult for an action decoder to address when an instruction refers to a particular object, especially under scene shifts where object identity is entangled with context. We propose OA-WAM, an Object-Addressable World Action Model for robust robot manipulation. OA-WAM decomposes each frame into N+1 slot states, with one robot slot and N object slots. Each slot contains a persistent address vector and a time-varying content vector, and is fused with text, image, proprioception, and past-action tokens in a block-causal sequence. A world head predicts next-frame slot states, while a flow-matching action head decodes a 16-step continuous action chunk in the same forward pass. Addressability is enforced by routing cross-slot attention through address-only keys and resetting the address slice at every transformer layer, separating which object to act on from what that object currently is without adding extra tokens. OA-WAM matches strong VLA and WAM baselines on LIBERO (97.8%) and SimplerEnv (79.3%), reaches state-of-the-art performance on the most relevant LIBERO-Plus geometric axes, and remains competitive on the seven-axis aggregate. A causal slot-intervention test yields a swap-binding cosine of 0.87, versus at most 0.09 for holistic baselines. These results suggest that addressable object states provide an effective interface for robust world-action modeling under scene perturbations.
GA3T: A Ground-Aerial Terrain Traversability Dataset for Heterogeneous Robot Teams in Unstructured Environments
Heterogeneous air-ground robot teams combine complementary sensing modalities, mobility characteristics, and spatial viewpoints that can significantly enhance perception in complex outdoor environments. However, progress in multi-robot collaborative perception has been constrained by the lack of real-world datasets featuring overlapping multi-modal observations from platforms operating in unstructured terrain. We present GA3T (Ground-Aerial Team for Terrain Traversal), a real-world multi-robot collaborative perception dataset collected using a Clearpath Husky UGV and an Autel EVO~II UAV across diverse unstructured environments, including forest trails, rocky paths, muddy terrain, snow piles, and grass-covered fields. The ground platform provides 3D LiDAR, stereo camera, IMU, and GPS data, while the aerial platform contributes RGB imagery, thermal/infrared observations, and GPS from a complementary overhead viewpoint, allowing for rich cross-modal and cross-view perception. The dataset is collected in 4 unique environments, with over 13,000 synchronized frames across approximately 29 minutes of operation, and includes both SAM~3-based zero-shot segmentation and over 8,000 manually labeled images. A unique aspect of the dataset is its early-spring collection period, during which sparse tree canopies allow the aerial robot to partially observe the ground robot and terrain through the trees, allowing for occlusion-aware collaborative perception. Unlike prior multi-robot datasets that focus on SLAM or simulated cooperative driving, GA3T is specifically designed to support research on cross-view perception, air-ground viewpoint fusion, traversability estimation, and collaborative scene understanding in real off-road environments.
comment: For DARS 2026
TouchDrive: Electronics-Free Tactile Sensing Interface for Assistive Grasping ICRA 2026
Assistive robotic grasping plays an important role in enabling safe and adaptive manipulation of diverse objects. However, existing systems often rely on electronic sensing and multi-stage processing pipelines, increasing system complexity and reducing accessibility. To address these limitations, we present TouchDrive, a cost-effective, electronics-free tactile sensing interface for assistive grasping. TouchDrive directly converts contact forces into pneumatic feedback through valve-mediated switching, integrating sensing, signal generation, and feedback within a single passive mechanical loop. The system can be employed using a pneumatic normally closed valve, a compressed air tank, sensing element, and haptic feedback actuator without electronics. By delivering tactile cues, TouchDrive empowers users to modulate grasp forces, enabling precise and robust delicate manipulation of compliant and fragile objects. The interface has been validated across diverse robotic platforms, consistently demonstrating reliable performance and practical applicability in assistive grasping tasks, such as handling fruits and everyday items (up to 20 objects).
comment: Accepted at ICRA 2026 workshop on Visuo-Tactile Perception, Learning, Control for Manipulation: Embodied Tactile Intelligence in Predictive Perception, Learning & Control in Grasping & Manipulation, Emerging the Role of Embodiment and Visuo -Tactile - LLM Foundation Models (ICRA RoboTac 2026)
Reconstruction or Semantics? What Makes a Latent Space Useful for Robotic World Models
World model-based policy evaluation is a practical proxy for testing real-world robot control by rolling out candidate actions in action-conditioned video diffusion models. As these models increasingly adopt latent diffusion modeling (LDM), choosing the right latent space becomes critical. While the status quo uses autoencoding latent spaces like VAEs that are primarily trained for pixel reconstruction, recent work suggests benefits from pretrained encoders with representation-aligned semantic latent spaces. We systematically evaluate these latent spaces for action-conditioned LDM by comparing six reconstruction and semantic encoders to train world model variants under a fixed protocol on BridgeV2 dataset, and show effective world model training in high-dimensional representation spaces with and without dimension compression. We then propose three axes to assess robotic world model performance: visual fidelity, planning and downstream policy performance, and latent representation quality. Our results show visual fidelity alone is insufficient for world model selection. While reconstruction encoders like VAE and Cosmos achieve strong pixel-level scores, semantic encoders such as V-JEPA 2.1 (strongest overall on policy), Web-DINO, and SigLIP 2 generally excel across the other two axes at all model scales. Our study advocates semantic latent space as stronger foundation for policy-relevant robotics diffusion world models.
comment: 9 pages
AssistDLO: Assistive Teleoperation for Deformable Linear Object Manipulation
Manipulating Deformable Linear Objects (DLOs) is challenging in robotics due to their infinite-dimensional configuration space and complex nonlinear dynamics. In teleoperation, depth uncertainty hinders state perception and reaction. AssistDLO addresses this challenge as an assistive teleoperation framework for DLO manipulation that combines real-time multi-view state estimation, visual assistance (VA), and a geometry-aware shared-autonomy controller based on Control Barrier Functions (SA-CBF). While traditional shared autonomy methods often rely on simple geometric attractors and may fail to preserve DLO geometry, SA-CBF acts as a geometry-aware funnel, facilitating precise grasping while preserving the operator's high-level authority. The framework is evaluated in a bimanual knot-untangling user study (N = 22) using ropes with varying length and rigidity. Results show that the effectiveness of the assistance depends strongly on operator expertise and DLO properties. SA-CBF provides the strongest gains for naive users, acting as a skill equalizer that increases task success from 71% to 88%, and is effective for stiffer ropes. Conversely, expert users prefer VA, and highly compliant, long ropes benefit more from visual support than localized action assistance. Ultimately, these findings demonstrate that effective DLO teleoperation cannot rely on a fixed strategy, highlighting the critical need for adaptive, user-aware, and material-aware shared autonomy.
comment: 20 pages, 14 figures. Submitted to a peer-reviewed journal
Toward Visually Realistic Simulation: A Benchmark for Evaluating Robot Manipulation in Simulation
Reliable simulation evaluation of robot manipulation policies serves as a high-fidelity proxy for real-world performance. Although existing benchmarks cover a wide range of task categories, they lack visual realism, creating a large domain gap between simulation and reality. This undermines the reliability of simulation-based evaluation in predicting real-world performance. To mitigate the sim-to-real visual gap, we conduct a systematic analysis to isolate the effects of lighting and material. Our results show that these factors play a critical role in geometric reasoning and spatial grounding, yet are largely overlooked in existing benchmarks. Motivated by the analysis, we propose VISER, a visually realistic benchmark for evaluating robot manipulation in simulation. VISER features a high-fidelity dataset of over 1,000 3D assets with physically-based rendering (PBR) materials, along with 3D scenes created from these assets through curated layouts or generation. To this end, we propose an automated pipeline leveraging Multi-modal Large Language Models (MLLMs) for material-aware part segmentation and material retrieval, enabling scalable generation of physically plausible assets. Building on the high-fidelity 3D asset dataset, we construct diverse evaluation tasks, such as grasping, placing, and long-horizon tasks, enabling scalable and reproducible assessment of Vision-Language-Action (VLA) models. Our benchmark shows a strong correlation between simulation and real-world performance, achieving an average Pearson correlation coefficient of 0.92 across different policies.
CKT-WAM: Parameter-Efficient Context Knowledge Transfer Between World Action Models
World action models (WAMs) provide a powerful generative framework for embodied control, yet transferring knowledge across heterogeneous WAMs remains challenging due to mismatched latent interfaces, high adaptation cost, and the rigidity of conventional distillation objectives. We propose \textbf{CKT-WAM}, a parameter-efficient \textbf{C}ontext \textbf{K}nowledge \textbf{T}ransfer framework that transfers teacher WAM's knowledge into a student WAM through a compact context in the text embedding space, rather than output imitation or dense hidden-state matching. Specifically, CKT-WAM extracts intermediate teacher hidden states, reduces the number of tokens via compressors' learnable-query cross attention (LQCA), and transforms them through an always-on generalized adapter, a lightweight router, and sparsely activated specialized adapters. The resulting context is then appended to the student's conditioning textual embeddings, thereby injecting the transferred knowledge into the student with minimal architectural modification. Experiments show that CKT-WAM consistently improves zero-shot generalization and achieves the best overall performance on LIBERO-Plus, reaching 86.1\% total success rate with only 1.17\% trainable parameters, while approaching full fine-tuning performance. Beyond simulation, CKT-WAM also demonstrates strong real-world long-horizon manipulation ability, achieving the best average success rate of 83.3\% across four multi-step and long-horizon tasks. Code is available at https://github.com/YuhuaJiang2002/CKT-WAM.
Structure-Preserving Gaussian Processes Via Discrete Euler-Lagrange Equations
In this paper, we propose Lagrangian Gaussian Processes (LGPs) for probabilistic and data-efficient learning of dynamics via discrete forced Euler-Lagrange equations. Importantly, the geometric structure of the Lagrange-d'Alembert principle, which governs the motion of dynamical systems, is preserved by construction in the absence of external forces. This allows learning physically consistent models that overcome erroneous drift in the system's energy, thereby providing stable long-term predictions. At the core of our approach lie linear operators for Gaussian process conditioning, constructed from discrete forced Euler-Lagrange equations and variational discretization schemes. Thereby and unlike prior work, the method enables learning dynamics from discrete position snapshots, i.e., without access to a system's velocities or momenta. This is particularly relevant for a large class of practical scenarios where only position measurements are available, for instance, in motion capture or visual servoing applications. We demonstrate the data-efficiency and generalization capabilities of the LGPs in various synthetic and real-world case studies, including a real-world soft robot with hysteresis. The experimental results underscore that the LGPs learn physically consistent dynamics with uncertainty quantification solely from sparse positional data and enable stable long-term predictions.
comment: 30 pages
RobotEQ: Transitioning from Passive Intelligence to Active Intelligence in Embodied AI
Embodied AI is a prominent research topic in both academia and industry. Current research centers on completing tasks based on explicit user instructions. However, for robots to integrate into human society, they must understand which actions are permissible and which are prohibited, even without explicit commands. We refer to the user-guided AI as passive intelligence and the unguided AI as active intelligence. This paper introduces RobotEQ, the first benchmark for active intelligence, aiming to assess whether existing models can comprehend and adhere to social norms in embodied scenarios. First, we construct RobotEQ-Data, a dataset consisting of 1,900 egocentric images, spanning 10 representative embodied categories and 56 subcategories. Through extensive manual annotation, we provide 5,353 action judgment questions and 1,286 spatial grounding questions, specifying appropriate robot actions across diverse scenarios. Furthermore, we establish RobotEQ-Bench to evaluate the performance of state-of-the-art models on this task. Experimental results show that current models still fall short in achieving reliable active intelligence, particularly in spatial grounding. Meanwhile, we observe that leveraging RAG techniques to incorporate external social norm knowledge bases can generally enhance performance. This work can facilitate the transition of robotics from user-guided passive manipulation to active social compliance.
Proactive Instance Navigation with Comparative Judgment for Ambiguous User Queries
Natural-language instance navigation becomes challenging when the initial user request does not uniquely specify the target instance. A practical agent should reduce the user's burden by actively asking only the information needed to distinguish the target from similar distractors, rather than requiring a detailed description upfront. Existing approaches often fall short of this goal: they may stop at the first plausible candidate before sufficiently exploring alternatives, or, even after collecting multiple candidates, ask about the target's attributes derived from individual candidates rather than questions selected to distinguish candidates in the pool. As a result, despite the dialogue, the agent may still fail to distinguish the target from distractors, leading to premature decisions and lengthy user responses. We propose Proactive Instance Navigation with Comparative Judgment (ProCompNav), a two-stage framework that first constructs a candidate pool and then identifies the target through comparative judgment. At each round, ProCompNav extracts an attribute-value pair that splits the current pool, asks a binary yes/no question, and prunes all inconsistent candidates at once. This reframes disambiguation from open-ended target description to pool-level discriminative questioning, where each question is chosen to narrow the candidate set. On CoIN-Bench, ProCompNav improves Success Rate over interactive baselines with the same minimal input and non-interactive baselines with detailed descriptions, while substantially reducing Response Length. ProCompNav also achieves state-of-the-art Success Rate on TextNav, suggesting that comparative judgment is broadly useful for instance-level navigation among similar distractors.
comment: 17 pages, 6 figures
When to Trust Imagination: Adaptive Action Execution for World Action Models
World Action Models (WAMs) have recently emerged as a promising paradigm for robotic manipulation by jointly predicting future visual observations and future actions. However, current WAMs typically execute a fixed number of predicted actions after each model inference, leaving the robot blind to whether the imagined future remains consistent with the actual physical rollout. In this work, we formulate adaptive WAM execution as a future-reality verification problem: the robot should execute longer when the WAM-predicted future remains reliable, and replan earlier when reality deviates from imagination. To this end, we propose Future Forward Dynamics Causal Attention (FFDC), a lightweight verifier that jointly reasons over predicted future actions, predicted visual dynamics, real observations, and language instructions to estimate whether the remaining action rollout can still be trusted. FFDC enables adaptive action chunk sizes as an emergent consequence of prediction-observation consistency, preserving the efficiency of long-horizon execution while restoring responsiveness in contact-rich or difficult phases. We further introduce Mixture-of-Horizon Training to improve long-horizon trajectory coverage for adaptive execution. Experiments on the RoboTwin benchmark and in the real world demonstrate that our method achieves a strong robustness-efficiency trade-off: on RoboTwin, it reduces WAM forward passes by 69.10% and execution time by 34.02%, while improving success rate by 2.54% over the short-chunk baseline; in real-world experiments, it improves success rate by 35%.
EA-WM: Event-Aware Generative World Model with Structured Kinematic-to-Visual Action Fields
Pretrained video diffusion models provide powerful spatiotemporal generative priors, making them a natural foundation for robotic world models. While recent world-action models jointly optimize future videos and actions, they predominantly treat video generation as an auxiliary representation for policy learning. Consequently, they insufficiently explore the inverse problem: leveraging action signals to guide video synthesis, thereby often failing to preserve precise robot spatial geometry and fine-grained robot-object interaction dynamics in the generated rollouts. To bridge this gap, we present EA-WM, an Event-Aware Generative World Model that effectively closes the loop between kinematic control and visual perception. Rather than injecting joint or end-effector actions as abstract, low-dimensional tokens, EA-WM projects actions and kinematic states directly into the target camera view as Structured Kinematic-to-Visual Action Fields. To fully exploit this geometrically grounded representation, we introduce event-aware bidirectional fusion blocks that modulate cross-branch attention, capturing object state changes and interaction dynamics. Evaluated on the comprehensive WorldArena benchmark, EA-WM achieves state-of-the-art performance, outperforming existing baselines by a significant margin.
comment: Preprint. 22 pages, 10 figures
VLA-GSE: Boosting Parameter-Efficient Fine-Tuning in VLA with Generalized and Specialized Experts
Vision-language-action (VLA) models inherit rich visual-semantic priors from pre-trained vision-language backbones, but adapting them to robotic control remains challenging. Full fine-tuning (FFT) is prone to overfitting on downstream robotic data and catastrophic forgetting of pretrained vision-language capabilities. Parameter-efficient fine-tuning (PEFT) better preserves pre-trained knowledge, yet existing PEFT methods still struggle to adapt effectively to robot control tasks. To address this gap, we propose VLA-GSE, a parameter-efficient VLA fine-tuning framework that improves control adaptation while retaining PEFT's knowledge preservation advantage. Specifically, VLA-GSE (Generalized and Specialized Experts) is initialized by spectrally decomposing the frozen backbone, assigning leading singular components to generalized experts (shared experts) and disjoint residual components to specialized experts (routed experts). This decomposition improves adaptation capacity under a fixed trainable-parameter budget. Under a comparable parameter budget, VLA-GSE updates only 2.51% of the full model parameters and consistently outperforms strong FFT and PEFT baselines. It achieves 81.2% average zero-shot success on LIBERO-Plus, preserves pre-trained VLM capability comparably to LoRA on multimodal understanding benchmarks, and improves real-world manipulation success under multiple distribution shifts. Code is available at: https://github.com/YuhuaJiang2002/VLA-GSE
CredibleDFGO: Differentiable Factor Graph Optimization with Credibility Supervision
Global navigation satellite system (GNSS) positioning is widely used for urban navigation, but the covariance reported by the GNSS solver is often unreliable in urban canyons. Existing differentiable factor graph optimization (DFGO) methods already learn measurement weighting through the solver, but they still use position-only objectives. As a result, the mean estimate may improve while the reported covariance remains too small, too large, or wrong in shape. In this work, we propose CredibleDFGO (CDFGO), a differentiable GNSS factor graph framework that makes covariance credibility an explicit training target. The Weighting Generation Network (WGN) predicts per-satellite reliability weights. The differentiable Gauss--Newton solver maps these weights to a position estimate and posterior covariance, and proper scoring rules supervise the East--North predictive distribution end-to-end. We study negative log-likelihood (NLL), Energy Score (ES), and their combination. Results on three UrbanNav test scenes show consistent gains in uncertainty credibility. Positioning accuracy also improves on the medium-urban and harsh-urban scenes, and the mean horizontal error and 95th-percentile error improve on the deep-urban scene. On the harsh-urban Mong Kok (MK) scene, CDFGO-Combined reduces the mean horizontal error from 13.77\,m to 11.68\,m, reduces NLL from 40.63 to 6.59, and reduces ES from 12.31 to 9.05. The case studies link the MK improvement to better axis-wise consistency, more credible local covariance ellipses, and satellite-level reweighting.
comment: Submitted to NAVIGATION: Journal of the Institute of Navigation
Monitoring autonomous persistent surveillance missions using invariance ICRA 2026
This paper studies runtime monitoring for persistent surveillance by autonomous robots when the autonomy stack is a black box. The environment is partitioned into finitely many parts, each carrying an uncertainty state that decreases when observed and increases otherwise. We model the closed loop as a state-dependent hybrid system with linear parameter varying dynamics and design a monitor based on an invariant computed offline. As this invariant is typically hard to obtain for large to-be-surveyed spaces, we propose a compositional monitor obtained by decentralized computation of low-dimensional invariant sets for each uncertainty region, and checking their conjunction online. Under common independence assumptions, the compositional monitor is sound and complete with respect to the full-system invariant. The approach is applied in a case study with a real robot persistently monitoring a labyrinth, emphasizing its applicability in practice.
comment: Accepted at IEEE ICRA 2026
Accurate Trajectory Tracking with MPCC for Flapping-Wing MAVs
Flapping-wing micro aerial vehicles offer quieter and safer operation than rotary-wing drones, yet achieving precise autonomous control of bird-scale ornithopters remains challenging: lift, airspeed, and turning authority are tightly coupled and governed by only a few control inputs. Conventional cascaded controllers treat altitude, speed, and heading independently, producing persistent tracking errors during complex maneuvers, while time-parameterized trajectory tracking requires predefined speed profiles that existing methods cannot robustly produce for these coupled dynamics. We address both limitations simultaneously with a Model Predictive Contouring Control (MPCC) approach that tracks arc-length-parameterized trajectories while optimizing progress online, eliminating the need for predefined timing. However, MPCC requires a dynamical model that captures the coupled aerodynamics without exceeding the computational budget of real-time nonlinear optimization. Here, we propose a compact, continuously differentiable model that captures the dominant couplings of bird-scale ornithopters, enabling real-time predictive control. We validated the method with the XFly ornithopter flying along circular and three-dimensional racing trajectories and achieved a mean deviation from the reference trajectory between 6.5 and 9 cm at speeds up to 3 m/s, which represents an almost 10-fold improvement over prior ornithopter control methods.
comment: 7 pages, 6 figures
Plug-and-Play Label Map Diffusion for Universal Goal-Oriented Navigation ICML 2026
In embodied vision, Goal-Oriented Navigation (GON) requires robots to locate a specific goal within an unexplored environment. The primary challenge of GON arises from the need to construct a Bird's-Eye-View (BEV) map to understand the environment while simultaneously localizing an unobserved goal. Existing map-based methods typically employ self-centered semantic maps, often facing challenges such as reliance on complete maps or inconsistent semantic association. To this end, we propose Plug-and-Play Label Map Diffusion (PLMD), which defines a novel map completion diffusion model based on Denoising Diffusion Probabilistic Models (DDPM). PLMD generates obstacle and semantic labels for unobserved regions through a diffusion-based completion process, thereby enabling goal localization even in partially observed environments. Moreover, it mitigates inconsistent semantic association by leveraging structural consistency between known and unknown obstacle layouts and integrating obstacle priors into the semantic denoising process. By substituting predicted labels for unobserved regions, robots can accurately localize the specified objects. Extensive experiments demonstrate that PLMD \textbf{(I)} effectively expands the region of unknown maps, \textbf{(II)} integrates seamlessly into existing navigation strategies that rely on semantic maps, \textbf{(III)} achieves state-of-the-art performance on three GON tasks.
comment: 21 pages, 10 figures, Extended Version of accepted ICML 2026 Paper
DexSynRefine: Synthesizing and Refining Human-Object Interaction Motion for Physically Feasible Dexterous Robot Actions
Learning dexterous manipulation from human-object interaction (HOI) data is a scalable alternative to teleoperation, but HOI demonstrations are sparse and provide only kinematic motion that is not directly executable under embodiment mismatch and contact-rich dynamics. We present DexSynRefine, a framework with three coupled components: HOI-MMFP, a task- and object-initial-state-conditioned extension of motion manifold primitives that synthesizes coordinated hand-object trajectories from sparse HOI demonstrations; a task-space residual RL policy that physically grounds the synthesized reference while inheriting its kinematic structure; and a contact-and-dynamics adaptation module that enables sim-to-real transfer from proprioceptive history. Across five dexterous manipulation tasks spanning pick-and-place, tool use, and object reorientation, our task-space residual policy outperforms prior action-representation baselines in simulations and transfers to a real robot on all five tasks, improving over kinematic retargeting by 50-70 percentage points.
comment: Project page: https://dexsynrefine.github.io/
Generating Roadside LiDAR Datasets from Vehicle-Side Datasets via Novel View Synthesis
Intelligent Transportation Systems (ITS) require reliable environmental perception to support safe and efficient transportation. With the rapid development of Vehicle-to-everything (V2X), roadside perception has become an effective means to extend sensing coverage and improve traffic safety. However, the scarcity of large-scale annotated roadside LiDAR datasets poses a major challenge for training high-performance roadside perception models. In this paper, we introduce Vehicle-to-Roadside LiDAR Synthesis (VRS), a data synthesis framework that generates labeled roadside LiDAR datasets from vehicle-side datasets via LiDAR novel view synthesis. To mitigate the vehicle-to-roadside domain gap, VRS employs vehicle point cloud completion to compensate for missing geometry in vehicle-side observations, and introduces an occupancy-based visibility constraint to handle large viewpoint changes during cross-view rendering. The proposed framework enables flexible multi-view rendering for scalable roadside data generation. Extensive experiments on roadside 3D object detection demonstrate that the synthesized data effectively complements real roadside data, mitigates the limitations of limited real-world roadside data, and improves generalization to unseen roadside viewpoints.
Cycle-resolved Cephalopod-Inspired Pulsed-Jet Robot With High-Volume Expulsion and Drag-Reduced Gliding
Cephalopod pulsed-jet locomotion is not a single isolated expulsion event, but a coordinated cycle involving jet expulsion, passive gliding, and mantle refilling. Inspired by this cycle-resolved biological strategy, this paper presents a cephalopod-inspired pulsed-jet robot with a rigid-soft hybrid origami mantle that enables large, actively driven, and geometry-guided body deformation. The proposed mantle integrates rigid folding panels with a compliant silicone framework, allowing a 75% effective cavity-volume reduction during expulsion and reducing the projected cross-sectional drag area by approximately 75.7% in the contracted gliding configuration. Using this platform, we formulate a cycle-resolved framework to separately investigate how expelled volume, glide duration, and refill pathway influence whole-cycle locomotion performance. Experiments show that the robot reaches a peak speed of approximately 0.5 m/s (3.8 BL/s) and an average speed exceeding 0.2 m/s (1.5 BL/s) within the first jetting cycle. The results further demonstrate the roles of high expelled-volume-ratio contraction in speed generation, reduced-drag-area gliding under different glide durations, and mantle-aperture-inspired passive inlet valves in assisting refill. This work provides both a robotic implementation of actively deformable cephalopod-like jet propulsion and a unified experimental platform for studying expulsion-gliding-refilling dynamics in pulsed-jet locomotion.
comment: This work has been submitted to the IEEE for possible publication
A Comparative Study of INDI and NDI with Nonlinear Disturbance Observer for Aerial Robotics
This work presents a simulation-based comparative robustness analysis of Incremental Nonlinear Dynamic Inversion (INDI) and Nonlinear Dynamic Inversion augmented with a nonlinear disturbance observer (NDI+NDO) for fully actuated aerial robots. A systematic simulation campaign across representative operating scenarios is conducted, where we compare tracking performance, robustness, control effort, under parametric variations, external disturbances, and measurement noise. Results show that INDI demonstrates stronger robustness in several model-mismatch and combined-stress cases, while NDI+NDO primarily matches nominal performance but exhibits greater sensitivity under several non-ideal conditions. These findings provide practical guidance on the relative strengths and limitations of incremental and observer-based inversion strategies for aerial robotic applications.
Resource-Constrained Robotic Planning in the face of Mixed Uncertainty
Robots operate under significant uncertainty, from quantifiable noise to unquantifiable unknowns, and must account for strict operational constraints, such as limited resources. In this paper, we consider the problem of synthesizing robust strategies to guide a robot's actions in fulfilling a given task, while ensuring the system never exhausts its resources. To solve this problem, we first model the robotic system as a Consumption Markov Decision Process with Set-valued Transitions(CMDPST), a unified framework modelling nondeterministic actions, quantifiable and unquantifiable uncertainty, and resource consumption. Then, we combine the CMDPST with the task specification, expressed as a Linear Temporal Logic over finite traces (LTLf ) formula. Lastly, we address the resource constrained optimal robust strategy synthesis problem, which aims to synthesize a strategy that maximizes the probability of satisfying the LTLf objective without resource exhaustion. Our solution involves two techniques: a direct unrolling-based method and a more efficient, optimized approach that leverages state-space pruning for better performance. Experiments on a warehouse transportation network show the effectiveness of the proposed solutions.
MaMi-HOI: Harmonizing Global Kinematics and Local Geometry for Human-Object Interaction Generation
Generating realistic 3D Human-Object Interactions (HOI) is a fundamental task for applications ranging from embodied AI to virtual content creation, which requires harmonizing high-level semantic intent with strict low-level physical constraints. Existing methods excel at semantic alignment, however, they struggle to maintain precise object contact. We reveal a key finding termed \textit{Geometric Forgetting}: as diffusion model depth increases, semantic feature tend to overshadow object geometry feature, causing the model to lose its perception to object geometry. To address this, we propose MaMi-HOI, a hierarchical framework reconciling \textbf{Ma}cro-level kinematic fluidity with \textbf{Mi}cro-level spatial precision. First, to counteract geometric forgetting, we introduce the Geometry-Aware Proximity Adapter (GAPA), which explicitly re-injects dense object details to perform residual snapping corrections for precise contact. Nevertheless, such aggressive local enforcement can disrupt global dynamics, leading to robotic stiffness. In response, we introduce the Kinematic Harmony Adapter (KHA), which proactively aligns whole-body posture with spatial objectives, ensuring the skeleton actively accommodates constraints without compromising naturalness. Extensive experiments validate that MaMi-HOI simultaneously achieves natural motion and precise contact. Crucially, it extends generation capabilities to long-term tasks with complex trajectories, effectively bridging the gap between global navigation and high-fidelity manipulation in 3D scenes. Code is available at https://github.com/DON738110198/MaMi-HOI.git
TriRelVLA: Triadic Relational Structure for Generalizable Embodied Manipulation
Vision-language-action (VLA) models perform well on training-seen robotic tasks but struggle to generalize to unseen scenes and objects. A key limitation lies in their implicit visual representations, which entangle object appearance, background, and scene layout. This makes policies sensitive to visual variations. Prior work improves transferability through structured intermediate representations that objectify visual content. However, these representations mainly capture scene semantics instead of action-relevant relations. As a result, action prediction remains tied to appearance statistics. We observe that manipulation actions depend on the object-hand-task relational structure, which governs interactions among task requirements, robot states, and object properties. Based on this observation, we propose TriRelVLA, a triadic relational VLA framework for generalizable embodied manipulation. Our approach consists of three components: 1) We construct explicit object-hand-task triadic representations from multimodal inputs as relational primitives. 2) We build a task-grounded relational graph. Task-guided cross-attention forms nodes, and a relation-aware graph transformer models interactions among them. 3) We perform relation-conditioned action generation. The relational structure is compressed into a bottleneck space and projected into the LLM for action prediction. This triadic relational bottleneck reduces reliance on appearance statistics and enables transfer across scenes, objects, and task compositions. We further introduce a real-world robotic dataset for fine-tuning. Experiments show strong performance on fine-tuned tasks and clear gains in cross-scene, cross-object, and cross-task generalization.
On the Emergence of Pendular Structure in Multi-Contact Locomotion
LIPM is everywhere in legged-locomotion control, but almost always as a modeling choice rather than as something the controller's cost actually prefers. This note tries to make that link more explicit. Working from a small centroidal OCP that penalizes the rate of angular momentum, we look at what its optimum tends to look like. Three things come out. With full-rank stance, the optimum drifts toward a pendular force pattern at a rate determined by the SVD of the moment Jacobian; the constant is set by foot-span geometry and matches the experiments to within 16%. With N=2 stance, as in trot, the friction cone introduces a lower bound on $\|\dot{H}_G\|$ that no amount of weight tuning fixes; we also see a non-smooth feasibility kink at a critical horizontal acceleration that we can write in closed form. Adding a task term that asks for a nonzero $\dot{H}_G$ moves the optimum off the pendular set in a predictable way. None of this is far from the classical ZMP/DCM picture. We test these claims on a point-mass quadruped and on the Unitree Go1 in MuJoCo (open-loop QP and a torque-level closed-loop controller), and we note where the asymptotic story stops being a good description of what the closed loop actually does.
Leveraging Image Generators to Address Training Data Scarcity: The Gen4Regen Dataset for Forest Regeneration Mapping
Sustainable forest management relies on precise species composition mapping, yet traditional ground surveys are labour-intensive and geographically constrained. While Uncrewed Aerial Vehicles (UAVs) offer scalable data collection, the transition to deep learning-based interpretation is bottlenecked by the severe scarcity of expert-annotated imagery, particularly in complex, visually heterogeneous regeneration zones. This paper addresses the dual challenges of data scarcity and extreme class imbalance in the semantic segmentation of fine-grained forest regeneration species by providing a scalable framework that reduces reliance on manual photo-interpretation for high-resolution, millimetre-level aerial imagery. Importantly, we leverage the large-scale vision-language Nano Banana Pro model to simultaneously generate high-fidelity images and their corresponding pixel-aligned semantic masks from prompts. We introduce WilDReF-Q-V2, an expansion of a natural forest dataset with 13 977 new unlabelled and 50 labelled real images, as well as the Gen4Regen dataset, featuring 2101 pairs of synthetic images and semantic masks. Our methodology integrates real-world data with AI-generated images, highlighting that AI-generated data is highly complementary to real-world data, with unified training yielding an F1 score improvement of over 15 %pt compared to purely supervised baselines. Furthermore, we demonstrate that even small quantities of prompt-generated data significantly improve performance for underrepresented species, some of which saw per-species F1 score gains of up to 30 %pt. We conclude that vision-language models can serve as agile data generators, effectively bootstrapping perception tasks for niche AI domains where expert labels are scarce or unavailable. Our datasets, source code, and models will be available at https://norlab-ulaval.github.io/gen4regen.
comment: 36 pages, 17 figures
Maximal Controlled Invariant-MPC: Enhancing Feasibility and Reducing Conservatism through Terminal CBF Constraint in Safety-Critical Control
Optimal control for safety-critical systems is often dependent on the conservativeness of constraints. Control Barrier Functions (CBFs) serve as a medium to represent such constraints, but constructing a minimally conservative CBF is a computationally intractable problem. Therefore, approaches that can guarantee safety while reducing conservatism will help improve the optimality of the system under consideration. Here, we present a Model Predictive Control (MPC) formulation using CBF as a terminal constraint, which is proven to improve feasibility and reachable sets with increasing prediction horizon. The constructive nature of the proofs allows for warm-starting the nonlinear optimization problem, thereby reducing the computational time substantially. Simulations are set up for a simple nonholonomic system to numerically validate the results, and it is observed that the number of infeasible points decreased by a factor of 1.7 to 2.7. The increase in reachable state space was demonstrated by the ability of the system to track trajectories that are entirely inside the unsafe region of the control barrier function.
comment: Under review
Adaptive Q-Chunking for Offline-to-Online Reinforcement Learning
Offline-to-online reinforcement learning with action chunking eliminates multi-step off-policy bias and enables temporally coherent exploration, but all existing methods use a fixed chunk size across every state. This is suboptimal: near contact events the agent needs short chunks for reactive control, while during free-space motion long chunks provide better credit assignment. The natural solution is to train critics for several chunk sizes and select the best one at each state, but naive comparison of learned critic values systematically collapses to the shortest chunk due to discount-scale mismatch, and degrades to noise in low-value states. We propose Adaptive Q-Chunking (AQC), which resolves both failures by comparing the advantage of each chunk size relative to a per-horizon baseline, normalized by the discount factor. This criterion converts biased wrong answers into unbiased near-random choices when no genuine signal exists, and becomes discriminative when a particular scale enables better planning. We prove theoretical bounds on the advantage selector's noise immunity and on the value dominance of adaptive chunking over any fixed chunk size. We demonstrate that AQC achieves state-of-the-art offline and online success rates on OGBench and Robomimic, and can be applied to enhance the performance of large-scale VLA models that predict action sequences, significantly boosting performance on RoboCasa-GR1 tasks.
Real-world Latency Analysis of Vehicular Visible Light Communication with Multiple LED Transmitters and an Event-Based Camera
Event cameras offer high temporal resolution, low latency, and wide dynamic range, making them promising receivers for visible light communication (VLC) in vehicle-to-everything (V2X) applications. This work presents an event-camera-based VLC system addressing three key challenges: bandwidth saturation, multi-transmitter reception, and latency characterization. We adopt a positive-event-only mode and design a protocol that suppresses event generation while maintaining communication distance and a wide field of view. We also propose a method to identify multiple transmitters and demonstrate simultaneous reception from up to three LEDs. Finally, we evaluate end-to-end latency in real vehicular scenarios and show that the system meets cooperative perception requirements. These results demonstrate that event-camera-based VLC is a feasible complement to existing V2X technologies (e.g., RF).
comment: 5 pages, IEEE VTC2026-Spring
Dr-BA: Separable Optimization for Direct Radar Bundle Adjustment & Localization
This paper introduces Dr-BA, a first-of-its-kind radar bundle adjustment (BA) framework that operates directly on 2D spinning radar intensity images. Unlike camera or lidar sensors, radar is largely unaffected by precipitation, making it a critical modality for autonomous systems that require all-weather robustness. Existing state estimation approaches using spinning radar typically extract sparse point clouds from range-azimuth-intensity measurements and apply point cloud alignment techniques to estimate vehicle motion, scene structure, or to localize within an existing map. In contrast, Dr-BA uses the full radar returns from multiple scans to jointly estimate dense maps and sensor poses. By formulating the problem as a separable optimization, we derive an efficient and general solution that decouples pose estimation from mapping. In addition to solving the BA problem, this formulation naturally extends to direct radar-only localization (DRL) within a previously built map. Dr-BA achieves state-of-the-art radar-based BA and cross-session localization performance, demonstrated on more than 200 km of on-road data across five distinct routes. Our implementation is publicly available at https://github.com/utiasASRL/dr_ba.
comment: Accepted for presentation at RSS 2026
Learning Material-Aware Hamiltonian Risk Fields for Safe Navigation
Risk-aware navigation should be selective: a policy should expose evasive degrees of freedom only when the local scene admits a lower-risk feasible maneuver, and suppress them when no safer alternative exists. We show that adding one context-energy term to a port-Hamiltonian navigation policy produces a learned force channel with exactly this falsifiable signature. When the local risk field contains a feasible lower-risk direction, the induced context force activates toward it; when the apparent escape is blocked or not yet available, a route-aware gate suppresses lateral force rather than hallucinating an unsafe maneuver. A CVaR tail-risk objective focuses gradient updates on rare but consequential risk transitions. We validate the selectivity signature across four settings. In the primary delayed-required-escape benchmark, route-aware CVaR reduces premature force activation from 0.950 to 0.180 versus DWA while raising success from 0.480 to 0.810 with zero replans. On real off-road terrain (RELLIS-3D), route-aware enrichment achieves correct activation rate 0.837 and false activation rate 0.114, compared to 0.378/0.752 for scalar risk gradients. On static semantic maps (DFC2018), enrichment reduces catastrophic failure from 0.60 to 0.10 and oscillation by 90.7% while preserving path efficiency. In highway traffic, collisions drop from 100% to 0% when a lane escape is feasible; when no escape exists, the policy suppresses the lateral maneuver. The selectivity property follows from the gradient structure of the context energy rather than from training-time tuning.
Intention assimilation control for accurate tracking with variable impedance in teleoperation
Robot systems for teleoperation commonly use a spring-like force pulling the follower robot towards the leader's position to track their movements. With this control strategy, the tracking accuracy deteriorates when the follower' stiffness is low, but high stiffness poses a danger to objects or people in the follower robot's environment. To address this trade-off between tracking accuracy and safety, we propose an alternative intention assimilation control (IAC) strategy where the robot's tracking accuracy can be ensured without high stiffness. Different from traditional approaches, which transmit the leader's current position to the follower, this new controller estimates the leader's target position and transmits it to the follower. With this strategy, the follower impedance can be changed on-the-fly to continuously reflect the user's desired impedance or modulated automatically to fulfill the task requirements. Our controller was validated on two 7 degree-of-freedom manipulators, yielding high tracking accuracy with varying impedance. Four experiments were conducted to compare {teleoperation} with IAC to tele-impedance control (TIC) during free tracking, interaction with a balloon, during peg insertion, and table polishing with force feedback. The results show that IAC increases tracking accuracy, improves task completion rate and reduces completion time. IAC enables the robot to accurately replicate the user's movement while giving them freedom to modulate the impedance according to their intention, providing an unprecedented level of control of the follower's position and its impedance during unilateral and bilateral teleoperation.
AirBender: Adaptive Transportation of Bendable Objects Using Dual UAVs
The interaction of robots with bendable objects in midair presents significant challenges in control, often resulting in performance degradation and potential crashes, especially for aerial robots due to their limited actuation capabilities and constant need to remain airborne. This paper presents an adaptive controller that enables two aerial vehicles to collaboratively follow a trajectory while transporting a bendable object without relying on explicit elasticity models. Our method allows on-the-fly adaptation to the object's unknown deformable properties, ensuring stability and performance in trajectory-tracking tasks. We use Lyapunov analysis to demonstrate that our adaptive controller is asymptotically stable. Our method is evaluated through hardware experiments in various scenarios, demonstrating the capabilities of using multirotor aerial vehicles to handle bendable objects.
The Cost of Consensus: Malignant Epistemic Herding and Adaptive Gating in Distributed Multi-Agent Search
Distributed agents in real-world settings frequently must coordinate under uncertainty with only partial observations. Coordination is necessary to share beliefs to aid in task completion, but communication costs bandwidth, introduces latency, and if done poorly, can degrade collective reasoning. This tension is especially acute in bandwidth-constrained deployments such as distributed sensing networks, autonomous reconnaissance, and collaborative cyber defense, where excessive transmission carries direct operational costs. Existing work has focused on multi-agent exploration and communication strategies, but not on how communication frequency and content jointly shape the collective belief state. Central to this challenge is the degree to which agents maintain compatible internal beliefs about the environment, a property we term \textit{epistemic alignment}. When agents share beliefs effectively, they converge on correct hypotheses; when communication is poorly designed, agents may converge confidently on wrong ones. We formalize this distinction and show it is not detectable from coordination metrics alone such as Jensen-Shannon Divergence or rate to consensus.
Traffic Scenario Orchestration from Language via Constraint Satisfaction ICRA 2026
Autonomous vehicles (AVs) require extensive testing in simulation, but test case generation for driving scenarios is laborious. The desired scenarios are often out-of-distribution and have precise requirements on interactions with the AV policy under test. Manually programming scenarios allows for precise controllability but is difficult to scale. On the other hand, statistical models can leverage compute and data, but struggle with precise controllability when out-of-distribution. We cast scenario orchestration as a constraint-solving problem and present a language-in, simulation-out scenario orchestrator for closed-loop testing AVs. Our approach leverages foundation model reasoning to translate general, natural language descriptions into a set of constraints as a scenario representation. This then allows us to leverage off the shelf solvers to solve for actor behaviors which meet precise testing intentions in closed-loop. Under a benchmark of carefully crafted and diverse scenario descriptions, our approach greatly outperforms our baselines in orchestration success rate. We further show that our closed-loop approach is especially important for scenarios which require ego-reactive specifications.
comment: 19 pages, 10 figures; full version of paper accepted for poster presentation at ICRA 2026
CARMEN: CORDIC-Accelerated Resource-Efficient Multi-Precision Inference Engine for Deep Learning
This paper presents CARMEN, a runtime-adaptive, CORDIC-accelerated multi-precision vector engine for resource-efficient deep learning inference. The key insight is that CORDIC iteration depth directly governs computational accuracy, enabling dynamic switching between approximate and accurate execution modes without hardware modification. The architecture integrates a low-resource iterative CORDIC-based MAC unit with a time-multiplexed multi-activation function block, supporting flexible 8/16-bit precision and high hardware utilization. ASIC implementation in 28 nm CMOS achieves up to 33% reduction in computation cycles and 21% power savings per MAC stage; a 256-PE configuration delivers 4.83 TOPS/mm2 compute density and 11.67 TOPS/W energy efficiency. FPGA deployment on PynqZ2 validates 154.6 ms latency at 0.43 W for real-time object detection.
comment: Under Review (VDAT 2026)
Bi3: A Biplatform, Bicultural, Biperson Dataset for Social Robot Navigation ICRA 2026
We contribute Bi3, a dataset of social robot navigation among groups of people in a constrained lab space. Compared to prior data collection efforts for social robot navigation, our dataset is unique in that it features: an original experiment design giving rise to close navigation encounters between two humans and a robot; five different navigation algorithms; two different robot platforms; a diverse participant pool of 74 people recruited from two sites in the USA and France; multimodal data streams including 10.5 hours of human and robot ground-truth motion tracks, RGB video, and user impressions over robot performance. Our analysis of the collected dataset through metrics like interaction density and human velocity suggests that Bi3 represents a benchmark of unique diversity and modeling complexity. Bi3 contributes towards understanding how humans and robots can productively mesh their activities in constrained environments, and can be a resource for training models of human motion prediction and robot control policies for navigation in densely crowded spaces.
comment: ICRA 2026
Randomness is sometimes necessary for coordination
Full parameter sharing is standard in cooperative multi-agent reinforcement learning (MARL) for homogeneous agents. Under permutation-symmetric observations, however, a shared deterministic policy outputs identical action distributions for every agent, making role differentiation impossible. This failure can theoretically be resolved using symmetry breaking among anonymous identical processors, which requires randomness. We propose Diamond Attention, a cross-attention architecture in which each agent samples a scalar random number per timestep, inducing a transient rank ordering that masks lower-ranked peers from agent-to-agent attention while leaving task attention fully unmasked. This realizes a random-bit coordination protocol in a single broadcast round, and the set-based attention enables zero-shot deployment to teams of different sizes. We evaluate across three regimes that isolate when structured randomness matters. On the perfectly symmetric XOR game, our method achieves $1.0$ success while all deterministic baselines plateau near $0.5$. On control coordination tasks, a policy trained on $N=4$ generalizes zero-shot to $N \in [2,8]$. On SMACLite cross-scenario transfer, we achieve zero-shot transfer where standard baselines cannot transfer due to structural limitations. Furthermore, replacing the structured mask with standard dropout-based randomness results in a 0\% win rate, confirming that protocol-space structure, not stochastic noise, is the operative ingredient. https://anonymous.4open.science/r/randomness-137A/
An Aerial Manipulator for Perception-Driven Flower Targeting Toward Contactless Pollination in Vertical Farming
The decline of natural pollinators has created a major challenge for crop production in controlled indoor agriculture, particularly in vertical farming environments where natural insect pollination is absent. This motivates the development of robotic systems capable of performing precise flower targeting tasks while minimizing physical interference with delicate floral structures. This paper presents an aerial manipulator platform for perception driven flower detection, localization, and approach in vertical farming environments. The proposed system integrates onboard RGBD based perception, model predictive path integral (MPPI) based unmanned aerial vehicle (UAV) control on a PX4 platform, and a lightweight 2DoF manipulator for precise end effector positioning. The platform is evaluated in both MuJoCo simulation and UAV lab experiments using a flower targeting testbed. The experimental results demonstrate stable UAV flight, reliable flower localization, and centimeter level end effector positioning accuracy. In simulation, the proposed controller achieves consistent trajectory convergence and accurate target alignment. In the real world UAV lab environment, the integrated perception control manipulation framework enables stable flower targeted positioning and end effector alignment under constrained aerial operation. These results validate the proposed aerial manipulator as a robust robotic carrier and positioning framework for future contactless pollination systems. While the current study focuses on perception guided targeting and positioning, the developed platform provides a practical foundation for integrating advanced contactless end effectors, including acoustic based pollen manipulation modules, in future work.
comment: This paper has been accepted for publication in the Proceedings of the 2026 4th International Conference on Robotics, Control and Vision Engineering (RCVE 2026), 10-12 July, 2026, Tokyo, Japan
R$^3$L: Reasoning 3D Layouts from Relative Spatial Relations ICML 2026
Relative spatial relations provide a compact representation of spatial structure and are fundamental to relative spatial reasoning in 3D layout generation. Recent works leverage Multimodal Large Language Models (MLLMs) to infer such relations, but the inferred relations are often unreliable and are typically handled with post-hoc heuristics. In this paper, we propose R$^3$L, a general framework that improves the reliability and consistency of relative spatial reasoning for 3D layout generation. Our key motivation is that multi-hop reasoning requires repeated reference-frame transformations, which accumulate errors in inferred relations and lead to semantic and metric drift. To mitigate this, we propose invariant spatial decomposition to break coupled relation chains, and consistent spatial imagination to promote self-consistency through an imagine-and-revise loop. We further introduce supportive spatial optimization to ease pose optimization via global-to-local coordinate re-parameterization. Extensive experiments across diverse scene types and instructions demonstrate that R$^3$L produces more physically feasible and semantically consistent layouts. Notably, our analysis shows that resolving frame-induced inconsistencies is crucial for reliable multi-hop relative spatial reasoning. The code is available at https://github.com/Neal2020GitHub/R3L.
comment: ICML 2026
HumanNet: Scaling Human-centric Video Learning to One Million Hours
Progress in embodied intelligence increasingly depends on scalable data infrastructure. While vision and language have scaled with internet corpora, learning physical interaction remains constrained by the lack of large, diverse, and richly annotated human activity data. We present HumanNet, a one-million-hour human-centric video corpus that captures how humans interact with the physical world at scale. HumanNet spans both first-person and third-person perspectives and covers fine-grained activities, human-object interactions, tool use, and long-horizon behaviors across diverse real-world environments. Beyond raw video, the dataset provides interaction-centric annotations, including captions, motion descriptions, and hand and body-related signals, enabling motion-aware and interaction-aware learning. Beyond scale, HumanNet introduces a systematic data curation paradigm for embodied learning, where human-centric filtering, temporal structuring, viewpoint diversity, and annotation enrichment are treated as first-class design principles. This design transforms unstructured internet video into a scalable substrate for representation learning, activity understanding, motion generation, and human-to-robot transfer. We conduct a first-step validation on the value of this design through controlled vision-language-action ablation: under a fixed set of validation data, continued training from the Qwen VLM model with 1000 hours of egocentric video drawn from HumanNet surpasses the continued training with 100 hours of real-robot data from Magic Cobot, indicating that egocentric human video could be a scalable and cost-effective substitute for robot data. By building this project, we aim to explore the opportunity to scale embodied foundation models using human-centric videos, rather than relying solely on robot-specific data.
comment: Github: https://github.com/DAGroup-PKU/HumanNet Project website: https://dagroup-pku.github.io/HumanNet/
Flexible Agent Alignment with Goal Inference from Open-Ended Dialog
We introduce Open-Universe Assistance Games (OU-AGs), a formal framework extending assistance games to LLM-based agents. Effective assistance requires reasoning over human preferences that are unbounded, underspecified, and evolving. Current LLM agents struggle in multi-turn interactions and with maintaining accurate models of user intent in collaborative settings. Existing assistance game formulations assume fixed, predefined preferences, an assumption that breaks down in open-ended dialogue where goals are revised incrementally and expressed in natural language. Grounded in cognitive science accounts of preference construction, we represent human preferences as a dynamically updated distribution over discrete natural-language goals. To operationalize OU-AGs, we introduce GOOD (GOals from Open-ended Dialogue), a data-efficient online method that extracts and ranks candidate goals during interaction, using LLM-simulated users to perform probabilistic inference over goal hypotheses. This allows for interpretable, uncertainty-aware preference representations without large offline datasets. We evaluate GOOD across three text-based domains: grocery shopping, household robotics (AI2-THOR), and coding. Compared to baselines without explicit goal tracking, GOOD produces semantically coherent goal representations and improves alignment with user intent across domains.
comment: Previous version of the paper was titled: Open-Universe Assistance Games
SwarmCoDe: A Scalable Co-Design Framework for Heterogeneous Robot Swarms via Dynamic Speciation
Robot swarms offer inherent robustness and the capacity to execute complex, collaborative tasks surpassing the capabilities of single-agent systems. Co-designing these systems is critical, as marginal improvements in individual performance or unit cost compound significantly at scale. However, under traditional frameworks, this scale renders co-design intractable due to exponentially large, non-intuitive design spaces. To address this, we propose SwarmCoDe, a novel Collaborative Co-Evolutionary Algorithm (CCEA) that utilizes dynamic speciation to automatically scale swarm heterogeneity to match task complexity. Inspired by biological signaling mechanisms for inter-species cooperation, the algorithm uses evolved genetic tags and a selectivity gene to facilitate the emergent identification of symbiotically beneficial partners without predefined species boundaries. Additionally, an evolved dominance gene dictates the relative swarm composition, decoupling the physical swarm size from the evolutionary population. We apply SwarmCoDe to simultaneously optimize task planning and hardware morphology under fabrication budgets, successfully evolving specialized swarms of up to 200 agents -- four times the size of the evolutionary population. This framework provides a scalable, computationally viable pathway for the holistic co-design of large-scale, heterogeneous robot swarms.
comment: 8 pages, 9 figures
Approximation-Free Control Barrier Functions for Prescribed-Time Reach-Avoid of Unknown Systems
We study the prescribed-time reach-avoid (PT-RA) control problem for nonlinear systems with unknown dynamics operating in environments with moving obstacles. Unlike robust or learning based Control Barrier Function (CBF) methods, the proposed framework requires neither online model learning nor uncertainty bound estimation. A CBF-based Quadratic Program (CBF-QP) is solved on a simple virtual system to generate a safe reference satisfying PT-RA conditions with respect to time-varying, tightened obstacle and goal sets. The true system is confined to a Virtual Confinement Zone (VCZ) around this reference using an approximation-free feedback law. This construction guarantees real-time safety and prescribed-time target reachability under unknown dynamics and dynamic constraints without explicit model identification or offline precomputation. Simulation results illustrate reliable dynamic obstacle avoidance and timely convergence to the target set.
LaST-R1: Reinforcing Robotic Manipulation via Adaptive Physical Latent Reasoning
Robotic foundation models require reasoning over complex visual scenes to execute adaptive actions in dynamic environments. While recent studies on latent-reasoning Vision-Language-Action (VLA) models have demonstrated the capability to capture fine-grained physical dynamics, they remain predominantly confined to static imitation learning, severely limiting their adaptability and generalization. In this paper, we present LaST-R1, a novel reinforcement learning (RL) post-training framework designed to effectively harness "latent reasoning-before-acting" policies. Specifically, we propose Latent-to-Action Policy Optimization (LAPO), a core RL algorithm that jointly optimizes the latent reasoning process and the action generation. By explicitly embedding latent Chain-of-Thought (CoT) reasoning directly within the RL optimization loop, LAPO stimulates profound physical world modeling, which in turn drives robust execution in interactive environments. Furthermore, an adaptive latent CoT mechanism is introduced, allowing the policy to dynamically modulate its reasoning horizon based on diverse environment states. Experiments show that LaST-R1 achieves a near-perfect 99.9% average success rate on the LIBERO benchmark with only one-shot supervised warm-up, significantly improving convergence speed and performance over prior state-of-the-art (SOTA) methods. In real-world deployments, LaST-R1 yields up to a 22.5% average improvement over SOTA supervised fine-tuning approach across four complex tasks, including both single-arm and dual-arm settings. Finally, LaST-R1 demonstrates strong generalization across simulated and real-world environments.
AsyncVLA: Asynchronous Flow Matching for Vision-Language-Action Models
Vision-language-action (VLA) models have recently emerged as a powerful paradigm for building generalist robots. However, traditional VLA models that generate actions through flow matching (FM) typically rely on rigid and uniform time schedules, i.e., synchronous FM (SFM). Without action context awareness and asynchronous self-correction, SFM becomes unstable in long-horizon tasks, where a single action error can cascade into failure. In this work, we propose asynchronous flow matching VLA (AsyncVLA), a novel framework that introduces temporal flexibility in asynchronous FM (AFM) and enables self-correction in action generation. AsyncVLA breaks from the vanilla SFM in VLA models by generating the action tokens in a non-uniform time schedule with action context awareness. Besides, our method introduces the confidence rater to extract confidence of the initially generated actions, enabling the model to selectively refine inaccurate action tokens before execution. Moreover, we propose a unified training procedure for SFM and AFM that endows a single model with both modes, improving KV-cache utilization. Extensive experiments on robotic manipulation benchmarks demonstrate that AsyncVLA is data-efficient and exhibits self-correction ability. AsyncVLA outperforms existing methods across both simulation and real-world evaluations. Our code is available at https://github.com/YuhuaJiang2002/AsyncVLA.
MARVL: Multi-Stage Guidance for Robotic Manipulation via Vision-Language Models
Designing dense reward functions is pivotal for efficient robotic Reinforcement Learning (RL). However, most dense rewards rely on manual engineering, which fundamentally limits the scalability and automation of reinforcement learning. While Vision-Language Models (VLMs) offer a promising path to reward design, naive VLM rewards often misalign with task progress, struggle with spatial grounding, and show limited understanding of task semantics. To address these issues, we propose MARVL-Multi-stAge guidance for Robotic manipulation via Vision-Language models. MARVL fine-tunes a VLM for spatial and semantic consistency and decomposes tasks into multi-stage subtasks with task direction projection for trajectory sensitivity. Empirically, MARVL significantly outperforms existing VLM-reward methods on the Meta-World benchmark, demonstrating superior sample efficiency and robustness on sparse-reward manipulation tasks.
Spectral Alignment in Forward-Backward Representations via Temporal Abstraction
Forward-backward (FB) representations provide a powerful framework for learning the successor representation (SR) in continuous spaces by enforcing a low-rank factorization. However, a fundamental spectral mismatch often exists between the high-rank transition dynamics of continuous environments and the low-rank bottleneck of the FB architecture, making accurate low-rank representation learning difficult. In this work, we analyze temporal abstraction as a mechanism to mitigate this mismatch. By characterizing the spectral properties of the transition operator, we show that temporal abstraction acts analogously to a low-pass filter that suppresses high-frequency spectral components. This suppression reduces the effective rank of the induced SR while preserving a formal bound on the resulting value function error. Empirically, we show that this alignment is a key factor for stable FB learning, particularly at high discount factors where bootstrapping becomes error-prone. Our results identify temporal abstraction as a principled mechanism for shaping the spectral structure of the underlying MDP and enabling effective long-horizon representations in continuous control.
Visibility-Aware Mobile Grasping in Dynamic Environments
This paper addresses the problem of mobile grasping in dynamic, unknown environments where a robot must operate under a limited field-of-view. The fundamental challenge is the inherent trade-off between ``seeing'' around to reduce environmental uncertainty and ``moving'' the body to achieve task progress in a high-dimensional configuration space, subject to visibility constraints. Previous approaches often assume known or static environments and decouple these objectives, failing to guarantee safety when unobserved dynamic obstacles intersect the robot's path during manipulation. In this paper, we propose a unified mobile grasping system comprising two core components: (1) an iterative low-level whole-body planner coupled with velocity-aware active perception to navigate dynamic environments safely; and (2) a hierarchical high-level planner based on behavior trees that adaptively generates subgoals to guide the robot through exploration and runtime failures. We provide experimental results across 400 randomized simulation scenarios and real-world deployment on a Fetch mobile manipulator. Results show that our system achieves a success rate of 68.8\% and 58.0\% in unknown static and dynamic environments, respectively, significantly boosting success rates by 22.8\% and 18.0\% over the \nam approach in both unknown static and dynamic environments, with improved collision safety.
Unified 4D World Action Modeling from Video Priors with Asynchronous Denoising
We propose X-WAM, a Unified 4D World Model that unifies real-time robotic action execution and high-fidelity 4D world synthesis (video + 3D reconstruction) in a single framework, addressing the critical limitations of prior unified world models (e.g., UWM) that only model 2D pixel-space and fail to balance action efficiency and world modeling quality. To leverage the strong visual priors of pretrained video diffusion models, X-WAM imagines the future world by predicting multi-view RGB-D videos, and obtains spatial information efficiently through a lightweight structural adaptation: replicating the final few blocks of the pretrained Diffusion Transformer into a dedicated depth prediction branch for the reconstruction of future spatial information. Moreover, we propose Asynchronous Noise Sampling (ANS) to jointly optimize generation quality and action decoding efficiency. ANS applies a specialized asynchronous denoising schedule during inference, which rapidly decodes actions with fewer steps to enable efficient real-time execution, while dedicating the full sequence of steps to generate high-fidelity video. Rather than entirely decoupling the timesteps during training, ANS samples from their joint distribution to align with the inference distribution. Pretrained on over 5,800 hours of robotic data, X-WAM achieves 79.2% and 90.7% average success rate on RoboCasa and RoboTwin 2.0 benchmarks, while producing high-fidelity 4D reconstruction and generation surpassing existing methods in both visual and geometric metrics.
comment: Project website: https://sharinka0715.github.io/X-WAM/
An Efficient Insect-inspired Approach for Visual Point-goal Navigation
In this work we develop a novel insect-inspired model for visual point-goal navigation. This combines abstracted models of two insect brain structures that have been implicated, respectively, in associative learning and path integration. We draw an analogy between the formal benchmark of the Habitat point-goal navigation task and the ability of insects to discover, learn, and refine visually guided paths around obstacles between a discovered food location and their nest. We demonstrate that the simple insect-inspired model exhibits performance comparable to recent state-of-the-art models at many orders of magnitude less computational cost. Testing in a more realistic simulated environment shows the approach is robust to perturbations.
comment: This work has been submitted to the IEEE for possible publication
Information Filtering via Variational Regularization for Robot Manipulation
Diffusion-based visuomotor policies built on 3D visual representations have achieved strong performance in learning complex robotic skills. However, most existing methods employ an oversized denoising decoder. While increasing model capacity can improve denoising, empirical evidence suggests that it also introduces redundancy and noise in intermediate feature blocks. Crucially, we find that randomly masking backbone features in U-Net or skipping intermediate layers in DiT at inference time (without changing training) can improve performance, confirming the presence of task-irrelevant noise in intermediate features. To this end, we propose Variational Regularization (VR), a plug-and-play module that imposes a context-conditioned Gaussian over the noisy features and applies a KL-divergence regularizer, forming an adaptive information bottleneck. Extensive experiments on three simulation benchmarks, RoboTwin2.0, Adroit, and MetaWorld, show that our approach consistently improves task success rates over the baseline for both DP3-UNet and DP3-DiT, achieving new state-of-the-art results. Real-world experiments further demonstrate that our method performs well in practical deployments.
Continually Evolving Skill Knowledge in Vision Language Action Model
Vision-language-action (VLA) models show promising knowledge accumulation ability from pretraining, yet continual learning in VLA remains challenging, especially for efficient adaptation. Existing continual imitation learning (CIL) methods often rely on additional parameters or external modules, limiting scalability for large VLA models. We propose Stellar VLA, a knowledge-driven CIL framework without increasing network parameters.Two progressively extended variants are designed: T-Stellar for flat task-centric modeling and TS-Stellar for hierarchical task-skill structure.Stellar VLA enables self-evolving knowledge learning by jointly optimizing task representations and a learned knowledge space. We propose a knowledge-guided expert routing mechanism conditioned on knowledge relation and Top-K semantic embeddings, enabling task specialization without increasing model size. Experiments on the LIBERO benchmark show that Stellar VLAs achieve strong performance among both VLA and CIL baselines, using only 1 % data replay. Real-world evaluation on a dual-arm platform with distinct embodiment and scene configurations validates effective knowledge transfer. TS-Stellar excels in hierarchical manipulation, and visualizations reveal robust knowledge retention and task discovery.Project Website: https://stellarvla.github.io/
Vibration Damping in Underactuated Cable-suspended Artwork -- Flying Belt Motion Control
This paper presents a comprehensive refurbishment of the interactive robotic art installation Standards and Double Standards by Rafael Lozano-Hemmer. The installation features an array of belts suspended from the ceiling, each actuated by stepper motors and dynamically oriented by a vision-based tracking system that follows the movements of exhibition visitors. The original system was limited by oscillatory dynamics, resulting in torsional and pendulum-like vibrations that constrained rotational speed and reduced interactive responsiveness. To address these challenges, the refurbishment involved significant upgrades to both hardware and motion control algorithms. A detailed mathematical model of the flying belt system was developed to accurately capture its dynamic behavior, providing a foundation for advanced control design. An input shaping method, formulated as a convex optimization problem, was implemented to effectively suppress vibrations, enabling smoother and faster belt movements. Experimental results demonstrate substantial improvements in system performance and audience interaction. This work exemplifies the integration of robotics, control engineering, and interactive art, offering new solutions to technical challenges in real-time motion control and vibration damping for large-scale kinetic installations.
comment: 10 pages, 10 figures
asRoBallet: Closing the Sim2Real Gap via Friction-Aware Reinforcement Learning for Underactuated Spherical Dynamics
We introduce asRoBallet, to the best of our knowledge, the first end-to-end reinforcement learning (RL) locomotion policy deployed on a humanoid ballbot hardware platform. Historically, ballbots have served as a canonical benchmark for underactuated and nonholonomic control, which are characterized by a reality gap in complex friction models for wheel-ball-floor interactions. While current literature demonstrates successful handling of 3D balancing with LQR and MPC, transitioning to actual hardware for a humanoid ballbot using RL is currently hindered by critical gaps in contact modeling, actuator latency & jitter, and safe hardware exploration. This study proposes a high-fidelity MuJoCo simulation that explicitly models the discrete roller mechanics of ETH-type omni-wheels, thereby capturing parasitic vibrations and contact discontinuities that have previously been ignored. We also developed a Friction-Aware Reinforcement Learning framework that achieves zero-shot Sim2Real transfer by mastering the coupled rolling, lateral, and torsional friction channels at the wheel-ball and ball-floor interfaces. We designed asRoBallet through subtractive reconfiguration, repurposing key components from an overconstrained quadruped and integrating them into a newly designed structural frame to achieve a robust research platform at low cost. We also developed a generalized iOS ecosystem that transforms consumer electronics into a low-latency interface, enabling a single operator to orchestrate expressive humanoid maneuvers via intuitive natural motion.
comment: 10 pages, 9 figure, accepted for RSS2026. For Supplementary Videos, see https://bionicdl.ancorasir.com/?p=2238
Many-vs-Many Missile Guidance via Virtual Targets
This paper presents a novel approach to many-vs-many missile guidance using virtual targets (VTs) generated by a Normalizing Flows-based trajectory predictor. Rather than assigning n interceptors directly to m physical targets through conventional weapon target assignment algorithms, we propose a centralized strategy that constructs n VT trajectories representing probabilistic predictions of maneuvering target behavior. Each interceptor is guided toward its assigned VT using Zero-Effort-Miss guidance during midcourse flight, transitioning to Proportional Navigation guidance for terminal interception. This approach treats many-vs-many engagements as many-vs-distribution scenarios, exploiting numerical superiority (n > m) by distributing interceptors across diverse trajectory hypotheses rather than pursuing identical deterministic predictions. Monte Carlo simulations across various target-interceptor configurations (1-6 targets, 1-8 interceptors) demonstrate that the VT method matches or exceeds baseline straight-line prediction performance by 0-4.1% when n = m, with improvements increasing to 5.8-14.4% when n > m. The results confirm that probabilistic VTs enable effective exploitation of numerical superiority, significantly increasing interception probability in many-vs-many scenarios.
comment: Subsequent investigations showed that the proposed method does not generalize beyond the specific scenario considered in this manuscript
PEPA: a Persistently Autonomous Embodied Agent with Personalities
Living organisms exhibit persistent autonomy through internally generated goals and self-sustaining behavioral organization, yet current embodied agents remain driven by externally scripted objectives. This dependence on predefined task specifications limits their capacity for long-term deployment in dynamic, unstructured environments where continuous human intervention is impractical. We propose that personality traits provide an intrinsic organizational principle for achieving persistent autonomy. Analogous to genotypic biases shaping biological behavioral tendencies, personalities enable agents to autonomously generate goals and sustain behavioral evolution without external supervision. To realize this, we develop PEPA, a three-layer cognitive architecture that operates through three interacting systems: Sys3 autonomously synthesizes personality-aligned goals and refines them via episodic memory and daily self-reflection; Sys2 performs deliberative reasoning to translate goals into executable action plans; Sys1 grounds the agent in sensorimotor interaction, executing actions and recording experiences. We validate the framework through real-world deployment on a quadruped robot in a multi-floor office building. Operating without reliance on fixed task specifications, the robot autonomously arbitrates between user requests and personality-driven motivations, navigating elevators and exploring environments accordingly. Quantitative analysis across five distinct personality prototypes demonstrates stable, trait-aligned behaviors. The results confirm that personality-driven cognitive architectures enable sustained autonomous operation characteristic of persistent embodied systems. Code and demo videos are available at https://sites.google.com/view/pepa-persistent/.
Mitigating Error Accumulation in Continuous Navigation via Memory-Augmented Kalman Filtering ICML 2026
Continuous navigation in complex environments is critical for Unmanned Aerial Vehicle (UAV). However, the existing Vision-Language Navigation (VLN) models follow the dead-reckoning, which iteratively updates its position for the next waypoint prediction, and subsequently construct the complete trajectory. Then, such stepwise manner will inevitably lead to accumulated errors of position over time, resulting in misalignment between internal belief and objective coordinates, which is known as "state drift" and ultimately compromises the full trajectory prediction. Drawing inspiration from classical control theory, we propose to correct for errors by formulating such sequential prediction as a recursive Bayesian state estimation problem. In this paper, we design NeuroKalman, a novel framework that decouples navigation into two complementary processes: a Prior Prediction, based on motion dynamics and a Likelihood Correction, from historical observation. We first mathematically associate Kernel Density Estimation of the measurement likelihood with the attention-based retrieval mechanism, which then allows the system to rectify the latent representation using retrieved historical anchors without gradient updates. Comprehensive experiments on TravelUAV benchmark demonstrate that, with only 10% of the training data fine-tuning, our method clearly outperforms strong baselines and regulates drift accumulation.
comment: ICML 2026 Camera Ready
Leveraging Analytic Gradients in Provably Safe Reinforcement Learning
The deployment of autonomous robots in safety-critical applications requires safety guarantees. Provably safe reinforcement learning is an active field of research that aims to provide such guarantees using safeguards. These safeguards should be integrated during training to reduce the sim-to-real gap. While there are several approaches for safeguarding sampling-based reinforcement learning, analytic gradient-based reinforcement learning often achieves superior performance from fewer environment interactions. However, there is no safeguarding approach for this learning paradigm yet. Our work addresses this gap by developing the first effective safeguard for analytic gradient-based reinforcement learning. We analyse existing, differentiable safeguards, adapt them through modified mappings and gradient formulations, and integrate them into a state-of-the-art learning algorithm and a differentiable simulation. Using numerical experiments on three control tasks, we evaluate how different safeguards affect learning. The results demonstrate safeguarded training without compromising performance. Additional visuals are provided at timwalter.github.io/safe-agb-rl.github.io.
comment: 21 pages, 10 figures
Action-to-Action Flow Matching
Diffusion-based policies have recently achieved remarkable success in robotics by formulating action prediction as a conditional denoising process. However, the standard practice of sampling from random Gaussian noise often requires multiple iterative steps to produce clean actions, leading to high inference latency that incurs a major bottleneck for real-time control. In this paper, we challenge the necessity of uninformed noise sampling and propose Action-to-Action flow matching (A2A), a novel policy paradigm that shifts from random sampling to initialization informed by the previous proprioceptive action. Unlike existing methods that treat proprioceptive action feedback as static conditions, A2A leverages historical proprioceptive sequences, embedding them into a high-dimensional latent space as the starting point for action generation. This design bypasses costly iterative denoising while effectively capturing the robot's physical dynamics and temporal continuity. Extensive experiments demonstrate that A2A exhibits high training efficiency, fast inference speed, and improved generalization. Notably, A2A enables high-quality action generation in as few as a single inference step, and exhibits superior robustness to visual perturbations and enhanced generalization to unseen configurations. Lastly, we also extend A2A to video generation, demonstrating its broader versatility in temporal modeling. Project site: https://lorenzo-0-0.github.io/A2A_Flow_Matching.
comment: 20 pages, 19 figures
Generalised Linear Models in Deep Bayesian RL with Learnable Basis Functions
Bayesian Reinforcement Learning (BRL), a subclass of Meta-Reinforcement Learning (Meta-RL), provides a principled framework for generalisation by explicitly incorporating Bayesian task parameters into transition and reward models. However, classical BRL methods assume known forms of transition and reward models. While recent deep BRL methods incorporate model learning to address this, applying neural networks directly to joint data and task parameters necessitates variational inference. This often yields indistinct task representations, compromising the resulting BRL policies. To overcome these limitations, we introduce Generalised Linear Models in Deep Bayesian RL with Learnable Basis Functions (GLiBRL). Our approach features fully tractable Bayesian inference over task parameters and model noise, alongside exact marginal likelihood evaluation for learning transition and reward models. The permutation-invariant nature of exact Bayesian inference in GLiBRL enables seamless integration with both on-policy and off-policy RL algorithms. We further show that GLiBRL admits a closed-form relationship between the $\mathcal{L}_2$ distance of its task representations and empirical kernel-based correspondence between task samples, which is to our knowledge the first such structural result for online deep BRL. GLiBRL is compared against representative and recent Meta-RL methods, and improves state-of-the-art performance on both MuJoCo and MetaWorld benchmarks by up to 1.8$\times$.
Risk-Averse Traversal of Graphs with Stochastic and Correlated Edge Costs for Safe Global Planetary Mobility
In robotic planetary surface exploration, strategic mobility planning is an important task that involves finding candidate long-distance routes on orbital maps and identifying segments with uncertain traversability. Then, expert human operators establish safe, adaptive traverse plans based on the actual navigation difficulties encountered in these uncertain areas. In this paper, we formalize this challenge as a new, risk-averse variant of the Canadian Traveller Problem (CTP) tailored to global planetary mobility. The objective is to find a traverse policy minimizing a conditional value-at-risk (CVaR) criterion, which is a risk measure with an intuitive interpretation. We propose a novel search algorithm that finds exact CVaR-optimal policies. Our approach leverages well-established optimal AND-OR search techniques intended for (risk-agnostic) expectation minimization and extends these methods to the risk-averse domain. We validate our approach through simulated long-distance planetary surface traverses; we employ real orbital maps of the Martian surface to construct problem instances and use terrain maps to express traversal probabilities in uncertain regions. Our results illustrate different adaptive decision-making schemes depending on the level of risk aversion. Additionally, our problem setup allows accounting for traversability correlations between similar areas of the environment. In such a case, we empirically demonstrate how information-seeking detours can mitigate risk.
comment: Published in the Autonomous Robots journal
Special Unitary Parameterized Estimators of Rotation ICLR 2026
This paper revisits the topic of rotation estimation through the lens of special unitary matrices. We begin by reformulating Wahba's problem using $SU(2)$ to derive multiple solutions that yield linear constraints on corresponding quaternion parameters. We then explore applications of these constraints by formulating efficient methods for related problems. Finally, from this theoretical foundation, we propose two novel continuous representations for learning rotations in neural networks. Extensive experiments validate the effectiveness of the proposed methods.
comment: Published at ICLR 2026; clarified paper contribution and theoretical narrative; 33 pages
DiffeoMorph: Learning to Morph 3D Shapes Using Differentiable Agent-Based Simulations
Biological systems can form complex three-dimensional structures through the collective behavior of agents that share a common update rule and operate without central control. How such distributed control gives rise to precise global patterns remains a central question not only in developmental biology but also in distributed robotics, programmable matter, and multi-agent learning. Here, we introduce DiffeoMorph, an end-to-end differentiable framework for learning a morphogenesis protocol that guides a population of agents to morph into a target 3D shape. Each agent updates its position and internal state using an SE(3)-equivariant graph neural network, based on its own internal state and signals received from other agents. To train this system, we introduce a new shape-matching loss based on 3D Zernike polynomials, which compares the predicted and target shapes as continuous spatial distributions, not as discrete point clouds, and is invariant to agent ordering, number of agents, and global orientation. To achieve rotation invariance while preserving reflection sensitivity, we include an alignment step that optimally rotates the predicted Zernike spectrum to match the target before computing the loss. We perform benchmarking to establish the advantages of our shape-matching loss over other standard distance metrics for shape comparison tasks. We then demonstrate that DiffeoMorph can form a range of complex shapes from minimally patterned initial conditions. DiffeoMorph provides a general framework for learning distributed control strategies for morphogenesis, swarm robotics, and programmable self-assembly.
A Cost-Effective and Climate-Resilient Air Pressure System for Rain Effect Reduction on Automated Vehicle Cameras
Recent advances in automated vehicles have focused on improving perception performance under adverse weather conditions; however, research on physical hardware solutions remains limited, despite their importance for perception critical applications such as vehicle platooning. Existing approaches, such as hydrophilic or hydrophobic lenses and sprays, provide only partial mitigation, while industrial protection systems imply high cost and they do not enable scalability for automotive deployment. To address these limitations, this paper presents a cost-effective hardware solution for rainy conditions, designed to be compatible with multiple cameras simultaneously. Beyond its technical contribution, the proposed solution supports sustainability goals in transportation systems. By enabling compatibility with existing camera-based sensing platforms, the system extends the operational reliability of automated vehicles without requiring additional high-cost sensors or hardware replacements. This approach reduces resource consumption, supports modular upgrades, and promotes more cost-efficient deployment of automated vehicle technologies, particularly in challenging weather conditions where system failures would otherwise lead to inefficiencies and increased emissions. The proposed system was able to increase pedestrian detection accuracy of a Deep Learning model from 8.3% to 41.6%.
Balancing Act: Trading Off Odometry and Map Registration for Efficient Lidar Localization
Most autonomous vehicles rely on accurate and efficient localization, which is achieved by comparing live sensor data to a preexisting map, to navigate their environment. Balancing the accuracy of localization with computational efficiency remains a significant challenge, as high-accuracy methods often come with higher computational costs. In this paper, we present two ways of improving lidar localization efficiency and study their impact on performance. First, we integrate two lightweight odometry estimators, a correspondence-free Doppler-inertial estimator and a low-cost wheel odometer-gyroscope (OG) method, into a topometric localization pipeline and compare them against a state-of-the-art (SOTA) iterative closest point (ICP) baseline. We highlight the trade-offs between these approaches: the Doppler and OG estimators offer faster, lightweight updates, while ICP provides higher accuracy at the cost of increased computational load. Second, by controlling the frequency of localization updates and leveraging odometry estimates between them, we demonstrate that accurate localization can be maintained while optimizing for computational efficiency using any of the presented methods. We evaluate these approaches using over 100 km of unique real-world driving data in different on-road environments. By varying the localization interval, we demonstrate that computational effort can be reduced by 27%, 80%, and 91% for the ICP, Doppler, and OG estimators, respectively, while maintaining SOTA accuracy.
comment: 8 pages
GustPilot: A Hierarchical DRL-INDI Framework for Wind-Resilient Quadrotor Navigation
Wind disturbances remain a key barrier to reliable autonomous navigation for lightweight quadrotors, where the rapidly varying airflow can destabilize both planning and tracking. This paper introduces GustPilot, a hierarchical wind-resilient navigation stack in which a deep reinforcement learning (DRL) policy generates inertial-frame velocity reference for gate traversal. At the same time, a geometric Incremental Nonlinear Dynamic Inversion (INDI) controller provides low-level tracking with fast residual disturbance rejection. The INDI layer achieves this by providing incremental feedback on both specific linear acceleration and angular acceleration rate, using onboard sensor measurements to reject wind disturbances rapidly. Robustness is obtained through a two-level strategy, wind-aware planning learned via fan-jet domain randomization during training, and rapid execution-time disturbance rejection by the INDI tracking controller. We evaluate GustPilot in real flights on a 50g quad-copter platform against a DRL-PID baseline across four scenarios ranging from no-wind to fully dynamic conditions with a moving gate and a moving disturbance source. Despite being trained only in a minimal single-gate and single-fan setup, the policy generalizes to significantly more complex environments (up to six gates and four fans) without retraining. Across 80 experiments, DRL-INDI achieves a 94.7% versus 55.0% for DRL-PID as average Overall Success Rate (OSR), reduces tracking RMSE up to 50%, and sustains speeds up to 1.34 m/s under wind disturbances up to 3.5 m/s. These results demonstrate that combining DRL-based velocity planning with structured INDI disturbance rejection provides a practical and generalizable approach to wind-resilient autonomous flight navigation.
comment: 8 pages, 5 figures
Multiagent Systems
Recursive Agent Optimization
We introduce Recursive Agent Optimization (RAO), a reinforcement learning approach for training recursive agents: agents that can spawn and delegate sub-tasks to new instantiations of themselves recursively. Recursive agents implement an inference-time scaling algorithm that naturally allows agents to scale to longer contexts and generalize to more difficult problems via divide-and-conquer. RAO provides a method to train models to best take advantage of such recursive inference, teaching agents when and how to delegate and communicate. We find that recursive agents trained in this way enjoy better training efficiency, can scale to tasks that go beyond the model's context window, generalize to tasks much harder than the ones the agent was trained on, and can enjoy reduced wall-clock time compared to single-agent systems.
Cross-Modal Navigation with Multi-Agent Reinforcement Learning
Robust embodied navigation relies on complementary sensory cues. However, high-quality and well-aligned multi-modal data is often difficult to obtain in practice. Training a monolithic model is also challenging as rich multi-modal inputs induce complex representations and substantially enlarge the policy space. Cross-modal collaboration among lightweight modality-specialized agents offers a scalable paradigm. It enables flexible deployment and parallel execution, while preserving the strength of each modality. In this paper, we propose \textbf{CRONA}, a Multi-Agent Reinforcement Learning (MARL) framework for \textbf{Cro}ss-Modal \textbf{Na}vigation. CRONA improves collaboration by leveraging control-relevant auxiliary beliefs and a centralized multi-modal critic with global state. Experiments on visual-acoustic navigation tasks show that multi-agent methods significantly improve performance and efficiency over single-agent baselines. We find that homogeneous collaboration with limited modalities is sufficient for short-range navigation under salient cues; heterogeneous collaboration among agents with complementary modalities is generally efficient and effective; and navigation in large, complex environments requires both richer multi-modal perception and increased model capacity.
Coordination Matters: Evaluation of Cooperative Multi-Agent Reinforcement Learning
Cooperative multi-agent reinforcement learning (MARL) benchmarks commonly emphasize aggregate outcomes such as return, success rate, or completion time. While essential, these metrics often fail to reveal how agents coordinate, particularly in settings where agents, tasks, and joint assignment choices scale combinatorially. We propose a coordination-aware evaluation perspective that supplements return with process-level diagnostics. We instantiate this perspective using STAT, a controlled commitment-constrained spatial task-allocation testbed that systematically varies agents, tasks, and environment size while holding observation access and task rules fixed. We evaluate six representative value-based MARL methods across varying levels of centralization. Our results show that similar return trends can reflect distinct coordination mechanisms, including differences in redundant assignment, assignment diversity, and task-completion efficiency. We find that in commitment-constrained task allocation, performance under scale is shaped not only by nominal action-space size, but also by assignment pressure, sparse decision opportunities, and redundant choices among interdependent agents. Our findings motivate coordination-aware evaluation as a necessary complement to return-based benchmarking for cooperative MARL.
comment: 27 pages. Submitted and under review
Sustaining Cooperation in Populations Guided by AI: A Folk Theorem for LLMs
Large language models (LLMs) are increasingly used to provide instructions to many agents who interact with one another. Such shared reliance couples agents who appear to act independently: they may in fact be guided by a common model. This coupling can change the prospects for cooperation among agents with misaligned incentives. We study settings in which multiple LLMs each advise a population of clients who participate in instances of an underlying game, creating strategic interaction at the level of the LLMs themselves. This induces a meta-game among the LLMs, mediated through clients. We first analyze the one-shot setting, where shared instructions can change equilibrium behavior only when an LLM may influence more than one role in the same interaction; in such cases, cooperation may emerge, and the effect of client share can be beneficial, harmful, or non-monotone, depending on the base game. Our main result concerns the repeated setting. We prove a folk theorem for LLMs: despite indirect observation and the clients' inability to identify which LLM advised their opponents, all feasible and individually rational outcomes can be sustained as $\varepsilon$-equilibria. The result does not follow from the standard folk theorem and requires new proof techniques. Together, these results show that shared LLM guidance can sustain cooperation among populations of agents even when the underlying incentives are misaligned.
Optimizing Social Utility in Sequential Experiments
Regulatory approval of products in high-stakes domains such as drug development requires statistical evidence of safety and efficacy through large-scale randomized controlled trials. However, the high financial cost of these trials may deter developers who lack absolute certainty in their product's efficacy, ultimately stifling the development of `moonshot' products that could offer high social utility. To address this inefficiency, in this paper, we introduce a statistical protocol for experimentation where the product developer (the agent) conducts a randomized controlled trial sequentially and the regulator (the principal) partially subsidizes its cost. By modeling the protocol using a belief Markov decision process, we show that the agent's optimal strategy can be found efficiently using dynamic programming. Further, we show that the social utility is a piecewise linear and convex function over the subsidy level the principal selects, and thus the socially optimal subsidy can also be found efficiently using divide-and-conquer. Simulation experiments using publicly available data on antibiotic development and approval demonstrate that our statistical protocol can be used to increase social utility by more than $35$$\%$ relative to standard, non-sequential protocols.
AgenticPrecoding: LLM-Empowered Multi-Agent System for Precoding Optimization
Precoding is a key technique for interference management and performance improvement in multi-antenna wireless systems. However, existing precoding methods are typically developed for specific system models, objectives, and constraint sets, which limits their adaptability to the heterogeneous and evolving scenarios expected in future 6G networks. To address this limitation, we propose AgenticPrecoding, a universal multi-agent framework that automates end-to-end precoding derivation directly from user-level communication requirements. Specifically, AgenticPrecoding decomposes the derivation process into four coordinated stages: problem formulation, solver selection, prompt upsampling, and code generation, assigning each stage to a specialized agent tailored to its specific reasoning demands. We employ two LoRA-adapted reasoning agents to inject precoding-specific domain knowledge for problem formulation and solver selection, while two general-purpose Large Language Models (LLMs) handle prompt refinement and executable code generation. Furthermore, a feedback-driven refinement mechanism is incorporated to enhance code executability, constraint feasibility, and solution quality. Extensive experiments across 10 representative precoding scenarios demonstrate that AgenticPrecoding achieves superior cross-scenario adaptability compared to conventional optimization-based and LLM-based baselines.
Independent Learning of Nash Equilibria in Partially Observable Markov Potential Games with Decoupled Dynamics
We study Nash equilibrium learning in partially observable Markov games (POMGs), a multi-agent reinforcement learning framework in which agents cannot fully observe the underlying state. Prior work in this setting relies on centralization or information sharing, and suffers from sample and computational complexity that scales exponentially in the number of players. We focus on a subclass of POMGs with independent state transitions, where agents remain coupled through their rewards, and assume that the underlying fully observed Markov game is a Markov potential game. For this class, we present an independent learning algorithm in which players, observing only their own actions and observations and without communication, jointly converge to an approximate Nash equilibrium. Due to partial observability, optimal policies may in general depend on the full action-observation history. Under a filter stability assumption, we show that policies based on finite history windows provide sufficient approximation guarantees. This enables us to approximate the POMG by a surrogate Markov game that is near-potential, leading to quasi-polynomial sample and computational complexity for independent Nash equilibrium learning in the underlying POMG.
From Agent Loops to Deterministic Graphs: Execution Lineage for Reproducible AI-Native Work
Large language model systems are increasingly deployed as agentic workflows that interleave reasoning, tool use, memory, and iterative refinement. These systems are effective at producing answers, but they often rely on implicit conversational state, making it difficult to preserve stable work products, isolate irrelevant updates, or propagate changes through intermediate artifacts. We introduce execution lineage: an execution model in which AI-native work is represented as a directed acyclic graph (DAG) of artifact-producing computations with explicit dependencies, stable intermediate boundaries, and identity-based replay. The goal is not to make the model a better one-shot writer, but to make evolving AI-generated work maintainable under change. We compare execution-lineage replay against loop-centric update baselines on two controlled policy-memo update tasks. In an unrelated-branch update, DAG replay preserved the final memo exactly in all runs, with zero churn and zero unrelated-branch contamination, while loop baselines regenerated the memo and frequently imported unrelated context. In an intermediate-artifact edit, all systems reflected the new constraint in the final memo, but only DAG replay achieved perfect upstream preservation, downstream propagation, unaffected-artifact preservation, and cross-artifact consistency. These results show that final answer quality and maintained-state quality are distinct. Strong loop baselines can remain competitive at producing polished final outputs when the task is a bounded synthesis/update problem and all current sources fit in context, but immediate task success can mask partial state inconsistency that may compound over future revisions. Execution lineage provides stronger guarantees about what should change, what should remain stable, and how work evolves across revisions.
comment: 16 pages, 1 figure
Improving the Efficiency of Language Agent Teams with Adaptive Task Graphs
Large language models (LLMs) are increasingly deployed in teams, yet existing coordination approaches often occupy two extremes. Highly structured methods rely on fixed roles, pipelines, or task decompositions assigned a priori. In contrast, fully unstructured teams enable adaptability and exploration but suffer from inefficiencies such as error propagation, inter-agent conflicts, and wasted resources (measured in time, tokens, or file operations). We introduce Language Agent Teams for Task Evolution (LATTE), a framework for coordinating LLM teams inspired by distributed systems, where processors must operate under partial observability and communication constraints. In LATTE, a team of agents collaboratively construct and maintain a shared, evolving coordination graph which encodes sub-task dependencies, individual agent assignment, and the current state of sub-task progress. This protocol maintains consistency while empowering agents to dynamically allocate work, adapt coordination, and discover new tasks. Across multiple collaborative tasks and a variety of base models, we demonstrate how LATTE reduces token usage, wall-clock time, communication, and coordination failures (e.g. file conflicts and redundant outputs) while matching or exceeding the accuracy of standard designs including MetaGPT, decentralized teams, top-down Leader-Worker hierarchies, and static decompositions.
Power-Efficiency and Scalability Analysis of Magnetically-Actuated Satellite Swarms via Convex Optimization
This correspondence presents a convex-optimization-based evaluation framework of satellite-swarm-based apertures maintained by magnetic-field interactions. Spaceborne distributed apertures are composed of multiple satellites and are attractive for scientific and commercial missions because their scalability enables high-gain, narrow-beam, and large-aperture capabilities beyond the launch-size limitations. A key challenge is that the long-term maintenance of such virtual structures requires consistent formation control amid unstable orbital dynamics, and magnetic interactions generated by satellite-mounted magnetorquers offer a desirable propellant-free position-control strategy. However, the nonlinearities of the electromagnetic force and torque model lead to a nonconvex power-consumption constraint, making system-level configuration analysis difficult. To address this issue, we develop a convex optimization-based framework to analyze the power consumption of large magnetically actuated satellite swarms. The resulting analysis shows that increasing the number of satellites can improve formation-keeping power efficiency. This indicates that magnetically actuated swarm architectures provide a power-efficient alternative to the conventional few-satellite electromagnetic formation-flight concept for constructing large-scale space systems.
comment: Submitted to IEEE Transactions on Aerospace and Electronic Systems (Correspondence)
Multiagent Stochastic Shortest Path Problem IJCAI 2026
We introduce and study the multi-agent stochastic shortest path (MSSP) problem, in which $k$ agents strive to reach a target state, aiming to minimize the expected time to reach the target by any agent. We analyze the computational and strategy-complexity of the problem in both autonomous and coordinated settings, and we design efficient strategy-synthesis algorithms. The algorithms are experimentally evaluated on instances of increasing size against natural baselines.
comment: A full version of the paper that was presented at IJCAI 2026
BioResearcher: Scenario-Guided Multi-Agent for Translational Medicine
Translational medicine turns underspecified development goals into evidence synthesis that must combine literature, trials, patents, and quantitative multi-omics analysis while preserving identifiers, uncertainty, and retrievable provenance. General-purpose foundation models and off-the-shelf tool-augmented or multi-agent systems are not built for this: they tend to produce single-shot answers or run open-endedly, and fall short on the auditable, scenario-specific workflows that heterogeneous biomedical sources demand. This paper introduces Ingenix BioResearcher, a scenario-guided multi-agent system that maps queries to versioned research playbooks, delegates to specialized subagents over 30+ tools and machine-learning endpoints, mixes structured database access with sandboxed code for genome-scale analyses, and applies claim-level multi-model reconciliation before editorial assembly. We evaluate BioResearcher across unit-level capabilities, open-ended biomedical reasoning, and end-to-end clinical discovery. It leads evaluated baselines on 109 single-step tests (83.49% pass rate; 0.892 average score), achieves strong biomedical benchmark performance (89.33% on BixBench-Verified-50 and the top 0.758 mean score on BaisBench Scientific Discovery), and leads on a 30-query clinical end-to-end benchmark with the highest positive hit rate (74.7% $\pm$ 3.3%) and negative clear rate (96.8% $\pm$ 0.2%). These results show broad, competitive performance across unit-level, open-ended, and end-to-end clinical evaluations.
comment: 5 pages (main text), 21 pages (appendix), 8 figures, 11 tables
Auto Research with Specialist Agents Develops Effective and Non-Trivial Training Recipes
We study auto research as a closed empirical loop driven by external measurement. Each submitted trial carries a hypothesis, an executable code edit, an evaluator-owned outcome, and feedback that shapes the next proposal. The output is not a generated paper or a single model checkpoint, but an auditable trajectory of proposals, code diffs, experiments, scores, and failure labels. We instantiate this loop with specialist agents that partition recipe surfaces and share measured lineage across trials. The central empirical finding is that lineage feedback lets agents turn evaluator outcomes, including crashes, budget overruns, size failures, and accuracy-gate misses, into later program-level recipe edits rather than one-shot suggestions. Across 1,197 headline-run trials plus 600 Parameter Golf control trials after one-time setup and launch, humans did not choose proposals, edit recipes, override scores, or repair failed trials during the search. In the three headline runs, the same submitted-trial loop reduces Parameter Golf validation bpb by $0.81\%$, raises NanoChat-D12 CORE by $38.7\%$, and reduces CIFAR-10 Airbench96 wallclock by $4.59\%$, with each task measured by its own external evaluator and legality checks. The trace includes a strict architecture-domain audit of 157 headline-run submissions and program rewrites such as a NanoChat attention-kernel path change. Within this scope the loop autonomously writes code, submits experiments, absorbs feedback, applies and combines known techniques inside each environment, and improves public starting recipes.
Active Learning for Communication Structure Optimization in LLM-Based Multi-Agent Systems
Optimizing the communication structure of large language model based multi-agent systems (LLM-MAS) has been shown to improve downstream performance and reduce token usage. Existing methods typically rely on randomly sampled training tasks. However, tasks may differ substantially in difficulty and domain, and thus they are not equally informative for updating communication structure, making optimization under limited training budgets often unstable and highly sensitive to the particular training set. To actively identify the most valuable tasks for communication-structure optimization, we propose an ensemble-based information-theoretic task selection framework. The proposed method estimates task informativeness by how much a candidate task changes the distribution over graph parameters, using ensemble Kalman inversion as an efficient and derivative-free approximation of the corresponding Bayesian update. The resulting estimator is especially suitable for black-box and noisy multi-agent systems. To enhance scalability, we construct a compact candidate pool through embedding-based representative selection and combine the informative selection with surrogate modeling and batch Thompson sampling. We validate our method in both benign settings and settings with agent attacks, demonstrating its effectiveness for communication-structure optimization under constrained computational budgets.
Retrieval-Conditioned Topology Selection with Provable Budget Conservation for Multi-Agent Code Generation NeurIPS 2026
Multi-agent LLM systems for code generation face a fundamental routing problem: the optimal orchestration topology depends on the structural complexity of the code under modification, yet existing systems select topologies without consulting the codebase. We present Retrieval-Guided Adaptive Orchestration (RGAO), an architecture that closes this loop by extracting a structural complexity vector from a hierarchical code index before selecting the orchestration topology. RGAO operates within Code-Agent, a multi-agent framework whose sub-agents are governed by formal contracts with six-dimensional budget vectors. Our headline contribution is the composition of two previously separate lines of work -- complexity-conditioned LLM routing and formal resource algebras -- yielding a property neither admits alone: provable budget conservation under retrieval-conditioned dynamic topology selection. Concretely we contribute: (1) a complexity-conditioned topology router that reduces proxy-measured misrouting from 30.1% to 8.2%; (2) a budget algebra with a structural-induction conservation theorem; and (3) a hierarchical code retrieval engine. Empirical evaluation demonstrates sub-millisecond DAG construction and linear tree-index scalability.
comment: 30 pages, 9 figures. NeurIPS 2026 Evaluations and Datasets Track Submission Under review
Learning Material-Aware Hamiltonian Risk Fields for Safe Navigation
Risk-aware navigation should be selective: a policy should expose evasive degrees of freedom only when the local scene admits a lower-risk feasible maneuver, and suppress them when no safer alternative exists. We show that adding one context-energy term to a port-Hamiltonian navigation policy produces a learned force channel with exactly this falsifiable signature. When the local risk field contains a feasible lower-risk direction, the induced context force activates toward it; when the apparent escape is blocked or not yet available, a route-aware gate suppresses lateral force rather than hallucinating an unsafe maneuver. A CVaR tail-risk objective focuses gradient updates on rare but consequential risk transitions. We validate the selectivity signature across four settings. In the primary delayed-required-escape benchmark, route-aware CVaR reduces premature force activation from 0.950 to 0.180 versus DWA while raising success from 0.480 to 0.810 with zero replans. On real off-road terrain (RELLIS-3D), route-aware enrichment achieves correct activation rate 0.837 and false activation rate 0.114, compared to 0.378/0.752 for scalar risk gradients. On static semantic maps (DFC2018), enrichment reduces catastrophic failure from 0.60 to 0.10 and oscillation by 90.7% while preserving path efficiency. In highway traffic, collisions drop from 100% to 0% when a lane escape is feasible; when no escape exists, the policy suppresses the lateral maneuver. The selectivity property follows from the gradient structure of the context energy rather than from training-time tuning.
The Cost of Consensus: Malignant Epistemic Herding and Adaptive Gating in Distributed Multi-Agent Search
Distributed agents in real-world settings frequently must coordinate under uncertainty with only partial observations. Coordination is necessary to share beliefs to aid in task completion, but communication costs bandwidth, introduces latency, and if done poorly, can degrade collective reasoning. This tension is especially acute in bandwidth-constrained deployments such as distributed sensing networks, autonomous reconnaissance, and collaborative cyber defense, where excessive transmission carries direct operational costs. Existing work has focused on multi-agent exploration and communication strategies, but not on how communication frequency and content jointly shape the collective belief state. Central to this challenge is the degree to which agents maintain compatible internal beliefs about the environment, a property we term \textit{epistemic alignment}. When agents share beliefs effectively, they converge on correct hypotheses; when communication is poorly designed, agents may converge confidently on wrong ones. We formalize this distinction and show it is not detectable from coordination metrics alone such as Jensen-Shannon Divergence or rate to consensus.
Multi-Objective Constraint Inference using Inverse reinforcement learning
Constraint inference is widely considered essential to align reinforcement learning agents with safety boundaries and operational guidelines by observing expert demonstrations. However, existing approaches typically assume homogeneous demonstrations (i.e., generated by a single expert or multiple experts with identical objectives). They also have limited ability to capture individual preferences and often suffer from computational inefficiencies. In this paper, we introduce Multi-Objective Constraint Inference (MOCI), a novel framework designed to jointly extract shared constraints and individual preferences from heterogeneous expert trajectories, where multiple experts pursue different objectives. MOCI effectively models and learns from diverse, and potentially conflicting, behaviors. Empirical evaluations demonstrate that MOCI significantly outperforms existing baselines, achieving improved predictive performance, and maintaining competitive computational efficiency on a standard grid-world benchmark. These results establish MOCI as an accurate, flexible, and computationally practical approach for real-world constraint inference and preference learning tasks.
Bridging the Last Mile of Circuit Design: PostEDA-Bench, a Hierarchical Benchmark for PPA Convergence and DRC Fixing
LLM-based agents are increasingly applied to the "last mile" of Electronic Design Automation (EDA): repairing residual sign-off Design Rule Check (DRC) violations and converging Power-Performance-Area (PPA) targets after tool runs. Existing EDA-LLM benchmarks, however, omit DRC fixing entirely and rely on flat hierarchies tied to a single toolchain. We introduce PostEDA-Bench, a hierarchical benchmark with 145 tasks across DRC-Essential, DRC-Reasoning, PPA-Mono, and PPA-Multi, supported by EDA toolchains with machine-checkable evaluation. Across eight commercial and open-source LLMs under multiple agent scaffolds, we find that agents handle synthetic DRC-Essential and single-objective PPA-Mono reasonably well but degrade sharply on the more practical DRC-Reasoning, where the best success rate is 36.66%, and PPA-Multi, where the best success rate is 20.00%; vision augmentation consistently enhances DRC-Bench; and trade-off reasoning, rather than knob knowledge, is the dominant PPA-Multi bottleneck.
MAGIQ: A Post-Quantum Multi-Agentic AI Governance System with Provable Security
Our computing ecosystem is being transformed by two emerging paradigms: the increased deployment of agentic AI systems and advancements in quantum computing. With respect to agentic AI systems, one of the most critical problems is creating secure governing architectures that ensure agents follow their owners' communication and interaction policies and can be held accountable for the messages they exchange with other agents. With respect to quantum computing, existing systems must be retrofitted and new cryptographic mechanisms must be designed to ensure long-term security and quantum resistance. In fact, NIST recommends that standard public-key cryptographic algorithms, including RSA, Diffie-Hellman (DH), and elliptic-curve constructions (ECC), be deprecated starting in 2030 and disallowed after 2035. In this paper, we present MAGIQ, a framework for policy definition and enforcement in multi-agent AI systems using novel, highly efficient, quantum-resistant cryptographic protocols with proven security guarantees. MAGIQ (i) allows users to define rich communication and access-control policy budgets for agent-to-agent sessions and tasks, including global budgets for one-to-many agent sessions; (ii) enforces such policies using post-quantum cryptographic primitives; (iii) supports session-based enforcement of policies for agent-to-agent and one-to-many agent sessions; and (iv) provides accountability of agents to their users through message attribution. We formally model and prove the correctness and security of the system using the Universal Composability (UC) framework. We evaluate the computation and communication overhead of our framework and compare it with the state-of-the-art agentic AI framework SAGA. MAGIQ is a first step toward post-quantum-secure solutions for agentic AI systems.
Generalising Travel Time Prediction To Varying Route Choices In Urban Networks
Previous methods that predict system-wide travel time, predominantly grounded in graph neural networks, remain limited to typical and recurring demand patterns. While they successfully predict future congestion following daily commute, they inherently approximate a single demand realisation and fail to capture varying route choices. In this work, we propose a Generalised Travel Time Predictor (GenTTP) that successfully differentiates route choices and offers accurate flow and travel time predictions. Our framework learns to uncover complex spatiotemporal traffic patterns and microscopic relationships between route choices and the resulting travel times. This addresses a critical gap: the lack of travel time prediction models that generalise across varying route assignments, where the same demand can produce substantially different network-wide outcomes depending on how travellers are distributed over available paths.
Beyond the Black Box: Interpretability of Agentic AI Tool Use
AI agents are promising for high-stakes enterprise workflows, but dependable deployment remains limited because tool-use failures are difficult to diagnose and control. Agents may skip required tool calls, invoke tools unnecessarily, or take actions whose consequence becomes visible only after execution. Existing observability methods are mostly external: prompts reveal correlations, evaluations score outputs, and logs arrive only after the model has already acted. In long-horizon settings, these failures are especially costly because an early tool mistake can alter the rest of the trajectory, increase token consumption, and create downstream safety and security risk. We introduce a mechanistic-interpretability toolkit built on Sparse Autoencoders (SAEs) and linear probes. The framework reads model states before each action and infers both whether a tool is needed and how consequential the next tool action is likely to be. By decomposing activations into sparse features, it identifies the internal layers and features most associated with tool decisions and tests their functional importance through feature ablation. We train the probes on multi-step trajectories from the NVIDIA Nemotron function-calling dataset and apply the same workflow to GPT-OSS 20B and Gemma 3 27B models. The goal is not to replace external evaluation, but to add a missing layer: visibility into what the model signaled internally before action. This helps surface deeper causes of agent failure, especially in long-horizon runs where an early mistake can reshape the rest of the agentic interaction. More broadly, the paper shows how mechanistic interpretability can support practical internal observability for monitoring tool calls and risk in agent systems.
comment: 12 pages, 4 figures, 17 tables
Conformal Agent Error Attribution
When multi-agent systems (MAS) fail, identifying where the decisive error occurred is the first step for automated recovery to an earlier state. Error attribution remains a fundamental challenge due to the long interaction traces that large language model-based MAS generate. This paper presents a framework for error attribution based on conformal prediction (CP) which provides finite-sample, distribution-free coverage guarantees. We introduce new algorithms for filtration-based CP designed for sequential data such as agent trajectories. Unlike existing CP algorithms, our approach predicts sets that are contiguous sequences to enable efficient recovery and debugging. We verify our theoretical guarantees on a variety of agents and datasets, show that errors can be precisely isolated, then use prediction sets to rollback MAS to correct their own errors. Our overall approach is model-agnostic, and offers a principled uncertainty layer for MAS error attribution. We release code at https://github.com/layer6ai-labs/conformal-agent-error-attribution.
comment: 10 pages
MASPO: Joint Prompt Optimization for LLM-based Multi-Agent Systems ICML 2026
Large language model (LLM)-based Multi-agent systems (MAS) have shown promise in tackling complex collaborative tasks, where agents are typically orchestrated via role-specific prompts. While the quality of these prompts is pivotal, jointly optimizing them across interacting agents remains a non-trivial challenge, primarily due to the misalignment between local agent objectives and holistic system goals. To address this, we introduce MASPO, a novel framework designed to automatically and iteratively refine prompts across the entire system. A core innovation of MASPO is its joint evaluation mechanism, which assesses prompts not merely by their local validity, but by their capacity to facilitate downstream success for successor agents. This effectively bridges the gap between local interactions and global outcomes without relying on ground-truth labels. Furthermore, MASPO employs a data-driven evolutionary beam search to efficiently navigate the high-dimensional prompt space. Extensive empirical evaluations across 6 diverse tasks demonstrate that MASPO consistently outperforms state-of-the-art prompt optimization methods, achieving an average accuracy improvement of 2.9. We release our code at https://github.com/wangzx1219/MASPO.
comment: Accepted at ICML 2026
Designing Intelligent Enterprise Agents: A Capability-Aligned Multi-Agent Architecture
Enterprise interest in multi-agent systems has shifted from generic software agents to large-language-model (LLM) based intelligent agents that plan, use tools, maintain contextual memory, inspect intermediate results, collaborate with other agents, and sometimes act in systems of record. This paper revises the enterprise architecture thesis around a design-first claim: governance is necessary, but it cannot be the primary organizing abstraction. The primary abstraction must be agent design - capability boundaries, autonomy allocation, interaction protocols, tool and data authority, state and memory design, verification design, and human interaction design. We propose CEAD (Capability-Aligned Enterprise Agent Design), a reference architecture for intelligent agents that uses service-oriented architecture (SOA) as an exemplar for contracts, registries, loose coupling, and policy-aware integration, while explicitly rejecting the idea that services are agents. It treats microservices as a cautionary precedent: decomposition without design discipline produces distributed complexity, cost, operational fragility, and agent proliferation. We evaluate CEAD over 10,000 enterprise tasks, comparing five architectures: a prompt-first mono-agent, a role-based micro-agent swarm, SOA-brokered agents, a governance-first but design-poor agent grid, and the proposed CEAD architecture. CEAD achieves 70.6% safe success, versus 45.2% for the mono-agent baseline, 23.1% for the ungoverned micro-agent swarm, 58.8% for SOA-brokered agents, and 50.8% for the control-heavy, design-poor grid. The results support the conclusion that design quality is the first-order enterprise concern; governance, security, policy, audit, and assurance should support and enforce good design rather than substitute for it.
AGMARL-DKS: An Adaptive Graph-Enhanced Multi-Agent Reinforcement Learning for Dynamic Kubernetes Scheduling
State-of-the-art cloud-native applications require intelligent schedulers that can effectively balance system stability, resource utilisation, and associated costs. While Kubernetes provides feasibility-based placement by default, recent research efforts have explored the use of reinforcement learning (RL) for more intelligent scheduling decisions. However, current RL-based schedulers have three major limitations. First, most of these schedulers use monolithic centralised agents, which are non-scalable for large heterogeneous clusters. Second, the ones that use multi-objective reward functions assume simple, static, linear combinations of the objectives. Third, no previous work has produced a stress-aware scheduler that can react adaptively to dynamic conditions. To address these gaps in current research, we propose the Adaptive Graph-enhanced Multi-Agent Reinforcement Learning Dynamic Kubernetes Scheduler (AGMARL-DKS). AGMARL-DKS addresses these gaps by introducing three major innovations. First, we construct a scalable solution by treating the scheduling challenge as a cooperative multi-agent problem, where every cluster node operates as an agent, employing centralised training methods before decentralised execution. Second, to be context-aware and yet decentralised, we use a Graph Neural Network (GNN) to build a state representation of the global cluster context at each agent. This represents an improvement over methods that rely solely on local observations. Finally, to make trade-offs between these objectives, we use a stress-aware lexicographical ordering policy instead of a simple, static linear weighting of these objectives. The evaluations in Google Kubernetes Engine (GKE) reveal that AGMARL-DKS significantly outperforms the default scheduler in terms of fault tolerance, utilisation, and cost, especially in scheduling batch and mission-critical workloads.
SwarmCoDe: A Scalable Co-Design Framework for Heterogeneous Robot Swarms via Dynamic Speciation
Robot swarms offer inherent robustness and the capacity to execute complex, collaborative tasks surpassing the capabilities of single-agent systems. Co-designing these systems is critical, as marginal improvements in individual performance or unit cost compound significantly at scale. However, under traditional frameworks, this scale renders co-design intractable due to exponentially large, non-intuitive design spaces. To address this, we propose SwarmCoDe, a novel Collaborative Co-Evolutionary Algorithm (CCEA) that utilizes dynamic speciation to automatically scale swarm heterogeneity to match task complexity. Inspired by biological signaling mechanisms for inter-species cooperation, the algorithm uses evolved genetic tags and a selectivity gene to facilitate the emergent identification of symbiotically beneficial partners without predefined species boundaries. Additionally, an evolved dominance gene dictates the relative swarm composition, decoupling the physical swarm size from the evolutionary population. We apply SwarmCoDe to simultaneously optimize task planning and hardware morphology under fabrication budgets, successfully evolving specialized swarms of up to 200 agents -- four times the size of the evolutionary population. This framework provides a scalable, computationally viable pathway for the holistic co-design of large-scale, heterogeneous robot swarms.
comment: 8 pages, 9 figures
High entropy leads to symmetry equivariant policies in Dec-POMDPs
We prove that in any Dec-POMDP, sufficiently high entropy regularization ensures that the policy gradient flow with tabular softmax parametrization always converges, for any initialization, to the same joint policy, and that this joint policy is equivariant w.r.t. all symmetries of the Dec-POMDP. In particular, policies coming from different initializations will be fully compatible, in that their cross-play returns are equal to their self-play returns. Through extensive evaluation of independent PPO, arguably the standard baseline deep multi-agent policy gradient algorithm, in the Hanabi, Overcooked and Yokai environments, we find that the entropy coefficient has a massive influence on the cross-play returns between independently trained policies, and that the decrease in self-play returns coming from increased entropy regularization can often be counteracted by greedifying the learned policies after training. In Hanabi in particular we achieve a new SOTA in inter-seed cross-play this way. While we give examples of Dec-POMDPs in which one cannot learn the optimal symmetry equivariant policy this way, both our theoretical and empirical results suggest that one should consider far higher entropy coefficients during hyperparameter sweeps in Dec-POMDPs than is typically done.
AI Agents Alone Are Not (Yet) Sufficient for Social Simulation
Recent advances in large language models (LLMs) have spurred growing interest in using LLM-integrated agents for social simulation, often under the implicit assumption that realistic population dynamics will emerge once role-specified agents are placed in a networked multi-agent setting. This position paper argues that LLM-based agents alone are not (yet) sufficient for social simulation. We attribute this over-optimism to a systematic mismatch between what current agent pipelines are typically optimized and validated to produce and what simulation-as-science requires. Concretely, role-playing plausibility does not imply faithful human behavioral validity; collective outcomes are frequently mediated by agent-environment co-dynamics rather than agent-agent messaging alone; and results can be dominated by interaction protocols, scheduling, and initial information priors. To make these underlying mechanisms explicit and auditable, we propose a unified formulation of AI agent-based social simulation as an environment-involved Markov game with explicit exposure and scheduling mechanisms, from which we derive concrete actions for design, evaluation, and interpretation.
comment: 16 pages
Mapping Human Anti-collusion Mechanisms to Multi-agent AI Systems
As multi-agent AI systems become increasingly autonomous, evidence shows they can develop collusive strategies similar to those long observed in human markets and institutions. While human domains have accumulated centuries of anti-collusion mechanisms, it remains unclear how these can be adapted to AI settings. This paper addresses that gap by (i) developing a taxonomy of human anti-collusion mechanisms, including sanctions, leniency & whistleblowing, monitoring & auditing, market design, and governance and (ii) mapping them to potential interventions for multi-agent AI systems. For each mechanism, we propose implementation approaches. We also highlight open challenges, such as the attribution problem (difficulty attributing emergent coordination to specific agents), identity fluidity (agents being easily forked or modified), the boundary problem (distinguishing beneficial cooperation from harmful collusion), and adversarial adaptation (agents learning to evade detection).
Neural Power-Optimal Magnetorquer Solution for Multi-Agent Formation and Attitude Control
This paper presents a learning-based current calculation model to achieve power-optimal magnetic-field interaction for multi-agent formation and attitude control. In aerospace engineering, electromagnetic coils are referred to as magnetorquer (MTQ) coils and used as satellite attitude actuators in Earth's orbit and for long-term formation and attitude control. This study derives a unique, continuous, and power-optimal current solution via sequential convex programming and approximates it using a multilayer perceptron model. The effectiveness of our strategy was demonstrated through numerical simulations and experimental trials on the formation and attitude control.
comment: IEEE Robotics and Automation Letters. Preprint Version. Accepted April, 2026 (DOI: https://doi.org/10.1109/LRA.2026.3692064)
DiffeoMorph: Learning to Morph 3D Shapes Using Differentiable Agent-Based Simulations
Biological systems can form complex three-dimensional structures through the collective behavior of agents that share a common update rule and operate without central control. How such distributed control gives rise to precise global patterns remains a central question not only in developmental biology but also in distributed robotics, programmable matter, and multi-agent learning. Here, we introduce DiffeoMorph, an end-to-end differentiable framework for learning a morphogenesis protocol that guides a population of agents to morph into a target 3D shape. Each agent updates its position and internal state using an SE(3)-equivariant graph neural network, based on its own internal state and signals received from other agents. To train this system, we introduce a new shape-matching loss based on 3D Zernike polynomials, which compares the predicted and target shapes as continuous spatial distributions, not as discrete point clouds, and is invariant to agent ordering, number of agents, and global orientation. To achieve rotation invariance while preserving reflection sensitivity, we include an alignment step that optimally rotates the predicted Zernike spectrum to match the target before computing the loss. We perform benchmarking to establish the advantages of our shape-matching loss over other standard distance metrics for shape comparison tasks. We then demonstrate that DiffeoMorph can form a range of complex shapes from minimally patterned initial conditions. DiffeoMorph provides a general framework for learning distributed control strategies for morphogenesis, swarm robotics, and programmable self-assembly.
Discovering Multiagent Learning Algorithms with Large Language Models
Much of the advancement in Multi-Agent Reinforcement Learning (MARL) for imperfect-information games has historically depended on the manual, iterative refinement of algorithmic baselines. Recently, evolutionary coding agents powered by Large Language Models (LLMs) have emerged as powerful tools to automate this discovery process. In this work, we deploy one of such agentic frameworks, AlphaEvolve, to navigate the design spaces of two distinct game-theoretic paradigms: counterfactual regret minimization (CFR) and policy-space response oracles (PSRO). This automated search yielded two algorithms: Volatility-Adaptive Discounted (VAD-) CFR and Smoothed Hybrid Optimistic Regret (SHOR-) PSRO, which are consistently competitive with state-of-the-art human-designed baselines across an 18-game evaluation suite spanning Poker, Goofspiel, Liar's Dice, Blotto, and Battleship variants. However, because the LLM optimizes for fitness on a specific training set, it often constructs highly synergistic, complex mechanisms tailored to those environments. Through systematic ablation studies, we demonstrate that while these mechanisms are tightly coupled, the true driver of generalization lies in a minimal algorithmic core. By distilling the LLM's discoveries down to their most fundamental principles, we produce two minimal solvers: Warm-started Optimistic Predictive (WOP-)CFR and Projection Matching (PM-)PSRO. These distilled versions achieve superior performance on generalization with greatly reduced structural complexity, providing a clear methodology for using LLMs in algorithmic discovery.
comment: More experiments and analysis on algorithmic distilliation
Systems and Control (EESS)
Quantifying Trade-Offs Between Stability and Goal-Obfuscation
Safety-critical autonomy in adversarial settings demands more than Lyapunov stability of tracking error signals. An agent executing a goal-directed trajectory is intrinsically legible to a passive observer running online Bayesian inference, because the contractive dynamics of any Lyapunov basin of attraction concentrates posterior belief over the latent intent parameters. We initiates the study of intent privacy over a continuous state space as a joint control problem on the physical state combined with the latent belief state of a putative observer. With the main challenges concentrated around the analysis of the belief-state dynamics, the agent dynamics is assumed to be simple, modeled by the differential inclusion $\dot{x}\in u+\bar{d}\mathbb{B}$. That is, the agent is fully actuated with bounded unknown disturbance to the control input. The observer's intent inference process is modeled as a discrete-time stochastic dynamical system evolving over the belief state space of a Rao Blackwellized particle filter reasoning over large random samples of possible agent goals. The agent's control input is modeled as a piecewise constant signal, with jumps matching the RBPF update times. Building on a prior intent-inference framework and its KL-based information leakage measurement, a privacy constraint is imposed, which amounts to maintaining information leakage above a prescribed threshold with high probability, using probabilistic discrete-time control barrier functions. A key technical contribution is the derivation of separate PCBF results for the Bayesian update step and the resampling step of the RBPF, enabling a PCBF result for the full update as well as integration of the privacy constraint with the agent's task-side tracking requirement. Finally, a joint feasibility analysis is carried out by examining the interplay between the privacy constraint and the tracking envelope.
comment: 11 pages
Lie Group Formulation of Recursive Dynamics Algorithms of Higher Order for Floating-Base Robots
In this paper, we describe procedures for computing higher-order time derivatives of the Lie-group Newton-Euler, Articulated-Body Inertia, and hybrid dynamics algorithms for floating-base trees, where the base configuration evolves on SE(3) and the attached mechanism is an open kinematic tree with configuration on the (n1+n2)-dimensional manifold T^{n1} \times R^{n2}, using spatial representation of twists. After presenting the algorithms, we collect the resulting recursions into closed-form equations of motion, identifying an admissible Coriolis matrix satisfying the passivity property, and showing that the articulated inertia tensor remains unchanged across all time derivatives. We then apply the developed methods to a 12-DoF aerial manipulator to derive analytical expressions for its geometric forward and inverse dynamics along with their first time derivatives whereas the numerical simulations successfully evaluate these dynamics up to fifth order. Finally, to demonstrate their practical utility, we benchmark the proposed extensions and show that, in the considered tests, their computational cost scales quadratically with the derivative order, whereas the automatic-differentiation baseline exhibits exponential scaling.
Global self-optimizing control of batch processes
This work considers to achieve near-optimal operation for a class of batch processes by employing self-optimizing control (SOC). Comparing with a continuous one, a batch process exhibits stronger nonlinearity with dynamics because of the non-steady operation condition. This necessitates a global version of SOC to achieve satisfactory performance. Meanwhile, it also makes the existing global SOC (gSOC) not directly applicable to batch processes due to the causality amongst variables. Therefore, it is necessary to extend the original gSOC to batch processes. In addition to the nonconvexity challenge of the original gSOC problem, the new extension for batch processes has to face even more challenges. Particularly, the causality due to dynamics of batch processes brings in structural constraints on controlled variables (CVs), making a CV selection problem even more difficult. To address these challenges, the gSOC problem is recast in a vectorized formulation and it is proved that the structural constraints considered are linear in the vectorized formulation. Moreover, a novel shortcut method is proposed to efficiently find sub-optimal but more transparent solutions for this problem. The effectiveness of the new approach is validated through a case study of a fed-batch reactor, where CVs are constructed through a combination matrix with a repetitive structure, resulting in a simple SOC scheme. This simplicity facilitates the implementation of the SOC approach and enhances its practical applicability and robustness.
Dynamic Controlled Variables Based Dynamic Self-Optimizing Control
Self-optimizing control is a strategy for selecting controlled variables, where the economic objective guides the selection and design of controlled variables, with the expectation that maintaining the controlled variables at constant values can achieve optimization effects, translating the process optimization problem into a process control problem. Currently, self-optimizing control is widely applied to steady-state optimization problems. However, the development of process systems exhibits a trend towards refinement, highlighting the importance of optimizing dynamic processes such as batch processes and grade transitions. This paper formally introduces the self-optimizing control problem for dynamic optimization, termed the dynamic self-optimizing control problem, extending the original definition of self-optimizing control. A novel concept, "dynamic controlled variables" (DCVs), is proposed, and an implicit control policy is presented based on this concept. The paper theoretically analyzes the advantages and generality of DCVs compared to explicit control strategies and elucidates the relationship between DCVs and traditional controllers. Moreover, this paper puts forth a data-driven approach to designing self-optimizing DCVs, which considers DCV design as a mapping identification problem and employs deep neural networks to parameterize the variables. Three case studies validate the efficacy and superiority of DCVs in approximating multi-valued and discontinuous functions, as well as their application to dynamic optimization problems with non-fixed horizons, which traditional self-optimizing control methods are unable to address.
Performance guaranteed MPC Policy Approximation via Cost Guided Learning
Model predictive control (MPC) is widely used in industries but implementing it poses challenges due to hardware or time constraints. A promising solution is to approximate the MPC policy using function approximators like neural networks. Existing methods focus on minimizing the error between the approximators outputs and the MPC optimal control actions on training data, which is called error guided learning approach in this paper. However, the goals of control law design is not to minimize the fitting error but to minimize the operation cost. This paper proposes a novel cost-guided learning approach that utilizes the cost sensitivity information from the MPC problem to directly minimize the loss in closed-loop performance. A theoretical analysis shows cost-guided learning provides tighter guarantees on optimality loss compared to traditional error-guided learning. Experiments on a continuous stirred tank reactor (CSTR) benchmark demonstrate that the proposed technique results in approximate MPC policies that achieve substantially better closed-loop performance. This work makes an important contribution by connecting the fitting errors with operational objectives, overcoming key limitations of existing approximation methods. The core idea could be applied more broadly for data-driven control.
Probabilistic Assessment of Rare Transient Instability Events via Kriging-based Active Learning Framework
The increasing uncertainty in modern power systems, driven by the integration of intermittent energy sources and variable loads, underscores the need for probabilistic transient stability assessment. However, existing assessment methods primarily focus on average system stability behavior and may struggle or incur high computational cost when identifying rare transient instability events, which in turn are critical for ensuring system resilience. To address this, the paper proposes a Kriging-based active learning framework to accurately characterize rare instability regions within the input uncertainty space and estimate the associated small instability probability, while requiring only a limited number of expensive time-domain simulations. The proposed active learning (AL) framework is tested on a modified IEEE 59-bus system with simulated load and wind uncertainties, and a WECC 240-bus system incorporating real-world wind and solar generation data. Comparative studies with the existing random forest-based active learning method and three non-AL methods demonstrate that the proposed AL framework achieves superior accuracy and computational efficiency.
comment: Accepted by International Journal of Electrical Power and Energy Systems for future publication
Distributed Online Learning for Time-Critical Communication in 6G Industrial Subnetworks
6G industrial in-X subnetworks are expected to support highly time-critical alarm reporting in large-scale environments characterized by mobility, bursty event-driven traffic, and limited radio resources. In such settings, conventional medium access solutions are ill-suited to guarantee reliable delivery of critical traffic, e.g., emergency alarms, within strict deadlines, especially when multiple subnetworks become simultaneously active after a common alarm event, a scenario widely referred as medium access with a shared message. This paper proposes a distributed deep reinforcement learning (DRL)-based medium access control protocol for timely alarm transmission in time-critical industrial subnetworks. The proposed method enables each local access point (LAP) to learn, in an online manner, to infer contention conditions from a broadcast contention-signature signal and to autonomously select a transmission pattern over the available channels using a lightweight deep neural network and an (ephsilon)-greedy policy. Simulation results demonstrate that the proposed approach consistently achieves a higher probability of in-time alarm delivery than benchmark random-access schemes, while exhibiting better scalability with increasing network density. For instance, the proposed method improves probability of in-time alarm delivery by at least 7% with a network size of 40 subnetworks, while the gain increases to 21% when the number of subnetworks increases to 60.
Residual-Corrected Equivalent-Circuit Model with Universal Differential Equations for Robust Battery Voltage Prediction under Operating-Condition Shift
Accurate terminal-voltage prediction underpins model-based battery management, yet low-order equivalent-circuit models (\ecm{}) lack expressiveness under transient conditions, whereas purely data-driven predictors sacrifice interpretability and may degrade under operating-condition shift. This paper introduces a residual-corrected hybrid formulation in which a first-order Thevenin \ecm{} (\ecmrc{}) provides the dominant voltage structure, and a compact neural network embedded as a universal differential equation (\ude{}) corrects only the latent polarization mismatch. The \ecmrc{} parameters identified by nonlinear least squares warm-start the hybrid model so that the learned component operates in a low-residual regime. Experiments on a public Panasonic 18650PF dataset compare the proposed \ecmude{} with standalone \ecmrc{} and Long Short-Term Memory (\lstm{}) baselines across four axes: matched-condition prediction on UDDS at \SI{25}{\celsius}, inference-time perturbation of the supplied state-of-charge (\SOC{}, denoted $z$) input, zero-shot temperature transfer (\SI{25}{\celsius} to \SI{-20}{\celsius}), and zero-shot drive-cycle transfer to US06, LA92, and HWFET. The proposed \ecmude{} achieves the lowest voltage error in every setting, reducing mean absolute error (\mae{}) by 48\% relative to the \lstm{} under matched conditions and showing an order-of-magnitude lower inter-seed variability (coefficient of variation: 0.44\% vs.\ 6.20\%). Substantial gains persist under challenging distribution shifts, indicating that the physical model anchors prediction where a purely learned model is most vulnerable. These results position residual-corrected \ecmude{} as a lightweight and interpretable enhancement of low-order circuit models for voltage prediction in battery management systems (\bms{}).
Unbalanced Optimal Transport and Density Control for Discrete-Time Linear Systems
This article studies unbalanced optimal transport (UOT) and its dynamical extension, unbalanced density control (UDC), for a class of constrained discrete-time linear systems. UOT compares measures with unequal total mass by balancing transport cost and fidelity to reference measures, while UDC incorporates system dynamics and constraints into this framework. Focusing on Gaussian references and discrete-time linear systems, we show that both problems admit globally optimal convex formulations, analogous to covariance steering. A numerical experiment is provided to illustrate our approach.
comment: To appear in the Proceedings of MTNS 2026 (extended abstracts). Submitted on February 15, 2026; accepted on April 20, 2026. A significantly expanded version containing additional theoretical results, complete proofs, and numerical experiments, is available at: arXiv:2605.04246v1
Synthesis of Limit Cycles and Reference Tracking via Switching Affine Systems
This paper introduces a novel method to approximate limit cycles of nonlinear ODEs by use of switching affine dynamics in order to ease data-based modeling and analysis. Previous approaches to approximating limit cycles by switching systems have been largely confined to simple partitions into two-regions or low-dimensional (often planar) settings. In contrast, this study utilizes more general partitions in higher-dimensional state spaces, augmented by external signals, to develop a synthesis scheme that guarantees a globally stable limit cycle. The synthesis task is formulated and solved based on constrained numerical optimization. Starting from sampled data of the nonlinear dynamics, the method minimizes the error between the data and the limit cycle generated by the switching affine model, while employing stability constraints to ensure global stability. Based on the obtained model, the paper tackles the problem of reference tracking for switching affine systems with periodic behavior. While the approximation scheme is based on a common Lyapunov function, the reference tracking approach uses multiple Lyapunov functions to achieve less conservative convergence results. The principle and effectiveness of the proposed methods are illustrated through a set of examples.
Unifying Goal-Conditioned RL and Unsupervised Skill Learning via Control-Maximization
Unsupervised pretraining has driven empirical advances in goal-conditioned reinforcement learning (GCRL), but its theoretical foundations remain poorly understood. In particular, an influential class of methods, mutual information skill learning (MISL), discovers behaviorally diverse skills that can later be used for downstream goal-reaching. However, it remains a theoretical mystery why skills learned through MISL should support goal-reaching. A subtle challenge is that both GCRL and MISL are umbrella terms: different GCRL tasks use distinct criteria for measuring goal-reaching performance, while different MISL methods optimize distinct notions of behavioral diversity. We address this challenge and unify GCRL and MISL as instances of control maximization. We identify three canonical GCRL formulations and prove that they are fundamentally inequivalent: they can induce incompatible optimal policies even in the same environment. Nevertheless, they all share a common interpretation: a well-performing goal-conditioned policy is one whose future trajectory is highly sensitive to the commanded goal, with the precise notion of sensitivity determined by the GCRL formulation. Noting that MISL objectives can be understood as measures of skill-sensitivity akin to goal-sensitivity, we show that MISL objectives are bounded by formulation-specific downstream goal-sensitivities. These bounds establish a precise correspondence between MISL methods and downstream GCRL tasks: for every GCRL formulation, there exists a matching MISL objective for which more diverse skills afford greater downstream goal sensitivity. Our results thus lay a theoretical foundation for RL pretraining and have important practical implications, such as suggesting which pretraining objectives to use when a user cares about a specific class of downstream tasks.
Absolute Stability of Nonlinear Negative Imaginary Systems with Application to Potential Energy Shaping
This paper establishes absolute stability conditions for nonlinear negative imaginary (NI) systems interconnected with static nonlinear feedback. We first show that the NI property is preserved when the feedback nonlinearity can be expressed as the gradient of a continuously differentiable function, and the composite storage of the resulting system remains positive definite. This condition provides a direct connection between nonlinear static feedback and storage-function shaping along the measured output channels. Building on this result, conditions are derived for absolute stability of the closed-loop system under mild assumptions. The linear specialization of the results strictly generalizes prior absolute stability results for linear NI systems, allowing coupled nonlinearities not covered by existing slope-restricted or sector-bounded frameworks. Finally, the proposed theory is illustrated through a linear example highlighting this generalization and a nonlinear example that shows the utility of the proposed results in potential energy shaping.
comment: 8 pages, 7 figures
Safety Certification is Classification
The goal of this paper is certifying safety of dynamical systems subject to uncertainty. Existing approaches use trajectory data to estimate transition probabilities, and compute safety probabilities recursively via dynamic programming (DP). This recursion may lead to compounding errors in the certified safety probability, thus collapsing to a vacuous lower bound for growing horizons $T$. We propose a kernel embedding framework that treats safety certification as a classification problem on trajectory data, directly estimating the $T$-step safety probability without recursion. We show that the framework subsumes well-established approaches from the literature (e.g., barrier certificates, robust Markov models) as special cases, and allows us to go beyond their limitations. As the main consequence, it bypasses compounding error across the horizon and enables certification for systems with non-Markovian dynamics. We demonstrate that direct estimators remain stable independent of the certification horizon and in the non-Markovian setting, whilst DP-based certificates silently go unsound -- confirmed in simulation on a neural-controlled quadrotor.
comment: 32 pages, 18 figures
Monitoring autonomous persistent surveillance missions using invariance ICRA 2026
This paper studies runtime monitoring for persistent surveillance by autonomous robots when the autonomy stack is a black box. The environment is partitioned into finitely many parts, each carrying an uncertainty state that decreases when observed and increases otherwise. We model the closed loop as a state-dependent hybrid system with linear parameter varying dynamics and design a monitor based on an invariant computed offline. As this invariant is typically hard to obtain for large to-be-surveyed spaces, we propose a compositional monitor obtained by decentralized computation of low-dimensional invariant sets for each uncertainty region, and checking their conjunction online. Under common independence assumptions, the compositional monitor is sound and complete with respect to the full-system invariant. The approach is applied in a case study with a real robot persistently monitoring a labyrinth, emphasizing its applicability in practice.
comment: Accepted at IEEE ICRA 2026
Arbitrage and the Stability of AMM Price Tracking
Automated market makers (AMMs) quote prices from pool state rather than from a limit order book. AMM pools often stay close to a reference price because arbitrageurs correct profitable mispricing. A large part of decentralized finance therefore relies on a simple economic premise: once the AMM price drifts away from the reference price, arbitrage incentives push it back. This paper studies when that premise is strong enough to guarantee block-scale stability. We model the gap between the reference price and the AMM price as a stochastic tracking error, treat arbitrage as the corrective input, and place blockchain execution inside the loop through fees, discrete blocks, transaction ordering, delays, and transaction failure. The detailed execution layer is reduced to the total successful correction confirmed in each block. Under a block-level correction condition, we prove geometric ergodicity of the tracking error and obtain explicit one-step bounds that connect tracking quality to liquidity and execution quality. We also show in a constant-product example how fees, fixed execution costs, and local liquidity map into the no-trade band and the optimal corrective trade. Finally, we build empirical proxies for the theorem quantities from realized block data and use them to organize reduced and mechanism-focused simulations whose comparative statics are consistent with the theory. The contribution is to turn a basic economic intuition behind decentralized finance into a quantitative stability statement together with a tractable calibration interface.
Kirigami-Structured Electronic Capsule for Long-Term Continuous Gastric Monitoring
Ingestible electronic systems enable non-invasive, in situ sensing within the gastrointestinal (GI) tract, yet clinical translation has been limited by uncontrolled transit, short operational lifetimes, and unreliable wireless communication that prevent continuous monitoring. Here, we present a gastric-resident ingestible robotic platform that achieves week-long operation through integration of a bioinspired, electrically triggered release mechanism with a kirigami-enabled electronic architecture. A kirigami-patterned flexible printed circuit board spans the capsule body and deployable superelastic arms, enabling high-density integration of sensing, power management, and wireless modules within a constrained volume while tolerating large mechanical deformation during gastric residence. Stable retention and on-demand disassembly are achieved using thermally responsive polycaprolactone joints that transition from rigid to compliant states under electrical activation, avoiding dependence on variable chemical triggers. Reliable telemetry in the highly attenuating gastric environment is maintained using a dual-band Bluetooth Low Energy and sub-gigahertz module with RSSI- and throughput-aware adaptive transmission, balancing link robustness and energy consumption. We demonstrate long-term, continuous monitoring of gastric radiation exposure, enabling early detection of dose accumulation and providing a promising in vivo alternative to wearable or handheld dosimeters. Swine studies confirm stable gastric residence, sustained real-time telemetry, and safe gastrointestinal passage following triggered disassembly. This work establishes kirigami-enabled integration as a scalable strategy for long-term gastric-resident robotic systems.
SOPF-Based Adaptive Droop Control for Hybrid AC--HVDC Grids Under Offshore Wind Uncertainty
The integration of massive offshore wind into hybrid AC-HVDC grids demands robust DC voltage regulation, yet conventional fixed-gain droop controllers struggle under severe stochastic volatility. This paper bridges the gap between system-level economic dispatch and converter-level control by proposing a novel Stochastic Optimal Power Flow (SOPF)-based adaptive droop framework. Rather than relying on heuristic or reactive tuning, wind forecast uncertainty is modeled using a zone-wise Beta distribution that accurately captures the heteroscedastic nature of wind errors across low, mid, and high power regimes. By leveraging Polynomial Chaos Expansion (PCE) within a chance-constrained SOPF, the system's stochastic states are formulated analytically. Crucially, the optimal adaptive droop gain is extracted directly from the first-order PCE coefficients via a Jacobian-free sensitivity analysis, embedding statistical voltage-security guarantees directly into the local converter control. Validation on a 4-terminal AC-HVDC system demonstrates that scenario-adaptive gains significantly outperform standard fixed-coefficient approaches, effectively minimizing active-power tracking errors during extreme wind disturbances.
Community-to-Vehicle: Integrating Electric Vehicles into Energy Communities -- A Swiss Case Study
The institutional separation between local energy communities and public electric vehicle (EV) charging limits the efficient use of locally generated renewable energy. This paper introduces the concept of community-to-vehicle (C2V) as an institutional design mechanism to bridge this gap by enabling EV charging within the community boundary, where locally generated photovoltaic (PV) surplus is preferentially allocated and offered to external users at a community charging price. Building on the recently introduced local electricity community framework in Switzerland, we design scenarios that capture the transition from full separation to coordinated EV charging and evaluate their impacts on EV users and the community. The results show that C2V significantly improves local PV utilization and enhances economic performance, reducing EV charging costs relative to commercial alternatives while generating additional revenue streams for the community. These findings highlight the potential of C2V as a practical, implementable mechanism for integrating EV charging into local energy communities, providing a clear pathway for adopting coordinated community-EV interaction within existing regulatory frameworks.
Foundation Twins: A New Generation of Power Systems Digital Twins using Foundation AI Models
Power systems are inherently multi-timescale systems, with different physical phenomena and decision-making processes spanning multiple timescales, time horizons, and geographic scopes. I envision power systems digital twins (DTs) as powerful modeling and simulation tools that can accelerate and improve decision-making across different time scales and geographic scopes. However, until now, research has not delivered such a vision, and power systems DTs remain a concept distant from implementation. This is not a regular research paper. This is a position paper that outlines my vision for developing a new generation of power systems DTs that leverage recent advances in artificial intelligence (AI) and machine learning (ML). I call these Foundation Twins. Foundation Twins combines the generalization features of foundation models with the decision-making capabilities of reinforcement learning (RL) architectures to deliver the envisioned power systems DTs.
comment: 6 pages
Consideration of Control-Loop Interaction in Transient Stability of Grid-Following Inverters using Bandwidth Separation Method
Grid-following inverters have been widely adopted as a grid interface for renewable energy, and ensuring their small-signal and large-signal stability is critical to modern power systems. Their large-signal, or transient, stability is a significant challenge to analyze because of the interaction of the phase-locked loop (PLL), which must maintain synchronism with various outer-loop controllers. Simple analysis in which outer-loop controllers are idealized is insufficient, and the interactions between the nonlinear dynamics of the PLL and the dynamics of the DC-link voltage control (DVC), as well as the AC terminal voltage control (TVC) when present, must be considered. An asymptotic analysis approach, termed the bandwidth separation method, is proposed. This method enables simplification and order reduction of the original differential equations when sufficient bandwidth separation exists. Through this method, the interaction between the DVC and PLL is explicitly characterized, revealing that such interaction degrades system stability and shrinks the stability region. The analysis also indicates that voltage instability, rather than PLL loss of synchronization alone, is often the root cause of transient instability. Optimal bandwidth configurations for the PLL and DVC are identified under various grid fault conditions: a larger PLL bandwidth improves resilience to phase-jump faults, while a larger DVC bandwidth enhances tolerance to power fluctuations. In addition, the influence of the TVC loop is analyzed, showing that a high TVC bandwidth can mitigate the destabilizing effects of PLL-DVC interaction and further improve transient stability. All analytical findings are validated through hardware-in-the-loop (HIL) experiments.
PREFER: Personalized Review Summarization with Online Preference Learning
Product reviews significantly influence purchasing decisions on e-commerce platforms. However, the sheer volume of reviews can overwhelm users, obscuring the information most relevant to their specific needs. Current e-commerce summarization systems typically produce generic, static summaries that fail to account for the fact that (i) different users care about different product characteristics, and (ii) these preferences may evolve with interactions. To address the challenge of unknown latent preferences, we propose an online learning framework that generates personalized summaries for each user. Our system iteratively refines its understanding of user preferences by incorporating feedback directly from the generated summaries over time. We provide a case study using the Amazon Reviews'23 dataset, showing in controlled simulations that online preference learning improves alignment with target user interests while maintaining summary quality.
Investigation of Wound Field Synchronous Machines using Soft Magnetic Composites for Automotive Applications
This paper investigates the application of soft magnetic composites (SMCs) in the stators of wound field synchronous machines for automotive traction. While SMCs are traditionally employed in axial flux topologies, this study examines their use in radial-flux electrically excited synchronous machines (EESMs). Multiple SMC materials and lamination thicknesses are evaluated, with the optimal configuration combining a SMC material in the stator and 0.35 mm NO35 laminated steel in the rotor. This combination delivers improved torque and efficiency compared to conventional designs. When integrated into a full electric drive unit (EDU), this motor achieves 89.7% efficiency over the WLTP drive cycle, representing a 1.4 percentage point improvement over a reference permanent magnet synchronous machine-based EDU. The proposed solution eliminates rare-earth materials, reduces cost through thicker laminations, and offers environmental benefits through SMC utilization. This novel material combination, previously unexplored for radial EESMs, presents a promising direction for affordable, high-efficiency, rare-earth-free automotive traction machines.
A Disaster-Aware Integrated TN-NTN System-Level Simulator for Resilient 6G Wireless Networks
Non-terrestrial networks (NTN) have been standardized by the 3rd generation partnership project (3GPP) as a key component of future 6G systems to enhance coverage and resilience. In particular, NTN technologies such as low-earth orbit (LEO) satellites, high-altitude platform stations (HAPS), and unmanned aerial vehicles (UAVs) are expected to support terrestrial networks (TN) during extreme events and disasters. In this paper, we present a lightweight system-level simulator for evaluating post-failure fallback behavior in integrated TN-NTN wireless networks under a partial-failure disaster model. The simulator follows 3GPP Rel-17/18 modeling principles, supports probabilistic terrestrial next-generation node B (gNB) failures, and service migration to NTN. The simulator supports comparative analysis of throughput, packet reception ratio (PRR), and latency under different user loads, disaster severities, and NTN provisioning levels. Results show the expected capacity-delay tradeoff of terrestrial operation, the reliability and stability of non-terrestrial service, and the balanced resilience behavior of hybrid TN-NTN operation. The proposed framework provides a tractable tool for studying wireless network resilience and traffic management in future integrated 6G mobile systems.
comment: 6 pages, 4 figures, IEEE PIMRC
Comparative Analysis of Direct-to-Cell (D2C) and 3GPP Non-Terrestrial Networks (NTN) for Global Connectivity
The quest for ubiquitous mobile coverage has catalyzed two fundamentally distinct architectural paradigms: Direct-to-Cell (D2C) and standardized 3GPP Non-Terrestrial Networks (NTN). D2C, pioneered by SpaceX Starlink and AST SpaceMobile, leverages existing terrestrial spectrum and unmodified consumer handsets to provide emergency connectivity as a market-driven overlay. In contrast, 3GPP NTN, standardized across Releases 17-19, offers a systematic satellite-native framework designed for long-term scalability, high-throughput broadband, and deep integration with terrestrial 5G/6G networks. This paper presents a comprehensive technical comparison of these approaches, analyzing their standardization trajectories, network architectures, physical-layer innovations, security postures, and operational trade-offs. We further examine their implications for emerging 6G use cases, particularly autonomous driving, where safety-critical redundancy motivates a hybrid tri-link architecture combining terrestrial 5G, NTN broadband, and D2C emergency fallback. Our analysis shows that, although D2C enables rapid market entry through legacy-device compatibility, NTN provides superior performance, security, and scalability, positioning it as the foundational framework for 6G satellite-terrestrial convergence. A hybrid model that combines the strengths of both paradigms is identified as the most practical path toward truly global connectivity.
comment: 7 pages, 2 figures, IEEE VTC fall 2026
WARP: A Benchmark for Primal-Dual Warm-Starting of Interior-Point Solvers
Solving AC Optimal Power Flow (AC-OPF) is of central importance in electricity market operations, where interior-point methods (IPMs) such as IPOPT are the standard solvers. A growing body of work uses machine learning to predict primal warm-start iterates, reporting iteration reductions of 30-46\%. We show that these reported gains rest on an inappropriate evaluation baseline: prior methods benchmark against the flat start $V_m = 1, V_a = 0$, whereas the solver's actual default - the variable-bound midpoint $(l+u)/2$ - is near-optimal for log-barrier centrality. Against this corrected baseline, no primal-only warm-start method reduces solver iterations. We trace the failure to a geometric property of interior-point methods: primal prediction accuracy is anticorrelated with convergence speed, and providing the ground-truth optimal solution $x^*$ without dual variables causes the solver to diverge. Oracle experiments establish that the complete primal-dual-barrier state $(x^*, λ^*, z^*, μ^*)$ reduces IPOPT iterations from 23 to 3 - an 85\% reduction that is structurally inaccessible to primal-only methods. To enable rigorous evaluation of warm-start methods on this task, we release a benchmark suite comprising dual-labeled AC-OPF datasets with IPOPT-extracted solutions, a corrected evaluation protocol, and WARP - a topology-conditioned encode-process-decode interaction network that predicts the full interior-point state $(\hat{x}, \hatλ, \hat{z}, \hatμ)$ on the heterogeneous constraint graph. WARP achieves a 76\% reduction in IPOPT iterations while natively accommodating N-1 contingency topology variations without retraining.
Space-Time Diversity in Observability and Estimation on Product Lie Groups
Robust state estimation in coupled dynamical systems depends critically not only on sensor quality but on the structural alignment between observation channels and the system's intrinsic dynamics. This paper develops a rigorous framework for analyzing spatial and temporal diversity in dynamical state estimation on product Lie groups, drawing structural parallels to diversity gains in space-time coding. Three main results are established: (i) coupling-based necessary and sufficient conditions for cross-factor observability, showing that a sensor local to one group factor renders another factor observable if and only if the dynamics propagate error directions across the corresponding Lie algebra components; (ii) a spatial diversity saturation theorem identifying precisely when additional observation channels fail to expand the propagated observation subspace and thus provide no structural benefit; and (iii) a time-space diversity decomposition that exactly separates instantaneous spatial information from accumulated temporal information in the estimation error covariance. The framework is applied to planar SE(2) and spatial SE(3) navigation, yielding exact observability guarantees for redundant and non redundant sensor architectures in modern robotics and autonomous vehicles. These results extend classical observability theory beyond Euclidean state spaces, exposing structural constraints invisible to standard rank-based analysis that fundamentally govern robust inference in coupled dynamical systems.
comment: 6 Pages (two columns), 1 figure 2 tables and an alogorithm. This work has been submitted to the IEEE for possible publication
Maximal Controlled Invariant-MPC: Enhancing Feasibility and Reducing Conservatism through Terminal CBF Constraint in Safety-Critical Control
Optimal control for safety-critical systems is often dependent on the conservativeness of constraints. Control Barrier Functions (CBFs) serve as a medium to represent such constraints, but constructing a minimally conservative CBF is a computationally intractable problem. Therefore, approaches that can guarantee safety while reducing conservatism will help improve the optimality of the system under consideration. Here, we present a Model Predictive Control (MPC) formulation using CBF as a terminal constraint, which is proven to improve feasibility and reachable sets with increasing prediction horizon. The constructive nature of the proofs allows for warm-starting the nonlinear optimization problem, thereby reducing the computational time substantially. Simulations are set up for a simple nonholonomic system to numerically validate the results, and it is observed that the number of infeasible points decreased by a factor of 1.7 to 2.7. The increase in reachable state space was demonstrated by the ability of the system to track trajectories that are entirely inside the unsafe region of the control barrier function.
comment: Under review
AirBender: Adaptive Transportation of Bendable Objects Using Dual UAVs
The interaction of robots with bendable objects in midair presents significant challenges in control, often resulting in performance degradation and potential crashes, especially for aerial robots due to their limited actuation capabilities and constant need to remain airborne. This paper presents an adaptive controller that enables two aerial vehicles to collaboratively follow a trajectory while transporting a bendable object without relying on explicit elasticity models. Our method allows on-the-fly adaptation to the object's unknown deformable properties, ensuring stability and performance in trajectory-tracking tasks. We use Lyapunov analysis to demonstrate that our adaptive controller is asymptotically stable. Our method is evaluated through hardware experiments in various scenarios, demonstrating the capabilities of using multirotor aerial vehicles to handle bendable objects.
A Semi-smooth Newton Method for the Constrained Optimal Control of Continuous-Time Linear Systems
This paper details a novel indirect method for solving constrained optimal control problems (OCPs) directly in continuous-time function space. The KKT conditions are embedded in a non-smooth complementarity function, which enables their reformulation as a rootfinding problem in Banach space. This problem is then solved using a non-smooth Newton method. Finally, the paper shows that the Newton update can be obtained by solving a modified differential Riccati equation, where the cost terms are reweighted at every iteration based on the constraint multipliers. Numerical simulations show the effectiveness of the method, which converges superlinearly up to the tolerance of the ODE solver.
Decentralized Time-Varying Optimization for Streaming Data via Temporal Weighting
Classical optimization theory largely focuses on fixed objective functions, whereas many modern learning systems operate in dynamic environments where data arrive sequentially and decisions must be updated continuously. In this work, we study optimization with streaming data over a distributed network of agents. We adopt a structured, weight-based formulation that explicitly captures the streaming-data origin of the time-varying objective: at each time step, every agent receives a new sample, and the network seeks to track the minimizer of a temporally weighted objective formed from all samples observed across the network so far. We focus on decentralized gradient descent (DGD) with a limited communication/computation budget, where at each time step, only a limited number of DGD iterations can be performed before the objective changes again. For strongly convex and smooth losses, we analyze the tracking error with respect to the time-varying minimizer through a fixed-point theory lens. Our analysis reveals that the tracking error decomposes into a fixed-point tracking term and a bias term induced by data heterogeneity across agents. We specialize the analysis to two natural weighting strategies: uniform weights, which treat all samples equally, and exponentially discounted weights, which geometrically decay the influence of older data. Under uniform weighting, DGD tracks the fixed-point at a rate $\mathcal{O}(1/t)$, whereas discounted weighting yields a non-vanishing fixed-point tracking floor controlled by the discount factor. In both cases, decentralization induces an additional non-zero bias floor under a constant step size. We validate our theoretical findings through numerical simulations.
A New Simple-to-Configure Self-Perturbing Multivariable Extremum-Seeking Controller
This paper presents a new stochastic relay-based extremum-seeking controller (ESC) for multi-input-single-output (MISO) systems. The goal of this work was to create an algorithm that is much simpler to configure than alternative approaches making deployment to real-world problems easier. A solution is developed first for a static map and then adapted for a general class of dynamic systems. The number of configurable parameters is one per input channel for the static case and only one additional parameter is needed for the dynamic version. The problem of gradient identification is solved via the use of stochastic relay gains and a simple stability proof for the static case is presented. Simulation tests demonstrate the performance of the strategy for optimizing both static and dynamic systems
Physics-based Digital Twins for Integrated Thermal Energy Systems Using Active Learning
Real-time supervisory control of thermal energy distribution systems requires digital twins that are accurate, interpretable, and uncertainty-aware, yet remain data and computationally efficient. High-fidelity simulations alone are costly, while purely data-driven surrogates often lack robustness. To address these challenges, this work proposes an active learning (AL) framework that couples system-level Modelica simulations with four simpler physics-informed and data-driven surrogate modeling approaches: deterministic Sparse Identification of Nonlinear Dynamics with Control (SINDyC), its probabilistic multivariate-Gaussian extension (MvG-SINDyC), feedforward neural network (FNN), and gated recurrent unit (GRU) network. Tailored to each surrogate, model-specific AL query strategies are employed, including Mahalanobis-distance sampling in coefficient space for MvG-SINDyC and error-based sampling in prediction space for SINDyC, FNN, and GRU, allowing the learning process to prioritize dynamically informative trajectories. The proposed approach is demonstrated on the glycol heat exchanger (GHX) subsystem of the Thermal Energy Distribution System (TEDS) at Idaho National Laboratory. Across key GHX outputs--the bypass mass flow rate $\dot{m}_{\mathrm{GHX}}$ and heat transfer rate $Q_{\mathrm{GHX}}$-the AL framework achieves comparable predictive accuracy using as few as one-fifth of the simulation trajectories required by random sampling. Among the evaluated surrogates, the GRU achieves the highest predictive fidelity, while SINDyC remains the most computationally efficient and interpretable. The probabilistic MvG-SINDyC surrogate further enables uncertainty quantification and exhibits the largest computational gains under AL.
comment: 23 pages, 12 figures, and 2 tables
Shared Situational Awareness Using Hybrid Zonotopes with Confidence Metric
Situational awareness for connected and automated vehicles describes the ability to perceive and predict the behavior of other road-users in the near surroundings. However, pedestrians can become occluded by vehicles or infrastructure, creating significant safety risks due to limited visibility. Vehicle-to-everything communication enables the sharing of perception data between connected road-users, allowing for a more comprehensive awareness. The main challenge is how to fuse perception data when measurements are inconsistent with the true locations of pedestrians. Inconsistent measurements can occur due to sensor noise, false positives, or unmodeled disturbances. This paper employs set-based estimation with constrained zonotopes to compute a confidence metric for the measurement set from each sensor. Estimated sets and their confidences are then fused using hybrid zonotopes. This method can account for inconsistent measurements, enabling reliable and robust fusion of the sensor data. The effectiveness of the proposed method is demonstrated in both simulation and real experiments.
A Measure-Theoretic Formulation of Behavioral Systems
In Willems' behavioral systems theory, a dynamical system is identified with the set of all trajectories compatible with its laws of motion. In the linear time-invariant setting this trajectory set is a linear subspace, and its algebraic structure underpins the Fundamental Lemma: a single persistently exciting data trajectory generates the entire finite-horizon behavior. For nonlinear or stochastic systems, however, the admissible trajectory set is generally nonconvex, obstructing direct optimization over the behavior. In this paper, we lift the behavioral viewpoint from trajectories to probability measures on trajectories by representing a finite-horizon dynamical system with the set of all Borel probability measures supported on its admissible trajectories. For deterministic systems, this behavioral-measure set is convex and weakly closed even when the dynamics are nonlinear, because convex combinations of trajectory distributions remain dynamically admissible even when convex combinations of trajectories do not. Its extreme points are precisely the Dirac masses on individual admissible trajectories, so the classical deterministic theory is embedded as the extremal skeleton of the richer measure-valued object. On this foundation we establish two core deterministic results and outline a stochastic extension based on history-conditional kernel consistency.
comment: 29 pages, 2 figures. Corrected proofs from previous version
NEO-Grid: A Neural Approximation Framework for Optimization and Control in Distribution Grids
The rise of distributed energy resources (DERs) is reshaping modern distribution grids, introducing new challenges in attaining voltage stability under dynamic and decentralized operating conditions. This paper presents NEO-Grid, a unified learning-based framework for volt-var optimization (VVO) and volt-var control (VVC) that leverages neural network surrogates for power flow and deep equilibrium models (DEQs) for closed-loop control. Our method replaces traditional linear approximations with piecewise-linear ReLU networks trained to capture the nonlinear relationship between power injections and voltage magnitudes. For control, we model the recursive interaction between voltage and inverter response using DEQs, allowing direct fixed-point computation and efficient training via implicit differentiation. We evaluated NEO-Grid on the IEEE 33-bus system, demonstrating that it significantly improves voltage regulation performance compared to standard linear and heuristic baselines in both optimization and control settings. Our results establish NEO-Grid as a scalable, accurate, and interpretable solution for learning-based voltage regulation in distribution grids.
Approximation-Free Control Barrier Functions for Prescribed-Time Reach-Avoid of Unknown Systems
We study the prescribed-time reach-avoid (PT-RA) control problem for nonlinear systems with unknown dynamics operating in environments with moving obstacles. Unlike robust or learning based Control Barrier Function (CBF) methods, the proposed framework requires neither online model learning nor uncertainty bound estimation. A CBF-based Quadratic Program (CBF-QP) is solved on a simple virtual system to generate a safe reference satisfying PT-RA conditions with respect to time-varying, tightened obstacle and goal sets. The true system is confined to a Virtual Confinement Zone (VCZ) around this reference using an approximation-free feedback law. This construction guarantees real-time safety and prescribed-time target reachability under unknown dynamics and dynamic constraints without explicit model identification or offline precomputation. Simulation results illustrate reliable dynamic obstacle avoidance and timely convergence to the target set.
Herd Behavior in Decentralized Balancing Models: A Case Study in Belgium
In a decentralized balancing model, Balance Responsible Parties (BRPs) are encouraged by the Transmission System Operator (TSO) to deviate from their schedule to help the system restore balance, also referred to as implicit balancing. This could reduce balancing costs for the grid operator and lower the entry barrier for flexible assets compared to explicit balancing services. However, these implicit reactions may overshoot when their total capacity is high, potentially requiring more explicit activations. This study analyses the effect of increased participation in the decentralized balancing model in Belgium. To this end, we develop a market simulator that produces price signals on minute-level and simulate the implicit reactions for battery assets with different risk profiles. Besides the current price formula, we also study two potential candidates for the near-term presented by the TSO. A simulation study is conducted using Belgian market data for the year 2023. The findings indicate that, while having a significant positive effect on the balancing costs at first, the risk of overshoots can outweigh the potential benefits when the total capacity of the implicit reactions becomes too large. Furthermore, even when the balancing costs start to increase for the TSO, BRPs were still found to benefit from implicit balancing.
On Fast Attitude Filtering Using Matrix Fisher Distributions with Stability Guarantee
This paper addresses two interrelated problems of the nonlinear filtering mechanism and fast attitude filtering with the matrix Fisher distribution (MFD) on the special orthogonal group. By analyzing the distribution evolution along Bayes' rule, we reveal two essential properties that enhance the performance of Bayesian attitude filters with MFDs, particularly in challenging conditions. Benefiting from the new understanding of the filtering mechanism associated with MFDs, two closed-form filters with MFDs are then proposed. These filters avoid the burdensome computations in previous MFD-based filters by introducing linearized error systems with right-invariant errors but retaining the two advantageous properties. The proposed filter with right-invariant error is proven to be almost globally asymptotically stable for any trajectory on $SO(3)$ leveraging its closed-form iteration and global uncertainty representation with MFDs. Moreover, we further prove the local exponential stability of the filter for single-axis rotations to reveal the effect of the two properties on the convergence rate. These stability results support the performance of the proposed filter with large initial error from a theoretical viewpoint, which to our knowledge, is not achieved by existing directional statistics-based filters. Numerical simulations demonstrate that proposed filters are as accurate as recent MFD-based Bayesian filters in challenging circumstances but consume far less computation time (about 1/5 to 1/100 of previous MFD-based attitude filters).
A Distributed Primal-Dual Method for Constrained Multi-agent Reinforcement Learning with General Parameterization
This paper proposes a novel distributed approach for solving a cooperative Constrained Multi-agent Reinforcement Learning (CMARL) problem, where agents seek to minimize a global objective function subject to shared constraints. Unlike existing methods that rely on centralized training or coordination, our approach enables fully decentralized online learning, with each agent maintaining local estimates of both primal and dual variables. Specifically, we develop a distributed primal-dual algorithm based on actor-critic methods, leveraging local information to estimate Lagrangian multipliers. We establish consensus among the Lagrangian multipliers across agents and prove the convergence of our algorithm to an equilibrium point, analyzing the sub-optimality of this equilibrium compared to the exact solution of the unparameterized problem. Furthermore, we introduce a constrained cooperative Cournot game with stochastic dynamics as a test environment to evaluate the algorithm's performance in complex, real-world scenarios.
Vibration Damping in Underactuated Cable-suspended Artwork -- Flying Belt Motion Control
This paper presents a comprehensive refurbishment of the interactive robotic art installation Standards and Double Standards by Rafael Lozano-Hemmer. The installation features an array of belts suspended from the ceiling, each actuated by stepper motors and dynamically oriented by a vision-based tracking system that follows the movements of exhibition visitors. The original system was limited by oscillatory dynamics, resulting in torsional and pendulum-like vibrations that constrained rotational speed and reduced interactive responsiveness. To address these challenges, the refurbishment involved significant upgrades to both hardware and motion control algorithms. A detailed mathematical model of the flying belt system was developed to accurately capture its dynamic behavior, providing a foundation for advanced control design. An input shaping method, formulated as a convex optimization problem, was implemented to effectively suppress vibrations, enabling smoother and faster belt movements. Experimental results demonstrate substantial improvements in system performance and audience interaction. This work exemplifies the integration of robotics, control engineering, and interactive art, offering new solutions to technical challenges in real-time motion control and vibration damping for large-scale kinetic installations.
comment: 10 pages, 10 figures
Risk-aware stochastic scheduling of multi-market energy storage systems
Energy storage promotes the integration of renewables by operating with charge and discharge policies that balance an intermittent power supply. A key challenge in this emerging sector is how to optimize the operation of storage assets given future price uncertainties and the need to recover the costs of project finance while ensuring an attractive return on equity and hedging against downside risk. This study investigates the scheduling of energy storage assets under price uncertainty, with a focus on electricity markets. A two-stage stochastic risk-constrained approach is employed, whereby electricity price trajectories or specific power markets are observed, allowing for recourse in the schedule. Conditional value-at-risk is used to quantify risk in the optimization problems; this allows for explicit specification of a probabilistic risk limit. The proposed approach is tested in an integrated hydrogen system (IHS) and a battery energy storage system (BESS). In the joint design and operation context for the IHS, the risk constraint results in large installed unit capacities, increasing capital cost but enabling more inventory to buffer price uncertainty. In both case studies, there is an operational trade-off between risk and expected reward; this is reflected in higher expected costs (or lower expected profits) with increasing risk aversion. Despite the decrease in expected reward (up to 500\$k), both systems exhibit substantial benefits of increasing risk aversion (up to 1.5\$mn) with respect to risk-neutral settings. This work provides a general method to address uncertainties in energy storage scheduling, allowing operators to input their level of risk tolerance on asset decisions.
comment: 49 pages, 11 figures, 7 tables
Data-based Moving Horizon Estimation under Irregularly Measured Data
In this work, we introduce a sample- and data-based moving horizon estimation framework for linear systems. We perform state estimation in a sample-based fashion in the sense that we assume to have only few, irregular output measurements available. This setting is encountered in applications where measuring is expensive or time-consuming. Furthermore, the state estimation framework does not rely on a standard mathematical model, but on an implicit system representation based on measured data. We prove sample-based practical robust exponential stability of the proposed estimator under mild assumptions. Furthermore, we apply the proposed scheme to estimate the states of a gastrointestinal tract absorption system.
comment: Extended online version of IFAC World Congress 2026 paper
Many-vs-Many Missile Guidance via Virtual Targets
This paper presents a novel approach to many-vs-many missile guidance using virtual targets (VTs) generated by a Normalizing Flows-based trajectory predictor. Rather than assigning n interceptors directly to m physical targets through conventional weapon target assignment algorithms, we propose a centralized strategy that constructs n VT trajectories representing probabilistic predictions of maneuvering target behavior. Each interceptor is guided toward its assigned VT using Zero-Effort-Miss guidance during midcourse flight, transitioning to Proportional Navigation guidance for terminal interception. This approach treats many-vs-many engagements as many-vs-distribution scenarios, exploiting numerical superiority (n > m) by distributing interceptors across diverse trajectory hypotheses rather than pursuing identical deterministic predictions. Monte Carlo simulations across various target-interceptor configurations (1-6 targets, 1-8 interceptors) demonstrate that the VT method matches or exceeds baseline straight-line prediction performance by 0-4.1% when n = m, with improvements increasing to 5.8-14.4% when n > m. The results confirm that probabilistic VTs enable effective exploitation of numerical superiority, significantly increasing interception probability in many-vs-many scenarios.
comment: Subsequent investigations showed that the proposed method does not generalize beyond the specific scenario considered in this manuscript
Mitigating Error Accumulation in Continuous Navigation via Memory-Augmented Kalman Filtering ICML 2026
Continuous navigation in complex environments is critical for Unmanned Aerial Vehicle (UAV). However, the existing Vision-Language Navigation (VLN) models follow the dead-reckoning, which iteratively updates its position for the next waypoint prediction, and subsequently construct the complete trajectory. Then, such stepwise manner will inevitably lead to accumulated errors of position over time, resulting in misalignment between internal belief and objective coordinates, which is known as "state drift" and ultimately compromises the full trajectory prediction. Drawing inspiration from classical control theory, we propose to correct for errors by formulating such sequential prediction as a recursive Bayesian state estimation problem. In this paper, we design NeuroKalman, a novel framework that decouples navigation into two complementary processes: a Prior Prediction, based on motion dynamics and a Likelihood Correction, from historical observation. We first mathematically associate Kernel Density Estimation of the measurement likelihood with the attention-based retrieval mechanism, which then allows the system to rectify the latent representation using retrieved historical anchors without gradient updates. Comprehensive experiments on TravelUAV benchmark demonstrate that, with only 10% of the training data fine-tuning, our method clearly outperforms strong baselines and regulates drift accumulation.
comment: ICML 2026 Camera Ready
KAN-Therm: A Lightweight Battery Thermal Model Using Kolmogorov-Arnold Network
A battery management system (BMS) relies on real-time estimation of battery temperature distribution in battery cells to ensure safe and optimal operation of Lithium-ion batteries. However, physical BMS often suffers from memory and computational resource limitations required by high-fidelity models. Temperature estimation of batteries for safety-critical systems using physics-based models on physical BMS can potentially become challenging due to their higher computational time. In contrast, neural network-based approaches offer faster estimation but require greater memory overhead. To address these challenges, we propose Kolmogorov-Arnold network (KAN) based thermal model, KAN-therm, to estimate the core temperature of a cylindrical battery. Unlike traditional neural network architectures, KAN uses learnable nonlinear activation functions that can effectively capture system complexity using relatively lean models. We have compared the memory overhead and estimation time of our model with state-of-the-art neural network and tree-based models to demonstrate the applicability and potential scalability of KAN-therm on a physical BMS.
comment: 16 pages, 9 figures
Assumed Density Filtering and Smoothing with Neural Network Surrogate Models
The Kalman filter and Rauch-Tung-Striebel (RTS) smoother are optimal for state estimation in linear dynamic systems. With nonlinear systems, the challenge consists in how to propagate uncertainty through the state transitions and output function. For the case of a neural network model, we enable accurate uncertainty propagation using a recent state-of-the-art analytic formula for computing the mean and covariance of a deep neural network with Gaussian input. We argue that cross entropy is a more appropriate performance metric than RMSE for evaluating the accuracy of filters and smoothers. We demonstrate the superiority of our method for state estimation on a stochastic Lorenz system and a Wiener system, and find that our method enables more optimal linear quadratic regulation when the state estimate is used for feedback. Code available at https: //github.com/simontheflutist/analytic-moments.
comment: To appear at Learning for Decision and Control 2026
Synchrophasors and Synchrowaveforms for the Distribution Grid: The SoCal 28-Bus Dataset
We provide an open-access dataset of phasor & waveform measurement units (PMUs/WMUs) of a real-world electrical distribution network. The network consists of diverse sets of generation resources (including solar panels, fuel cells, natural gas generators, and utility interconnections), loads (including large-scale electric vehicle charging, data centers, central cooling, offices), topology changes (such as line outages and load transfers), as well as a mixture of single- and three-phase networks. We describe a densely deployed PMU sensor network in a distribution grid, in which all buses with non-zero power injections are measured. This approach enables a range of applications such as state estimation, system identification, power flow optimization, and feedback control, several of which are discussed in this paper. Additionally, we provide a synchronized waveform dataset which allows the analysis of harmonics, transient events, dynamic grid impedance, and stability. Data collection started in 2023 while new data is generated continuously and made available online. A characterization of measurement error is provided. Finally, we provide circuit topology and parameters as a part of the dataset. Together, the circuit and timeseries data offer an opportunity for researchers to develop and test algorithms on a real-world system.
A Review of Community-Centric Power System Resilience: Strategies, Data-Driven Methods, and Techno-Legal Perspectives
This paper presents a comprehensive review of community-centric power system resilience, emphasizing the integration of community-level resilience considerations and techno-legal governance frameworks with engineering-based resilience enhancement strategies and data-driven approaches to address extreme events. Recent large-scale outages have demonstrated that power disruptions can cascade beyond electrical infrastructure and disproportionately affect vulnerable communities, critical services, and interconnected urban systems, highlighting the need for resilience approaches that integrate technical, social, and regulatory dimensions. Within this community-centric perspective, the review first summarizes state-of-the-art strategies for enhancing power system resilience, including network hardening, resource allocation, optimal scheduling, and system reconfiguration techniques, while highlighting the growing role of artificial intelligence (AI) and data-driven analytics in supporting resilience planning and operational decision-making. It then examines the interdependencies between power system resilience and community resilience, addressing socioeconomic and behavioral dimensions, cross-infrastructure interconnections, and the emerging role of resilience hubs. The paper further examines the techno-legal frameworks governing resilient energy systems by comparing the regulatory landscapes of the European Union (EU) and the United States, highlighting key similarities and distinctions that shape resilience planning and implementation. By analyzing state-of-the-art engineering-based, AI-driven, and techno-legal methods for assessing and mitigating the impacts of high-impact, low-probability (HILP) events, the review identifies critical research gaps and outlines promising directions for future investigation.
comment: This paper has been accepted for publication in the Electric Power Systems Research (EPSR) journal
A Performance Bound for the Greedy Algorithm in a Generalized Class of String Optimization Problems
We present a simple performance bound for the greedy scheme in string optimization problems that obtains strong results. Our approach vastly generalizes the group of previously established greedy curvature bounds by Conforti and Cornuéjols (1984). We consider three constants, $α_G$, $α_G'$, and $α_G''$ introduced by Conforti and Cornuéjols (1984), that are used in performance bounds of greedy schemes in submodular set optimization. We first generalize both of the $α_G$ and $α_G''$ bounds to string optimization problems in a manner that includes maximizing submodular set functions over matroids as a special case. We then derive a much simpler and computable bound that allows for applications to a far more general class of functions with string domains. We prove that our bound is superior to both the $α_G$ and $α_G''$ bounds and provide a counterexample to show that the $α_G'$ bound is incorrect under the assumptions in Conforti and Cornuéjols (1984). We conclude with two applications. The first is an application of our result to sensor coverage problems. We demonstrate our performance bound in cases where the objective function is set submodular and string submodular. The second is an application to a social welfare maximization problem with black-box utility functions.
comment: This is the accepted version of the paper for IEEE Transactions on Automatic Control
A LiDAR-Driven Fallback Longitudinal Controller for Safer Following in Sudden Braking Scenarios
Adaptive Cruise Control has seen significant advancements, with Collaborative Adaptive Cruise Control leveraging Vehicle-to-Vehicle communication to enhance coordination and stability. However, the reliance on stable communication channels limits its reliability. Research on reducing information dependencies in Adaptive Cruise Control systems has remained limited, despite its critical role in mitigating collision risks during sudden braking scenarios. This study proposes a novel fallback longitudinal controller that relies solely on LiDAR-based distance measurements and the velocity of a follower vehicle. The controller is designed to be time-independent, ensuring operation in the presence of sensor delays or synchronization issues. Simulation results demonstrate that the proposed controller enables vehicle-following from standstill and prevents collisions during emergency braking, even under minimal onboard information.
Realization of Precise Perforating Using Dynamic Threshold and Physical Plausibility Algorithm for Self-Locating Perforating in Oil and Gas Wells
Accurate depth measurement is critical for targeting designated perforation intervals to maximize hydrocarbon recovery. While next-generation automated wireless perforating techniques reduce reliance on costly surface infrastructure and personnel, they lack the continuous depth correlation provided by conventional wireline cables. Consequently, correlating real-time casing collar locator (CCL) signals with a pre-recorded casing tally is essential for automatic depth determination. However, implementing this measurement remains challenging: downhole instruments must process CCL signals in real-time to identify collar signatures from complex interference, a task severely restricted by the limited computational resources and power budget of high-temperature downhole electronics. To address these constraints, this work proposes the Dynamic Threshold and Physical Plausibility Depth Measurement and Perforation Control (DTPPMP) system. This integrated solution enables in situ depth calibration by correlating CCL signals with the casing tally using lightweight algorithms for dynamic-threshold-based collar recognition and physical plausibility verification. Field tests demonstrate a collar recognition F1 score of 98.6% at a throughput of 1000 Sa/s. Notably, the algorithm requires only 1.5 μs per sample, confirming its computational efficiency and suitability for deployment on resource-constrained, high-temperature downhole platforms.
comment: This work has been submitted to the IEEE for possible publication
Automating the Wildfire Detection and Scheduling Pipeline with Maneuverable Earth Observation Satellites
Wildfires are becoming increasingly frequent, with potentially devastating consequences, including loss of life, infrastructure destruction, and severe environmental damage. Low Earth orbit satellites equipped with onboard sensors can capture critical information relative to active wildfires and enable near real-time detection through machine learning algorithms applied to the acquired data. We propose a framework that automates the complete wildfire detection and satellite scheduling pipeline, entitled the WildFire-applicable Intelligent and Responsive Ensemble for Detection and Scheduling (WildFIRE-DS). This paper develops an algorithm to realize the vision of the WildFIRE-DS as a proof of concept, integrating three key components: wildfire detection in satellite imagery, statistical updating that incorporates data from repeated flyovers, and multi-satellite scheduling optimization. The algorithm enables wildfire detection using convolutional neural networks with sensor fusion techniques, incorporates subsequent flyover information via Bayesian statistics, and schedules a constellation of satellites using the state-of-the-art Reconfigurable Earth Observation Satellite Scheduling Problem. Simulated experiments conducted using real-world wildfire locations and the orbits of operational Earth observation satellites to demonstrate that this autonomous detection and scheduling approach effectively enhances wildfire monitoring capabilities.
comment: 45 pages, Journal of Aerospace Information Systems (Accepted)
Pricing Short-Circuit Current via a Primal-Dual Formulation for Preserving Integrality Constraints
Synchronous Generators (SGs) currently provide important levels of Short-Circuit Current (SCC), a critical ancillary service that ensures line protections trip during short-circuit faults. Given the ongoing replacement of SGs by power-electronics-based generation, which has a hard limit on current injection, it has become relevant to optimize the procurement of SCC services provided by remaining SGs. Pricing this service is, however, challenging due to the integrality constraints in Unit Commitment (UC). Existing methods, e.g., dispatchable pricing and restricted pricing, attempt to address this issue but exhibit limitations in handling non-convexities, resulting in SCC prices that either fail to cover the operating costs of units or lack interpretability. To overcome these pitfalls, we adopt a primal-dual formulation of the SCC-constrained dispatch that preserves the binary UC for effectively computing shadow prices of SCC services. Using a modified IEEE 30-bus system, the proposed method is compared with the previously developed pricing schemes. It is demonstrated that, under the proposed pricing method, revenue-adequate and explicit service prices can be assigned without the need for uplift payments, an advantage that cannot be achieved by other pricing approaches.
Comprehensive Approach to Directly Addressing Estimation Delays in Stochastic Guidance
In realistic pursuit-evasion scenarios, abrupt target maneuvers generate unavoidable periods of elevated uncertainty that result in estimation delays. Such delays can degrade interception performance to the point of causing a miss. Existing delayed-information guidance laws fail to provide a complete remedy, as they typically assume constant and known delays. Moreover, in practice they are fed by filtered estimates, contrary to these laws' foundational assumptions. We present an overarching strategy for tracking and interception that explicitly accounts for time-varying estimation delays. We first devise a guidance law that incorporates two time-varying delays, thereby generalizing prior deterministic formulations. This law is driven by a particle-based fixed-lag smoother that provides it with appropriately delayed state estimates. Furthermore, using semi-Markov modeling of the target's maneuvers, the delays are estimated in real-time, enabling adaptive adjustment of the guidance inputs during engagement. The resulting framework consistently conjoins estimation, delay modeling, and guidance. Its effectiveness and superior robustness over existing delayed-information guidance laws are demonstrated via an extensive Monte Carlo study.
comment: Submitted to journal publication. 48 pages, 12 figures
Switching-time bioprocess control with pulse-width-modulated optogenetics
Biotechnology can benefit from dynamic control to improve production efficiency. In this context, optogenetics enables modulation of gene expression using light as an external input, allowing fine-tuning of protein levels to unlock dynamic metabolic control and regulation of cell growth. Optogenetic systems can be actuated by light intensity. However, relying solely on intensity-driven control (i.e., signal amplitude) may fail to properly tune optogenetic bioprocesses when the dose-response relationship (i.e., light intensity versus gene-expression strength) is steep. In these cases, tunability is effectively constrained to either fully active or fully repressed gene expression, with little intermediate regulation. Pulse-width modulation can alleviate this issue by alternating between fully ON and OFF light intensity within forcing periods, thereby smoothing the average response and enhancing process controllability. Optimizing pulse-width-modulated optogenetics entails a switching-time optimal control problem with a binary input over multiple forcing periods. While this can be formulated as a mixed-integer optimization problem on a refined control grid with monotonic input constraints, the number of decision variables can grow rapidly with increasing control-grid resolution within forcing periods and with the total number of forcing periods, complicating the task. Here, we propose an alternative solution based on reinforcement learning. We parametrize control actions via the duty cycle, a continuous proxy variable that encodes the ON-to-OFF switching time within each forcing period, thereby respecting the intrinsic binary nature of the light intensity while avoiding fine-grid binary decision variables.
comment: Accepted conference paper: IFAC World Congress 2026
Robotics
A Closed-Form Dual-Barrier CBF Safety Filter for Holonomic Robots on Incrementally Built Occupancy Grid Maps
We present a dual-barrier control barrier function (CBF) safety filter for real-time, safety-critical velocity control of holonomic robots operating in incrementally built occupancy grid maps. As a robot explores an unknown environment, unmapped regions introduce irreducible uncertainty, since obstacle geometry beyond the explored frontier is unknown, making entry into such regions a source of collision risk, especially with front-facing sensors. To address this, we enforce two constraints: avoidance of mapped obstacles and restriction from unexplored regions. Both constraints are derived analytically from the occupancy grid's signed distance field, yielding a closed-form safety filter that requires only a small linear system solve per cycle. On resource-constrained platforms such as the Raspberry Pi, where SLAM and planning already consume significant compute, the low overhead of the proposed filter preserves resources. An adaptive gain schedule relaxes the frontier constraint in information-rich regions and tightens it in well-mapped areas, improving exploration efficiency while maintaining safety. The filter operates in velocity space as a minimally invasive correction and composes with arbitrary nominal controllers, including learning-based methods. Hardware flight experiments on a PX4-controlled quadrotor demonstrate zero collisions across multiple indoor runs.
When Life Gives You BC, Make Q-functions: Extracting Q-values from Behavior Cloning for On-Robot Reinforcement Learning
Behavior Cloning (BC) has emerged as a highly effective paradigm for robot learning. However, BC lacks a self-guided mechanism for online improvement after demonstrations have been collected. Existing offline-to-online learning methods often cause policies to replace previously learned good actions due to a distribution mismatch between offline data and online learning. In this work, we propose Q2RL, Q-Estimation and Q-Gating from BC for Reinforcement Learning, an algorithm for efficient offline-to-online learning. Our method consists of two parts: (1) Q-Estimation extracts a Q-function from a BC policy using a few interaction steps with the environment, followed by online RL with (2) Q-Gating, which switches between BC and RL policy actions based on their respective Q-values to collect samples for RL policy training. Across manipulation tasks from D4RL and robomimic benchmarks, Q2RL outperforms SOTA offline-to-online learning baselines on success rate and time to convergence. Q2RL is efficient enough to be applied in an on-robot RL setting, learning robust policies for contact-rich and high precision manipulation tasks such as pipe assembly and kitting, in 1-2 hours of online interaction, achieving success rates of up to 100% and up to 3.75x improvement against the original BC policy. Code and video are available at https://pages.rai-inst.com/q2rl_website/
ConsisVLA-4D: Advancing Spatiotemporal Consistency in Efficient 3D-Perception and 4D-Reasoning for Robotic Manipulation CVPR 2026
Current Vision-Language-Action (VLA) models primarily focus on mapping 2D observations to actions, but exhibit notable limitations in spatiotemporal perception and reasoning: 1) spatial representations often rely on additional sensors, introducing substantial computational overhead; 2) visual reasoning is typically limited to future-frame prediction, lacking alignment with the instruction-grounded scene and thus compromising spatiotemporal consistency. To address these challenges, we propose ConsisVLA-4D, a unified and efficient framework that enhances spatiotemporal consistency in 3D perception and 4D reasoning. Specifically, we design: 1) CV-Aligner, which ensures cross-view object semantic consistency by filtering instruction-relevant regions and aligning object identities across multiple viewpoints; 2) CO-Fuser, which guarantees cross-object spatial geometric consistency by eliminating spatial relation ambiguities between objects across views using compact latent representations. Building upon these, we introduce 3) CS-Thinker to achieve cross-scene spatiotemporal consistency as actions unfold. It learns implicit knowledge of local dynamics from object-semantic tokens of CV-Aligner and global depth from geometric tokens of CO-Fuser, thereby enhancing efficient visual reasoning under scene variations. Extensive experiments demonstrate that, benefiting from its efficient spatiotemporal consistency design, ConsisVLA-4D achieves 21.6% and 41.5% performance improvements, along with 2.3-fold and 2.4-fold inference speedups compared to OpenVLA on the LIBERO benchmark and real-world platforms, respectively.ConsisVLA-4D is open-sourced and publicly available at
comment: Accepted to CVPR 2026, Project Page: https://github.com/iLearn-Lab/CVPR26-ConsisVLA-4D
LineRides: Line-Guided Reinforcement Learning for Bicycle Robot Stunts
Designing reward functions for agile robotic maneuvers in reinforcement learning remains difficult, and demonstration-based approaches often require reference motions that are unavailable for novel platforms or extreme stunts. We present LineRides, a line-guided learning framework that enables a custom bicycle robot to acquire diverse, commandable stunt behaviors from a user-provided spatial guideline and sparse key-orientations, without demonstrations or explicit timing. LineRides handles physically infeasible guidelines using a tracking margin that permits controlled deviation, resolves temporal ambiguity by measuring progress via traveled distance along the guideline, and disambiguates motion details through position- and sequence-based key-orientations. We evaluate LineRides on the Ultra Mobility Vehicle (UMV) and show that the policy trained with our methods supports seamless transitions between normal driving and stunt execution, enabling five distinct stunts on command: MiniHop, LargeHop, ThreePointTurn, Backflip, and DriftTurn.
comment: Published in IEEE Robotics and Automation Letters (RA-L), 2026
Driver-WM: A Driver-Centric Traffic-Conditioned Latent World Model for In-Cabin Dynamics Rollout
Safe L2/L3 driving automation requires anticipating human-in-the-loop reactions during shared-control transitions. While most driving world models forecast the external environment, in-cabin intelligence remains strictly recognition-oriented and lacks multi-step rollout capabilities for driver dynamics. We introduce Driver-WM, a driver-centric latent world model that rolls out in-cabin dynamics causally conditioned on out-cabin traffic context. This formulation unifies physical kinematics forecasting with auxiliary behavioral and emotional semantic recognition. Operating in a compact latent space constructed from frozen vision-language features, Driver-WM adopts a dual-stream architecture to separately encode external traffic and internal driver states. These streams are directionally coupled via a gated causal injection mechanism, which uses a learned vector gate to modulate external contextual perturbations while strictly enforcing temporal causality. Evaluations on a multi-task assistive driving benchmark demonstrate that Driver-WM yields robust long-horizon geometric forecasting for reactive high-motion maneuvers and improves semantic alignment for both driver and traffic states. Finally, the explicit external-to-internal conditioning allows for controlled test-time interventions to systematically analyze mechanism responses.
Reduced-order Neural Modeling with Differentiable Simulation for High-Detail Tactile Perception
Tactile perception is key to dexterous manipulation, yet simulating high-resolution elastomer deformation remains computationally prohibitive. Finite element methods (FEM) deliver high fidelity but demand costly remeshing, while Material Point Methods (MPM) suffer from heavy particle-memory tradeoffs. We propose a {reduced-order neural simulation framework} that couples coarse-grained MPM dynamics with an implicit neural decoder to reconstruct sub-particle tactile details from compact latent states. The framework learns a continuous deformation manifold from paired high- and low-resolution simulations, enabling physically consistent, differentiable inference. Compared to the TacIPC, our method achieves over 65\% faster simulation and {40\% lower memory usage}, while maintaining better geometric fidelity. In tactile rendering and 3D surface reconstruction, our methods further improve accuracy by 25\% and produce realistic depth images and surface mesh within a faster inference speed. These results demonstrate that the proposed reduced-order neural model enables high-detail, physically grounded tactile simulation with substantial efficiency gains for robotic interaction and optimization.
comment: IEEE RoboSoft 2026
Position: Embodied AI Requires a Privacy-Utility Trade-off ICML 2026
Embodied AI (EAI) systems are rapidly transitioning from simulations into real-world domestic and other sensitive environments. However, recent EAI solutions have largely demonstrated advancements within isolated stages such as instruction, perception, planning and interaction, without considering their coupled privacy implications in high-frequency deployments where privacy leakage is often irreversible. This position paper argues that optimizing these components independently creates a systemic privacy crisis when deployed in sensitive settings, thereby advancing the position that privacy in EAI is a life cycle-level architectural constraint rather than a stage-local feature. To address these challenges, we propose Secure Privacy Integration in Next-generation Embodied AI (SPINE), a unified privacy-aware framework that treats privacy as a dynamic control signal governing cross-stage coupling throughout the entire EAI life cycle. SPINE decomposes the EAI pipeline into various stages and establishes a multi-criterion privacy classification matrix to orchestrate contextual sensitivity across stage boundaries. We conduct preliminary simulation and real-world case studies to conceptually validate how privacy constraints propagate downstream to reshape system behavior, illustrating the insufficiency of fragmented privacy patches and motivating future research directions into secure yet functional embodied AI systems. We detail the SPINE framework and case studies at https://github.com/rminshen03/EAI_Privacy_Position.
comment: Accepted at ICML 2026. 10 pages, 3 figures
Modular Reinforcement Learning For Cooperative Swarms
A cooperative robot swarm is a collective of computationally-limited robots that share a common goal. Each robot can only interact with a small subset of its peers, without knowing how this affects the collective utility. Recent advances in distributed multi-agent reinforcement learning have demonstrated that it is possible for robots to learn how to interact effectively with others, in a manner that is aligned with the common goal, despite each robot learning independently of others. However, this requires each robot to represent a potentially combinatorial number of interaction states, challenging the memory capabilities of the robots. This paper proposes an alternative approach for representing spatial interaction states for multi-robot reinforcement learning in swarms. A modular (decomposed) representation is used, where each feature of the state is handled by a separate learning procedure, and the results aggregated. We demonstrate the efficacy of the approach in numerous experiments with simulated robot swarms carrying out foraging.
Koopman Identification of Nonlinear Systems via Reservoir Liftings
Learning tractable linear representations of nonlinear dynamical systems via Koopman operator theory is often hindered by dictionary selection, temporal memory encoding, and numerical ill-conditioning. Inspired by Reservoir Computing (RC) paradigm, this paper introduces the RC-Koopman framework, which interprets reservoir as a stateful, finite-dimensional Koopman dictionary whose temporal depth is explicitly controlled by its spectral radius. We show that the Echo State Property (ESP) guarantees well-posedness and favorable numerical conditioning of the lifted Koopman approximation. A correlation-based spectral radius selection algorithm aligns reservoir memory with dominant system timescales. Analysis reveals how the finite memory of the reservoir determines which Koopman eigenfunctions remain observable from the lifted features. Evaluation on synthetic benchmarks demonstrates that RC-Koopman achieves a favorable balance between reconstruction accuracy of the underlying nonlinear dynamics and dynamical stability, compared to Extended Dynamic Mode Decomposition (EDMD) and Hankel-based lifting approaches. Code available at: https://github.com/NEAR-the-future/RC-Koopman.git
Optimal Uncertainty-Aware Calibration for the AX=YB Problem
This article proposes a general optimization framework for solving hand-eye calibration problem. Unlike traditional methods, an iterative algorithm based on Lie algebra that achieves approximately global optimal solutions is developed. During the optimization process, the method strictly preserves the structural constraints of the calibration parameters and enables synchronized updates between calibration parameters. Recognizing that data used in real-word hand-eye calibration often contain uncertainty, especially in over-loading and large workspace industrial robot scenarios, which can significantly degrade accuracy, and accurately modeling such uncertainty is inherently difficult, this article avoids explicit uncertainty modeling. Instead, an uncertainty metric to evaluate the relative uncertainty between data sources is introduced and used to dynamically refine the iterative process. To further enhance convergence efficiency, an effective initial solution generation method that improves overall stability and accuracy is designed. Numerical simulations and real-world experiments validate the effectiveness of the proposed approach, and in synthetic datasets, the proposed approach improves the estimation accuracy by at least 67\% under high-uncertainty conditions compared with the existing methods.
comment: 23 pages, 26 figures, under review in IJRR
Dr-PoGO: Direct Radar Pose-Graph Optimization ICRA 2026
This paper introduces Dr-PoGO, a method for Simultaneous Localization And Mapping (SLAM) using a 2D spinning radar. Unlike cameras or lidars that require line-of-sight, millimetre-wave radars can `see' through dust, falling snow, rain, etc. Accordingly, it is a great modality for robust perception regardless of the weather conditions. While most existing radar-based SLAM methods rely on the extraction of point clouds or features to perform ego-motion estimation, Dr-PoGO leverages direct registration techniques for odometry (DRO) and loop-closure registration. An off-the-shelf radar-focused place recognition algorithm, RaPlace, provides loop-closure candidates. As RaPlace does not provide relative transformations, Dr-PoGO introduces a coarse-to-fine registration that uses visual features and descriptors to obtain an initial guess for the direct transformation refinement. The global trajectory is optimized in a pose-graph optimization. Dr-PoGO demonstrates state-of-the-art performance over 300km of data in various real-world automotive environments. Our implementation is publicly available: https://github.com/utiasASRL/dr_pogo.
comment: Accepted for presentation at ICRA 2026 Cite as @inproceedings{legentil2026drpogo, title={Dr-PoGO: Direct Radar Pose-Graph Optimization}, author={{Le Gentil}, Cedric and Weican, Li and Brizi, Leonardo and Barfoot, Timothy D.}, booktitle={IEEE International Conference on Robotics and Automation (ICRA)}, year={2026} }
Gaze4HRI: Zero-shot Benchmarking Gaze Estimation Neural-Networks for Human-Robot Interaction
While zero-shot appearance-based 3D gaze estimation offers significant cost-efficiency by directly mapping RGB images to gaze vectors, its reliability in Human-Robot Interaction (HRI) settings remains uncertain. Existing benchmarks frequently overlook fundamental HRI conditions, such as dynamic camera viewpoints and moving targets in video. Furthermore, current cross-dataset evaluations often suffer from a complexity gap, where methods trained on diverse datasets are tested on significantly smaller and less varied sets, failing to assess true robustness. To bridge these gaps, we introduce Gaze4HRI, a large-scale dataset (50+ subjects, 3,000+ videos, 600,000+ frames) designed to evaluate state-of-the-art performance against critical HRI variables: illumination, head-gaze conflict, as well as the motion of camera and gaze target in video. Our benchmark reveals that all evaluated methods fail in at least one condition, identifying steeply-downward gaze as a universal failure point. Notably, PureGaze trained on the ETH-X-Gaze dataset uniquely maintains resilience across all other conditions. These results challenge the recent focus in the literature on complex spatial-temporal modeling and Transformer-based architectures. Instead, our findings suggest that extensive data diversity, as exemplified by the ETH-X-Gaze dataset, serves as the primary driver of zero-shot robustness in unconstrained environments, while resilience-enhancing frameworks, such as PureGaze's self-adversarial loss for gaze feature purification, provide a substantial further improvement. Ultimately, this study establishes a rigorous benchmark that provides practical guidelines for practitioners as well as reshaping future research. The dataset and codes are available at https://gazeforhri.github.io.
comment: Accepted to the 2026 IEEE International Conference on Automatic Face and Gesture Recognition (FG 2026)
3D Printing of Passively Actuated Self-Folding Robots with Integrated Functional Modules ICRA 2026
We introduce an elastic-driven self-folding approach that fabricates robots directly from flat 3D-printed conductive PLA nets. Elastic bands routed through printed hooks store energy that folds the sheet into programmed 3D geometries, while the flat state allows accurate placement of electronics and magnets before deployment. The same substrate doubles as electrodes for capacitive touch and supports a reusable platform I/O palette with Hall sensors and eccentric rotating mass (ERM) motors for docking detection and vibration actuation. We also derive a closed-form folding model that balances hinge stiffness with elastic band moment to predict equilibrium fold angles; experiments validate the model and yield a design map linking hinge thickness, band size, and hook spacing to target angles. Using this workflow we realize multiple polyhedral modules and demonstrate three applications: a cube that highlights the potential of self-folding for scalable modular robot collectives, a deployable gripper, and a tendon-driven finger. The method is low cost, stimulus-free, and integrates actuation and sensing.
comment: 8pages, 10 figures, This paper is accepted in ICRA 2026
ELVIS: Ensemble-Calibrated Latent Imagination for Long-Horizon Visual MPC
A central challenge of visual control with model-based reinforcement learning (RL) is reliable long-horizon planning: long rollouts with learned latent dynamics exhibit branching futures and multi-modal action-value distributions. In addition, compounding model errors amplified by visual occlusions make deep imagination brittle. We present ELVIS, a latent model predictive controller (MPC) designed to make long-horizon planning practical. ELVIS plans in a Dreamer-style recurrent state space model (RSSM) and replaces standard unimodal model predictive path integral (MPPI) with a Gaussian-mixture MPPI that maintains multiple coherent hypotheses over long horizons, avoiding mode averaging under branching rollouts. In parallel, ELVIS stabilizes deep imagination with a shared uncertainty-aware lambda-return: an ensemble of latent critics defines an upper-confidence-bound (UCB) score that gates a time-varying lambda, adaptively trading off bootstrapping versus look-ahead to limit compounding error during planning. The same return is used both to train an actor-critic prior from imagined rollouts and to score candidate trajectories inside GMM-MPPI, aligning RL objectives with the planner's long-horizon optimization. On fourteen DeepMind Control Suite visual tasks, ELVIS establishes state-of-the-art performance compared with TD-MPC2 and DreamerV3. Finally, ELVIS transfers zero-shot to a real-world sand-spraying task with severe occlusions, improving surface-quality metrics and demonstrating robustness beyond simulation.
From Pixels to Tokens: A Systematic Study of Latent Action Supervision for Vision-Language-Action Models
Latent actions serve as an intermediate representation that enables consistent modeling of vision-language-action (VLA) models across heterogeneous datasets. However, approaches to supervising VLAs with latent actions are fragmented and lack a systematic comparison. This work structures the study of latent action supervision from two perspectives: (i) regularizing the trajectory via image-based latent actions, and (ii) unifying the target space with action-based latent actions. Under a unified VLA baseline, we instantiate and compare four representative integration strategies. Our results reveal a formulation-task correspondence: image-based latent actions benefit long-horizon reasoning and scene-level generalization, whereas action-based latent actions excel at complex motor coordination. Furthermore, we find that directly supervising the VLM with discrete latent action tokens yields the most effective performance. Finally, our experiments offer initial insights into the benefits of latent action supervision in mixed-data, suggesting a promising direction for VLA training. Code is available at https://github.com/RUCKBReasoning/From_Pixels_to_Tokens.
AI-Aided Advancements in Autonomous Underwater Vehicle Navigation
Autonomous underwater vehicles (AUVs) have become indispensable for deep-sea exploration, spanning critical scientific research and commercial applications. The rapid attenuation of electromagnetic waves renders satellite radio signals unavailable, while the dynamic unpredictability of the marine environment presents formidable navigation challenges. This chapter explores recent advancements in AI-aided AUV positioning, specifically focusing on advanced sensor fusion architectures that integrate inertial navigation systems with Doppler velocity logs and cameras. Beyond traditional model-based filtering, we examine the transformative emergence of AI-driven learning approaches in enhancing inertial dead-reckoning tasks and adaptive fusion algorithms. By addressing these recent milestones, this chapter provides a comprehensive roadmap for achieving the high-precision navigation essential for autonomous underwater missions.
From Reach to Insert: Tactile-Augmented Precision Assembly under Sub-Millimeter Tolerances
High-precision assembly frequently involves tight-tolerance insertions, where even slight pose errors can cause jamming or excessive interaction forces, making robust and safe insertion policies difficult to obtain. This paper proposes a tactile-augmented two-stage method that combines Imitation Learning (IL) and Reinforcement Learning (RL) for precision insertion tasks. In the first stage, IL learns a reaching policy with position generalization that grasps the peg and brings it to the vicinity of the target region. In the second stage, RL executes the insertion and enables recovery from failures during contact-rich interactions. To better exploit tactile feedback, we introduce tactile group sampling to increase coverage of critical contact segments during training, and design a tactile critic to more accurately evaluate policy values, improving insertion performance while maintaining low contact forces. We conduct systematic experiments across five hole geometries and three clearance settings. Results show that our method substantially improves insertion performance across all settings; under the most challenging 0.05\,mm clearance, it achieves a 67\% success rate while keeping contact forces low, reducing the maximum interaction force by 60\% and torque by 44\%, thereby validating both effectiveness and safety for precision assembly.
comment: 8 pages, 9 figures
ReflectDrive-2: Reinforcement-Learning-Aligned Self-Editing for Discrete Diffusion Driving
We introduce ReflectDrive-2, a masked discrete diffusion planner with separate action expert for autonomous driving that represents plans as discrete trajectory tokens and generates them through parallel masked decoding. This discrete token space enables in-place trajectory revision: AutoEdit rewrites selected tokens using the same model, without requiring an auxiliary refinement network. To train this capability, we use a two-stage procedure. First, we construct structure-aware perturbations of expert trajectories along longitudinal progress and lateral heading directions and supervise the model to recover the original expert trajectory. We then fine-tune the full decision--draft--reflect rollout with reinforcement learning (RL), assigning terminal driving reward to the final post-edit trajectory and propagating policy-gradient credit through full-rollout transitions. Full-rollout RL proves crucial for coupling drafting and editing: under supervised training alone, inference-time AutoEdit improves PDMS by at most $0.3$, whereas RL increases its gain to $1.9$. We also co-design an efficient reflective decoding stack for the decision--draft--reflect pipeline, combining shared-prefix KV reuse, Alternating Step Decode, and fused on-device unmasking. On NAVSIM, ReflectDrive-2 achieves $91.0$ PDMS with camera-only input and $94.8$ PDMS in a best-of-6 oracle setting, while running at $31.8$ ms average latency on NVIDIA Thor.
Active Contact Sensing for Robust Robot-to-Human Object Handover
Robot-to-human object handover is an essential skill for robot assistants, from serving drinks at home to passing surgical tools in the operating room. We expect robots to perform handover robustly -- to release the object only after a firm human grasp while ignoring incidental touches. Existing passive-sensing methods struggle to generalize across diverse objects and human behaviors, as they lack informative perturbations to disambiguate different contact conditions, such as firm grasp versus incidental touch. We propose an active sensing approach for robust handovers: the robot applies information-gathering motions and senses the resulting human-applied forces to infer the contact state. A firm grasp produces forces in multiple directions, while an accidental touch does not. To capture this distinction, we model the contact state with a Bayesian linear model: a distribution over piecewise-linear mappings from robot motions to human-applied forces. This model enables firm grasp detection and active information gathering. In experiments with 12 participants and 30 diverse rigid objects, our method achieved a 97.5% success rate -- over 30% higher than two common baselines.
Right Model, Right Time: Real-Time Cascaded-Fidelity MPC for Bipedal Walking ICRA 2026
This paper presents a multi-phase whole-body model predictive control approach for bipedal walking, combining a detailed whole-body model in the near horizon with a simplified single-rigid-body model in the later prediction steps. This reduces computational complexity while retaining prediction capabilities. The resulting nonlinear optimal control problem is solved using sequential quadratic programming (SQP) in acados. Using a prior specified contact schedule and a target walking speed, the controller optimizes joint torques without depending on prior selected foot step locations. The controller is validated in MuJoCo simulation on the 18-DoF bipedal robot HyPer-2
comment: Accepted to IEEE ICRA 2026 Workshop "2cnd Workshop on Frontiers of Optimization for Robotics"
Dream-MPC: Gradient-Based Model Predictive Control with Latent Imagination
State-of-the-art model-based Reinforcement Learning (RL) approaches either use gradient-free, population-based methods for planning, learned policy networks, or a combination of policy networks and planning. Hybrid approaches that combine Model Predictive Control (MPC) with a learned model and a policy prior to leverage the advantages of both paradigms have shown promising results. However, these approaches typically rely on gradient-free optimization methods, which can be computationally expensive for high-dimensional control tasks. While gradient-based methods are a promising alternative, recent works have empirically shown that gradient-based methods often perform worse than their gradient-free counterparts. We propose Dream-MPC, a novel approach that generates few candidate trajectories from a rolled-out policy and optimizes each trajectory by gradient ascent using a learned world model, uncertainty regularization and amortization of optimization iterations over time by reusing previously optimized actions. Our results on 24 continuous control tasks show that Dream-MPC can significantly improve the performance of the underlying policy and can outperform gradient-free MPC and state-of-the-art baselines. We will open source our code and more at https://dream-mpc.github.io.
Practical validation of synthetic pre-crash scenarios
The representativeness of synthetic pre-crash scenarios is crucial for assessing the safety impact of Driving Automation Systems through virtual simulations. However, a gap remains in the robust evaluation of synthetic pre-crash scenarios' practical equivalence to their real-world counterparts; that is, whether they are similar enough for the intended assessment purpose. Conventional significance testing is inadequate, as it focuses on detecting differences rather than establishing practical equivalence. This study addresses the research gap by extending our previous work on a Bayesian Region of Practical Equivalence (ROPE)-based equivalence testing framework by introducing a binning-based approach to define appropriate statistics and equivalence criteria. Two binning-based statistics are proposed to measure practically meaningful distributional differences between datasets in the context of safety impact assessment. The framework's applicability is demonstrated through a case study, which tests the practical equivalence of two synthetic rear-end pre-crash datasets with a previously developed reference dataset in the context of the safety impact assessment of an Automatic Emergency Braking system. The results show that the framework provides informative quantitative assessments of practical equivalence as well as diagnostic insights into the divergence of datasets. Although the demonstration focuses on rear-end pre-crash scenarios, the framework is generic and extensible to broader validation contexts, providing an interpretable and principled basis for practical equivalence assessment across diverse synthetic data applications.
HDFlow: Hierarchical Diffusion-Flow Planning for Long-horizon Tasks ICML 2026
Recent advances in generative models have shown promise in generating behavior plans for long-horizon, sparse reward tasks. While these approaches have achieved promising results, they often lack a principled framework for hierarchical decomposition and struggle with the computational demands of real-time execution, due to their iterative denoising process. In this work, we introduce Hierarchical Diffusion-Flow (HDFlow), a novel hierarchical planning framework that optimally leverages the strengths of diffusion and rectified flow models to overcome the limitations of single-paradigm generative planners. HDFlow employs a high-level diffusion planner to generate sequences of strategic subgoals in a learned latent space, capitalizing on diffusion's powerful exploratory capabilities. These subgoals then guide a low-level rectified flow planner that generates smooth and dense trajectories, exploiting the speed and efficiency of ordinary differential equation (ODE)-based trajectory generation. We evaluate HDFlow on four challenging furniture assembly tasks in both simulation and real-world, where it significantly outperforms state-of-the-art methods. Furthermore, we also showcase our method's generalizability on two long-horizon benchmarks comprising diverse locomotion and manipulation tasks. Project website: https://hdflow-page.github.io/
comment: ICML 2026 (Spotlight)
Tightly-Coupled Estimation and Guidance for Robust Low-Thrust Rendezvous via Adaptive Homotopy
Minimum-fuel low-thrust rendezvous guidance yields bang-bang control structures highly sensitive to estimation errors, sensor anomalies, and solver regularization, making aggressive closed-loop execution brittle for uncooperative proximity operations. This paper proposes a tightly-coupled estimation and guidance architecture where navigation confidence directly modulates the homotopy parameter of a receding-horizon indirect optimal control solver. Relative motion is modeled in the Clohessy-Wiltshire frame. The translational state is estimated via a linear Kalman filter augmented by a Multiple Tuning Factors (MTF) covariance inflation mechanism that suppresses suspicious innovation directions. A composite score from the normalized innovation and MTF activity is mapped online to the homotopy parameter, allowing the controller to relax toward a smoother, conservative regime when confidence degrades, and recover fuel-efficient bang-bang control as sensing improves. Numerical results under severe measurement degradation show fixed bang-bang guidance remains brittle; both plain-KF and MTF-KF fixed-epsilon controllers yield large terminal miss distances. Conversely, the proposed MTF-adaptive homotopy controller reduces terminal miss by roughly two orders of magnitude, from hundreds of meters to sub-meter levels, requiring only a moderate increase in control effort versus the open-loop fuel-optimal benchmark. A comparison indicates adaptive homotopy is the dominant robustness mechanism, while MTF provides additional accuracy and efficiency improvements. The receding-horizon implementation exhibits consistently fast and reliable solution times, supporting the practical online viability of the proposed method.
CRAFT: Counterfactual-to-Interactive Reinforcement Fine-Tuning for Driving Policies
Open-loop imitation learning has advanced modern autonomous driving policy architectures, but closed-loop deployment remains vulnerable to policy-induced distribution shift. Existing post-training paradigms exhibit fundamental trade-offs: closed-loop RL fine-tuning provides grounded feedback from executed actions but is constrained by the sparsity of informative events, whereas counterfactual fine-tuning provides dense supervision over candidate futures but inherits bias from imperfect future estimates. We introduce Counterfactual-to-Interactive Reinforcement Fine-Tuning (CRAFT), an on-policy framework that formulates closed-loop post-training as proxy-residual optimization. CRAFT uses group-normalized counterfactual advantages as a dense proxy for real closed-loop advantages and aligns this proxy with the closed-loop world through grounded residual correction from interaction-critical events. To stabilize adaptation, CRAFT regularizes the online policy toward an EMA teacher via asymmetric KL self-distillation. Theoretically, CRAFT decomposes the real closed-loop policy gradient into proxy and residual terms under the same visited-state distribution, reducing residual variance with an aligned proxy while mitigating proxy bias through grounded residual approximation. Empirically, CRAFT achieves the strongest closed-loop gains on Bench2Drive across hierarchical planning, vision-language-action, and vocabulary-scoring architectures. Ablations, scaling behavior, stability analyses, and transfer results further validate the complementary roles of dense counterfactual proxy and grounded residual correction. Project page: https://currychen77.github.io/CRAFT.
Autonomous Laparoscope Control through Unified Mechanics-Based Representation of Multimodal Intraoperative Information
Laparoscope-holding robots can provide surgeons with a stable laparoscopic field of view (FOV) and reduce the burden on human assistants. To maintain an ideal intraoperative FOV, the robot must continuously adjust the laparoscope pose according to intraoperative information. However, intraoperative multimodal signals, such as position, force/torque, and images, differ markedly in physical meaning and units, making it difficult to build a unified representation and to generate control commands that can be used directly for laparoscope control. To address this issue, we propose a laparoscope-holding robot control method based on unified mechanics modeling of multimodal information. First, we design mapping strategies for multiple intraoperative sources, including position, force/torque, and images, and unify them into an equivalent-wrench representation in the operational space. Then, using a task-priority scheme, we inject the wrenches into the task space and the null space, respectively, and synthesize laparoscope control commands via task-priority projection, thereby achieving consistent representation and coordinated fusion of multimodal information within a single framework. Finally, taking the intraoperative remote center of motion (RCM) position, force/torque sensor readings, and laparoscopic images as examples, we construct an RCM-constraint wrench to enforce the RCM geometric constraint and reduce the contact force at the trocar site, a laparoscope-manipulation wrench to enable compliant dragging, and an instrument-tracking wrench to achieve autonomous visual tracking of the instruments. Experiments on a surgical phantom and in vivo porcine trials demonstrate that the proposed method supports multi-task operation, including compliant laparoscope manipulation and autonomous instrument tracking, while maintaining the RCM constraint and reducing sustained trocar-site loading.
Conditional Flow-VAE for Safety-Critical Traffic Scenario Generation ICRA 2026
Safety-critical scenarios are essential for the development of autonomous vehicles (AVs) but are rare in real-world driving data. While simulation offers a way to generate such scenarios, manually designed test cases lack scalability, and adversarial optimization often produces unrealistic behaviors. In this work, we introduce a conditional latent flow matching approach for scalable and realistic safety-critical scenario generation. Our method uses distribution matching to transform nominal scenes into safety-critical rollouts. Furthermore, we demonstrate that incorporating both simulation and real-world data enables our framework to efficiently generate diverse, data-driven scenarios. Experimental results highlight that our approach is able to more consistently and realistically generate novel safety-critical scenarios, making it a valuable tool for training and benchmarking AV systems.
comment: ICRA 2026
Robust $\mathcal{H}_\infty$ Controller Design For INDI-Controlled Quadrotor Using Online Parameter Identification
It has recently been shown that all physical parameters of an Incremental Nonlinear Dynamic Inversion (INDI) controller can be estimated onboard a multirotor within half a second, which is fast enough to do the full identification during a throw in the air. However, a robust method to tune outer loop gains for this feedback-linearizing INDI controller depending on the model parameters is still missing. This work presents the design of a robust gain-scheduled controller for attitude control of quadrotor, using an INDI-based inner loop with online identification of its system parameters. A gain-scheduled cascaded attitude controller with a feedforward filter is synthesized for a symmetric quadrotor using signal-based $\mathcal{H}_\infty$ closed-loop shaping. The resulting controller exhibits good stability margins, with nonlinear simulations confirming effective tracking performance under uncertainty. Experimental evaluation is also conducted through flight tests with full online parameter identification. Even though the identified parameters during these tests are far outside the defined uncertainty range, acceptable flight performance comparable to simulation results is maintained for actuator time constants below 40 ms.
comment: 8 pages, 11 figures, Accepted to the ICUAS 2026 conference
Contact-Free Grasp Stability Prediction with In-Hand Time-of-Flight Sensors
Current approaches to grasp planning for robotics demonstrate high success rates, but degrade with noisy sensors and other factors. Previous works have proposed tactile-based grasp stability classifiers to detect failures, but these approaches rely on making contact and grasping the object to do so. We propose a contact-free grasp stability predictor using multi-zone time-of-flight sensors mounted in the distal links of a gripper. Our method, as it does not require grasping the object to make a prediction, significantly speeds up the stability classification process, cycling at 15 Hz. We collected over 2,500 real-world grasps across 15 objects to train a classifier. Additionally, we conducted grasp attempts over six additional unseen objects, three for validation and model selection, and three for model testing. Our approach demonstrated strong classification performance, with an accuracy of 85.5% on validation and 86.0% on test objects.
Creative Robot Tool Use by Counterfactual Reasoning
We propose a causal reasoning framework for creative robot tool use where a suitable tool for a task is correctly identified for use beyond its primary objectives. The proposed framework first discovers the causal relationships between the tool and the task by conducting simulated experiments in a dynamics model. We decouple the causal discovery problem into two complementary components: VLM-based feature suggestion and counterfactual tool generation via targeted geometric and physical feature perturbations. Then, novel objects are classified based on identified causal features, and the tool use skill is transferred via keypoint matching conditioned on the identified causal features. By reconstructing the task in a dynamics model, our approach grounds tool use in the physics of the problem. We illustrate our approach in reaching a distant object with different sticks, scooping candies from a bowl using diverse items, and using different boxes or crates as stepping platforms to retrieve an object from a high shelf. Our baseline comparisons show that identifying causal features and grounding them in physical tool properties leads to more reliable tool selection and stronger skill keypoint transfer.
comment: Under review
Passive Fault Tolerance through Tension-to-Thrust Feed-Forward: Hybrid Input-to-State Stability for Decentralized Multi-UAV Slung-Load Transport under Abrupt Cable Severance
Abrupt cable severance in multi-UAV slung-load transport redistributes load and changes the active constraint set, leaving limited time for fault diagnosis and reconfiguration. Existing controllers rely on coordinated force allocation, peer-state exchange, or fixed cable topology, and therefore lack a certified decentralized recovery mechanism for unannounced severance. We present a passive architecture that routes each vehicle's measured cable tension directly into its altitude thrust command, $T_i^{\mathrm{ff}}=T_i$, while a surrounding proportional-derivative, anti-swing, and projection cascade preserves local tracking feasibility. The main contribution is a conditional hybrid practical input-to-state-stability certificate that composes a slack-excursion-bounded taut-cable reduction, bounded post-severance Lyapunov jumps, inter-fault decay, and per-fault-cycle contraction $ρ\in (0,1)$ into an explicit recovery envelope under stated actuator, slack, and dwell assumptions. We validate the controller in Drake multibody simulation with five vehicles, a 10 kg payload, Kelvin-Voigt cables, Dryden wind, and single- and dual-severance schedules: the closed loop attains 0.312-0.328 m RMSE, 76.1-95.2 mm peak sag, and recovery within one payload-pendulum period. Disabling the identity inflates cruise error by 34-39% and peak sag by 3.6x-4.0x, identifying local tension feed-forward as the dominant passive recovery mechanism in the tested decentralized cascade.
comment: Submitted for review at IEEE Transactions on Control Systems Technology For the paper and simulation code see: https://github.com/Hadi-Hajieghrary/Tether_Grace.git
Track A*: Fast Visibility-Aware Trajectory Planning for Active Target Tracking
Offline reference trajectories for active target tracking are needed both for building multi-modal tracking datasets and for benchmarking online tracking planners under repeatable conditions. We present Track A star (TA star), an offline search-based trajectory planner that targets the visibility-aware target tracking objective on a discretized four-dimensional spatio-temporal grid (x, y, z, t). TA star combines a layered Directed Acyclic Graph (DAG) search with three engineering optimizations: cross-time obstacle distance caching against a Bounding Volume Hierarchy (BVH), per-layer beam pruning, and a configurable multi-ray visibility evaluator. TA star employs a beam-pruned heuristic search on this discrete graph to efficiently find high-quality tracking trajectories. While it trades strict theoretical optimality for practical scalability, our empirical results demonstrate robust, near-baseline visibility performance at a fraction of the computational cost. On a 1000-scenario stress test across eight CARLA Optimized maps, TA star converges on all scenarios and completes in 45 s using 32 workers; on a 248-scenario controlled comparison against an unoptimized priority-queue A star baseline (BinaryHeap implementation) under identical scenario inputs and a 5 x 10^6 expansion cap, TA star reduces mean planning time by 23.0x and worst-case planning time by 11.8x, while raising convergence from 56.9% to 100%. On the n=141 baseline-converged subset, TA star changes average visibility by only -0.15 percentage points (pp), with no scenario exceeding a 5 pp drop. We position TA star as a practical offline reference planner under these specific conditions, with limitations and failure cases discussed for environments such as Town07 dense vegetation.
Query2Uncertainty: Robust Uncertainty Quantification and Calibration for 3D Object Detection under Distribution Shift CVPR 2026
Reliable uncertainty estimation for 3D object detection is critical for deploying safe autonomous systems, yet modern detectors remain poorly calibrated, especially under distribution shifts. Although post-hoc calibration methods address this issue and provide improved calibration for in-distribution tests, they fail to adapt in distribution-shifted scenarios. In this work, we address this issue and introduce a density-aware calibration method that couples post-hoc calibrators with the feature density of latent object queries from DETR-style 3D object detectors. These queries form a compact, location and class-aware feature, ideal for density estimation, allowing our approach to adjust model confidences in distribution-shift scenarios. By fitting a density estimator on these query features, our approach jointly recalibrates both classification and bounding box regression uncertainties. On both a multi-view camera and LiDAR-based detector, our approach consistently outperforms standard post-hoc methods in both in-distribution and distribution-shifted scenarios. Code available https://tillbeemelmanns.github.io/query2uncertainty/ .
comment: Accepted for publication at CVPR 2026
Modular Lie Algebraic PDE Control of Multibody Flexible Manipulators
This paper addresses PDE-based control for flexible multibody robotic systems, presenting a subsystem-based framework for serial manipulators with arbitrary links in 3D space. The approach uses a screw-theoretic Lie-algebraic model where motion, deformation, and forces are expressed as body-fixed twists and wrenches in se(3). By substituting a strain-based deformation PDE into the dynamics, distributed elastic acceleration is eliminated, yielding a model governed by twist acceleration and the deformation field. Subsystem twist trajectories are generated from task-space endpoints via deflection-compensating inverse kinematics, providing real-time correction for tip deformation. A nominal controller for each link ensures exponential decay of twist errors via a Lyapunov function nu_i. An adaptive modification replaces physical parameters with online estimates, establishing exponential convergence of both twist and parameter errors. Summing over all links, composite Lyapunov functions V = sum(nu_i) and V^a = sum(nu_i^a) yield time derivatives where inter-link interaction power terms telescope to zero. This cancellation is ensured by Newton's third law and the frame invariance of the power pairing on se(3) x se*(3), establishing global exponential convergence of tracking errors. Bounded elastic deformation is guaranteed by an Euler-Bernoulli energy argument. The screw-theoretic structure renders interaction cancellation exact, making the stability certificate modular and scalable to chains of arbitrary length. Numerical simulations demonstrate the scheme's physical consistency.
Stability of Control Lyapunov Function Guided Reinforcement Learning
Reinforcement learning (RL) has become the de facto method for achieving locomotion on humanoid robots in practice, yet stability analysis of the corresponding control policies is lacking. Recent work has attempted to merge control theoretic ideas with reinforcement learning through control guided learning. A notable example of this is the use of a control Lyapunov function (CLF) to synthesize the reinforcement learning rewards, a technique known as CLF-RL, which has shown practical success. This paper investigates the stability properties of optimal controllers using CLF-RL with the goal of bridging experimentally observed stability with theoretical guarantees. The RL problem is viewed as an optimal control problem and exponential stability is proven in both continuous and discrete time using both core CLF reward terms and the additional terms used in practice. The theoretical bounds are numerically verified on systems such as the double integrator and cart-pole. Finally, the CLF guided rewards are implemented for a walking humanoid robot to generate stable periodic orbits.
comment: This work has been submitted to the IEEE for possible publication
Towards Adaptive Humanoid Control via Multi-Behavior Distillation and Reinforced Fine-Tuning
Humanoid robots are promising to learn a diverse set of human-like locomotion behaviors, including standing up, walking, running, and jumping. However, existing methods predominantly require training independent policies for each skill, yielding behavior-specific controllers that exhibit limited generalization and brittle performance when deployed on irregular terrains and in diverse situations. To address this challenge, we propose Adaptive Humanoid Control (AHC) that adopts a two-stage framework to learn an adaptive humanoid locomotion controller across different skills and terrains. Specifically, we first train several primary locomotion policies and perform a multi-behavior distillation process to obtain a basic multi-behavior controller, facilitating adaptive behavior switching based on the environment. Then, we perform reinforced fine-tuning by collecting online feedback in performing adaptive behaviors on more diverse terrains, enhancing terrain adaptability for the controller. We conduct experiments in both simulation and real-world experiments in Unitree G1 robots. The results show that our method exhibits strong adaptability across various situations and terrains. Project website: https://ahc-humanoid.github.io.
The Field of Safe Motion: Operationalizing Affordances in the Field of Safe Travel Using Reachability Analysis
We present the Field of Safe Motion (FSM), a quantitative safety model for determining whether a driver maintains a collision-free escape route, or "out," at any given moment by accounting for that driver's physical capabilities and the foreseeable actions of other road users. The Field of Safe Travel (FST) provides a framework for representing the types of sensory information and actions available to drivers. However, the FST has remained conceptual in nature since its initial publication almost 90 years ago -- and a concrete computational operationalization is still lacking. At the same time, reachability analysis provides a quantitative basis for assessing the possible actions available to road users, using interpretable kinematic models, but reachability models have so far remained confined largely to the engineering and robotics literature. Bringing these two approaches together provides for an interpretable, quantitative tool for assessing driving behavior across a wide range of driving scenarios. Beyond being interpretable, our approach relies on a relatively small set of basic assumptions that are easy to enumerate and reason about. Furthermore, an interpretable reachability model paired with kinematic assumptions provides a way to bound uncertainty about road users' reasonably foreseeable future locations. We demonstrate the applicability of the FSM to different driving scenarios and discuss the strengths and weaknesses of the model.
CLAMP: Contrastive Learning for 3D Multi-View Action-Conditioned Robotic Manipulation Pretraining
Leveraging pre-trained 2D image representations in behavior cloning policies has achieved great success and has become a standard approach for robotic manipulation. However, such representations fail to capture the 3D spatial information about objects and scenes that is essential for precise manipulation. In this work, we introduce Contrastive Learning for 3D Multi-View Action-Conditioned Robotic Manipulation Pretraining (CLAMP), a novel 3D pre-training framework that utilizes point clouds and robot actions. From the merged point cloud computed from RGB-D images and camera extrinsics, we re-render multi-view four-channel image observations with depth and 3D coordinates, including dynamic wrist views, to provide clearer views of target objects for high-precision manipulation tasks. The pre-trained encoders learn to associate the 3D geometric and positional information of objects with robot action patterns via contrastive learning on large-scale simulated robot trajectories. During encoder pre-training, we pre-train a Diffusion Policy to initialize the policy weights for fine-tuning, which is essential for improving fine-tuning sample efficiency and performance. After pre-training, we fine-tune the policy on a limited amount of task demonstrations using the learned image and action representations. We demonstrate that this pre-training and fine-tuning design substantially improves learning efficiency and policy performance on unseen tasks. Furthermore, we show that CLAMP outperforms state-of-the-art baselines across six simulated tasks and five real-world tasks. The project website and videos can be found at https://clamp3d.github.io/CLAMP/.
comment: Accepted to the Robotics: Science and Systems (RSS) 2026
A Universal Large Language Model -- Drone Command and Control Interface
The use of artificial intelligence (AI) for drone control can have a transformative impact on drone capabilities, especially when real world information can be integrated with drone sensing, command, and control, part of a growing field of physical AI. Large language models (LLMs) can be advantageous if trained at scale on general knowledge, but especially and in particular when the training data includes information such as detailed map geography topology of the entire planet, as well as the ability to access real time situational data such as weather. However, challenges remain in the interface between drones and LLMs in general, with each application requiring a tedious, labor intensive effort to connect the LLM trained knowledge to drone command and control. Here, we solve that problem, using an interface strategy that is LLM agnostic and drone agnostic, providing the first universal, versatile, comprehensive and easy to use drone control interface. We do this using the new model context protocol (MCP) standard, an open standard that provides a universal way for AI systems to access external data, tools, and services. We develop and deploy a cloud based Linux machine hosting an MCP server that supports the Mavlink protocol, an ubiquitous drone control language used almost universally by millions of drones including Ardupilot and PX4 framework.We demonstrate flight control of a real unmanned aerial vehicle. In further testing, we demonstrate extensive flight planning and control capability in a simulated drone, integrated with a Google Maps MCP server for up to date, real time navigation information. This demonstrates a universal approach to integration of LLMs with drone command and control, a paradigm that leverages and exploits virtually all of modern AI industry with drone technology in an easy to use interface that translates natural language to drone control.
LaST-R1: Reinforcing Robotic Manipulation via Adaptive Physical Latent Reasoning
Robotic foundation models require reasoning over complex visual scenes to execute adaptive actions in dynamic environments. While recent studies on latent-reasoning Vision-Language-Action (VLA) models have demonstrated the capability to capture fine-grained physical dynamics, they remain predominantly confined to static imitation learning, severely limiting their adaptability and generalization. In this paper, we present LaST-R1, a novel reinforcement learning (RL) post-training framework designed to effectively harness "latent reasoning-before-acting" policies. Specifically, we propose Latent-to-Action Policy Optimization (LAPO), a core RL algorithm that jointly optimizes the latent reasoning process and the action generation. By explicitly embedding latent Chain-of-Thought (CoT) reasoning directly within the RL optimization loop, LAPO stimulates profound physical world modeling, which in turn drives robust execution in interactive environments. Furthermore, an adaptive latent CoT mechanism is introduced, allowing the policy to dynamically modulate its reasoning horizon based on diverse environment states. Experiments show that LaST-R1 achieves a near-perfect 99.9% average success rate on the LIBERO benchmark with only one-shot supervised warm-up, significantly improving convergence speed and performance over prior state-of-the-art (SOTA) methods. In real-world deployments, LaST-R1 yields up to a 22.5% average improvement over SOTA supervised fine-tuning approach across four complex tasks, including both single-arm and dual-arm settings. Finally, LaST-R1 demonstrates strong generalization across simulated and real-world environments.
comment: LaST-R1 Technical Report
Efficient Model-Based Reinforcement Learning for Robot Control via Online Optimization
We present an online model-based reinforcement learning algorithm suitable for controlling complex robotic systems directly in the real world. Unlike prevailing sim-to-real pipelines that rely on extensive offline simulation and model-free policy optimization, our method builds a dynamics model from real-time interaction data and performs policy updates guided by the learned dynamics model. This efficient model-based reinforcement learning scheme significantly reduces the number of samples to train control policies, enabling direct training on real-world rollout data. This significantly reduces the influence of bias in the simulated data, and facilitates the search for high-performance control policies. We adopt online optimization analysis to derive sublinear regret bounds under stochastic online optimization assumptions, providing formal guarantees on performance improvement as more interaction data are collected. Experimental evaluations were performed on a hydraulic excavator arm and a soft robot arm, where the algorithm demonstrates strong sample efficiency compared to model-free reinforcement learning methods, reaching comparable performance within hours. Robust adaptation to shifting dynamics was also observed when the payload condition was randomized. Our approach paves the way toward efficient and reliable on-robot learning for a broad class of challenging control tasks.
Encoding Predictability and Legibility for Style-Conditioned Diffusion Policy
Striking a balance between efficiency and transparent motion is a core challenge in human-robot collaboration, as highly expressive movements often incur unnecessary time and energy costs. In collaborative environments, legibility allows a human observer a better understanding of the robot's actions, increasing safety and trust. However, these behaviors result in sub-optimal and exaggerated trajectories that are redundant in low-ambiguity scenarios where the robot's goal is already obvious. To address this trade-off, we propose Style-Conditioned Diffusion Policy (SCDP), a modular framework that constrains the trajectory generation of a pre-trained diffusion model toward either legibility or efficiency based on the environment's configuration. Our method utilizes a post-training pipeline that freezes the base policy and trains a lightweight scene encoder and conditioning predictor to modulate the diffusion process. At inference time, an ambiguity detection module activates the appropriate conditioning, prioritizing expressive motion only for ambiguous goals and reverting to efficient paths otherwise. We evaluate SCDP on manipulation and navigation tasks, and results show that it enhances legibility in ambiguous settings while preserving optimal efficiency when legibility is unnecessary, all without retraining the base policy.
comment: Accepted to the 18th International Conference on Social Robotics (ICSR 2026)
Pack it in: Packing into Partially Filled Containers Through Contact
The automation of warehouse operations is crucial for improving productivity and reducing human exposure to hazardous environments. One operation frequently performed in warehouses is bin-packing where items need to be placed into containers, either for delivery to a customer, or for temporary storage in the warehouse. Whilst prior bin-packing works have largely been focused on packing items into empty containers and have adopted collision-free strategies, it is often the case that containers will already be partially filled with items, often in suboptimal arrangements due to transportation about a warehouse. This paper presents a contact-aware packing approach that exploits purposeful interactions with previously placed objects to create free space and enable successful placement of new items. This is achieved by using a contact-based multi-object trajectory optimizer within a model predictive controller, integrated with a physics-aware perception system that estimates object poses even during inevitable occlusions, and a method that suggests physically-feasible locations to place the object inside the container.
comment: 8 pages, 5 figures
Scalable Multi Agent Diffusion Policies for Coverage Control
We propose MADP, a novel diffusion-model-based approach for collaboration in decentralized robot swarms. MADP leverages diffusion models to generate samples from complex and high-dimensional action distributions that capture the interdependencies between agents' actions. Each robot conditions policy sampling on a fused representation of its own observations and perceptual embeddings received from peers. To evaluate this approach, we task a team of holonomic robots piloted by MADP to address coverage control-a canonical multi agent navigation problem. The policy is trained via imitation learning from a clairvoyant expert on the coverage control problem, with the diffusion process parameterized by a spatial transformer architecture to enable decentralized inference. We evaluate the system under varying numbers, locations, and variances of importance density functions, capturing the robustness demands of real-world coverage tasks. Experiments demonstrate that our model inherits valuable properties from diffusion models, generalizing across agent densities and environments, and consistently outperforming state-of-the-art baselines.
RLDX-1 Technical Report
While Vision-Language-Action models (VLAs) have shown remarkable progress toward human-like generalist robotic policies through the versatile intelligence (i.e. broad scene understanding and language-conditioned generalization) inherited from pre-trained Vision-Language Models, they still struggle with complex real-world tasks requiring broader functional capabilities (e.g. motion awareness, long-term memory, and physical sensing). To address this, we introduce RLDX-1, a general-purpose robotic policy for dexterous manipulation built on the Multi-Stream Action Transformer (MSAT), an architecture that unifies these capabilities by integrating heterogeneous modalities through modality-specific streams with cross-modal joint self-attention. RLDX-1 further combines this architecture with system-level design choices, including data synthesis for rare manipulation scenarios, learning procedures specialized for human-like manipulation, and inference optimizations for real-time deployment. Through empirical evaluation, we show that RLDX-1 consistently outperforms recent frontier VLAs (e.g. $π_{0.5}$ and GR00T N1.6) across both simulation benchmarks and real-world tasks that require broad functional capabilities beyond general versatility. In particular, RLDX-1 shows superiority in ALLEX humanoid tasks by achieving success rates of 86.8% while $π_{0.5}$ and GR00T N1.6 achieve around 40%, highlighting the ability of RLDX-1 to control a high-DoF humanoid robot under diverse functional demands. Together, these results position RLDX-1 as a promising step toward reliable VLAs for complex, contact-rich, and dynamic real-world dexterous manipulation.
comment: Project page: https://rlwrld.ai/rldx-1
3D Generation for Embodied AI and Robotic Simulation: A Survey
Embodied AI and robotic systems increasingly depend on scalable, diverse, and physically grounded 3D content for simulation-based training and real-world deployment. While 3D generative modeling has advanced rapidly, embodied applications impose requirements far beyond visual realism: generated objects must carry kinematic structure and material properties, scenes must support interaction and task execution, and the resulting content must bridge the gap between simulation and reality. This survey reviews 3D generation for embodied AI and organizes the literature around three roles that 3D generation plays in embodied systems. In Data Generator, 3D generation produces simulation-ready objects and assets, including articulated, physically grounded, and deformable content for downstream interaction; in Simulation Environments, it constructs interactive and task-oriented worlds, spanning structure-aware, controllable, and agentic scene generation; and in Sim2Real Bridge, it supports digital twin reconstruction, data augmentation, and synthetic demonstrations for downstream robot learning and real-world transfer. We also show that the field is shifting from visual realism toward interaction readiness, and we identify the main bottlenecks, including limited physical annotations, the gap between geometric quality and physical validity, fragmented evaluation, and the persistent sim-to-real divide, that must be addressed for 3D generation to become a dependable foundation for embodied intelligence. Our project page is at https://3dgen4robot.github.io.
comment: 27 pages, 11 figures, 8 tables
Overcoming Environmental Meta-Stationarity in MARL via Adaptive Curriculum and Counterfactual Group Advantage
Multi-agent reinforcement learning (MARL) has reached competitive performance on cooperative tasks against scripted adversaries, yet most methods train agents at a single fixed difficulty throughout the entire run. We term this static-difficulty regime environmental meta-stationarity and show that it caps policy generalization and steers learning toward shallow local optima. To break this regime, we propose CL-MARL, a dynamic curriculum learning framework that adapts opponent strength online from win-rate signals, advancing or regressing the task as agents master it. Its scheduler, FlexDiff, fuses momentum-based trend estimation with sliding-window dual-curve monitoring of training and evaluation returns, yielding stable difficulty transitions without manual tuning. Because a moving curriculum amplifies non-stationarity and sparsifies global rewards, we introduce the Counterfactual Group Relative Policy Advantage (CGRPA), which extends GRPO-style group-relative optimization with counterfactual baselines to disentangle each agent's contribution under shifting team dynamics. On the StarCraft Multi-Agent Challenge (SMAC), CL-MARL attains a 40% mean win rate on the super-hard maps with an average episode return of 17.85, exceeding the QMIX, OW-QMIX, DER, EMC, and MARR baselines by +2.94 on average, while reaching its peak win rate roughly 1.28faster on 8m_vs_9m and 1.42 faster on 3s5z_vs_3s6z than the strongest baseline. The implementation is publicly available at https://github.com/NICE-HKU/CL2MARL-SMAC.
comment: 23 pages; 15figures
Enhancing Glass Surface Reconstruction via Depth Prior for Robot Navigation
Indoor robot navigation is often compromised by glass surfaces, which severely corrupt depth sensor measurements. While foundation models like Depth Anything 3 provide excellent geometric priors, they lack an absolute metric scale. We propose a training-free framework that leverages depth foundation models as a structural prior, employing a robust local RANSAC-based alignment to fuse it with raw sensor depth. This naturally avoids contamination from erroneous glass measurements and recovers an accurate metric scale. Furthermore, we introduce \ti{GlassRecon}, a novel RGB-D dataset with geometrically derived ground truth for glass regions. Extensive experiments demonstrate that our approach consistently outperforms state-of-the-art baselines, especially under severe sensor depth corruption. The dataset and related code will be released at https://github.com/jarvisyjw/GlassRecon.
comment: 9 pages, 8 figures
MAD-BA: 3D LiDAR Bundle Adjustment -- from Uncertainty Modelling to Structure Optimization
The joint optimization of sensor poses and 3D structure is fundamental for state estimation in robotics and related fields. Current LiDAR systems often prioritize pose optimization, with structure refinement either omitted or treated separately using implicit representations. This paper introduces a framework for simultaneous optimization of sensor poses and 3D map, represented as surfels. A generalized LiDAR uncertainty model is proposed to address less reliable measurements in varying scenarios. Experimental results on public datasets demonstrate improved performance over most comparable state-of-the-art methods. The system is provided as open-source software to support further research.
comment: 8 pages, 7 figures. This work has been accepted to IEEE Robotics and Automation Letters (RA-L)
Evaluating Generative Models as Interactive Emergent Representations of Human-Like Collaborative Behavior
Human-AI collaboration requires AI agents to understand human behavior for effective coordination. While advances in foundation models show promising capabilities in understanding and showing human-like behavior, their application in embodied collaborative settings needs further investigation. This work examines whether embodied foundation model agents exhibit emergent collaborative behaviors indicating underlying mental models of their collaborators, which is an important aspect of effective coordination. This paper develops a 2D collaborative game environment where large language model agents and humans complete color-matching tasks requiring coordination. We define five collaborative behaviors as indicators of emergent mental model representation: perspective-taking, collaborator-aware planning, introspection, theory of mind, and clarification. An automated behavior detection system using LLM-based judges identifies these behaviors, achieving fair to substantial agreement with human annotations. Results from the automated behavior detection system show that foundation models consistently exhibit emergent collaborative behaviors without being explicitly trained to do so. These behaviors occur at varying frequencies during collaboration stages, with distinct patterns across different LLMs. A user study was also conducted to evaluate human satisfaction and perceived collaboration effectiveness, with the results indicating positive collaboration experiences. Participants appreciated the agents' task focus, plan verbalization, and initiative, while suggesting improvements in response times and human-like interactions. This work provides an experimental framework for human-AI collaboration, empirical evidence of collaborative behaviors in embodied LLM agents, a validated behavioral analysis methodology, and an assessment of collaboration effectiveness.
comment: Under review
Low-Latency Quasi-Static Modeling of UAV Tether Aerodynamics
One of the main limitations of multirotor UAVs is their short flight time due to battery constraints. A practical solution for continuous operation is to power the drone from the ground via a tether. While this approach has been demonstrated for stationary systems, scenarios with a fast-moving base vehicle or strong wind conditions require modeling the tether forces, including aerodynamic effects. In this work, we propose two complementary approaches for low-latency quasi-static tether modeling with aerodynamics. The first is an analytical method based on catenary theory with a uniform drag assumption, achieving very fast solve times below 1 ms. The second is a numerical method that discretizes the tether into segments and lumped masses, solving the equilibrium equations using CasADi and IPOPT. By leveraging initialization strategies, such as warm starting and analytical initialization, low-latency performance was achieved with a solve time of 5 ms, while allowing for flexible force formulations. Both approaches were validated in real-world tests using a load cell to measure the tether force. The results show that the analytical method provides sufficient accuracy for most tethered UAV applications with minimal computational cost, while the numerical method offers higher flexibility and physical accuracy when required. These approaches form a lightweight and extensible framework for low-latency tether simulation, applicable to both offline optimization and online tasks such as simulation, control, and trajectory planning.
comment: Accepted at ICUAS2026
AnyPos: Automated Task-Agnostic Actions for Bimanual Manipulation
Learning generalizable manipulation policies hinges on data, yet robot manipulation data is scarce and often entangled with specific embodiments, making both cross-task and cross-platform transfer difficult. We tackle this challenge with task-agnostic embodiment modeling, which learns embodiment dynamics directly from task-agnostic action data and decouples them from high-level policy learning. By focusing on exploring all feasible actions of the embodiment to capture what is physically feasible and consistent, task-agnostic data takes the form of independent image-action pairs with the potential to cover the entire embodiment workspace, unlike task-specific data, which is sequential and tied to concrete tasks. This data-driven perspective bypasses the limitations of traditional dynamics-based modeling and enables scalable reuse of action data across different tasks. Building on this principle, we introduce AnyPos, a unified pipeline that integrates large-scale automated task-agnostic exploration with robust embodiment modeling through inverse dynamics learning. AnyPos generates diverse yet safe trajectories at scale, then learns embodiment representations by decoupling arm and end-effector motions and employing a direction-aware decoder to stabilize predictions under distribution shift, which can be seamlessly coupled with diverse high-level policy models. In comparison to the standard baseline, AnyPos achieves a 51% improvement in test accuracy. On manipulation tasks such as operating a microwave, toasting bread, folding clothes, watering plants, and scrubbing plates, AnyPos raises success rates by 30-40% over strong baselines. These results highlight data-driven embodiment modeling as a practical route to overcoming data scarcity and achieving generalization across tasks and platforms in visuomotor control. Project page: https://embodiedfoundation.github.io/vidar_anypos.
Software Engineering for Self-Adaptive Robotics: A Research Agenda
Self-adaptive robotic systems operate autonomously in dynamic and uncertain environments, requiring robust real-time monitoring and adaptive behaviour. Unlike traditional robotic software with predefined logic, self-adaptive robots exploit artificial intelligence (AI), machine learning, and model-driven engineering to adapt continuously to changing conditions, thereby ensuring reliability, safety, and optimal performance. This paper presents a research agenda for software engineering in self-adaptive robotics, structured along two dimensions. The first concerns the software engineering lifecycle, requirements, design, development, testing, and operations, tailored to the challenges of self-adaptive robotics. The second focuses on enabling technologies such as digital twins and AI-driven adaptation, which support runtime monitoring, fault detection, and automated decision-making. We identify open challenges, including verifying adaptive behaviours under uncertainty, balancing trade-offs between adaptability, performance, and safety, and integrating self-adaptation frameworks like MAPE K/MAPLE-K. By consolidating these challenges into a roadmap toward 2030, this work contributes to the foundations of trustworthy and efficient self-adaptive robotic systems capable of meeting the complexities of real-world deployment.
Multi-Source Human-in-the-Loop Digital Twin Testbed for Connected and Autonomous Vehicles in Mixed Traffic Flow
In the emerging mixed traffic environments, Connected and Autonomous Vehicles (CAVs) have to interact with surrounding human-driven vehicles (HDVs). This paper introduces MSH-MCCT (Multi-Source Human-in-the-Loop Mixed Cloud Control Testbed), a novel CAV testbed that captures complex interactions between various CAVs and HDVs. Utilizing the Mixed Digital Twin concept, which combines Mixed Reality with Digital Twin, MSH-MCCT integrates physical, virtual, and mixed platforms, along with multi-source control inputs. Bridged by the mixed platform, MSH-MCCT allows human drivers and CAV algorithms to operate both physical and virtual vehicles within multiple fields of view. Particularly, this testbed facilitates the coexistence and real-time interaction of physical and virtual CAVs \& HDVs, significantly enhancing the experimental flexibility and scalability. Experiments on vehicle platooning in mixed traffic showcase the potential of MSH-MCCT to conduct CAV testing with multi-source real human drivers in the loop through driving simulators of diverse fidelity. The videos for the experiments are available at our project website: https://dongjh20.github.io/MSH-MCCT.
SlotVLA: Towards Modeling of Object-Relation Representations in Robotic Manipulation ICRA 2026
Inspired by how humans reason over discrete objects and their relationships, we explore whether compact object-centric and object-relation representations can form a foundation for multitask robotic manipulation. Most existing robotic multitask models rely on dense embeddings that entangle both object and background cues, raising concerns about both efficiency and interpretability. In contrast, we study object-relation-centric representations as a pathway to more structured, efficient, and explainable visuomotor control. Our contributions are two-fold. First, we introduce LIBERO+, a fine-grained benchmark dataset designed to enable and evaluate object-relation reasoning in robotic manipulation. Unlike prior datasets, LIBERO+ provides object-centric annotations that enrich demonstrations with box- and mask-level labels as well as instance-level temporal tracking, supporting compact and interpretable visuomotor representations. Second, we propose SlotVLA, a slot-attention-based framework that captures both objects and their relations for action decoding. It uses a slot-based visual tokenizer to maintain consistent temporal object representations, a relation-centric decoder to produce task-relevant embeddings, and an LLM-driven module that translates these embeddings into executable actions. Experiments on LIBERO+ demonstrate that object-centric slot and object-relation slot representations drastically reduce the number of required visual tokens, while providing competitive generalization. Together, LIBERO+ and SlotVLA provide a compact, interpretable, and effective foundation for advancing object-relation-centric robotic manipulation.
comment: Accepted at ICRA 2026
Learning to Feel the Future: DreamTacVLA for Contact-Rich Manipulation
Vision-Language-Action (VLA) models have shown remarkable generalization by mapping web-scale knowledge to robotic control, yet they remain blind to physical contact. Consequently, they struggle with contact-rich manipulation tasks that require reasoning about force, texture, and slip. While some approaches incorporate low-dimensional tactile signals, they fail to capture the high-resolution dynamics essential for such interactions. To address this limitation, we introduce DreamTacVLA, a framework that grounds VLA models in contact physics by learning to feel the future. Our model adopts a hierarchical perception scheme in which high-resolution tactile images serve as micro-vision inputs coupled with wrist-camera local vision and third-person macro vision. To reconcile these multi-scale sensory streams, we first train a unified policy with a Hierarchical Spatial Alignment (HSA) loss that aligns tactile tokens with their spatial counterparts in the wrist and third-person views. To further deepen the model's understanding of fine-grained contact dynamics, we finetune the system with a tactile world model that predicts future tactile signals. To mitigate tactile data scarcity and the wear-prone nature of tactile sensors, we construct a hybrid large-scale dataset sourced from both high-fidelity digital twin and real-world experiments. By anticipating upcoming tactile states, DreamTacVLA acquires a rich model of contact physics and conditions its actions on both real observations and imagined consequences. Across contact-rich manipulation tasks, it outperforms state-of-the-art VLA baselines, achieving up to 95% success, highlighting the importance of understanding physical contact for robust, touch-aware robotic agents.
Governed Capability Evolution for Embodied Agents: Safe Upgrade, Compatibility Checking, and Runtime Rollback for Embodied Capability Modules
Embodied agents are increasingly expected to improve over time by updating their executable capabilities rather than rewriting the agent itself. Prior work has separately studied modular capability packaging, capability evolution, and runtime governance. However, a key systems problem remains underexplored: once an embodied capability module evolves into a new version, how can the hosting system deploy it safely without breaking policy constraints, execution assumptions, or recovery guarantees? We formulate governed capability evolution as a first-class systems problem for embodied agents. We propose a lifecycle-aware upgrade framework in which every new capability version is treated as a governed deployment candidate rather than an immediately executable replacement. The framework introduces four upgrade compatibility checks -- interface, policy, behavioral, and recovery -- and organizes them into a staged runtime pipeline comprising candidate validation, sandbox evaluation, shadow deployment, gated activation, online monitoring, and rollback. We evaluate over 6 rounds of capability upgrade with 15 random seeds. Naive upgrade achieves 72.9% task success but drives unsafe activation to 60% by the final round; governed upgrade retains comparable success (67.4%) while maintaining zero unsafe activations across all rounds (Wilcoxon p=0.003). Shadow deployment reveals 40% of regressions invisible to sandbox evaluation alone, and rollback succeeds in 79.8% of post-activation drift scenarios.
comment: 46 pages, 3 figures, 10 tables, 7 appendices
Collision-Aware Object-Goal Visual Navigation via Two-Stage Deep Reinforcement Learning
Object-goal visual navigation aims to reach a specific target object using egocentric visual observations. Recent deep reinforcement learning (DRL) approaches have achieved promising success rates but often neglect collisions during evaluation, limiting real-world deployment. To address this issue, this letter introduces a collision-aware evaluation metric, namely collision-free success rate (CF-SR), to explicitly measure navigation performance under collision constraints. In addition, collision-free success weighted by path length (CF-SPL) is adopted to further evaluate navigation efficiency. Furthermore, a two-stage DRL training framework with collision prediction is proposed to improve collision-free navigation performance. In the first stage, a collision prediction module is trained by supervising the agent's collision states during exploration. In the second stage, leveraging the trained collision prediction, the agent learns to navigate toward target objects while avoiding collision. Extensive experiments across multiple navigation models in the AI2-THOR environment demonstrate consistent improvements in both CF-SR and CF-SPL. Real-world experiments further validate the effectiveness and generalization capability of the proposed framework.
Shepherding UAV Swarm with Action Prediction Based on Movement Constraints
In this study, we propose a new sheepdog-inspired control method for a swarm of small unmanned aerial vehicles (UAVs), which predicts the swarm behavior while explicitly accounting for the motion constraints of real robots. Sheepdog-inspired guidance control refers to a framework in which a small number of navigator agents (sheepdog agents) indirectly drive a large number of autonomous agents (a flock of sheep agents) so as to steer the group toward a target position. In conventional studies on sheepdog-inspired guidance, both types of agents have typically been modeled as point masses, and the guidance law for the navigator agents has been designed using simple interaction vectors based on the instantaneous relative positions between the agents. However, when implementing such methods on real robots such as drones, it is necessary to consider each agent's motion constraints, including upper bounds on velocity and acceleration. Moreover, we argue that guidance can be made more efficient by predicting the future behavior of the autonomous swarm that is observable to the navigator agents. To this end, we propose a three-dimensional guidance control law based on behavior prediction of autonomous agents under motion constraints, inspired by the Dynamic Window Approach (DWA). At each control cycle, the navigator agent generates a set of feasible motion candidates that satisfy its motion constraints, and predicts the short-horizon swarm evolution using an internal model of the autonomous agents maintained within the navigator agent. The motion candidates are then evaluated according to criteria such as the progress velocity toward the target, the positioning strategy with respect to the swarm, and safety margins, and the optimal motion is selected to achieve safe and efficient guidance. Numerical simulation results demonstrate the effectiveness of the proposed guidance control law.
comment: Incomplete results were found in the paper
Atomic-Probe Governance for Skill Updates in Compositional Robot Policies
Skill libraries in deployed robotic systems are continually updated through fine-tuning, fresh demonstrations, or domain adaptation, yet existing typed-composition methods (BLADE, SymSkill, Generative Skill Chaining) treat the library as frozen at test time and do not analyze how composition outcomes change when a skill is replaced. We introduce a paired-sampling cross-version swap protocol on robosuite manipulation tasks to characterize this dimension of compositional skill learning. On a dual-arm peg-in-hole task we discover a dominant-skill effect: one ECM achieves 86.7% atomic success rate while every other ECM is at or below 26.7%, and whether this dominant ECM enters a composition shifts the success rate by up to +50pp. We characterize the boundary on a simpler pick task where all atomic policies saturate at 100% and the effect is undefined. Across three tasks we further find that off-policy behavioral distance metrics fail to identify the dominant ECM, ruling out the natural cheap predictor. We propose an atomic-quality probe and a Hybrid Selector combining per-skill probes (zero per-decision cost) with selective composition revalidation (full cost), and characterize its Pareto frontier on 144 skill-update decisions. On T6 the atomic-only probe sits 23pp below full revalidation (64.6% vs 87.5% oracle match) at zero per-decision cost; a Hybrid Selector with m=10 closes most of that gap to ~12pp at 46% of full-revalidation cost. On the cross-task average over 144 events, atomic-only is within 3pp of full revalidation under a mixed-oracle caveat. The atomic-quality probe is, to our knowledge, the first principled, deployment-ready primitive for skill-update governance in compositional robot policies.
comment: 8 pages main text + appendix; 3 figures, 12 tables;
KGLAMP: Knowledge Graph-guided Language model for Adaptive Multi-robot Planning and Replanning
Heterogeneous multi-robot systems are increasingly used in long-horizon missions requiring coordinated planning across diverse capabilities. However, existing planning approaches struggle to construct accurate symbolic representations and maintain plan consistency in dynamic environments. Classical PDDL planners require manually crafted symbolic models, while LLM-based planners often ignore agent heterogeneity and environmental uncertainty. We introduce KGLAMP, a knowledge-graph-guided LLM planning framework for heterogeneous multi-robot teams. The framework maintains a structured knowledge graph encoding object relations, spatial reachability, and robot capabilities, which guides the LLM in generating accurate PDDL problem specifications. The knowledge graph serves as a persistent, dynamically updated memory that incorporates new observations and triggers replanning upon detecting inconsistencies, enabling symbolic plans to adapt to evolving world states. Experiments on the MAT-THOR benchmark show that KGLAMP improves performance by at least 25.3% over both LLM-only and PDDL-based variants.
VOFA: Visual Object Goal Pushing with Force-Adaptive Control for Humanoids
The ability to push large objects in a goal-directed manner using onboard egocentric perception is an essential skill for humanoid robots to perform complex tasks such as material handling in warehouses. To robustly manipulate heavy objects to arbitrary goal configurations, the robot must cope with unknown object mass and ground friction, noisy onboard perception, and actuation errors; all in a real-time feedback loop. Existing solutions either rely on privileged object-state information without onboard perception or lack robustness to variations in goal configurations and object physical properties. In this work, we present VOFA, a visual goal-conditioned humanoid loco-manipulation system capable of pushing objects with unknown physical properties to arbitrary goal positions. VOFA consists of a two-level hierarchical architecture with a high-level visuomotor policy and a low-level force-adaptive whole-body controller. The high-level policy processes noisy onboard observations and generates goal-conditioned commands to operate in closed loop across diverse object-goal configurations, while the low-level whole-body controller provides robustness to variations in object physical properties. VOFA is extensively evaluated in both simulation and real-world experiments on the Booster T1 humanoid robot. Our results demonstrate strong performance, achieving over 90% success in simulation and over 80% success in real-world trials. Moreover, VOFA successfully pushes objects weighing up to 17kg, exceeding half of the Booster T1's body weight.
A Position Statement on Endovascular Models and Effectiveness Metrics for Mechanical Thrombectomy Navigation, on behalf of the Stakeholder Taskforce for AI-assisted Robotic Thrombectomy (START)
While we are making progress in overcoming infectious diseases and cancer; one of the major medical challenges of the mid-21st century will be the rising prevalence of stroke. Large vessels occlusions are especially debilitating, yet effective treatment (needed within hours to achieve best outcomes) remains limited due to geography. One solution for improving timely access to mechanical thrombectomy in geographically diverse populations is the deployment of robotic surgical systems. Artificial intelligence (AI) assistance may enable the upskilling of operators in this emerging therapeutic delivery approach. Our aim was to establish consensus frameworks for developing and validating AI-assisted robots for thrombectomy. Objectives included standardizing effectiveness metrics and defining reference testbeds across in silico, in vitro, ex vivo, and in vivo environments. To achieve this, we convened experts in neurointervention, robotics, data science, health economics, policy, statistics, and patient advocacy. Consensus was built through an incubator day, a Delphi process, and a final Position Statement. We identified that the four essential testbed environments each had distinct validation roles. Realism requirements vary: simpler testbeds should include realistic vessel anatomy compatible with guidewire and catheter use, while standard testbeds should incorporate deformable vessels. More advanced testbeds should include blood flow, pulsatility, and disease features. There are two macro-classes of effectiveness metrics: one for in silico, in vitro, and ex vivo stages focusing on technical navigation, and another for in vivo stages, focused on clinical outcomes. Patient safety is central to this technology's development. One requisite patient safety task needed now is to correlate in vitro measurements to in vivo complications.
comment: Published in Journal of the American Heart Association
BOIL: Learning Environment Personalized Information
Navigating complex environments poses challenges for multi-agent systems, requiring efficient extraction of insights from limited information. In this paper, we introduce the Blackbox Oracle Information Learning (BOIL) process, a scalable solution for extracting valuable insights from the environment structure. Leveraging the Pagerank algorithm and common information maximization, BOIL facilitates the extraction of information to guide long-term agent behavior applicable to problems such as coverage, patrolling, and stochastic reachability. Through experiments, we demonstrate the efficacy of BOIL in generating strategy distributions conducive to improved performance over extended time horizons, surpassing heuristic approaches in complex environments.
Perceptive Humanoid Parkour: Chaining Dynamic Human Skills via Motion Matching
While recent advances in humanoid locomotion have achieved stable walking on varied terrains, capturing the agility and adaptivity of highly dynamic human motions remains an open challenge. In particular, agile parkour in complex environments demands not only low-level robustness, but also human-like motion expressiveness, long-horizon skill composition, and perception-driven decision-making. In this paper, we present Perceptive Humanoid Parkour (PHP), a modular framework that enables humanoid robots to autonomously perform long-horizon, vision-based parkour across challenging obstacle courses. Our approach first leverages motion matching, formulated as nearest-neighbor search in a feature space, to compose retargeted atomic human skills into long-horizon kinematic trajectories. This framework enables the flexible composition and smooth transition of complex skill chains while preserving the elegance and fluidity of dynamic human motions. Next, we train motion-tracking reinforcement learning (RL) expert policies for these composed motions, and distill them into a single depth-based, multi-skill student policy, using a combination of DAgger and RL. Crucially, the combination of perception and skill composition enables autonomous, context-aware decision-making: using only onboard depth sensing and a discrete 2D velocity command, the robot selects and executes whether to step over, climb onto, vault or roll off obstacles of varying geometries and heights. We validate our framework with extensive real-world experiments on a Unitree G1 humanoid robot, demonstrating highly dynamic parkour skills such as climbing tall obstacles up to 1.25m (96% robot height), as well as long-horizon multi-obstacle traversal with closed-loop adaptation to real-time obstacle perturbations.
Separation Assurance between Heterogeneous Fleets of Small Unmanned Aerial Systems via Multi-Agent Reinforcement Learning
In the envisioned future dense urban airspace, multiple companies will operate heterogeneous fleets of small unmanned aerial systems (sUASs), where each fleet includes several homogeneous aircraft with identical policies and configurations, e.g., equipage, sensing, and communication ranges, making tactical deconfliction highly complex for the aircraft. This paper aims to address two core questions: (1) Can tactical deconfliction policies converge or reach an equilibrium to ensure a conflict-free airspace when companies operate heterogeneous fleets of homogeneous aircraft? (2) If so, will the converged policies discriminate against companies operating sUASs with weaker configurations? We investigate a multi-agent reinforcement learning paradigm in which homogeneous aircraft within heterogeneous fleets operate concurrently to perform package delivery missions over Dallas, Texas, USA. An attention-enhanced Proximal Policy Optimization-based Advantage Actor-Critic (PPOA2C) framework is employed to resolve intra- and inter-fleet conflicts, with each fleet independently training its own policy while preserving privacy. Experimental results show that two fleets with distinct, shared PPOA2C policies can reach an equilibrium to maintain safe separation. While two PPOA2C policies outperform two strong rule-based baselines in terms of conflict resolution, a PPOA2C policy exhibits safer interaction with a rule-based policy, indicating adaptive capabilities of PPOA2C policies. Furthermore, we conducted extensive policy-configuration evaluations, which reveal that equilibria between similar policy types tend to favor fleets with stronger configurations. Even under similar configurations but different policy types, the equilibrium favors one of the heterogeneous policies, underscoring the need for fairness-aware conflict management in heterogeneous sUAS operations.
comment: 8 pages, 3 figure, 1 table
Multiagent Systems
Graph-SND: Sparse Aggregation for Behavioral Diversity in Multi-Agent Reinforcement Learning
System Neural Diversity (SND) measures behavioral heterogeneity in multi-agent reinforcement learning by averaging pairwise distances over all $\binom{n}{2}$ agent pairs, making each call quadratic in team size. We introduce Graph-SND, which replaces this complete-graph average with a weighted average over the edges of an arbitrary graph $G$. Three regimes follow: $G=K_n$ recovers SND exactly; a fixed sparse $G$ defines a localized diversity measure at $O(|E|)$ cost; and random edge samples yield an unbiased Horvitz-Thompson estimator and a normalized sample mean with $O(1/\sqrt{m})$ concentration in the sampled edge count $m$. For fixed sparse graphs we prove forwarding-index distortion bounds for expanders and a spectral refinement under low-rank distance structure; for random $d$-regular graphs we prove an unconditional probabilistic $\widetilde{\mathcal{O}}(D_{\max}/\sqrt{n})$ bound. On VMAS we verify recovery, unbiasedness, concentration, and wall-clock scaling, with a PettingZoo TVD panel checking non-Gaussian transfer. In a 500-iteration $n=100$ PPO run, Bernoulli-$0.1$ Graph-SND tracks full SND while reducing per-call metric time by about $10\times$, and frozen-policy GPU timing up to $n=500$ follows the predicted $\binom{n}{2}/|E|$ speedup. Random $d$-regular expanders empirically achieve $\mathrm{SND}_{G}^{\mathrm{u}}/\mathrm{SND} \in [0.9987, 1.0013]$ at $Θ(n \log n)$ edges. In DiCo diversity control at $n=50$, Bernoulli-$0.1$ Graph-SND preserves set-point tracking with paired reward differences indistinguishable from zero across nine matched cells while cutting per-call metric cost by ${\sim}9.5\times$. Together, these results show that the SND aggregation bottleneck can be removed without changing the metric's semantics, yielding a drop-in sparse alternative that scales beyond complete-graph SND and supports both passive measurement and closed-loop diversity control.
comment: 22 pages, 12 figures, 7 tables
Evolving Idea Graphs with Learnable Edits-and-Commits for Multi-Agent Scientific Ideation
LLM-empowered multi-agent systems offer new potential to accelerate scientific discovery by generating novel research ideas. However, existing methods typically coordinate agents through temporary texts, such as drafts or chat logs; it is difficult to pinpoint the weaknesses in the generated ideas and how the agents refine them. To this end, we introduce \textbf{Evolving Idea Graphs} (EIG), a graph-based multi-agent scientific ideation framework that can generate high-performance research ideas across various benchmark-native metrics, such as novelty, feasibility, and clarity. Instead of coordinating solely through texts, EIG represents a partially formed proposal as an evolving idea graph, where nodes capture scientific claims and edges encode relations (e.g., support and conflict), enabling unresolved weaknesses to remain identifiable throughout the idea evolving process. Specifically, a learned two-head controller operates over the evolving graph to guide the ideation: one head selects graph edits for agents to execute, while the other decides when the graph is ready for commit as final proposal synthesis. On AI Idea Bench 2025 and LiveIdeaBench, EIG outperforms all compared systems on both automatic benchmark scores and blind expert ratings. Ablations further show that explicit graph state provides the main performance gains, and learned edit-and-commit control adds consistent improvements.
Tree-based Credit Assignment for Multi-Agent Memory System
Memory systems are widely adopted to enhance LLMs for long-horizon tasks, and are commonly organized as multi-agent pipelines with memory building, summarizing, and retrieval agents. To empower this system, existing RL-based methods either apply final downstream task rewards (e.g., QA accuracy) for all agents uniformly, which are coarse and ambiguous, or design task-specific rewards for agents on different subtasks, which require costly annotations (e.g., key evidence) and are difficult to define reliably. To address these limitations, we propose Tree-based Credit Assignment for Multi-Agent Memory Systems (TreeMem), which derives agent-specific credit from the final reward without task-specific annotations. Specifically, TreeMem extends the multi-agent pipeline (builder--summarizer--retrieval) into a tree structure, where each agent's outputs are expanded into multiple subsequent branches. The contribution of each agent is estimated via Monte Carlo averaging over its subsequent branches, capturing how intermediate agent actions may influence the final reward. This converts the coarse final reward into agent-specific optimization signals. These signals are then used to update all agent policies simultaneously, helping heterogeneous agents specialize effectively. Experiments on long-horizon benchmarks show that TreeMem improves memory system performance over strong baselines, validating the effectiveness of tree-structured credit assignment for the multi-agent memory system.
Bridging Perception and Action: A Lightweight Multimodal Meta-Planner Framework for Robust Earth Observation Agents
Autonomous Earth Observation (EO) agents are transitioning from passive perception to complex, multi-step task execution. However, current architectures that integrate planning and execution within a single model often struggle with combinatorial complexity and reasoning errors in dynamic EO scenarios. To resolve these challenges, we propose the Lightweight Multimodal Meta-Planner (LMMP) framework. LMMP incorporates a dual-awareness mechanism that grounds strategic plans in both multimodal image features and high-level task semantics. Crucially, we introduce a Meta Task Library to inject remote sensing expert knowledge directly into the workflow, which standardizes domain logic and ensures plans are physically feasible. We further implement a two-stage training pipeline, initializing the Meta-Planner via expert-distilled Supervised Fine-Tuning and refining it through Direct Preference Optimization based on execution feedback. Extensive experiments on a dataset derived from EarthBench and ThinkGeo demonstrate that LMMP significantly improves tool-calling accuracy and task success rates. Moreover, the framework exhibits strong ``plug-and-play'' versatility, consistently enhancing the performance of diverse executor backbones across previously unseen EO missions.
Hierarachical Multiagent Reinforcement Learning for Multi-Group Tax Game
Reinforcement learning has increasingly been used to study economic decision-making, such as taxation, public spending, and labour supply. However, most existing RL-based economic models focus on a single government--household group, thereby overlooking the strategic interactions that arise when multiple governments compete while managing their own populations. In practice, many economic systems (e.g., taxation) exhibit a multi-group structure, where each government must optimize its fiscal policy in response not only to household behaviour within its jurisdiction, but also to the policies of other competing governments. To capture this structure, we formulate taxation as a hierarchical multi-group game. Within each group, the interaction between the government and households is modelled as a leader--follower game; across groups, governments are modelled as players in a competitive game. This results in a hybrid hierarchical game that is difficult to solve using standard multi-agent reinforcement learning algorithms. We therefore propose a bi-level training framework built on multi-agent reinforcement learning, together with \textit{ Curriculum Learning} and a \textit{ Closed-Loop Sequential Update} strategy, to stabilize training and promote convergence. We instantiate this framework in a taxation game simulation environment grounded in classical economic models. The environment supports the evaluation of different taxation algorithms and provides multiple economic indicators for assessing policy performance. Experiments show that our approach can learn stable tax policies that benefit all participating groups. Compared with a two-group baseline without the proposed update mechanisms, our method avoids premature game collapse, extends the effective game duration by 60.92\%, produces more sustainable and robust tax policies, and reduces GDP disparities among governments by 44.12\%.
SWE-WebDevBench: Evaluating Coding Agent Application Platforms as Virtual Software Agencies
The emergence of "vibe coding" platforms, where users describe applications in natural language and AI agents autonomously generate full-stack software, has created a need for rigorous evaluation beyond code-level benchmarks. In order to assess them as virtual software development agencies on understanding business requirements, making architectural decisions, writing production code, handling iterative modifications, and maintaining business readiness, we introduce SWE-WebDev Bench, a 68-metric evaluation framework spanning 25 primary and 43 diagnostic metrics across seven groups, organized along three dimensions: Interaction Mode (App Creation Request (ACR) vs. App Modification Request (AMR)), Agency Angle (Product Manager (PM), Engineering, Ops), and Complexity Tier (T4 multi-role SaaS, T5 AI-native). Our evaluation (six platforms, three domains, 18 evaluation cells) reveals four recurring shortcomings in the current generation of AI app builders: (1) A specification bottleneck, where platforms compress rich business requirements into oversimplified technical plans, (2) A pervasive frontend-backend decoupling, where visually polished UIs mask absent or broken backend infrastructure, (3) A steep production-readiness cliff, where no platform scores above 60% on engineering quality and post-generation human effort varies substantially across platforms and (4) Widespread security and infrastructure failures, with no platform exceeding 65% Security Score against a 90% target and concurrency handling as low as 6%. These observations are descriptive of our sample and require larger-scale replication to establish generality. We release SWE-WebDev Bench as a community benchmark to enable such replication and help platform builders identify and address these gaps. Code and benchmark resources are available at: https://github.com/snowmountainAi/webdevbench and https://webdevbench.com/.
comment: 35 pages, 12 figures, 18 tables
Autonomous Synchronization of Discrete-Time Heterogeneous Multiagent Systems
This paper investigates the autonomous synchronization problem for discrete-time heterogeneous multiagent systems. The synchronization problem is transformed into the asymptotic decoupling problem of stable modes in a class of discrete-time linear time-varying systems, for which we provide a sufficient condition. Leveraging this condition, synchronization conditions are established. The synchronization conditions are based on the average of the agents' initial dynamic matrices, without requiring the differences among these matrices to be small. This approach reduces the conservativeness of existing conditions and achieves a unification of both homogeneous and heterogeneous systems. Numerical simulation results are provided to support the theoretical findings.
comment: 9 pages, 7 figures, submitted to IEEE Transactions on Control of Network Systems
YOTOnet: Zero-Shot Cross-Domain Fault Diagnosis via Domain-Conditioned Mixture of Experts
Mechanical equipment forms the critical backbone of modern industrial production, yet domain shift severely limits the generalization of deep learning based fault diagnosis models across different equipment and operating conditions.Inspired by the success of foundation models in achieving zero-shotgeneralization, we propose YOTOnet (You Only Train Once), a novel architecture specifically designed for cross-domain fault diagnosis in mechanical equipment.YOTOnet comprises three core components: (1) a physics-aware Invariant Feature Distiller that extracts domain-agnostic representations using multi-scale dilated convolutions and FFT-based time-frequency fusion,(2) Domain-Conditioned Sparse Experts (DC-MoE) that adaptively route inputs to specialized processors via learned gating without external meta-data, and (3) a dual-head classification system with auxiliary supervision.Extensive validation on five public bearing datasets (CWRU, MFPT, XJTU,OTTAWA, HUST) through 30 cross-dataset protocols demonstrates the superiority of YOTOnet compared with other state-of-the-art methods. Critically, we observe a clear scaling effect-average test F1 improves from 0.5339(1 training dataset) to 0.705 (4 datasets), with a clear gain when moving from 3 to 4 datasets. These findings provide empirical evidence that foundation model principles can enable robust, train-once deployment for industrial fault diagnosis.
DAO-enabled decentralized physical AI: A new paradigm for human-machine collaboration
We propose DAO-enabled decentralized physical AI (DePAI), a democratic architecture for coordinating humans and autonomous machines in the operation and governance of physical-digital systems. We (1) synthesize foundations in blockchains, decentralized autonomous organizations (DAOs), and cryptoeconomics; (2) connect DAO design with digital-democracy research on deliberation and voting, showing how each can advance the other; (3) position DAO-governed decentralized physical infrastructure networks (DePIN) within a vertically integrated stack that links energy and sensing to connectivity, storage/compute, models, and robots; (4) show how these elements specify workflows that couple machine execution with human oversight, enabling enhanced self-organization of techno-socio-economic systems, which we call DePAI; and (5) analyze risks, including security, centralization, incentive failure, legal exposure, and the crowding-out of intrinsic motivation, and argue for value-sensitive design and continuously adaptive governance. DePAI offers a path to scalable, resilient self-organization that integrates physical infrastructure, AI, and community ownership under transparent rules, on-chain incentives, and permissionless participation, aiming to preserve human autonomy.
FinRAG-12B: A Production-Validated Recipe for Grounded Question Answering in Banking ACL 2026
Large language models (LLMs) are rapidly being adopted across various domains. However, their adoption in banking industry faces resistance due to demands for high accuracy, regulatory compliance, and the need for verifiable and grounded responses. We present a unified, data-efficient framework for training grounded domain-specific LLMs that optimizes answer quality, citation grounding, and calibrated refusal under real-world deployment constraints. First, we describe a data generation pipeline that combines LLM-as-a-Judge filtering, citation annotation, and curriculum learning with only 143M tokens. The resulting 12B model achieves high answer quality outperforming GPT-4.1 on citation grounding, with a modest citation tradeoff versus the untuned base. Second, we propose a calibrated refusal mechanism: training on 22% unanswerable examples yield a 12% "I don't know" rate, substantially improving over the base model's unsafe 4.3% rate while avoiding GPT-4.1's over-refusal (20.2%). Third, we present an end-to-end methodology spanning from data curation to quantized serving. The system is deployed at 40+ financial institutions, achieving a 7.1 percentage point improvement in query resolution (p < 0.001). Additionally, the model delivers 3-5x faster responses at 20-50x lower cost compared to GPT-4.1.
comment: 7 pages, ACL 2026 conference
Caesar: Deep Agentic Web Exploration for Creative Answer Synthesis
To advance from passive retrieval to creative discovery of new ideas, autonomous agents must be capable of deep, associative synthesis. However, current agentic frameworks prioritize convergent search, often resulting in derivative summaries that lack creativity. Caesar is an agentic architecture designed to bridge the gap between information gathering and synthesis of new insights. Unlike existing agents that treat the web as a flat sequence of disconnected documents, Caesar performs a deep web traversal to construct a dynamic knowledge graph. This graph then serves as a navigational scaffold, guiding the agent to diverse, non-obvious information that flat retrieval would never encounter. Caesar thus consists of two components: (1) exploration driven by a dynamic context-aware policy that maximizes information coverage across the web's topological structure, and (2) synthesis through adversarial refinement that actively seeks novel perspectives rather than confirming established priors. Caesar demonstrates the ability to generate artifacts and answers characterized by high novelty and structural coherence, achieving 13% to 23% improvement over state-of-the-art deep research agents in creative synthesis challenges, with strong dominance across all output formats.
When Reasoning Models Hurt Behavioral Simulation: A Solver-Sampler Mismatch in Multi-Agent LLM Negotiation
Behavioral simulation and strategic problem solving are different tasks. Large language models are increasingly explored as agents in policy-facing institutional simulations, but stronger reasoning need not improve behavioral sampling. We study this solver-sampler mismatch in three multi-agent negotiation environments: two trading-limits scenarios with different authority structures and a grid-curtailment case in emergency electricity management. Across two primary model families, native reasoning and often no reflection collapse toward authority-heavy outcomes. The sharpest case is DeepSeek native reasoning in the grid-curtailment transfer: it reaches action entropy 1.256 and a concession-arc rate of 0.933, yet still ends in authority decision in 15 of 15 runs. A direct OpenAI extension shows the same pressure at provider breadth: GPT-5.2 native reasoning ends in authority decisions in 45 of 45 runs across the three environments. Budget-matched no-reflection controls and orthogonal private-state controls remain rigid, while the negotiation-structured scaffold condition is the only condition that consistently opens negotiated outcomes. These diagnostics are failure screens within a fixed negotiation grammar, not evidence of external behavioral realism or policy-forecasting validity. The results show that neither more output space nor generic extra private state rescues solver-like sampler failure. For institutional simulation, solver strength and sampler qualification are different objectives: models should be evaluated for the behavioral role they are meant to play, not only for strategic capability.
comment: 12 pages, 7 figures, supplementary material included as ancillary file
When Stress Becomes Signal: Detecting Antifragility-Compatible Regimes in Multi-Agent LLM Systems
Multi-agent LLM systems are increasingly used to solve complex tasks through decomposition, debate, specialization, and ensemble reasoning. However, these systems are usually evaluated in terms of robustness: whether performance is preserved under perturbation. This paper studies a different question: whether semantic stress exposes structured variation that could support future antifragile learning. We introduce CAFE (Cognitive Antifragility Framework for Evaluation), a statistical framework for detecting antifragility-compatible regimes in multi-agent architectures. CAFE models a controlled expected distribution of semantic stressors, reconstructs an architecture-specific observed effective stress distribution from multi-dimensional judge signals, and compares both distributions using a distributional Jensen Gap under a convex stress potential. A positive gap does not imply immediate performance improvement; instead, it indicates a convex-expansive deformation of the observed stress distribution, suggesting that the architecture exposes learnable stress structure. We evaluate CAFE on a banking-risk analysis benchmark with five multi-agent architectures: flat, hierarchical, debate, meta-adaptive, and ensemble. Across all architectures, semantic stress reduces average judged quality by roughly one third. Yet all architectures exhibit positive distributional Jensen Gaps with bootstrap confidence intervals above zero. These results show that immediate quality degradation can coexist with statistically detectable antifragility-compatible stress geometry. CAFE is therefore not an antifragile learner itself, but a measurement layer for identifying when and where antifragility learning may be worth applying.
KGLAMP: Knowledge Graph-guided Language model for Adaptive Multi-robot Planning and Replanning
Heterogeneous multi-robot systems are increasingly used in long-horizon missions requiring coordinated planning across diverse capabilities. However, existing planning approaches struggle to construct accurate symbolic representations and maintain plan consistency in dynamic environments. Classical PDDL planners require manually crafted symbolic models, while LLM-based planners often ignore agent heterogeneity and environmental uncertainty. We introduce KGLAMP, a knowledge-graph-guided LLM planning framework for heterogeneous multi-robot teams. The framework maintains a structured knowledge graph encoding object relations, spatial reachability, and robot capabilities, which guides the LLM in generating accurate PDDL problem specifications. The knowledge graph serves as a persistent, dynamically updated memory that incorporates new observations and triggers replanning upon detecting inconsistencies, enabling symbolic plans to adapt to evolving world states. Experiments on the MAT-THOR benchmark show that KGLAMP improves performance by at least 25.3% over both LLM-only and PDDL-based variants.
Separation Assurance between Heterogeneous Fleets of Small Unmanned Aerial Systems via Multi-Agent Reinforcement Learning
In the envisioned future dense urban airspace, multiple companies will operate heterogeneous fleets of small unmanned aerial systems (sUASs), where each fleet includes several homogeneous aircraft with identical policies and configurations, e.g., equipage, sensing, and communication ranges, making tactical deconfliction highly complex for the aircraft. This paper aims to address two core questions: (1) Can tactical deconfliction policies converge or reach an equilibrium to ensure a conflict-free airspace when companies operate heterogeneous fleets of homogeneous aircraft? (2) If so, will the converged policies discriminate against companies operating sUASs with weaker configurations? We investigate a multi-agent reinforcement learning paradigm in which homogeneous aircraft within heterogeneous fleets operate concurrently to perform package delivery missions over Dallas, Texas, USA. An attention-enhanced Proximal Policy Optimization-based Advantage Actor-Critic (PPOA2C) framework is employed to resolve intra- and inter-fleet conflicts, with each fleet independently training its own policy while preserving privacy. Experimental results show that two fleets with distinct, shared PPOA2C policies can reach an equilibrium to maintain safe separation. While two PPOA2C policies outperform two strong rule-based baselines in terms of conflict resolution, a PPOA2C policy exhibits safer interaction with a rule-based policy, indicating adaptive capabilities of PPOA2C policies. Furthermore, we conducted extensive policy-configuration evaluations, which reveal that equilibria between similar policy types tend to favor fleets with stronger configurations. Even under similar configurations but different policy types, the equilibrium favors one of the heterogeneous policies, underscoring the need for fairness-aware conflict management in heterogeneous sUAS operations.
comment: 8 pages, 3 figure, 1 table
Systems and Control (EESS)
A Closed-Form Dual-Barrier CBF Safety Filter for Holonomic Robots on Incrementally Built Occupancy Grid Maps
We present a dual-barrier control barrier function (CBF) safety filter for real-time, safety-critical velocity control of holonomic robots operating in incrementally built occupancy grid maps. As a robot explores an unknown environment, unmapped regions introduce irreducible uncertainty, since obstacle geometry beyond the explored frontier is unknown, making entry into such regions a source of collision risk, especially with front-facing sensors. To address this, we enforce two constraints: avoidance of mapped obstacles and restriction from unexplored regions. Both constraints are derived analytically from the occupancy grid's signed distance field, yielding a closed-form safety filter that requires only a small linear system solve per cycle. On resource-constrained platforms such as the Raspberry Pi, where SLAM and planning already consume significant compute, the low overhead of the proposed filter preserves resources. An adaptive gain schedule relaxes the frontier constraint in information-rich regions and tightens it in well-mapped areas, improving exploration efficiency while maintaining safety. The filter operates in velocity space as a minimally invasive correction and composes with arbitrary nominal controllers, including learning-based methods. Hardware flight experiments on a PX4-controlled quadrotor demonstrate zero collisions across multiple indoor runs.
Input-Output Specifications and Dynamic Droop Coefficients: Stability and Performance Conditions for Grid-Forming IBRs
This paper proposes dynamic stability and performance conditions for grid-connected inverter-based resources (IBRs). To this end, we extend the notion of steady-state droop coefficients to dynamic droop coefficients to capture the small-signal dynamics of IBRs and synchronous generators (SGs). Notably, the dynamic droop coefficients can be obtained from input-output data collected at the unit's (e.g., IBR or SG) point of interconnection without requiring prior knowledge of IBR internals or controls structure. To obtain frequency stability conditions, this IBR model is combined with a lightweight dynamic transmission network model that accounts for uncertainty of line dynamics. The resulting stability conditions are highly scalable and, given a few key network parameters, can be verified at the unit level. To make the conditions practical and offer intuitive and illustrative interpretations, we map the frequency stability conditions to bounds on the Bode plot of the dynamic droop coefficient for two broad types of IBR responses. Moreover, our specifications on the dynamic droop coefficient (i) translate basic frequency control ancillary services into verifiable requirements, and (ii) provide insights into the much-debated question of how to certify an IBR as grid-forming (GFM). The results are illustrated using dynamic droop coefficients obtained using detailed simulations of GFM and GFL IBRs as well as SGs.
Minimizing the Expected Cost of Synchronization in Lossless Power Networks
The reliable operation of large-scale electric power networks is increasingly challenging, particularly with the integration of stochastic renewable generation. In this work, we address the problem of minimizing network transients by optimally modifying the underlying network. We formulate the problem in terms of graph Laplacian matrices and show that, under certain assumptions, the problem is convex. We derive a linear matrix inequality whose feasibility guarantees the existence and uniqueness of phase cohesive steady-state angles; this condition can be directly incorporated as a convex constraint in the optimization framework and we provide several geometric interpretations of the optimization problem. The proposed method is validated on the IEEE 30-bus test system, where results demonstrate that our approach effectively identifies critical links on the network. Dynamic simulations show a significant reduction in network transients and overall improvements across several performance metrics. We explore the sparsity-optimality trade-off using a reweighted $\ell_1$ heuristic.
Look Once, Beam Twice: Camera-Primed Real-Time Double-Directional mmWave Beam Management for Vehicular Connectivity
Millimeter-wave (mmWave) frequencies promise multi-gigabit connectivity for vehicle-to-everything (V2X) networks, but face challenges in terms of severe path loss and mobility-related beam misalignment. Reliable V2X connectivity requires fast, double-directional beam alignment. However, existing methods suffer from high training overhead and limited generalization to unseen scenarios. This paper presents VIsion-based BEamforming(VIBE), a hybrid model-based, closed-loop, learning architecture for real-time double-directional mmWave beam management primed by camera sensing. VIBE fuses machine learning, model-based reasoning, and closed-loop RF feedback to balance beam-pair establishment latency with link quality. VIBE bypasses exhaustive training overhead and accelerates link establishment by leveraging camera observations to reduce the beam-search space. Lightweight beam refinement and offset tracking mechanisms adaptively refine beams in response to dynamic application requirements. VIBE is implemented and evaluated across online indoor/outdoor testbeds, public datasets, and real-time vehicular experiments, demonstrating strong generalization capabilities, making it suitable for real-time V2X communication. Comparisons with 5G NR hierarchical beamforming show that VIBE consistently maintains lower outage rates. Furthermore, VIBE outperforms state-of-the-art end-to-end ML models for beam selection when evaluated on public datasets and achieves outage rates as low as 1.1-1.4 %. The results show that a hybrid model-based, closed-loop learning architecture is better suited for real-world mmWave vehicular connectivity than end-to-end trained ML models. For reproducibility, we publish our code to https://github.com/UNL-CPN-Lab/Look-Once-Beam-Twice.
comment: Accepted to the 2026 IEEE International Conference on Sensing, Communication, and Networking (IEEE SECON 2026). Code and models available at: https://github.com/UNL-CPN-Lab/Look-Once-Beam-Twice
Kinematic Discriminants of Deceleration Behavior Modes in Car-Following: Evidence from NGSIM Trajectory Data
Gap-closing rate and visual looming swap discriminative dominance depending on deceleration intensity - a finding that reconciles a long-standing conflict in the car-following literature and challenges spacing-centered assumptions in traditional driver behavior models. This study presents a two-stage analytical framework that distinguishes between information availability (kinematic variables measurable in the environment) and information utilization (variables that demonstrably separate driver behavioral patterns), applied to 1,060,119 valid car-following observations from the NGSIM trajectory dataset (2,932 vehicles). Six kinematic features are extracted, and deceleration events are detected under two threshold conditions (-0.5 m/s^2 and -0.3 m/s^2). K-means clustering identifies behavioral modes, and one-way ANOVA with eta-squared effect sizes ranks each feature's discriminative power. Three key findings emerge: (1) threshold selection fundamentally shapes behavioral inference - the stricter threshold yields three interpretable modes while the permissive threshold collapses these to two; (2) hard braking prioritizes gap-closing rate (eta^2 = 0.715) while moderate braking emphasizes visual looming (eta^2 = 0.574); and (3) spacing headway is negligible (eta^2 <= 0.014) across both thresholds. These findings provide empirically grounded candidates for perceptual cue prioritization and have direct implications for ADAS warning system design and autonomous vehicle control.
Quantized Probabilistic AI for Gear Fault Diagnosis in Motor Drives
Deploying large artificial intelligence (AI) models in power electronics often demands high computational resources. Driven by the quantization paradigm, this digest proposes a quantization-aware training (QAT) principle to substantially minimize the number of bits required and simultaneously maximize the accuracy of computations in pre-trained AI models. Considering a pre-trained probabilistic Bayesian Neural Network (BNN) for gear fault diagnosis in motor drives as an example, we quantize its weights and activation functions from floating-point FP32 to low-precision INT8 values, which enhances the computational efficiency by a significant margin of 30-45% (for different model versions) without any compromise in the accuracy and uncertainty estimates. This substantiates a sustainable mechanism of deploying most quantized light-weight AI models into low-cost edge processors for power electronic applications.
comment: This manuscript has been accepted for publication in 2026 International Power Electronics Conference, IPEC-Nagasaki 2026 -ECCE Asia-
Unlocking Embodied Probabilistic Computational Features in Motor Drives
Artificial intelligence (AI)-driven fault diagnosis in motor drives often requires significant computational efforts and time for re-training, in addition to the limited knowledge behind the model and suitability of training and learning mechanisms. This work bridges this gap by proposing a structured mechanism of transforming untapped labeled fault data into AI parameters to leverage probabilistic data-driven learning. This novel AI reservoir modeling framework for power electronics not only eliminates exogenous efforts behind learning data patterns and its optimization, but also provides intuitive guidelines for power electronics engineers behind sizing of AI models. This alignment between data and system physics makes the proposed model transparent and interpretable, bridging practical understanding with data-driven learning. Its computational efficiency is demonstrated using experimental data that structured, physics-aware reservoirs achieve higher diagnostic accuracy and clearer explanations than conventional black-box AI methods.
comment: This manuscript has been accepted for publication in 2026 International Power Electronics Conference, IPEC-Nagasaki 2026 -ECCE Asia-
Adaptive Contention-based Random Access for Uplink Reporting in 3GPP Ambient IoT Networks
Ambient Internet of Things (A-IoT) targets energy harvesting (EH), battery-less devices as a simple connectivity solution for extensive ultra-low-power deployments. These devices typically face intermittent energy availability, making uplink reports increasingly susceptible to access collisions and energy outages. In this paper, we build upon the cellular standardization of A-IoT and examine the paging-triggered contention-based random access (CBRA) framework for uplink reporting. We analyze the effects of energy availability and collisions on these systems and introduce an EH-aware access control mechanism. In this mechanism, the reader broadcasts an access probability in the paging message, which helps regulate the number of devices attempting random access. Results show that, unlike the baselines, the proposed method scales well under dense deployments by keeping collisions nearly constant, improving access efficiency, and substantially reducing the number of paging rounds required for successful reporting. These results highlight the importance of lightweight reader-side access control for reliable and resource-efficient reporting in A-IoT environments.
Fast Full-Wave Simulation of Indoor RSS Maps for Pre-Measurement Validation in Device-Free Localization
Human localization is gaining momentum in security, healthcare, logistics, and smart spaces applications. While global navigation systems are unreliable indoor, device-free (a.k.a. passive) localization methods that exploit human-induced perturbations of radio propagation can be effectively used. This paper investigates the use of a compact full-wave electromagnetic (EM) setup as a fast and reliable tool to simulate indoor Wi-Fi propagation for human sensing. The goal is to provide a practical baseline for validating simplified propagation models, such as diffraction-based descriptions, and to reduce the need for costly measurement campaigns. Two-dimensional attenuation maps from received signal strength are generated and compared in controlled environments, focusing on attenuation statistics and interference patterns. The simulations reproduce the main spatial features, though discrepancies remain due to simplified material characterization. Diffraction-aware refinements are proposed to mitigate these effects. Overall, the approach provides an efficient pre-measurement reference to support device-free system design and to guide experimental planning.
423.7 + 426.5 Tb/s GMI Bi-Directional HCF Transmission
We demonstrate OESCL-band same-wavelength bi-directional transmission over 60 km HCF with 42.5 THz bandwidth, achieving GMIs comparable with the highest unidirectional SMF data-rates in both directions, with an aggregate of 423.7 + 426.5 Tb/s.
comment: 4 pages, 5 figures, submitted to ECOC 2026
ADMM-based decomposed DNN+RLT Relaxations for Completely Positive Models in Electricity Market Clearing
The day-ahead electricity market clearing with nonconvex order types can be formulated as a mixed-integer linear program (MILP), but its LP relaxation may provide weak bounds, and exact solutions can become computationally intractable in large-scale or extended market settings. We study a welfare-maximizing clearing model with elementary hourly orders, block orders with logical acceptance constraints, and flexible hourly orders. Starting from a compact MILP formulation, we derive an equivalent completely positive programming (CPP) reformulation via matrix lifting and propose relaxed CPP variants that further reduce the modeling burden while maintaining strong bounds. We then develop tractable doubly nonnegative (DNN) relaxations, including decomposed formulations that exploit the problem structure by using smaller positive semidefinite matrices. To further strengthen these bounds, we introduce reformulation-linearization technique (RLT) inequalities tailored to the decomposed structure. To tackle the challenge of large-scale DNNs, we design an alternating direction method of multipliers (ADMM) with adaptive penalty updates and rigorous dual lower bounds, enabling certified early termination. Computational experiments on synthetic instances show that the proposed DNN+RLT relaxations substantially tighten LP bounds, while decomposition and first-order methods significantly reduce computational effort.
Toward less conservative distributed stability analysis of power systems via matrix-valued differential passivity indices
Passivity indices have been widely adopted to derive distributed stability certificates for power systems. Nevertheless, conventional passivity indices remain scalar-valued even for multi-input-multi-output (MIMO) systems, which can introduce excessive conservatism and compromise analysis accuracy. To overcome these limitations, this paper extends the differential passivity index to a matrix-valued formulation that captures both channel-wise passivity properties and inter-channel coupling effects in MIMO subsystems. On this basis, semi-distributed and fully distributed stability criteria are developed for power systems with heterogeneous nonlinear devices. It is shown that system stability is guaranteed when the aggregate passivity excess of devices compensates for the passivity shortage imposed by the network. Furthermore, analytical passivity matrix expressions for typical power system components are derived, facilitating compositional stability analysis. Case studies on a three-bus system and a modified IEEE 118-bus system validate the effectiveness of the proposed framework.
comment: 18 pages, 9 figures
Negative Imaginary and Passivity Properties of Synchronous Machine Systems
The recent rapid proliferation of renewable energy is fundamentally changing the dynamic operations of power systems, necessitating new approaches to assess stability for these highly nonlinear systems. In this paper, we prove that synchronous machine systems, modeled in the nonlinear dq-frame, possess fundamental dissipativity properties. Specifically, we show passivity from current input to voltage output and a nonlinear negative imaginary property from torque input to rotor angle output. For the nonlinear system shifted around an equilibrium point, we derive explicit conditions for both passivity and the NI property to hold. Finally, we demonstrate that interconnection with passive droop controllers preserves these dissipativity properties with identical supply rates, thereby ensuring closed-loop stability.
Equilibrium points and stability of synchronous machine systems
This paper investigates equilibrium points and stability in two synchronous machine configurations: (i) a single generator with an impedance load and (ii) two interconnected machines with co-located loads. We consider both abc and dq reference frames to show that the equilibrium condition reduces to a cubic polynomial in the single-machine case and to an 18th- degree polynomial in the two-machine case. For the single-machine system, Lyapunov stability analysis and linearization based stability analysis are carried out. For the two-machine system, local stability is assessed through linearization and eigenvalue analysis. Illustrative examples confirm the existence of multiple equilibria and illustrate the impact of parameter variation on stability. Our results provide insight into the stability of synchronous machine systems.
From open-loop representations to closed-loop feedback implementations in differential games: A numerical case study
Solutions to pursuit-evasion and surveillance-evasion differential games are typically computed and expressed using open-loop representations, with the synthesis of feedback strategies significantly less common. We propose a numerical scheme for obtaining feedback strategies for the recently introduced prying-pedestrian surveillance-evasion differential game. The scheme involves computing feedback strategies as input-output maps approximated via neural networks trained using data obtained from open-loop representations of solutions. Simulations show the effectiveness of neural networks trained with an appropriate learning-loss function. Since optimal feedback strategies are discontinuous, as a second contribution, the potential loss/gain of individual players is subsequently studied for players using sample-and-hold feedback compared to continuous-time feedback.
Sequential Monte Carlo for Resilient Networks: Assessment, Mitigation, and Generative Modeling
Resilience is becoming crucial for future wireless networks, which must withstand, adapt to, and recover from rare but potentially cascading disruptions. This paper develops a sequential Monte Carlo (SMC) simulation framework for such systems, in which resilience failures are formulated as path-dependent rare events arising from staged degradation and delayed recovery, and are decomposed into semantically interpretable levels defined by a reaction coordinate. Building on this structure, we present a fixed-level splitting approach with budget-aware population control, enabling efficient estimation of rare non-recovery probabilities. We discuss the potential reuse of SMC checkpoints as representative near-critical states for policy evaluation and simulation-based selection. We further extend the methodology to learned stochastic simulation by using generative sequence models as restartable surrogates within data-driven digital twins. We showcase the framework in a delay-critical wireless network use case, where SMC substantially improves over standard Monte Carlo in rare-event regimes with both physical and learned simulators.
VC-FeS: Viewpoint-Conditioned Feature Selection for Vehicle Re-identification in Thermal Vision
Identification of less-articulated objects using single-channel images, such as thermal images, is important in many applications, such as surveillance. However, in this domain, existing methods show poor performance due to high similarity among objects of the same category in the absence of color information (overlooking shape information) and de-emphasized texture information. Furthermore, variability in viewpoint adds more complexity as the features vary from side to side. We address these issues by constructing viewpoint-conditioned feature vectors and area-specific feature comparisons in separate feature spaces. These interventions enable leveraging the advancements of existing RGB-pre-trained ViT feature extractors while effectively adapting them to address the challenges specific to the thermal domain. We test our system with RGBNT100 (IR) vehicle dataset and a thermal maritime dataset acquired by us. Our results surpass the state-of-the-art methods by 19.7% and 12.8% for the above datasets in mAP scores, respectively. We also plan to make our thermal dataset available, the first of its kind for maritime vessel identification.
ELVIS: Ensemble-Calibrated Latent Imagination for Long-Horizon Visual MPC
A central challenge of visual control with model-based reinforcement learning (RL) is reliable long-horizon planning: long rollouts with learned latent dynamics exhibit branching futures and multi-modal action-value distributions. In addition, compounding model errors amplified by visual occlusions make deep imagination brittle. We present ELVIS, a latent model predictive controller (MPC) designed to make long-horizon planning practical. ELVIS plans in a Dreamer-style recurrent state space model (RSSM) and replaces standard unimodal model predictive path integral (MPPI) with a Gaussian-mixture MPPI that maintains multiple coherent hypotheses over long horizons, avoiding mode averaging under branching rollouts. In parallel, ELVIS stabilizes deep imagination with a shared uncertainty-aware lambda-return: an ensemble of latent critics defines an upper-confidence-bound (UCB) score that gates a time-varying lambda, adaptively trading off bootstrapping versus look-ahead to limit compounding error during planning. The same return is used both to train an actor-critic prior from imagined rollouts and to score candidate trajectories inside GMM-MPPI, aligning RL objectives with the planner's long-horizon optimization. On fourteen DeepMind Control Suite visual tasks, ELVIS establishes state-of-the-art performance compared with TD-MPC2 and DreamerV3. Finally, ELVIS transfers zero-shot to a real-world sand-spraying task with severe occlusions, improving surface-quality metrics and demonstrating robustness beyond simulation.
Towards Lag Consensus with Noisy Digital Twins Perception in Second-order Multi-agent Cyber-physical Systems
In this paper, we study second-order lag consensus in multi-agent cyber-physical networks subject to random noise and input failures, within a framework modeling the interactions and perceptions between physical twins and digital twins. We propose a lag consensus protocol and establish sufficient conditions for the mean-square (exponential) stability of the resulting stochastic lag error dynamics. The consensus criteria are derived via Lyapunov analysis using the Itô formula, ensuring robustness to random perturbations and intermittent input failures. Numerical examples illustrate the effectiveness of the proposed method.
comment: accepted by IFAC WC 26
Adaptive MPC for Constrained Trajectory Tracking of Uncertain LTI System with Input-Rate Limits
This paper addresses the trajectory-tracking problem for discrete-time linear time-invariant systems with bounded parametric uncertainty, subject to hard constraints on system states, control inputs, and input rates. Unlike existing methods, which often consider only partial uncertainty, omit input-rate or state constraints, or focus on regulation problems, this work provides a systematic adaptive model predictive control (MPC) solution for constrained trajectory tracking under full parametric uncertainty. Determining the control input required to achieve zero tracking error under unknown parameters is challenging. Simultaneously, trajectory tracking under uncertainty with input-rate constraints induces temporal coupling in the control sequence, resulting in a time-varying admissible control set and rendering standard recursive feasibility arguments inapplicable. These challenges are overcome by systematically utilizing the estimated system parameters, coupled with a suitably designed adaptive learning process within a reformulated MPC framework. The recursive feasibility of the proposed MPC optimization routine is then rigorously established despite the time-varying admissible control set induced by input-rate constraints. Closed-loop stability is guaranteed via Lyapunov-based analysis, ensuring convergence of the tracking error and boundedness of system states. Simulation results validate the effectiveness of the pr
Counter-Dyna: Data-Efficient RL-Based HVAC Control using Counterfactual Building Models
Model-based reinforcement learning (MBRL) offers a promising approach for data-efficient energy management in buildings, combining the strengths of predictive modeling and reinforcement learning. While previous MBRL methods applied to HVAC control have reduced training data requirements, they still require several months of interaction with the building to learn a satisfactory control policy. A key reason is that existing surrogate models attempt to predict the entire state-space, including weather and electricity prices that are unaffected by control actions, or completely ignore these variables. Addressing these issues, we propose Counter-Dyna, a method that enhances the data-efficiency of Dyna, an MBRL method. We create data-efficient counterfactual surrogate models (CSM) by leveraging invariances in the state-space. Using a CSM in Dyna speeds up RL training measured in environment interaction data compared to previous results. In comparison with previous state-of-the-art that used 6-12 months of environment interactions, our method needs only 5 weeks. We evaluate our method in a large simulation study using the literature standard BOPTEST framework and proximal policy algorithm (PPO) as the RL algorithm. Our results show cost-saving potentials of 5.3% to 17.0% in a hypothetical deployment scenario. Our work is a significant step towards making real-world deployment of RL algorithms in HVAC control practically viable.
Tightly-Coupled Estimation and Guidance for Robust Low-Thrust Rendezvous via Adaptive Homotopy
Minimum-fuel low-thrust rendezvous guidance yields bang-bang control structures highly sensitive to estimation errors, sensor anomalies, and solver regularization, making aggressive closed-loop execution brittle for uncooperative proximity operations. This paper proposes a tightly-coupled estimation and guidance architecture where navigation confidence directly modulates the homotopy parameter of a receding-horizon indirect optimal control solver. Relative motion is modeled in the Clohessy-Wiltshire frame. The translational state is estimated via a linear Kalman filter augmented by a Multiple Tuning Factors (MTF) covariance inflation mechanism that suppresses suspicious innovation directions. A composite score from the normalized innovation and MTF activity is mapped online to the homotopy parameter, allowing the controller to relax toward a smoother, conservative regime when confidence degrades, and recover fuel-efficient bang-bang control as sensing improves. Numerical results under severe measurement degradation show fixed bang-bang guidance remains brittle; both plain-KF and MTF-KF fixed-epsilon controllers yield large terminal miss distances. Conversely, the proposed MTF-adaptive homotopy controller reduces terminal miss by roughly two orders of magnitude, from hundreds of meters to sub-meter levels, requiring only a moderate increase in control effort versus the open-loop fuel-optimal benchmark. A comparison indicates adaptive homotopy is the dominant robustness mechanism, while MTF provides additional accuracy and efficiency improvements. The receding-horizon implementation exhibits consistently fast and reliable solution times, supporting the practical online viability of the proposed method.
Queue-Aware and Resilient Routing in LEO Satellite Networks Using Multi-Agent Reinforcement Learning
With the rapid growth in data demand and stringent latency requirements of modern applications has driven significant interest in Low Earth Orbit (LEO) satellite constellations as an emerging solution for global Internet coverage. However, routing in LEO networks remains a fundamental challenge due to highly dynamic topologies, time-varying traffic conditions, and its susceptibility to link failures. Conventional routing algorithms typically assume static link metrics and fail to account for queue backlogs or real-time system variations, making them less effective in such environments. We propose a queue-aware multi-agent deep reinforcement learning (MA-DRL) framework for routing in LEO satellite networks. Each satellite is modeled as an independent agent responsible for making local routing decisions, enabling a distributed and scalable solution. The proposed framework formulates a latency-aware optimization problem that incorporates background traffic, queue dynamics at each satellite, and a resilience score to improve robustness. We evaluate the proposed approach against the state-action-reward-state-action (SARSA) and Dijkstra algorithms. While Dijkstra achieves the lowest end-to-end latency under ideal conditions, its computational and signaling overhead becomes a significant bottleneck as the network scales. In contrast, our proposed approach incurs significantly lower overhead (approximately 50% of Dijkstra at a 5 s recalculation interval), scales efficiently with network size, and effectively manages queue backlogs and resilience under increasing traffic load, demonstrating enhanced robustness and scalability in LEO satellite networks while maintaining competitive latency and resilience scores.
Optimize-at-Capture: Highly-adaptive Exposure Controlling for In-Vehicle Non-contact Heart-rate Monitoring
Remote photoplethysmography (rPPG) holds great promise for continuous heart-rate monitoring of drivers in intelligent vehicles. However, its performance is severely degraded by the highly dynamic illumination changes. A critical yet overlooked factor is the lack of exposure controlling during video acquisition -- most existing systems rely on either fixed exposure settings or camera build-in auto-exposure, both of which fail to maintain stable facial brightness under rapidly changing lighting conditions during driving. To address this gap, we propose a highly-adaptive exposure controlling framework that proactively adjusts exposure parameters based on predictive modeling of historical skin reflections. Unlike standard auto-exposure, our method is specifically optimized for rPPG measurement, ensuring the skin region of interest (ROI) remains within the optimal dynamic range for rPPG signal extraction. As an important contribution of this study, we introduce ExpDrive, a public in-vehicle physiological monitoring dataset comprising synchronized facial video and reference ECG from 48 subjects captured under real driving conditions. Extensive experiments demonstrate that our method consistently outperforms fixed exposure and standard auto-exposure strategies. Specifically, it reduces the Mean Absolute Error (MAE) by 6.31 bpm (from 14.1 to 7.79 bpm) and significantly increases the success rate by 32.3 percentage points (p < 0.001) (from 24.9% to 57.2%) across challenging driving scenarios. Notably, it clearly improved the performance of non-contact heart-rate monitoring in both low-light (rainy) and high-glare (sunny) conditions, validating the efficacy of exposure-aware acquisition design.
Experiment-as-Code Labs: A Declarative Stack for AI-Driven Scientific Discovery
To unleash the full potential of AI for Science, we must untether the agents from a purely digital environment. The agent's ability to control and explore in real-world labs is essential because the physical lab remains foundational to scientific discovery. While some tasks can be performed on a computer (e.g., data analysis, running simulated experiments), Eureka moments could occur at any time while operating lab instruments (e.g., when a scientist notices unexpected clues, intuition may prompt a real-time course change). Although autonomous labs are on the rise, which expose programmable APIs to control scientific instruments via software, bridging the gap between increasingly powerful AI agents and automated lab equipment requires innovation that draws insights from computer systems. We propose a new paradigm called ``Experiment-as-Code (EaC) Labs,'' where a core concept is to encode experiments as declarative configurations that can be compiled down to device-level APIs. AI agents come up with hypotheses and experiments, written as an ensemble of declarative configurations. The systems layer performs program analysis, safety checks, resource assignment, and job orchestration. Finally, programmatic experimentation occurs via actuating the device APIs. This is a general stack that is science-, lab-, and instrument-independent, representing a novel synthesis across the physical, systems, and intelligence layers to unleash the next breakthrough in AI for Science.
comment: Experiment-as-Code (EaC) white paper
Worst-Case Discovery and Runtime Protection for RL-Based Network Controllers
RL-based controllers achieve strong average-case performance in networking tasks such as congestion control and adaptive bitrate streaming. Yet their performance can degrade severely under network conditions where strong performance is still achievable. Identifying such conditions and quantifying the resulting performance gap is intractable by enumeration, while the sequential and closed-loop nature of RL controllers makes formal verification methods impractical. We present ReGuard, a framework that discovers worst-case scenarios for a given RL controller and protects it against them at inference time without retraining. Discovery is formulated as a bilevel regret-maximization problem, which yields a certified lower bound on the worst-case performance gap. The discovered trajectories are then analyzed as counterfactuals and compiled into lightweight logic rules that intervene only when a risky state is detected, leaving the controller's behavior unchanged otherwise. We evaluate ReGuard across three RL-based network controllers: Pensieve, Sage, and Park. ReGuard discovers scenarios in which the controller's performance is 43$-$64% worse than what is achievable. ReGuard not only discovers gaps 57% to 6$\times$ larger than those found by the strongest baselines but also shrinks them by 79$-$85% via lightweight rule-based protection while preserving nominal performance. ReGuard's protection extends beyond the scenarios it discovers, improving performance across a wider range of network conditions.
comment: 23 pages, 12 figures, 4 tables
Online Nonstochastic Prediction: Logarithmic Regret via Predictive Online Least Squares
We study online prediction for marginally stable, partially observed linear dynamical systems under nonstochastic disturbances. Our objective is to minimize the cumulative squared prediction loss and compete with the best-in-hindsight Luenberger predictor. Standard online learning methods typically rely on bounded domains/gradients, and thus their guarantees may fail to deal with potentially unbounded trajectories in marginally stable systems. In this paper, we introduce an unconstrained online least squares method that stabilizes the learning process via tailored predictive hints. With model knowledge, we prove that hints constructed from any stabilizing Luenberger predictor render the hint residuals uniformly bounded, achieving logarithmic regret despite unbounded trajectory growth. We also discuss model-free prediction and introduce a simple universal hint for symmetric systems, under which logarithmic regret is maintained without model knowledge. Our results provide an adaptive, instance-wise optimal online predictor compared to classical fixed-gain observers under nonstochastic disturbances.
A Computationally Tractable Path-Planning Method for Airborne Wind Energy Systems
Airborne Wind Energy Systems (AWES) have emerged as a promising renewable energy technology that exploits stronger, more consistent high-altitude winds via tethered airborne devices. Among the various concepts, crosswind systems, where efficient flight control is essential to maximise energy output, offer significant potential. This paper addresses the problem of reference selection for crosswind flight control, focusing on the design of power-maximising geometric flight paths for the reel-out phase of Groundgen systems. To overcome the computational challenges associated with optimal control approaches, a computationally tractable framework is proposed in which a path-planning problem is formulated as a nonlinear program. The method optimises the parameters of a Lissajous curve to maximise the average power production over the reel-out phase, while incorporating curvature constraints. The proposed approach provides an efficient alternative to existing optimal control and learning-based methods.
comment: This paper is based on a chapter of the PhD thesis of the first author, with minor revisions and extensions
Stability of Control Lyapunov Function Guided Reinforcement Learning
Reinforcement learning (RL) has become the de facto method for achieving locomotion on humanoid robots in practice, yet stability analysis of the corresponding control policies is lacking. Recent work has attempted to merge control theoretic ideas with reinforcement learning through control guided learning. A notable example of this is the use of a control Lyapunov function (CLF) to synthesize the reinforcement learning rewards, a technique known as CLF-RL, which has shown practical success. This paper investigates the stability properties of optimal controllers using CLF-RL with the goal of bridging experimentally observed stability with theoretical guarantees. The RL problem is viewed as an optimal control problem and exponential stability is proven in both continuous and discrete time using both core CLF reward terms and the additional terms used in practice. The theoretical bounds are numerically verified on systems such as the double integrator and cart-pole. Finally, the CLF guided rewards are implemented for a walking humanoid robot to generate stable periodic orbits.
comment: This work has been submitted to the IEEE for possible publication
Personalized Spiking Neural Networks with Ferroelectric Synapses for EEG Signal Processing
Electroencephalography (EEG)-based brain-computer interfaces (BCIs) are strongly affected by non-stationary neural signals that vary across sessions and individuals, limiting the generalization of subject-agnostic models and motivating adaptive and personalized learning on resource-constrained platforms. Programmable memristive hardware offers a promising substrate for such post-deployment adaptation; however, practical realization is challenged by limited weight resolution, device variability, nonlinear programming dynamics, and finite device endurance. In this work, we show that spiking neural networks (SNNs) can be deployed on ferroelectric memristive synaptic devices for adaptive EEG-based motor imagery decoding under realistic device constraints, achieving classification performance comparable to software-based SNNs. We fabricate, characterize, and model the weight update in ferroelectric synapses. We then evaluate the deployment of convolutional-recurrent SNN architecture using two strategies. First, we adapt to SNNs a mixed precision strategy in which gradient-based updates are accumulated digitally and converted into discrete programming events only when a threshold is exceeded. Additionally, the weight update is device-aware and accounts for the nonlinear, state-dependent programming dynamics. During learning and adaptation, this scheme mitigates possible endurance and energy constraints. Second, we evaluate the transfer of software-trained weights followed by low-overhead on-device re-tuning. We show that, subject-specific transfer learning achieved by retraining only the final network layers improves classification accuracy. These results demonstrate that programmable ferroelectric hardware can support robust, low-overhead adaptation in spiking neural networks, opening a practical path toward personalized neuromorphic processing of neural signals.
Dynamic Modeling and Control of Multi-Stack Alkaline Water Electrolysis Systems with Shared Gas Separators and Lye Circulation: An Experiment-Based Study
An emerging approach for large-scale renewable hydrogen production is integrating multiple alkaline water electrolysis (AWE) stacks into one balance-of-plant (BoP) system, sharing gas-lye separation and lye circulation components. While this configuration, termed $N$-in-1, reduces cost and complexity, its dynamic performance under fluctuating power remains unclear compared with conventional 1-in-1 systems. This paper develops a state-space model of the multi-stack AWE system, capturing lye circulation, temperature, and hydrogen-to-oxygen (HTO) dynamics, calibrated via experiments on a 4,000 Nm$^3$/h-rated 4-in-1 system. A nonlinear model predictive controller (NMPC) is then designed to coordinate inter-stack current distribution, lye flow, and cooling for load tracking and operational stability. Simulations on the experimental-validated model show that a $4$-in-1 system can achieve very similar performance compared to four parallel 1-in-1 systems. Differences in load-tracking error, temperature stabilization, and specific energy consumption remain below 0.015 MW, 0.346 K, and 0.001 kWh/Nm$^3$ under wind power supply.
A Policy-Driven DRL Framework for System-Level Tradeoff Control in NR-U/Wi-Fi Coexistence
The coexistence of NR-U and Wi-Fi in unlicensed spectrum introduces a system-level resource coordination problem, where heterogeneous channel access mechanisms lead to a significant imbalance in spectrum utilization and degraded Wi-Fi performance. To address this challenge, we propose a policy-driven deep reinforcement learning (DRL) framework for adaptive TXOP control, in which the coexistence process is formulated as a Markov decision process (MDP) and a deep Q-network (DQN) learns control policies through online interaction. A key contribution is the introduction of a policy layer via reward design, enabling explicit control of system-level tradeoffs among fairness, throughput, and quality of service (QoS). Three policies, namely absolute fairness, moderate fairness, and utility-based fairness, are developed to achieve different operating points. Simulation results show that the proposed framework achieves a Jain fairness index above 0.9 under strict fairness control. Compared to absolute fairness, moderate fairness improves aggregate throughput by 68.22%, while the utility-based policy further enhances utility by 177.6%. These results demonstrate that policy-driven control provides a flexible and effective solution for managing tradeoffs in heterogeneous coexistence networks.
comment: 13 pages, 13 figures, 1 table, submitted to IEEE Open Journal of Vehicular Technology
Fast Switching in Mixed-Integer Model Predictive Control
We deduce stability results for finite control set and mixed-integer model predictive control with a downstream oversampling phase. The presentation rests upon the inherent robustness of model predictive control with stabilizing terminal conditions and techniques for solving mixed-integer optimal control problems by continuous optimization. Partial outer convexification and binary relaxation transform mixed-integer problems into common optimal control problems. We deduce nominal asymptotic stability for the resulting relaxed system formulation and implement sum-up rounding to restore efficiently integer feasibility on an oversampling time grid. If fast control switching is technically possible and inexpensive, we can approximate the relaxed system behavior in the state space arbitrarily close. We integrate input perturbed model predictive control with practical asymptotic stability. Numerical experiments illustrate practical relevance of fast control switching.
comment: This preprint was revised based on the feedback from the reviewers and resubmitted to the IEEE. The previous version has been conditionally accepted for publication
Model Predictive Control and Moving Horizon Estimation using Statistically Weighted Data-Based Ensemble Models
This paper presents a model predictive control (MPC) framework leveraging an ensemble of data-based models to optimally control complex systems under multiple operating conditions. A novel combination rule for ensemble models is proposed, based on the statistical Mahalanobis distance, enabling the ensemble weights to suitably vary across the prediction window based on the system input. In addition, a novel state observer for ensemble models is developed using moving horizon estimation (MHE). The effectiveness of the proposed methodology is demonstrated on a benchmark energy system operating under multiple conditions.
comment: 6 pages, 4 figures, published in ECC 2026
Second-Order MPC-Based Distributed Q-Learning
The state of the art for model predictive control (MPC)-based distributed Q-learning is limited to first-order gradient updates of the MPC parameterization. In general, using secondorder information can significantly improve the speed of convergence for learning, allowing the use of higher learning rates without introducing instability. This work presents a second-order extension to MPC-based Q-learning with updates distributed across local agents, relying only on locally available information and neighbor-to-neighbor communication. In simulation the approach is demonstrated to significantly outperform first-order distributed Q-learning.
comment: 6 pages, 2 figures, published in IFAC World Congress 2026
Local Safety Filters for Networked Systems via Two-Time-Scale Design
Safety filters based on Control Barrier Functions (CBFs) provide formal guarantees of forward invariance, but are often difficult to implement in networked dynamical systems. This is due to global coupling and communication requirements. This paper develops locally implementable approximations of networked CBF safety filters that require no coordination across subsystems. The proposed approach is based on a two-time-scale dynamic implementation inspired by singular perturbation theory, where a small parameter $ε$ separates fast filter dynamics from the plant dynamics; then, a local implementation is enabled via derivative estimation. Explicit bounds are derived to quantify the mismatch between trajectories of the systems with dynamic filter and with the ideal centralized safety filter. These results characterize how safety degradation depends on the time-scale parameter $ε$, estimation errors, and filter activation time, thereby quantifying trade-offs between safety guarantees and local implementability.
comment: Longer version of a paper accepted for publication in IEEE LCSS; this version has additional data for the simulations
Mapping High-Performance Regions in Battery Scheduling across Data Uncertainty, Battery Design, and Planning Horizons
This study presents a controlled parametric framework for analyzing energy storage planning under uncertainty in a multi-stage model predictive control setting. The framework enables a broad and systematic exploration through parametrized generation of synthetic datasets in the context of energy price arbitrage. It facilitates the study of the joint effects of battery characteristics, signal structure, forecast uncertainty, and planning horizon on revenue performance in energy storage optimization, which are rarely considered together. The analysis is driven by two objectives. First, it characterizes how these interacting factors influence operational revenue and its sensitivity to planning horizon selection, including economic losses caused by deviations from optimal horizons. This provides guidance on expected horizon ranges and their impact on revenue and computational cost. Second, it enables a compact parametrization of the relationships between battery properties, data characteristics, forecast uncertainty, and horizon-dependent performance, providing a basis for future modelling of optimal planning horizon length. Results show that the framework captures consistent structural dependencies across configurations and provides meaningful guidance for horizon selection under uncertainty. In particular, increasing forecast uncertainty systematically reduces the optimal planning horizon across battery types, reflecting the diminishing value of long-term information under increasingly unreliable forecasts. Comparison with real market data shows that the parametrization reproduces the main qualitative trends of optimal horizon behavior, suggesting its potential as a lightweight surrogate for more complex simulation-based analysis.
comment: Research supported by Enefit
Towards Optimal Passive Feedback Control of LTI Systems under LQR Performance
We study state-feedback design for continuous-time LTI systems with a control input and an external input-output pair. Our objective is to determine feedback gains that render the closed-loop system (strictly) passive with respect to the external port while minimizing the standard LQR cost in the disturbance-free case. The resulting constrained optimization problem is intractable due to bilinear matrix inequalities. We analyze the set of passivating gains, showing it is unbounded, possibly nonconvex, path-connected, and contractible. We propose an indirect approach, in which the set of passivating feedback gains is inner-approximated by a compact, convex polytope. A projected gradient flow is employed to compute a gain within this polytope that minimizes the LQR cost. Numerical examples illustrate the effectiveness of the method.
PM-EKF: A Physiological Model-Based Extended Kalman Filter for Daily-Life Physical Activity Energy Expenditure Estimation
Monitoring physical activity energy expenditure (PAEE) in daily life is essential for characterizing individual health and metabolic status. Although indirect calorimetry provides gold-standard PAEE measurements, it is impractical for continuous daily-life monitoring. Consequently, wearable sensing approaches using inertial measurement units (IMUs) and heart rate (HR) sensors have attracted substantial interest. However, most existing IMU- and HR-based methods are purely data-driven and offer limited physiological interpretability. In this work, we propose a simplified physiological model that explicitly links body movement during activities of daily living to the underlying metabolic gas-exchange processes governing PAEE. The model is formulated as a nonlinear state-space system and embedded within an Extended Kalman Filter (EKF), enabling principled handling of measurement noise, model uncertainty, and system nonlinearities. The proposed framework provides personalized, interpretable PAEE estimates without employing black-box models. Our model was validated using a dataset, including 9 subjects with around 50 minutes of measurements per subject, collected in our lab simulating a free-living condition. Using the respiratory data measured by COSMED K5 as reference and explained variance (R^2) as evaluation metric, our model's predicted PAEE yielded median (min-max) R^2 = 0.72 (0.60--0.87), using three IMUs (pelvis and two thighs) for capturing the body-center-of-mass motion and measured HR for the time-varying cardiac output. Our model outperformed a linear regression (LR) model (R^2 = 0.52 (0.23--0.92)) and CNN-LSTM model (R^2 = 0.65 (0.46--0.78)) on the same dataset. Notably, excluding the sensory HR measurement did not significantly degrade PAEE estimation of all three models, indicating that IMU-captured mechanical workload dominated PAEE estimation performance in our protocol.
comment: The main body consists of 11 pages. A 2-page supplementary material is included in the source file as pdf
Model-free practical PI-Lead control design by ultimate sensitivity principle
Practical design and tuning of feedback controllers has often to get by without a model of the dynamic process at hand. Only some general assumptions about the system dynamics, in this work type-one stable, can be available for engineers, for instance in motion control applications and many others. This paper proposes a practical and simple in realization procedure for designing a robust PI-Lead control without modeling. The developed method derives from the ultimate sensitivity principles, known in empirical Ziegler-Nichols tuning of PID controllers, and makes use of some general characteristics of the loop shaping. A three-steps procedure is proposed to determine the integration time constant, control gain, and Lead-element in a way to guarantee a sufficient phase margin, while all steps are served by only experimental monitoring of the output value. Proposed method is demonstrated and discussed with experiments accomplished on a noise-perturbed electro-mechanical actuator system.
comment: 6 pages, 10 figures
Load constrained wind farm flow control through multi-objective multi-agent reinforcement learning
This study presents a multi-agent reinforcement learning (MARL) framework for load-constrained wind farm flow control (WFFC). While wake steering can enhance total wind farm power, it often introduces increased structural loads on downstream turbines. To address this, we integrate an Independent Soft Actor-Critic (I-SAC) architecture with a data-driven, local inflow sector-averaged surrogate model to provide real-time estimates of Damage Equivalent Loads (DELs). By incorporating these estimates into a shaped reward function, turbine-specific agents are trained to maximize power production while adhering to specific load-increase thresholds ($Δ_{max}$) of 10%, 20%, and 30% relative to a baseline controller. The framework is implemented within the WindGym environment using the DYNAMIKS flow solver with Dynamic Wake Meandering (DWM) model to capture non-stationary wake physics. Results indicate that the MARL agents successfully learn collaborative policies that prioritise power gain while actively retreating from high-DEL control strategies.
comment: Submitted to Journal of Physics: Conference Series (Torque 2026). This is the Accepted Manuscript version of an article accepted for publication in Journal of Physics: Conference Series. IOP Publishing Ltd is not responsible for any errors or omissions in this version of the manuscript or any version derived from it. This Accepted Manuscript is published under a CC BY licence
Multi-Source Human-in-the-Loop Digital Twin Testbed for Connected and Autonomous Vehicles in Mixed Traffic Flow
In the emerging mixed traffic environments, Connected and Autonomous Vehicles (CAVs) have to interact with surrounding human-driven vehicles (HDVs). This paper introduces MSH-MCCT (Multi-Source Human-in-the-Loop Mixed Cloud Control Testbed), a novel CAV testbed that captures complex interactions between various CAVs and HDVs. Utilizing the Mixed Digital Twin concept, which combines Mixed Reality with Digital Twin, MSH-MCCT integrates physical, virtual, and mixed platforms, along with multi-source control inputs. Bridged by the mixed platform, MSH-MCCT allows human drivers and CAV algorithms to operate both physical and virtual vehicles within multiple fields of view. Particularly, this testbed facilitates the coexistence and real-time interaction of physical and virtual CAVs \& HDVs, significantly enhancing the experimental flexibility and scalability. Experiments on vehicle platooning in mixed traffic showcase the potential of MSH-MCCT to conduct CAV testing with multi-source real human drivers in the loop through driving simulators of diverse fidelity. The videos for the experiments are available at our project website: https://dongjh20.github.io/MSH-MCCT.
A Tutorial on Learning-Based Radio Map Construction: Data, Paradigms, and Physics-Awareness
The integration of artificial intelligence into next-generation wireless networks necessitates the accurate construction of radio maps (RMs) as a foundational prerequisite for electromagnetic digital twins. A RM provides the digital representation of the wireless propagation environment, mapping complex geographical and topological boundary conditions to critical spatial-spectral metrics that range from received signal strength to full channel state information matrices. This tutorial presents a comprehensive survey of learning-based RM construction, systematically addressing three intertwined dimensions: data, paradigms, and physics-awareness. From the data perspective, we review physical measurement campaigns, ray tracing simulation engines, and publicly available benchmark datasets, identifying their respective strengths and fundamental limitations. From the paradigm perspective, we establish a core taxonomy that categorizes RM construction into source-aware forward prediction and source-agnostic inverse reconstruction, and examine five principal neural architecture families spanning convolutional neural networks, vision transformers, graph neural networks, generative adversarial networks, and diffusion models. We further survey optics-inspired methods adapted from neural radiance fields and 3D Gaussian splatting for continuous wireless radiation field modeling. From the physics-awareness perspective, we introduce a three-level integration framework encompassing data-level feature engineering, loss-level partial differential equation regularization, and architecture-level structural isomorphism. Open challenges including foundation model development, physical hallucination detection, and amortized inference for real-time deployment are discussed to outline future research directions. The project page is at https://github.com/UNIC-Lab/Awesome-Radio-Map-Categorized.
Sampled-data Robust Control of Electrically Stimulated Engineered Cell Factories
Closed-loop bioelectronic regulation of engineered secretory cell systems is challenging because electric-field (EF) stimulation acts indirectly through transcription-factor activation, in the presence of delayed, nonlinear, and noisy intracellular dynamics, sparse measurements, and constrained burst-based actuation. We develop a framework for robust closed-loop endocrine regulation in electrically stimulated engineered cell factories, illustrated through extracellular thyroid hormone \(T_4\) production in engineered thyroid-like cells. The plant is modeled by a control-oriented ODE formulation combining a reduced mechanistic \(T_4\) pathway, an EF-responsive Hill module, and a linear-chain Erlang cascade representing distributed intracellular delay. On this basis, we design a sampled-data adaptive proportional-integral-derivative (PID) controller with derivative filtering, anti-windup, saturation and rate limits, and hysteretic band-locking, together with a robust adaptive extension that accounts for parameter mismatch, sensor noise and bias, actuator mismatch, delay/jitter, and exogenous rhythmic disturbance through a scenario-based risk-aware update. We provide local sampled-data input-to-state stability interpretations for both APID and RAPID, showing that, under standard local Lyapunov and bounded-disturbance conditions, the sampled tracking error is ultimately bounded by a disturbance-dependent constant. In silico experiments demonstrate sustained regulation of extracellular \(T_4\) across prescribed targets despite significant uncertainty.
Theoretical Studies of Sub-THz Active Split-Ring Resonators for Near-Field Imaging
This paper develops a theoretical framework for the design of Active Split-Ring Resonators (ASRRs). An ASRR is a Split-Ring Resonator (SRR) equipped with a tunable negative resistor, enabling both switchability and quality factor boosting and tuning. These properties make ASRRs well-suited for integration into dense arrays on silicon chips, where pixelated near-fields are generated and leveraged for high-resolution 2D imaging of samples. Such imagers pave the way for real-time, non-invasive, and low-cost imaging of human body tissue. The paper investigates ASRR coupling to host transmission lines, nonlinear effects, signal flow, and the influence of various noise sources on detection performance. Verified through simulations, these studies provide design guidelines for optimizing the Signal-to-Noise Ratio (SNR) and power consumption of a single pixel, while adhering to the constraints of a scalable array.
comment: IEEE Transactions on Circuits and Systems I: Regular Papers
Robustness Measures in Distributionally Robust Optimization
Distributionally Robust Optimization (DRO) is a worst-case approach to decision making when there is model uncertainty. It is also well known that for certain uncertainty sets, DRO is approximated by a regularized nominal problem. We show that the regularizer is not just a penalty function but the worst-case sensitivity (WCS) of the expected cost with respect to deviations from the nominal model, giving it the interpretation of a robustness measure. This has substantial consequences for robust modeling. It shows that DRO is fundamentally a tradeoff between performance and robustness, where the robustness measure is determined by the uncertainty set. The robustness measure reveals properties of a cost distribution that affect sensitivity to misspecification. This leads to a systematic approach to selecting uncertainty sets. The family of DRO solutions obtained by varying the size of the uncertainty set traces a near Pareto-optimal performance--robustness frontier that can be used to select its size. The frontier identifies problem instances where the price of robustness is high and provides insight into effective ways of redesigning the system to reduce this cost. We derive WCS for a collection of commonly used uncertainty sets, and illustrate these ideas in a number of applications.
Perceptive Humanoid Parkour: Chaining Dynamic Human Skills via Motion Matching
While recent advances in humanoid locomotion have achieved stable walking on varied terrains, capturing the agility and adaptivity of highly dynamic human motions remains an open challenge. In particular, agile parkour in complex environments demands not only low-level robustness, but also human-like motion expressiveness, long-horizon skill composition, and perception-driven decision-making. In this paper, we present Perceptive Humanoid Parkour (PHP), a modular framework that enables humanoid robots to autonomously perform long-horizon, vision-based parkour across challenging obstacle courses. Our approach first leverages motion matching, formulated as nearest-neighbor search in a feature space, to compose retargeted atomic human skills into long-horizon kinematic trajectories. This framework enables the flexible composition and smooth transition of complex skill chains while preserving the elegance and fluidity of dynamic human motions. Next, we train motion-tracking reinforcement learning (RL) expert policies for these composed motions, and distill them into a single depth-based, multi-skill student policy, using a combination of DAgger and RL. Crucially, the combination of perception and skill composition enables autonomous, context-aware decision-making: using only onboard depth sensing and a discrete 2D velocity command, the robot selects and executes whether to step over, climb onto, vault or roll off obstacles of varying geometries and heights. We validate our framework with extensive real-world experiments on a Unitree G1 humanoid robot, demonstrating highly dynamic parkour skills such as climbing tall obstacles up to 1.25m (96% robot height), as well as long-horizon multi-obstacle traversal with closed-loop adaptation to real-time obstacle perturbations.
Safe Adaptive-Sampling Control via Robust M-Step Hold Model Predictive Control
In adaptive-sampling control, the control frequency can be adjusted during task execution. Ensuring that these changes do not jeopardize the safety of the system being controlled requires attention. We introduce robust M-step hold model predictive control (MPC) to address this. Our formulation provides robust constraint satisfaction for an uncertain discrete-time system model with a fixed sampling time subject to an adaptable multi-step input hold (referred to as M-step hold). We show how to ensure recursive feasibility of the MPC utilizing M-step hold extensions of robust invariant sets, and demonstrate how to enable safe adaptive-sampling control via the online selection of M. We evaluate the utility of the robust M-step hold MPC formulation in a cruise control example.
Robotics
Task-Aware Scanning Parameter Configuration for Robotic Inspection Using Vision Language Embeddings and Hyperdimensional Computing
Robotic laser profiling is widely used for dimensional verification and surface inspection, yet measurement fidelity is often dominated by sensor configuration rather than robot motion. Industrial profilers expose multiple coupled parameters, including sampling frequency, measurement range, exposure time, receiver dynamic range, and illumination, that are still tuned by trial-and-error; mismatches can cause saturation, clipping, or missing returns that cannot be recovered downstream. We formulate instruction-conditioned sensing parameter recommendation; given a pre-scan RGB observation and a natural-language inspection instruction, infer a discrete configuration over key parameters of a robot-mounted profiler. To benchmark this problem, we develop Instruct-Obs2Param, a real-world multimodal dataset linking inspection intents and multi-view pose and illumination variation across 16 objects to canonical parameter regimes. We then propose ScanHD, a hyperdimensional computing framework that binds instruction and observation into a task-aware code and performs parameter-wise associative reasoning with compact memories, matching discrete scanner regimes while yielding stable, interpretable, low-latency decisions. On Instruct-Obs2Param, ScanHD achieves 92.7% average exact accuracy and 98.1% average Win@1 accuracy across the five parameters, with strong cross-split generalization and low-latency inference suitable for deployment, outperforming rule-based heuristics, conventional multimodal models, and multimodal large language models. This work enables autonomous, instruction-conditioned sensing configuration from task intent and scene context, eliminating manual tuning and elevating sensor configuration from a static setting to an adaptive decision variable.
comment: 20 pages, 13 figures
Evaluating Generative Models as Interactive Emergent Representations of Human-Like Collaborative Behavior
Human-AI collaboration requires AI agents to understand human behavior for effective coordination. While advances in foundation models show promising capabilities in understanding and showing human-like behavior, their application in embodied collaborative settings needs further investigation. This work examines whether embodied foundation model agents exhibit emergent collaborative behaviors indicating underlying mental models of their collaborators, which is an important aspect of effective coordination. This paper develops a 2D collaborative game environment where large language model agents and humans complete color-matching tasks requiring coordination. We define five collaborative behaviors as indicators of emergent mental model representation: perspective-taking, collaborator-aware planning, introspection, theory of mind, and clarification. An automated behavior detection system using LLM-based judges identifies these behaviors, achieving fair to substantial agreement with human annotations. Results from the automated behavior detection system show that foundation models consistently exhibit emergent collaborative behaviors without being explicitly trained to do so. These behaviors occur at varying frequencies during collaboration stages, with distinct patterns across different LLMs. A user study was also conducted to evaluate human satisfaction and perceived collaboration effectiveness, with the results indicating positive collaboration experiences. Participants appreciated the agents' task focus, plan verbalization, and initiative, while suggesting improvements in response times and human-like interactions. This work provides an experimental framework for human-AI collaboration, empirical evidence of collaborative behaviors in embodied LLM agents, a validated behavioral analysis methodology, and an assessment of collaboration effectiveness.
comment: Under review
SigLoMa: Learning Open-World Quadrupedal Loco-Manipulation from Ego-Centric Vision
Designing an open-world quadrupedal loco-manipulation system is highly challenging. Traditional reinforcement learning frameworks utilizing exteroception often suffer from extreme sample inefficiency and massive sim-to-real gaps. Furthermore, the inherent latency of visual tracking fundamentally conflicts with the high-frequency demands of precise floating-base control. Consequently, existing systems lean heavily on expensive external motion capture and off-board computation. To eliminate these dependencies, we present SigLoMa, a fully onboard, ego-centric vision-based pick-and-place framework. At the core of SigLoMa is the introduction of Sigma Points, a lightweight geometric representation for exteroception that guarantees high scalability and native sim-to-real alignment. To bridge the frequency divide between slow perception and fast control, we design an ego-centric Kalman Filter to provide robust, high-rate state estimation. On the learning front, we alleviate sample inefficiency via an Active Sampling Curriculum guided by Hint Poses, and tackle the robot's structural visual blind spots using temporal encoding coupled with simulated random-walk drift. Real-world experiments validate that, relying solely on a 5Hz (200 ms latency) open-vocabulary detector, SigLoMa successfully executes dynamic loco-manipulation across multiple tasks, achieving performance comparable to expert human teleoperation.
comment: Project website: https://11chens.github.io/SigLoMa/
SOAR: Real-Time Joint Optimization of Order Allocation and Robot Scheduling in Robotic Mobile Fulfillment Systems
Robotic Mobile Fulfillment Systems (RMFS) rely on mobile robots for automated inventory transportation, coordinating order allocation and robot scheduling to enhance warehousing efficiency. However, optimizing RMFS is challenging due to strict real-time constraints and the strong coupling of multi-phase decisions. Existing methods either decompose the problem into isolated sub-tasks to guarantee responsiveness at the cost of global optimality, or rely on computationally expensive global optimization models that are unsuitable for dynamic industrial environments. To bridge this gap, we propose SOAR, a unified Deep Reinforcement Learning framework for real-time joint optimization. SOAR transforms order allocation and robot scheduling into a unified process by utilizing soft order allocations as observations. We formulate this as an Event-Driven Markov Decision Process, enabling the agent to perform simultaneous scheduling in response to asynchronous system events. Technically, we employ a Heterogeneous Graph Transformer to encode the warehouse state and integrate phased domain knowledge. Additionally, we incorporate a reward shaping strategy to address sparse feedback in long-horizon tasks. Extensive experiments on synthetic and real-world industrial datasets, in collaboration with Geekplus, demonstrate that SOAR reduces global makespan by 7.5\% and average order completion time by 15.4\% with sub-100ms latency. Furthermore, sim-to-real deployment confirms its practical viability and significant performance gains in production environments. The code is available at https://github.com/200815147/SOAR.
comment: 13 pages, 6 figures
RoboAlign-R1: Distilled Multimodal Reward Alignment for Robot Video World Models
Existing robot video world models are typically trained with low-level objectives such as reconstruction and perceptual similarity, which are poorly aligned with the capabilities that matter most for robot decision making, including instruction following, manipulation success, and physical plausibility. They also suffer from error accumulation in long-horizon autoregressive prediction. We present RoboAlign-R1, a framework that combines reward-aligned post-training with stabilized long-horizon inference for robot video world models. We construct RobotWorldBench, a benchmark of 10,000 annotated video-instruction pairs collected from four robot data sources, and train a multimodal teacher judge, RoboAlign-Judge, to provide fine-grained six-dimensional evaluation of generated videos. We then distill the teacher into a lightweight student reward model for efficient reinforcement-learning-based post-training. To reduce long-horizon rollout drift, we further introduce Sliding Window Re-encoding (SWR), a training-free inference strategy that periodically refreshes the generation context. Under our in-domain evaluation protocol, RoboAlign-R1 improves the aggregate six-dimension score by 10.1% over the strongest baseline, including gains of 7.5% on Manipulation Accuracy and 4.6% on Instruction Following; these ranking improvements are further supported by an external VLM-based cross-check and a blinded human study. Meanwhile, SWR improves long-horizon prediction quality with only about 1% additional latency, yielding a 2.8% gain in SSIM and a 9.8% reduction in LPIPS. Together, these results show that reward-aligned post-training and stabilized long-horizon decoding improve task consistency, physical realism, and long-horizon prediction quality in robot video world models.
Say the Mission, Execute the Swarm: Agent-Enhanced LLM Reasoning in the Web-of-Drones
Large Language Models (LLMs) are increasingly explored as high-level reasoning engines for cyber-physical systems, yet their application to real-time UAV swarm management remains challenging due to heterogeneous interfaces, limited grounding, and the need for long-running closed-loop execution. This paper presents a mission-agnostic, agent-enhanced LLM framework for UAV swarm control, where users express mission objectives in natural language and the system autonomously executes them through grounded, real-time interactions. The proposed architecture combines an LLM-based Agent Core with a Model Context Protocol (MCP) gateway and a Web-of-Drones abstraction based on W3C Web of Things (WoT) standards. By exposing drones, sensors, and services as standardized WoT Things, the framework enables structured tool-based interaction, continuous state observation, and safe actuation without relying on code generation. We evaluate the framework using ArduPilot-based simulation across four swarm missions and six state-of-the-art LLMs. Results show that, despite strong reasoning abilities, current general-purpose LLMs still struggle to achieve reliable execution - even for simple swarm tasks - when operating without explicit grounding and execution support. Task-specific planning tools and runtime guardrails substantially improve robustness, while token consumption alone is not indicative of execution quality or reliability.
comment: 15 pages, 5 figures. This paper has been accepted for presentation at the 27th IEEE International Symposium on a World of Wireless, Mobile and Multimedia Networks (WoWMoM 2026)
Robust Visual SLAM for UAV Navigation in GPS-Denied and Degraded Environments: A Multi-Paradigm Evaluation and Deployment Study
Reliable localization in GPS-denied, visually degraded environments is critical for autonomous UAV opera- tions. This paper presents a systematic comparative evaluation of five V-SLAM systems ORB-SLAM3, DPVO, DROID-SLAM, DUSt3R, and MASt3R spanning classical, deep learning, recurrent, and Vision Transformer (ViT) paradigms. Experiments are conducted on curated sequences from four public benchmarks (TUM RGB-D, EuRoC MAV, UMA-VI, SubT-MRS) and a custom monocular indoor dataset under five controlled degradation conditions (normal, low light, dust haze, motion blur, and combined), with sub-millimeter Vicon ground truth. Results show that ORB-SLAM3 fails critically under severe degradation (62.4% overall TSR; 0% under dense haze), while learning-based methods remain robust: MASt3R achieves the lowest degraded ATE (0.027 m) and DUSt3R the highest tracking success (96.5%). DPVO offers the best efficiency robustness trade-off (18.6 FPS, 3.1 GB GPU memory, 86.1% TSR), making it the preferred choice for memory-constrained embedded platforms. Embedded deployment analysis across NVIDIA Jetson platforms provides actionable guidelines for SLAM selection under SWaP-constrained UAV scenarios.
comment: 24 pages
FUS3DMaps: Scalable and Accurate Open-Vocabulary Semantic Mapping by 3D Fusion of Voxel- and Instance-Level Layers
Open-vocabulary semantic mapping enables robots to spatially ground previously unseen concepts without requiring predefined class sets. Current training-free methods commonly rely on multi-view fusion of semantic embeddings into a 3D map, either at the instance-level via segmenting views and encoding image crops of segments, or by projecting image patch embeddings directly into a dense semantic map. The latter approach sidesteps segmentation and 2D-to-3D instance association by operating on full uncropped image frames, but existing methods remain limited in scalability. We present FUS3DMaps, an online dual-layer semantic mapping method that jointly maintains both dense and instance-level open-vocabulary layers within a shared voxel map. This design enables further voxel-level semantic fusion of the layer embeddings, combining the complementary strengths of both semantic mapping approaches. We find that our proposed semantic cross-layer fusion approach improves the quality of both the instance-level and dense layers, while also enabling a scalable and highly accurate instance-level map where the dense layer and cross-layer fusion are restricted to a spatial sliding window. Experiments on established 3D semantic segmentation benchmarks as well as a selection of large-scale scenes show that FUS3DMaps achieves accurate open-vocabulary semantic mapping at multi-story building scales. Additional material and code will be made available: https://githanonymous.github.io/FUS3DMaps/.
comment: This work has been submitted to the IEEE for possible publication
Sensorless State Estimation and Control for Agile Cable-Suspended Payload Transport by Quadrotors
This work proposes a novel control and estimation approach for aerial manipulation of a cable-suspended load using Unmanned Aerial Vehicles (UAVs). Common approaches in the state of the art have practical limitations, relying on direct load measurements and Lagrangian methods for dynamic modeling. The lack of a straightforward dynamic model of the system led us to propose adopting the Udwadia-Kalaba method to explicitly incorporate the cable's geometric constraints. This formulation allowed for the consistent derivation of the tension force and its direct integration into the NMPC prediction model. Additionally, we propose a sensorless load state estimation based on the same geometric constraints. Results from real-robot experiments demonstrated that the explicit inclusion of load dynamics in the optimization problem significantly reduces trajectory-tracking errors and yields better overall performance compared to strategies based on incomplete models.
comment: 8 pages, 6 figures
Feasibility-aware Hybrid Control for Motion Planning under Signal Temporal Logics
In this work, a novel method for planar task and motion planning based on hybrid modeling is proposed. By virtue of a discrete variable which models local constraint satisfaction and enables local feasibility analysis, the proposed control architecture unifies planning with control design. Concurrently, control barrier functions are designed on a transformed disk version of the original nonconvex and geometrically complex robotic workspace, thus amending the issue of deadlocks. Simulations of the proposed method indicate effective handling of multiple overlapping spatio-temporal tasks even in the face of input saturation.
Jiao: Bridging Isolation and Customization in Mixed Criticality Robotics
Consumer robotics demands consolidation of safety-critical control, perception pipelines, and user applications on shared multicore platforms. While static partitioning hypervisors provide hardware-enforced isolation, directly transplanting automotive architectures encounters an expertise asymmetry problem in which end-users modifying robot behavior lack the systems knowledge that platform developers possess. We present an architecture addressing this challenge through three integrated components. A Safe IO Cell provides hardware-level override capability. A Parameter Synchronization Service encapsulates cross-domain complexity. A Safety Communication Layer implements IEC~61508-aligned verification. Our empirical evaluation on an ARM Cortex-A55 platform demonstrates that partition isolation reduces cycle-period jitter by 84.5\% and cuts tail timing error by nearly an order of magnitude (p99 $|$jitter$|$ from 69.0\,$μ$s to 7.8\,$μ$s), eliminating all $>$50\,$μ$s~excursions.
comment: Accepted by Infocom'26 Embodied Intelligence Networks workshop
Bridging the Embodiment Gap: Disentangled Cross-Embodiment Video Editing
Learning robotic manipulation from human videos is a promising solution to the data bottleneck in robotics, but the distribution shift between humans and robots remains a critical challenge. Existing approaches often produce entangled representations, where task-relevant information is coupled with human-specific kinematics, limiting their adaptability. We propose a generative framework for cross-embodiment video editing that directly addresses this by learning explicitly disentangled task and embodiment representations. Our method factorizes a demonstration video into two orthogonal latent spaces by enforcing a dual contrastive objective: it minimizes mutual information between the spaces to ensure independence while maximizing intra-space consistency to create stable representations. A parameter-efficient adapter injects these latent codes into a frozen video diffusion model, enabling the synthesis of a coherent robot execution video from a single human demonstration, without requiring paired cross-embodiment data. Experiments show our approach generates temporally consistent and morphologically accurate robot demonstrations, offering a scalable solution to leverage internet-scale human video for robot learning.
BifrostUMI: Bridging Robot-Free Demonstrations and Humanoid Whole-Body Manipulation
High-quality data collection is a fundamental cornerstone for training humanoid whole-body visuomotor policies. Current data acquisition paradigms predominantly rely on robot teleoperation, which is often hindered by limited hardware accessibility and low operational efficiency. Inspired by the Universal Manipulation Interface (UMI), we propose BifrostUMI, a portable, efficient, and robot-free data collection framework tailored for humanoid robots. BifrostUMI leverages lightweight VR devices to capture human demonstrations as sparse keypoint trajectories while simultaneously recording wrist-mounted visual data. These multimodal data are subsequently utilized to train a high-level policy network that predicts future keypoint trajectories conditioned on the captured visual features. Through a robust keypoint retargeting pipeline, keypoint trajectories are precisely mapped onto the robot's morphology and executed via a whole-body controller. This approach enables the seamless transfer of diverse and agile behaviors from natural human demonstrations to humanoid embodiments. We demonstrate the efficacy and versatility of the proposed framework across two distinct experimental scenarios.
Learning Reactive Dexterous Grasping via Hierarchical Task-Space RL Planning and Joint-Space QP Control
In this work, we propose a hybrid hierarchical control framework for reactive dexterous grasping that explicitly decouples high-level spatial intent from low-level joint execution. We introduce a multi-agent reinforcement learning architecture, specialized into distinct arm and hand agents, that acts as a high-level planner by generating desired task-space velocity commands. These commands are then processed by a GPU-parallelized quadratic programming controller, which translates them into feasible joint velocities while strictly enforcing kinematic limits and collision avoidance. This structural isolation not only accelerates training convergence but also strictly enforces hardware safety. Furthermore, the architecture unlocks zero-shot steerability, allowing system operators to dynamically adjust safety margins and avoid dynamic obstacles without retraining the policy. We extensively validate the proposed framework through a rigorous simulation-to-reality pipeline. Real-world hardware experiments on a 7-DoF arm equipped with a 20-DoF anthropomorphic hand demonstrate highly robust zero-shot transferability for dexterous grasping to a diverse set of unseen objects, highlighting the system's ability to reactively recover from unexpected physical disturbances in unstructured environments.
comment: 18 pages
TACO: Trajectory Aligning Cross-view Optimisation
Cross-View Geo-localisation (CVGL) matches ground imagery against satellite tiles to give absolute position fixes, an alternative to GNSS where signals are occluded, jammed, or spoofed. Recent fine-grained CVGL methods regress sub-tile metric pose, but have only been evaluated as one-shot localisers, never as the primary fix in a live pipeline. Inertial sensing provides high-rate relative motion, but accumulates unbounded drift without an absolute anchor. We propose TACO, a tightly-coupled IMU + fine-grained CVGL pipeline that consumes a single GNSS reading at start-up and thereafter operates on onboard sensing alone. A closed-form cross-track error model triggers CVGL before IMU drift exceeds the matcher's capture radius, and a forward-biased five-point multi-crop search keeps inference cost fixed at five forward passes per fix. A yaw-residual gate rejects fixes that disagree with the onboard compass, and an anisotropic body-frame noise model scales each Unscented Kalman Filter update by per-fix confidence. A factor graph with vetted loop closures provides an offline smoothed trajectory. On the KITTI raw dataset, TACO reduces median Absolute Trajectory Error (ATE) from 97.0m (IMU-only) to 16.3m, a 5.9 times reduction, at <0.1 ms per-frame fusion cost and a 5-10% camera duty cycle. Code is available: github.com/tavisshore/TACO.
Height Control and Optimal Torque Planning for Jumping With Wheeled-Bipedal Robots
This paper mainly studies the accurate height jumping control of wheeled-bipedal robots based on torque planning and energy consumption optimization. Due to the characteristics of underactuated, nonlinear estimation, and instantaneous impact in the jumping process, accurate control of the wheeled-bipedal robot's jumping height is complicated. In reality, robots often jump at excessive height to ensure safety, causing additional motor loss, greater ground reaction force and more energy consumption. To solve this problem, a novel wheeled-bipedal jumping dynamical model(W-JBD) is proposed to achieve accurate height control. It performs well but not suitable for the real robot because the torque has a striking step. Therefore, the Bayesian optimization for torque planning method(BOTP) is proposed, which can obtain the optimal torque planning without accurate dynamic model and within few iterations. BOTP method can reduce 82.3% height error, 26.9% energy cost with continuous torque curve. This result is validated in the Webots simulation platform. Based on the torque curve obtained in the W-JBD model to narrow the searching space, BOTP can quickly converge (40 times on average). Cooperating W-JBD model and BOTP method, it is possible to achieve the height control of real robots with reasonable times of experiments.
comment: 6 pages, 16 figures. Accepted for publication at ICARM 2021
On Surprising Effects of Risk-Aware Domain Randomization for Contact-Rich Sampling-based Predictive Control
Domain randomization (DR) is widely used in policy learning to improve robustness to modeling error, but remains underexplored in contact-rich sampling-based predictive control (SPC), where rollout quality is highly sensitive to uncertainty. In this work, we take the first step by studying risk-aware DR in predictive sampling on a simple yet representative Push-T task, comparing average, optimistic, and pessimistic rollout aggregations under randomized model instances. Our initial results suggest that DR affects not only robustness to model error, but also the effective cost landscape seen by the sampling-based optimizer, by reshaping the basin of attraction around contact-producing actions. This opens up potential for exploring better grounded risk-aware contact-rich SPC under model uncertainty. Video: https://youtu.be/f1F0ALXxhSM
comment: 5 pages, 3 figures
Neural Control: Adjoint Learning Through Equilibrium Constraints
Many physical AI tasks are governed by implicit equilibrium: an agent actuates a subset of degrees of freedom (boundary DoFs), while the remaining free DoFs settle by minimizing a total potential energy. Even seemingly basic tasks such as bending a deformable linear object (DLO) to a target shape can exhibit strongly nonlinear behavior due to multi-stability: the same boundary conditions may yield multiple equilibrium shapes depending on the actuation trajectory. However, learning and control in such systems is brittle because the actuation-to-configuration map is defined only implicitly, and naive backpropagation through iterative equilibrium solvers is memory- and compute-intensive. We propose Neural Control, a boundary-control framework that computes trajectory-dependent, memory-efficient proxy gradients by differentiating equilibrium conditions via an adjoint formulation, avoiding unrolling of solver iterations. To improve robustness over long horizons, we integrate these sensitivities into a receding-horizon MPC scheme that repeatedly re-anchors optimization to realized equilibria and mitigates basin-switching in multi-stable regimes. We evaluate Neural Control in simulation and on physical robots manipulating DLOs, and show improved performance over gradient-free baselines such as SPSA and CEM.
RLDX-1 Technical Report
While Vision-Language-Action models (VLAs) have shown remarkable progress toward human-like generalist robotic policies through the versatile intelligence (i.e. broad scene understanding and language-conditioned generalization) inherited from pre-trained Vision-Language Models, they still struggle with complex real-world tasks requiring broader functional capabilities (e.g. motion awareness, memory-aware decision making, and physical sensing). To address this, we introduce RLDX-1, a general-purpose robotic policy for dexterous manipulation built on the Multi-Stream Action Transformer (MSAT), an architecture that unifies these capabilities by integrating heterogeneous modalities through modality-specific streams with cross-modal joint self-attention. RLDX-1 further combines this architecture with system-level design choices, including synthesizing training data for rare manipulation scenarios, learning procedures specialized for human-like manipulation, and inference optimizations for real-time deployment. Through empirical evaluation, we show that RLDX-1 consistently outperforms recent frontier VLAs (e.g. $π_{0.5}$ and GR00T N1.6) across both simulation benchmarks and real-world tasks that require broad functional capabilities beyond general versatility. In particular, RLDX-1 shows superiority in ALLEX humanoid tasks by achieving success rates of 86.8% while $π_{0.5}$ and GR00T N1.6 achieve around 40%, highlighting the ability of RLDX-1 to control a high-DoF humanoid robot under diverse functional demands. Together, these results position RLDX-1 as a promising step toward reliable VLAs for complex, contact-rich, and dynamic real-world dexterous manipulation.
comment: Project page: https://rlwrld.ai/rldx-1
Robust Path Tracking for Vehicles via Continuous-Time Residual Learning: An ICODE-MPPI Approach
Model Predictive Path Integral (MPPI) control is a powerful sampling-based strategy for nonlinear autonomous systems. However, its performance is often bottlenecked by the fidelity of nominal dynamics. We propose ICODE-MPPI, a robust framework that leverages Input Concomitant Neural Ordinary Differential Equations (ICODEs) to learn and compensate for unmodeled residual dynamics. Unlike discrete-time learners, ICODEs maintain physical consistency and temporal continuity during the MPPI prediction horizon. High-fidelity simulations on complex trajectories demonstrate that ICODE-MPPI achieves up to a 69\% reduction in cross-tracking error under persistent disturbances compared to standard MPPI control. Furthermore, our analysis confirms that ICODE-MPPI significantly suppresses control chattering, yielding smoother steering commands and superior robust performance.
From Language to Logic: A Theoretical Architecture for VLM-Grounded Safe Navigation
We propose an architecture for integrating high-level, human-provided safety rules and operator-aligned semantic preferences into autonomous robot navigation in unstructured outdoor environments. In our approach, natural-language rules are translated into Signal Temporal Logic (STL) specifications that guide planning and navigation during runtime. Persistent, environment-centric rules and terrain preferences are grounded into a 2D cost map, while temporally dynamic requirements are expressed as STL specifications to be monitored during runtime. We hypothesize the use of Vision-Language Models (VLMs) for zero-shot scene understanding, enabling mapping between human instructions, semantic features, and environmental constraints. Within this framework, we construct an illustrative navigation model that is designed to satisfy a set of STL-encoded specifications and soft operator preferences through formal satisfaction metrics embedded into environmental properties and runtime monitoring.
comment: 8 pages, 3 figures, to be published in ICUAS 2026 conference proceedings
Beyond Fixed Thresholds and Domain-Specific Benchmarks for Explainable Multi-Task Classification in Autonomous Vehicles
Scene understanding is a vital part of autonomous driving systems, which requires the use of deep learning models. Deep learning methods are intrinsically black box models, which lack transparency and safety in autonomous driving. To make these systems transparent, multi-task visual understanding has become crucial for explainable autonomous driving perception systems, where simultaneous prediction of multiple driving behaviors and their underlying explanations is essential for safe navigation and human trust in autonomous vehicles. In order to design an accurate and cross-cultural explainable autonomous driving system, we introduce a comprehensive confidence threshold sensitivity analysis that evaluates various threshold values to identify optimal decision boundaries for different tasks. Our analysis demonstrates that traditional fixed threshold approaches are suboptimal for multi-task scenarios. Through extensive evaluation, we demonstrate that our adaptive threshold selection methodology improves F1-scores across different tasks. In addition, we introduce IUST-XAI-AD, a novel dataset consisting of 958 images with human annotations for driving decisions and corresponding reasoning. This dataset addresses the critical gap in domain-specific evaluation benchmarks for distinct driving contexts and provides a more challenging test environment compared to existing datasets. Experimental results demonstrate that confidence threshold sensitivity analysis can significantly improve model performance, while the introduction of the IUST-XAI-AD dataset reveals important insights about cross-cultural driving behavior patterns. The combined contributions of this work provide both methodological advances and practical evaluation tools that can accelerate the development of more reliable, explainable, and culturally-adaptive autonomous driving systems for global deployment.
OPENJ: A Conceptual Framework for Open-Source Digital Human Modeling and Ergonomic Assessment in a CAD Environment
Industrial workplace challenges range from musculoskeletal disorders -- a leading cause of occupational injury -- to suboptimal workstation layouts, inefficient task sequences, and poor human-equipment fit. Digital human modeling (DHM) tools address several of these challenges by placing a scalable virtual mannequin in a computer-aided design (CAD) environment, enabling engineers to evaluate ergonomic risk through standardized assessment methods (RULA, REBA, NIOSH Lifting Equation, OWAS), optimize workstation layouts for reach and visibility, predict task postures through inverse kinematics, and simulate operations before physical implementation. Despite four decades of development since the Jack system originated at the University of Pennsylvania in the 1980s, the integrated DHM capability set -- anthropometric mannequin, posture prediction, ergonomic assessment, and CAD integration -- remains exclusive to commercial platforms such as Siemens Tecnomatix Jack (Process Simulate), Dassault DELMIA, Humanetics RAMSIS, and the University of Iowa's Santos system. These platforms operate under proprietary, vendor-quoted pricing models, and their acquisition and operating costs, together with closed-source implementations, have been repeatedly identified as practical adoption barriers for individual researchers, small-to-medium enterprises, and educational institutions. Organizations without access resort to manual observational methods -- paper-based worksheets applied to photographs or video -- sacrificing the predictive power and reproducibility that computational analysis provides. The paper serves as a design blueprint for (OpenJane/Joe), positioning the project for subsequent open-source implementation and community adoption.
comment: 11 pages, 2 figures, submitted to ASME IMECE 2026
Globally Solving Unbalanced Optimal Transport and Density Control for Gaussian Distributions
In this article, we study unbalanced optimal transport (UOT) and establish a control-theoretic dynamical extension, which we call the unbalanced density control (UDC), for a class of Gaussian reference measures. In the static setting, we consider UOT with quadratic transport cost and Kullback--Leibler penalties on the marginals relative to prescribed Gaussian measures. We show that the infinite-dimensional variational problem admits an exact Gaussian reduction, yielding a finite-dimensional optimization over masses, means, and covariances, together with a closed-form expression for the optimal transported mass. We then formulate UDC for discrete-time linear systems, where the initial and terminal state measures are imposed softly through KL penalties and the intermediate evolution is governed by controlled linear dynamics with quadratic control cost. For this problem, we prove that any feasible solution can be replaced, without loss of optimality, by a Gaussian initial measure and an affine-Gaussian control policy. This leads to an exact finite-dimensional reformulation and, after a standard covariance-steering lifting, to an SDP-based optimization for fixed mass, again coupled with a closed-form mass update. We further establish existence of optimal solutions and identify a sufficient condition under which the affine-Gaussian UDC policy is deterministic. These results provide globally optimal solution methods for both Gaussian UOT and Gaussian UDC. Finally, we illustrate our results with several numerical examples.
comment: 28 pages; submitted to a journal
ipc_shared_ptr: A Publish/Subscribe-Aware Smart Pointer for Cross-Process Object Lifetime Management
True zero-copy Inter-Process Communication (IPC) in publish/subscribe (pub/sub) middleware such as Robot Operating System 2 (ROS 2) requires subscribers to reference message objects in publisher-owned shared memory. Objects must not be reclaimed while referenced, yet must eventually be reclaimed, with correct handling of crash recovery and Transient Local QoS retention requirements. We propose ipc_shared_ptr, a pub/sub-aware smart pointer for cross-process message lifetime management. ipc_shared_ptr exploits pub/sub structural properties to specialize Birrell's reference listing, limiting global metadata updates to per-subscriber 0<->1 transitions and achieving an order-of-magnitude reduction in global communication over general-purpose distributed reference counting. We analyze the key metadata management tradeoff: scalability versus implementation simplicity. Owner-driven reclaim offers greater scalability, but concurrent membership changes and reclamation decisions produce races that widen the correctness-verification state space. Single-writer achieves structural atomicity, eliminating this complexity at the cost of a centralized bottleneck. iceoryx2 (owner-driven reclaim) and Agnocast -- a true zero-copy ROS 2 IPC middleware sharing the publisher's heap with subscribers and adopting ipc_shared_ptr with single-writer -- embody each architecture. Comparative evaluation at the scale of Autoware -- the largest open-source ROS 2 application -- confirms that single-writer achieves sufficient scalability: at 200 topics, two subscribers per topic and 100 Hz, Agnocast's E2E p99.9 is 2.9x lower than iceoryx2's, justifying implementation simplicity over owner-driven reclaim.
comment: Accepted for publication in the 2026 IEEE 29th International Symposium on Real-Time Distributed Computing (ISORC); 10 pages, 8 figures
ARMATA: Auto-Regressive Multi-Agent Task Assignment
Coordinating multi-agent systems over spatially distributed areas requires solving a complex hierarchical problem: first distributing areas among agents (allocation) and subsequently determining the optimal visitation order (routing). Existing methods typically decouple these stages ignoring inter-stage dependencies or rely on decentralized heuristics that lack global context. In this work, we propose a centralized, fully end-to-end auto-regressive framework that jointly generates allocation decisions and routing sequences. The core contribution of our approach is a multi-stage decoding mechanism that unifies high-level allocation and low-level routing in a single autoregressive pass while maintaining a centralized global state. This enables the model to implicitly balance workload distribution with routing efficiency, avoiding local optima common in decentralized methods. Extensive experiments demonstrate that our method significantly outperforms diverse baselines, achieving up to a 20\% improvement in solution quality over industrial solvers such as Google OR-Tools, IBM CPLEX, and LKH-3, while reducing computation time from hours to seconds.
Safety by Invariance, Liveness through Refinement: Heterogeneous Contract Framework for Co-Design of Layered Control
Real-world control systems must achieve long-horizon objectives (liveness) while respecting continuous-time safety constraints, a combination that motivates hierarchical layered control architectures (LCAs). Existing LCA research, however, lacks (i) a uniform specification language across discrete planning and continuous execution, (ii) formal guarantees that specifications are preserved when interconnecting subsystems at heterogeneous time scales, and (iii) compositional separation between layers, owing to reliance on naive input-filtering laws. This paper addresses all three gaps by importing the safety--liveness decomposition into a heterogeneous assume--guarantee framework: \emph{safety is enforced by invariance} at the continuous-time layer, while \emph{liveness is achieved through refinement} at the discrete-time layer, with inter-layer coordination formalized via vertical refinement and timing-compatibility conditions. We instantiate this contract with a novel LCA combining an MPC planner, an input-to-state stabilizing (ISS) low-level controller, and a reference-governor bridge, and validate it on a Hybrid Energy Storage System (HESS) comprising a battery and a supercapacitor.
comment: 22 pages
Constraint-Enhanced Reinforcement Learning Based on Dynamic Decoupled Spherical Radial Squashing
When deploying reinforcement learning policies to physical robots, actuator rate constraints -- hard limits on how fast each joint can move per control step -- are unavoidable. These limits vary substantially across joints due to differences in motor inertia, power bandwidth, and transmission stiffness, creating pronounced heterogeneity that existing methods fail to handle geometrically: the per-joint feasible region forms a high-dimensional box in action-increment space, yet QP projection and spherical parameterization methods impose isotropic ball-shaped constraints, exponentially under-covering the true feasible set as heterogeneity grows. This paper proposes Dynamic Decoupled Spherical Radial Squashing (DD-SRad), which resolves this mismatch by computing a position-adaptive radius independently for each actuator, achieving tight alignment with the true per-joint feasible region. DD-SRad satisfies per-step hard constraints with probability~1, preserves well-conditioned gradients throughout training, and admits exact policy gradient backpropagation with zero runtime solver overhead. MuJoCo benchmark experiments demonstrate the highest task return at zero constraint violation -- matching the unconstrained upper bound -- with 30%--50% improvement in constraint-space coverage over spherical baselines. High-fidelity IsaacLab simulations with Unitree H1 and G1 humanoid robots confirm end-to-end optimality parameterized directly from official joint specifications, validating a systematic pathway from hardware datasheets to safe deployment.
comment: 27 pages, 60 figures
VOFA: Visual Object Goal Pushing with Force-Adaptive Control for Humanoids
The ability to push large objects in a goal-directed manner using onboard egocentric perception is an essential skill for humanoid robots to perform complex tasks such as material handling in warehouses. To robustly manipulate heavy objects to arbitrary goal configurations, the robot must cope with unknown object mass and ground friction, noisy onboard perception, and actuation errors; all in a real-time feedback loop. Existing solutions either rely on privileged object-state information without onboard perception or lack robustness to variations in goal configurations and object physical properties. In this work, we present VOFA, a visual goal-conditioned humanoid loco-manipulation system capable of pushing objects with unknown physical properties to arbitrary goal positions. VOFA consists of a two-level hierarchical architecture with a high-level visuomotor policy and a low-level force-adaptive whole-body controller. The high-level policy processes noisy onboard observations and generates goal-conditioned commands to operate in closed loop across diverse object-goal configurations, while the low-level whole-body controller provides robustness to variations in object physical properties. VOFA is extensively evaluated in both simulation and real-world experiments on the Booster T1 humanoid robot. Our results demonstrate strong performance, achieving over 90% success in simulation and over 80% success in real-world trials. Moreover, VOFA successfully pushes objects weighing up to 17kg, exceeding half of the Booster T1's body weight.
Kinematic Kitbashing
We introduce Kinematic Kitbashing, an optimization framework that synthesizes articulated 3D objects by assembling reusable parts conditioned on an abstract kinematic graph. Given the graph and a library of articulated parts, our method optimizes per-part similarity transformations that place, orient, and scale each component into a coherent articulated object; optional graph edits further enable novel assemblies beyond the prescribed connectivity. Central to our method is an exemplar-based analogy for part placement: each reused component is paired with a single source asset that exemplifies how it attaches to its parent. We capture this attachment context using vector distance fields and measure consistency by integrating the matching error over the joint's full motion range. This yields a kinematics-aware attachment energy that favors placements that preserve the exemplar's local attachment neighborhood throughout articulation. To incorporate task-level functionality, we use this attachment energy as a prior in an annealed Langevin sampling framework, enabling gradient-free optimization of black-box functionality objectives. We demonstrate the versatility of kinematic kitbashing across diverse applications, including instantiating kinematic graphs from user-selected or automatically retrieved parts, synthesizing assemblies with user-defined functionality, and re-targeting articulations via graph edits.
Hi-WM: Human-in-the-World-Model for Scalable Robot Post-Training
Post-training is essential for turning pretrained generalist robot policies into reliable task-specific controllers, but existing human-in-the-loop pipelines remain tied to physical execution: each correction requires robot time, scene setup, resets, and operator supervision in the real world. Meanwhile, action-conditioned world models have been studied mainly for imagination, synthetic data generation, and policy evaluation. We propose \textbf{Human-in-the-World-Model (Hi-WM)}, a post-training framework that uses a learned world model as a reusable corrective substrate for failure-targeted policy improvement. A policy is first rolled out in closed loop inside the world model; when the rollout becomes incorrect or failure-prone, a human intervenes directly in the model to provide short corrective actions. Hi-WM caches intermediate states and supports rollback and branching, allowing a single failure state to be reused for multiple corrective continuations and yielding dense supervision around behaviors that the base policy handles poorly. The resulting corrective trajectories are then added back to the training set for post-training. We evaluate Hi-WM on three real-world manipulation tasks spanning both rigid and deformable object interaction, and on two policy backbones. Hi-WM improves real-world success by 37.9 points on average over the base policy and by 19.0 points over a world-model closed-loop baseline, while world-model evaluation correlates strongly with real-world performance (r = 0.953). These results suggest that world models can serve not only as generators or evaluators, but also as effective corrective substrates for scalable robot post-training.
comment: Project Page: https://hi-wm.github.io/
Fisher Decorator: Refining Flow Policy via a Local Transport Map
Recent advances in flow-based offline reinforcement learning (RL) have achieved strong performance by parameterizing policies via flow matching. However, they still face critical trade-offs among expressiveness, optimality, and efficiency. In particular, existing flow policies interpret the $L_2$ regularization as an upper bound of the 2-Wasserstein distance ($W_2$), which can be problematic in offline settings. This issue stems from a fundamental geometric mismatch: the behavioral policy manifold is inherently anisotropic, whereas the $L_2$ (or upper bound of $W_2$) regularization is isotropic and density-insensitive, leading to systematically misaligned optimization directions. To address this, we revisit offline RL from a geometric perspective and show that policy refinement can be formulated as a local transport map: an initial flow policy augmented by a residual displacement. By analyzing the induced density transformation, we derive a local quadratic approximation of the KL-constrained objective governed by the Fisher information matrix, enabling a tractable anisotropic optimization formulation. By leveraging the score function embedded in the flow velocity, we obtain a corresponding quadratic constraint for efficient optimization. Our results reveal that the optimality gap in prior methods arises from their isotropic approximation. In contrast, our framework achieves a controllable approximation error within a provable neighborhood of the optimal solution. Extensive experiments demonstrate state-of-the-art performance across diverse offline RL benchmarks. See project page: https://github.com/ARC0127/Fisher-Decorator.
FingerViP: Learning Real-World Dexterous Manipulation with Fingertip Visual Perception
The current practice of dexterous manipulation generally relies on a single wrist-mounted view, which is often occluded and limits performance on tasks requiring multi-view perception. In this work, we present FingerViP, a learning system that utilizes a visuomotor policy with fingertip visual perception for dexterous manipulation. Specifically, we design a vision-enhanced fingertip module with an embedded miniature camera and install the modules on each finger of a multi-fingered hand. The fingertip cameras substantially improve visual perception by providing comprehensive, multi-view feedback of both the hand and its surrounding environment. Building on the integrated fingertip modules, we develop a diffusion-based whole-body visuomotor policy conditioned on a third-view camera and multi-view fingertip vision, which effectively learns complex manipulation skills directly from human demonstrations. To improve view-proprioception alignment and contact awareness, each fingertip visual feature is augmented with its corresponding camera pose encoding and per-finger joint-current encoding. We validate the effectiveness of the multi-view fingertip vision and demonstrate the robustness and adaptability of FingerViP on various challenging real-world tasks, including pressing buttons inside a confined box, retrieving sticks from an unstable support, retrieving objects behind an occluding curtain, and performing long-horizon cabinet opening and object retrieval, achieving an overall success rate of 80.8%. All hardware designs and code will be fully open-sourced.
comment: 12 pages, 6 figures
Safety-critical Control Under Partial Observability: Reach-Avoid POMDP meets Belief Space Control
Partially Observable Markov Decision Processes (POMDPs) provide a principled framework for robot decision-making under uncertainty. Solving reach-avoid POMDPs, however, requires coordinating three distinct behaviors: goal reaching, safety, and active information gathering to reduce uncertainty. Existing online POMDP solvers attempt to address all three within a single belief tree search, but this unified approach struggles with the conflicting time scales inherent to these objectives. We propose a layered, certificate-based control architecture that operates directly in belief space, decoupling goal reaching, information gathering, and safety into modular components. We introduce Belief Control Lyapunov Functions (BCLFs) that formalize information gathering as a Lyapunov convergence problem in belief space, and show how they can be learned via reinforcement learning. For safety, we develop Belief Control Barrier Functions (BCBFs) that leverage conformal prediction to provide probabilistic safety guarantees over finite horizons. The resulting control synthesis reduces to lightweight quadratic programs solvable in real time, even for non-Gaussian belief representations with dimension $>10^4$. Experiments in simulation and on a space-robotics platform demonstrate real-time performance and improved safety and task success compared to state-of-the-art constrained POMDP solvers.
Toggling stiffness via multistability
Variable stiffness is a key capability in biological and robotic systems, enabling adaptive interaction across tasks and environments. Mechanical metamaterials offer an alternative to conventional mechatronic solutions by encoding stiffness variation directly into monolithic structural architectures, reducing the need for discrete assemblies. Here, we introduce a multistable mechanical metamaterial that exhibits a toggleable stiffness effect in which the effective shear stiffness switches discretely between stable mechanical configurations. Mechanical analysis of surrogate beam models of the unit cell reveals that this behavior originates from the rotation transmitted by the support beams to the curved beam, governing the balance between bending and axial deformation. Consequently, the shear stiffness ratio between the two states can be tuned by varying the slenderness of the support beams or by incorporating localized hinges that modulate rotational transfer. Experiments on 3D-printed prototypes validate the numerical predictions and confirm consistent stiffness toggling across different geometries. Finally, we demonstrate a monolithic soft clutch that leverages this effect to achieve programmable, stepwise stiffness modulation. This work establishes a design strategy for toggleable stiffness using multistable metamaterials, with potential applications in soft robotics and smart structures where adaptive compliance is of paramount importance.
Atomic-Probe Governance for Skill Updates in Compositional Robot Policies
Skill libraries in deployed robotic systems are continually updated through fine-tuning, fresh demonstrations, or domain adaptation, yet existing typed-composition methods (BLADE, SymSkill, Generative Skill Chaining) treat the library as frozen at test time and do not analyze how composition outcomes change when a skill is replaced. We introduce a paired-sampling cross-version swap protocol on robosuite manipulation tasks to characterize this dimension of compositional skill learning. On a dual-arm peg-in-hole task we discover a dominant-skill effect: one ECM achieves 86.7% atomic success rate while every other ECM is at or below 26.7%, and whether this dominant ECM enters a composition shifts the success rate by up to +50pp. We characterize the boundary on a simpler pick task where all atomic policies saturate at 100% and the effect is undefined. Across three tasks we further find that off-policy behavioral distance metrics fail to identify the dominant ECM, ruling out the natural cheap predictor. We propose an atomic-quality probe and a Hybrid Selector combining per-skill probes (zero per-decision cost) with selective composition revalidation (full cost), and characterize its Pareto frontier on 144 skill-update decisions. On T6 the atomic-only probe sits 23pp below full revalidation (64.6% vs 87.5% oracle match) at zero per-decision cost; a Hybrid Selector with m=10 closes most of that gap to ~12pp at 46% of full-revalidation cost. On the cross-task average over 144 events, atomic-only is within 3pp of full revalidation under a mixed-oracle caveat. The atomic-quality probe is, to our knowledge, the first principled, deployment-ready primitive for skill-update governance in compositional robot policies.
comment: 8 pages main text + appendix; 3 figures, 12 tables;
AEROS: A Single-Agent Operating Architecture with Embodied Capability Modules
Robotic systems lack a principled abstraction for organizing intelligence, capabilities, and execution in a unified manner. Existing approaches either couple skills within monolithic architectures or decompose functionality into loosely coordinated modules or multiple agents, often without a coherent model of identity and control authority. We argue that a robot should be modeled as a single persistent intelligent subject whose capabilities are extended through installable packages. We formalize this view as AEROS (Agent Execution Runtime Operating System), in which each robot corresponds to one persistent agent and capabilities are provided through Embodied Capability Modules (ECMs). Each ECM encapsulates executable skills, models, and tools, while execution constraints and safety guarantees are enforced by a policy-separated runtime. This separation enables modular extensibility, composable capability execution, and consistent system-level safety. We evaluate a reference implementation in PyBullet simulation with a Franka Panda 7-DOF manipulator across eight experiments covering re-planning, failure recovery, policy enforcement, baseline comparison, cross-task generality, ECM hot-swapping, ablation, and failure boundary analysis. Over 100 randomized trials per condition, AEROS achieves 100% task success across three tasks versus baselines (BehaviorTree.CPP-style and ProgPrompt-style at 92--93%, flat pipeline at 67--73%), the policy layer blocks all invalid actions with zero false acceptances, runtime benefits generalize across tasks without task-specific tuning, and ECMs load at runtime with 100% post-swap success.
comment: Submitted to Engineering Applications of Artificial Intelligence (EAAI). 48 pages, 5 figures, 9 tables
A Three-Stage Offline SDRE-Based Control Framework for Human Motion Reproduction on a Suspended Bipedal Robot
During the development of wearable exoskeletons, evaluations involving human subjects pose inherent safety risks. Therefore, systematic testing is often conducted using robots that emulate human motion. However, reproducing human movements is challenging due to differences in robot structure and actuator characteristics. This study proposes a three-stage offline control strategy that uses motion-capture data and robot-specific properties to generate control commands for accurate motion replication. First, an optimal torque trajectory is generated via a State-Dependent Riccati Equation (SDRE) controller based on the dynamic model of the bipedal system. Second, joint velocity and acceleration command sequences are synthesized through parameterized optimization under actuator constraints. Finally, a data-driven PID-LQR offline controller refines these commands by minimizing the tracking error between the desired and executed motions. Experimental validation is performed on a suspended bipedal robot platform designed for the evaluation of gravity-counteracting exoskeletons. Motion-capture data collected from squatting and walking tasks are used for system assessment. The experimental results demonstrate high tracking fidelity, with an average root mean square error (RMSE) below 3 degrees. These results verify the effectiveness of the proposed three-stage control strategy for robot-based systematic testing of exoskeletons.
comment: 14 pages, 10 figures. Preliminary version submitted for documentation purposes on arXiv. This version records results presented at a conference and is not peer-reviewed
AhaRobot: A Low-Cost Open-Source Bimanual Mobile Manipulator for Embodied AI
Scaling Vision-Language-Action models for embodied manipulation demands large volumes of diverse manipulation data, yet the high cost of commercial mobile manipulators and teleoperation interfaces that are difficult to deploy at scale remain key bottlenecks. We present AhaRobot, a low-cost, fully open-source bimanual mobile manipulator tailored for Embodied-AI. The system contributes: (1) a SCARA-like dual-arm hardware design that reduces motor torque demands while maintaining a large vertical reachable workspace, (2) an optimized control stack that improves precision via dual-motor backlash mitigation and static-friction compensation through dithering, and (3) RoboPilot, a teleoperation interface featuring a novel 26-faced marker handle for precise, long-horizon remote data collection. Experimental results show that our hardware-control co-design achieves 0.7 mm repeatability at a total hardware cost of only $1,000. The proposed 26-faced handle reduces tracking error by 80% over a 6-faced baseline and improves data-collection efficiency by 30%, while robustly handling singularities and supporting extremely long-horizon tasks in fully remote settings. Despite its low cost, AhaRobot enables imitation learning of complex household behaviors involving bimanual coordination, upper-body mobility, and contact-rich interaction, with data quality comparable to VR-based collection. All software, CAD files, and documentation are available at https://aha-robot.github.io.
comment: The first two authors contributed equally. Website: https://aha-robot.github.io
Can Explicit Physical Feasibility Benefit VLA Learning? An Empirical Study
Vision-Language-Action (VLA) models map multimodal inputs directly to robot actions and are typically trained through large-scale imitation learning. While this paradigm has shown strong performance, prevailing VLA training procedures do not explicitly supervise hard physical constraints such as obstacle avoidance or kinematic feasibility. As a result, the geometric structure underlying physically feasible behavior must be inferred only implicitly from demonstrations. In this paper, we study whether introducing explicit feasibility supervision can provide effective structured guidance for VLA policies. We formulate a simple geometry-grounded feasibility objective and integrate it into the training stage of a diffusion-based VLA policy. To evaluate this idea systematically, we use obstacle-aware manipulation as a controlled probe of geometry-dependent physical feasibility. Empirical results show that augmenting VLA training with feasibility supervision improves both physical reliability and overall task performance, while also enhancing learning efficiency in the low-data regime. These findings indicate that explicit feasibility signals can effectively complement imitation-based VLA learning, highlighting their potential for developing more reliable VLA policies.
comment: 8 pages, 5 figures. This work has been submitted to the IEEE for possible publication
Learning to Forget -- Hierarchical Episodic Memory for Lifelong Robot Deployment
Robots must verbalize their past experiences when users ask "Where did you put my keys?" or "Why did the task fail?" Yet maintaining life-long episodic memory (EM) from continuous multimodal perception quickly exceeds storage limits and makes real-time query impractical, calling for selective forgetting that adapts to users' notions of relevance. We present H$^2$-EMV, a framework enabling humanoids to learn what to remember through user interaction. Our approach incrementally constructs hierarchical EM, selectively forgets using language-model-based relevance estimation conditioned on learned natural-language rules, and updates these rules given user feedback about forgotten details. Evaluations on simulated household tasks and 20.5-hour-long real-world recordings from ARMAR-7 demonstrate that H$^2$-EMV maintains question-answering accuracy while reducing memory size by 45% and query-time compute by 35%. Critically, performance improves over time - accuracy increases 70% in second-round queries by adapting to user-specific priorities - demonstrating that learned forgetting enables scalable, personalized EM for long-term human-robot collaboration.
K2MUSE: A human lower-limb multimodal walking dataset spanning task and acquisition variability for rehabilitation robotics
The natural interaction and control performance of lower limb rehabilitation robots are closely linked to biomechanical information from various human locomotion activities. Multidimensional human motion data significantly deepen the understanding of the complex mechanisms governing neuromuscular alterations, thereby facilitating the development and application of rehabilitation robots in multifaceted real-world environments. However, existing lower limb datasets are inadequate for supplying the essential multimodal data and large-scale gait samples necessary for the development of effective data-driven approaches, and the significant effects of acquisition interference in real applications are neglected. To fill this gap, we present the K2MUSE dataset, which includes a comprehensive collection of multimodal data, comprising kinematic, kinetic, amplitude mode ultrasound (AUS), and surface electromyography (sEMG) measurements. The proposed dataset includes lower-limb multimodal data collected from two cohorts, including 30 able-bodied young adults and 12 older adults, across different inclines (0$^\circ$, $\pm$5$^\circ$, and $\pm$10$^\circ$), speeds (0.5 m/s, 1.0 m/s, and 1.5 m/s), and representative non-ideal acquisition conditions (muscle fatigue, electrode shifts, and interday differences). The kinematic and ground reaction force data were collected with a Vicon motion capture system and an instrumented treadmill with embedded force plates, whereas the sEMG and AUS data of thirteen muscles on the bilateral lower limbs were synchronously recorded. K2MUSE is released with the corresponding structured documentation, preprocessing pipelines, and example code, thereby providing a comprehensive resource for rehabilitation robot development, biomechanical analysis, and wearable sensing research. The dataset is available at https://k2muse.github.io/.
comment: Accepted manuscript corresponding to the IJRR Version of Record. 34 pages, 30 figures, 7 tables
OmniUMI: Towards Physically Grounded Robot Learning via Human-Aligned Multimodal Interaction
UMI-style interfaces enable scalable robot learning, but existing systems remain largely visuomotor, relying primarily on RGB observations and trajectory while providing only limited access to physical interaction signals. This becomes a fundamental limitation in contact-rich manipulation, where success depends on contact dynamics such as tactile interaction, internal grasping force, and external interaction wrench that are difficult to infer from vision alone. We present OmniUMI, a unified framework for physically grounded robot learning via human-aligned multimodal interaction. OmniUMI synchronously captures RGB, depth, trajectory, tactile sensing, internal grasping force, and external interaction wrench within a compact handheld system, while maintaining collection--deployment consistency through a shared embodiment design. To support human-aligned demonstration, OmniUMI enables natural perception and modulation of internal grasping force, external interaction wrench, and tactile interaction through bilateral gripper feedback and the handheld embodiment. Built on this interface, we extend diffusion policy with visual, tactile, and force-related observations, and deploy the learned policy through impedance-based execution for unified regulation of motion and contact behavior. Experiments demonstrate reliable sensing and strong downstream performance on force-sensitive pick-and-place, interactive surface erasing, and tactile-informed selective release. Overall, OmniUMI combines physically grounded multimodal data acquisition with human-aligned interaction, providing a scalable foundation for learning contact-rich manipulation.
PROBE: Probabilistic Occupancy BEV Encoding with Analytical Translation Robustness for 3D Place Recognition
We present PROBE (PRobabilistic Occupancy BEV Encoding), a learning-free LiDAR place recognition descriptor that models each BEV cell's occupancy as a Bernoulli random variable. Rather than relying on discrete point-cloud perturbations, PROBE analytically marginalizes over continuous Cartesian translations via the polar Jacobian, yielding a distance-adaptive angular uncertainty $σ_θ= σ_t / r$ in $\mathcal{O}(R{\cdot}S)$ time. The primary parameter $σ_t$ represents the expected translational uncertainty in meters, a sensor-independent physical quantity that enhances cross-sensor generalization while reducing the need for extensive per-dataset tuning. Pairwise similarity combines a Bernoulli-KL Jaccard with exponential uncertainty gating and FFT-based height cosine similarity for rotation alignment. Evaluated on four datasets spanning four diverse LiDAR types, PROBE achieves the highest accuracy among handcrafted descriptors in multi-session evaluation and competitive single-session performance relative to both handcrafted and supervised baselines. The source code and supplementary materials are available at https://sites.google.com/view/probe-pr.
comment: 8 pages, 8 figures
Driving Style Recognition Like an Expert Using Semantic Privileged Information from Large Language Models
Existing driving style recognition systems largely depend on low-level sensor-derived features for training, neglecting the rich semantic reasoning capability inherent to human experts. This discrepancy results in a fundamental misalignment between algorithmic classifications and expert judgments. To bridge this gap, we propose a novel framework that integrates Semantic Privileged Information (SPI) derived from large language models (LLMs) to align recognition outcomes with human-interpretable reasoning. First, we introduce DriBehavGPT, an interactive LLM-based module that generates natural-language descriptions of driving behaviors. These descriptions are then encoded into machine learning-compatible representations via text embedding and dimensionality reduction. Finally, we incorporate them as privileged information into Support Vector Machine Plus (SVM+) for training, enabling the model to approximate human-like interpretation patterns. Experiments across diverse real-world driving scenarios demonstrate that our SPI-enhanced framework outperforms conventional methods, achieving F1-score improvements of 7.6% (car-following) and 7.9% (lane-changing). Importantly, SPI is exclusively used during training, while inference relies solely on sensor data, ensuring computational efficiency without sacrificing performance. These results highlight the pivotal role of semantic behavioral representations in improving recognition accuracy while advancing interpretable, human-centric driving systems.
When Engineering Outruns Intelligence: Rethinking Instruction-Guided Navigation
Recent ObjectNav systems credit large language models (LLMs) for sizable zero-shot gains, yet it remains unclear how much comes from language versus geometry. We revisit this question by re-evaluating an instruction-guided pipeline, InstructNav, under a detector-controlled setting and introducing two training-free variants that only alter the action value map: a geometry-only Frontier Proximity Explorer (FPE) and a lightweight Semantic-Heuristic Frontier (SHF) that polls the LLM with simple frontier votes. Across HM3D and MP3D, FPE matches or exceeds the detector-controlled instruction follower while using no API calls and running faster; SHF attains comparable accuracy with a smaller, localized language prior. These results suggest that carefully engineered frontier geometry accounts for much of the reported progress, and that language is most reliable as a light heuristic rather than an end-to-end planner. Code available at: https://github.com/matinaghaei/instructnav-scrutinized
comment: Updated version with additional ablations, clarifications, and code release
RouteFormer: A Transformer-Based Routing Framework for Autonomous Vehicles
Autonomous surveillance missions in Internet of Things (IoT) networks often involve solving NP-hard combinatorial optimization problems to ensure efficient resource utilization. To address the limitations of conventional heuristics in dynamic environments, we propose RouteFormer, a novel framework for single-agent routing in graph-based terrains. RouteFormer creates a synergy between the global context awareness of the transformer self-attention mechanism and the adaptive decision-making capabilities of Reinforcement Learning (RL). This architecture allows the system to output optimized routing decisions that adapt to complex task dependencies and resource availability without requiring labeled training datasets. We evaluated RouteFormer on varying graph sizes designed to resemble realistic reconnaissance missions. The results indicate that our model effectively handles the complexity of missions requiring multiple action profiles, outperforming baseline approaches, in terms of both time and distance. Specifically, RouteFormer achieved 10\% and 7\% reduction in distance compared to the solutions obtained from well-established solvers like Concorde and Lin-Kernighan-Helsgaun-3 (LKH-3). This improvement was achieved by effectively incorporating mission-specific constraints that traditional solvers overlook. The proposed framework serves as a modular, scalable pipeline for diverse autonomous scheduling and routing tasks.
comment: 10 pages, the title and abstract are modified after peer review process to better reflect the scope of the paper. More validation tests were added as well
Denoising Particle Filters: Learning State Estimation with Single-Step Objectives
Learning-based methods commonly treat state estimation in robotics as a sequence modeling problem. While this paradigm can be effective at maximizing end-to-end performance, models are often difficult to interpret and expensive to train, since training requires unrolling sequences of predictions in time. As an alternative to end-to-end trained state estimation, we propose a novel particle filtering algorithm in which models are trained from individual state transitions, fully exploiting the Markov property in robotic systems. In this framework, measurement models are learned implicitly by minimizing a denoising score matching objective. At inference, the learned denoiser is used alongside a (learned) dynamics model to approximately solve the Bayesian filtering equation at each time step, effectively guiding predicted states toward the data manifold informed by measurements. We evaluate the proposed method on challenging robotic state estimation tasks in simulation, demonstrating competitive performance compared to tuned end-to-end trained baselines. Importantly, our method offers the desirable composability of classical filtering algorithms, allowing prior information and external sensor models to be incorporated without retraining.
Multiagent Systems
Physics-Grounded Multi-Agent Architecture for Traceable, Risk-Aware Human-AI Decision Support in Manufacturing
High-precision CNC machining of free-form aerospace components requires bounded compensations informed by inspection, simulation, and process knowledge. Off-the-shelf large language model (LLM) assistants can generate text, but they do not reliably execute risk-constrained multi-step numerical workflows or provide auditable provenance for high-stakes decisions. We present multi-agent knowledge analysis (MAKA), a human-in-the-loop decision-support architecture that separates intent routing, tools-only quantitative analysis, knowledge graph retrieval, and critic-based verification that enforces physical plausibility, safety bounds, and provenance completeness before recommendations are surfaced for human approval. MAKA is instantiated on a Ti-6Al-4V rotor blade machining testbed by fusing virtual-machining path-tracking error fields, cutting-force and deflection simulations, and scan-based 3D inspection deviation maps from 16 blades. The analysis decomposes deviation into an evidence-linked pathing component, a drift-based wear proxy capturing systematic evolution across parts, a residual systematic compliance term, and a variability proxy for instability-aware escalation. In a three-level tool-orchestration benchmark (single-step through $\geq$3-step stateful sequences), MAKA improves successful tool execution by up to 87.5 percentage points relative to an unstructured single-model interaction pattern with identical tool access. Digital twin what-if studies show MAKA can coordinate traceable compensation candidates that reduce predicted surface deviation from order $10^{-2}$in to approximately $\pm 10^{-3}$in over most of the blade within the simulation environment, providing a pre-deployment verification signal for risk-aware human decision-making.
QKVShare: Quantized KV-Cache Handoff for Multi-Agent On-Device LLMs
Multi-agent LLM systems on edge devices need to hand off latent context efficiently, but the practical choices today are expensive re-prefill or full-precision KV transfer. We study QKVShare, a framework for quantized KV-cache handoff between agents that combines token-level mixed-precision allocation, a self-contained CacheCard representation, and a HuggingFace-compatible cache injection path. Our current results support a narrower but clearer story than the original draft: on 150 GSM8K problems with Llama-3.1-8B-Instruct, adaptive quantization remains competitive under repeated handoff and shows its clearest gains against uniform quantization in deeper-hop, higher budget settings; for handoff latency, the QKVShare path reduces TTFT relative to full re prefill at every tested context, from 130.7 ms vs. 150.2 ms at nominal 1K context to 397.1 ms vs. 1029.7 ms at nominal 8K context;. Stage timing shows that post-injection generation, not card creation, dominates the current QKVShare latency path. These results position quantized KV handoff as a promising on-device systems direction while also highlighting the need for stronger controller ablations and apples-to-apples runtime comparisons.
comment: 12 pages, 1 figure, 3 tables
FINER-SQL: Boosting Small Language Models for Text-to-SQL
Large language models have driven major advances in Text-to-SQL generation. However, they suffer from high computational cost, long latency, and data privacy concerns, which make them impractical for many real-world applications. A natural alternative is to use small language models (SLMs), which enable efficient and private on-premise deployment. Yet, SLMs often struggle with weak reasoning and poor instruction following. Conventional reinforcement learning methods based on sparse binary rewards (0/1) provide little learning signal when the generated SQLs are incorrect, leading to unstable or collapsed training. To overcome these issues, we propose FINER-SQL, a scalable and reusable reinforcement learning framework that enhances SLMs through fine-grained execution feedback. Built on group relative policy optimization, FINER-SQL replaces sparse supervision with dense and interpretable rewards that offer continuous feedback even for incorrect SQLs. It introduces two key reward functions: a memory reward, which aligns reasoning with verified traces for semantic stability, and an atomic reward, which measures operation-level overlap to grant partial credit for structurally correct but incomplete SQLs. This approach transforms discrete correctness into continuous learning, enabling stable, critic-free optimization. Experiments on the BIRD and Spider benchmarks show that FINER-SQL achieves up to 67.73\% and 85\% execution accuracy with a 3B model -- matching much larger LLMs while reducing inference latency to 5.57~s/sample. These results highlight a cost-efficient and privacy-preserving path toward high-performance Text-to-SQL generation. Our code is available at https://github.com/thanhdath/finer-sql.
MemFlow: Intent-Driven Memory Orchestration for Small Language Model Agents
Modern language agents must operate over long-horizon, multi-turn histories, yet deploying such agents with Small Language Models (SLMs) remains fundamentally difficult. Full-context prompting causes context overflow, flat retrieval exposes the model to noisy evidence, and open-ended agentic loops are unreliable under limited reasoning capacity. We argue that a substantial portion of SLM memory failure arises from mismatched memory operations: different query types demand categorically different retrieval strategies, evidence transformations, and context budgets that SLMs cannot reliably self-orchestrate through open-ended reasoning. We introduce MemFlow, a training-free memory orchestration framework that externalizes memory planning from the SLM. A Router Agent classifies each query by intent and dispatches it to the Memory Agent, which executes one of three specialized tiers (Profile Lookup, Targeted Retrieval, or Deep Reasoning) and assembles the resulting evidence under a dynamic, tier-aware token budget. An Answer Agent then generates a response from this compact context, and a Validator Agent optionally retries with a heavier memory tier when the response is not supported by the provided evidence. This route-then-compile design avoids tool-selection hallucination and reasoning loops while keeping the answer context compact. Evaluated on a frozen Qwen3-1.7B backbone across long-horizon memory benchmarks - LongMemEval, LoCoMo, and LongBench - MemFlow improves accuracy by nearly 2x over full-context SLM baselines. These results suggest that structured intent routing and deterministic evidence preparation can make limited-capacity models substantially more effective in resource-constrained long-horizon agents.
Coordination as an Architectural Layer for LLM-Based Multi-Agent Systems
Multi-agent LLM systems fail in production at rates between 41% and 87%, mostly due to coordination defects rather than base-model capability. Existing responses split between cataloguing failure modes empirically and shipping declarative orchestration frameworks as engineering tools; neither delivers a principled mapping from coordination configuration to predictable failure-mode signature. We argue that coordination should be treated as a configurable architectural layer, separable from agent logic and from information access, enabling architectural reasoning rather than only engineering productivity. We instantiate this with an information-controlled design on prediction markets: a single LLM, fixed tools, fixed per-call output cap, and fixed prompt template across five reference coordination configurations, with total compute per question treated as an endogenous architectural output. The Murphy decomposition of the Brier score separates calibration from discriminative power, so configurations leave distinguishable signatures even when aggregate scores coincide. On 100 Polymarket binary markets resolved after the model's training cutoff (claude-opus-4-6) we report Murphy signatures, a cost-quality Pareto frontier, category-conditioned analysis, and a bootstrap power-projection. Three of five pre-specified predictions are upheld in direction; two configurations dominate the Pareto frontier within this regime; exploratory bootstrap intervals separate consensus alignment from others, though pairwise tests do not survive Bonferroni correction at n=100. We also deploy the same configurations as live agents on Foresight Arena under web-search-enabled conditions, as an on-chain replication channel accumulating in parallel. Harness, trace dataset, and production agents are released. We position this as a methodology-validating first instantiation, not a general cross-model claim.
comment: 31 pages, 7 figures, 4 tables. Code, traces, and production agents publicly released; see Appendix B for repository pinning
Agent Island: A Saturation- and Contamination-Resistant Benchmark from Multiagent Games
Static capabilities benchmarks suffer from saturation and contamination, making it difficult to track capabilities progress over time. We introduce Agent Island, a multiplayer simulation environment in which language-model agents compete in a game of interagent cooperation, conflict, and persuasion. The environment yields a dynamic benchmark designed to mitigate both saturation and contamination; new models can always outperform the current leading player in this winner-take-all game, and agents compete against other adaptive agents rather than face a fixed task set. We rank players with a Bayesian Plackett-Luce model, allowing us to quantify uncertainty in player skill. In 999 games involving 49 unique models, openai/gpt-5.5 dominates its peers with a posterior mean skill of 5.64, compared with 3.10 for the second-ranked model, openai/gpt-5.2, and 2.86 for the third-ranked model, openai/gpt-5.3-codex. We release the game logs as a dataset for analyses of model behavior. As an example, we investigate same-provider preference in final-round votes and find that models are 8.3 p.p. more likely to support a same-provider finalist than finalists from other providers. This preference is not uniform across providers: among separately estimated providers, the effect is strongest for OpenAI models and weakest for Anthropic models.
comment: 15 pages, 3 figures, 3 tables
Governed Collaborative Memory as Artificial Selection in LLM-Based Multi-Agent Systems
Persistent memory is turning language-model-based agents from stateless participants in isolated interactions into state-bearing components of LLM-based multi-agent systems. As memory becomes durable, reloadable, and behavior-shaping across agents, sessions, or versions, a design question arises that is not captured by retrieval accuracy or access control alone: which candidate memories should become shared institutional state? This Viewpoint frames that problem as governed collaborative memory. We argue that memory governance functions as a selection regime, determining which memory variants persist, which remain private, and which are rejected, abstained from, or superseded. We distinguish ungoverned persistence, constitutional or hybrid selection, automatic metric-based selection, and human-ratified artificial selection, emphasizing that these regimes are not a ranking but a design choice over target properties. We then describe a layered architecture that separates agent-local memory, shared institutional memory, archive memory, and project-continuity memory, with provenance and version lineage making selection inspectable. Documented traces from one running LLM-based multi-agent ecosystem illustrate unmanaged false-memory persistence, ratified institutional memory, rejection and revision, identity-preserving expansion, and governance-as-learning. The contribution is a design agenda: persistent LLM-based multi-agent systems should evaluate memory not only for recall and performance, but also for provenance fidelity, selection traceability, epistemic quality, correction pathways, and role preservation.
ARMATA: Auto-Regressive Multi-Agent Task Assignment
Coordinating multi-agent systems over spatially distributed areas requires solving a complex hierarchical problem: first distributing areas among agents (allocation) and subsequently determining the optimal visitation order (routing). Existing methods typically decouple these stages ignoring inter-stage dependencies or rely on decentralized heuristics that lack global context. In this work, we propose a centralized, fully end-to-end auto-regressive framework that jointly generates allocation decisions and routing sequences. The core contribution of our approach is a multi-stage decoding mechanism that unifies high-level allocation and low-level routing in a single autoregressive pass while maintaining a centralized global state. This enables the model to implicitly balance workload distribution with routing efficiency, avoiding local optima common in decentralized methods. Extensive experiments demonstrate that our method significantly outperforms diverse baselines, achieving up to a 20\% improvement in solution quality over industrial solvers such as Google OR-Tools, IBM CPLEX, and LKH-3, while reducing computation time from hours to seconds.
FlowEval: Reference-based Evaluation of Generated User Interfaces
While large language models (LLMs) and coding agents are often applied to user interface (UI) development, developers find it difficult to reliably assess their proficiency in visual and interaction design. Existing evaluations either rely on human experts, who can accurately assess usability by testing critical flows but are slow and costly, or on automated judges, which are scalable but less accurate and opaque. We present FlowEval, a reference-based framework that measures whether a generated UI supports realistic interaction flows by comparing navigation traces from real websites to traces from generated analogs using reference-based similarity metrics (e.g., dynamic time warping). In a small-scale study with expert UI evaluators, we show that reference-based metrics strongly correlate with human judgments, suggesting that they can provide scalable yet trustworthy evaluation for UI generation systems.
Soft Tournament Equilibrium
The evaluation of general-purpose artificial agents, particularly those based on LLMs, presents a significant challenge due to the non-transitive nature of their interactions. When agent A defeats B, B defeats C, and C defeats A, traditional ranking methods that force a linear ordering can be misleading and unstable. We argue that for such cyclic domains, the fundamental object of evaluation should not be a ranking alone but a set-valued core, as conceptualized in classical tournament theory. This paper introduces Soft Tournament Equilibrium (STE), a differentiable framework for learning and computing set-valued tournament solutions directly from pairwise comparison data. STE first learns a probabilistic tournament model, potentially conditioned on rich contextual information. It then employs differentiable operators for soft reachability and soft covering to compute continuous analogues of two seminal tournament solutions: the Top Cycle and the Uncovered Set. The output is a set of core agents, each with a continuous membership score that can be calibrated when suitable validation labels or repeated-sampling evidence are available. We develop the theoretical foundation for STE by proving consistency with classical solutions in the zero-temperature limit, establishing Condorcet-inclusion properties, and analyzing stability and sample complexity. We evaluate the method on a planted cyclic core benchmark and on real preference/execution diagnostics. This work provides a self-contained account that re-centers general-agent evaluation on a robust tournament-theoretic foundation, moving from unstable rankings toward stable, set-valued equilibria.
Descent-Guided Policy Gradient for Scalable Cooperative Multi-Agent Learning
Scaling cooperative multi-agent reinforcement learning (MARL) is fundamentally limited by cross-agent noise. When agents share a common reward, each agent's learning signal is computed from a shared return that depends on all agents, so the stochasticity of the other agents enters the signal as cross-agent noise that grows with $N$. Fortunately, many engineering systems, such as cloud computing and power systems, have differentiable analytical models that prescribe efficient system states, providing a new reference beyond noisy shared returns. In this work, we propose Descent-Guided Policy Gradient (DG-PG), a framework that augments policy-gradient updates with a noise-free descent signal derived from differentiable analytical models. We prove that DG-PG reduces policy-gradient estimator variance from $\mathcal{O}(N)$ to $\mathcal{O}(1)$, preserves the equilibria of the cooperative game, and achieves agent-independent sample complexity $\widetilde{\mathcal{O}} (1/ε)$. On a heterogeneous cloud resource scheduling task with up to 1500 agents, DG-PG converges within 20 episodes on average, while MAPPO and IPPO fail to converge under identical architectures.
comment: 11 pages, 4 figures, 9 tables; plus 19 pages of appendices
Closed-Loop Vision-Language Planning for Multi-Agent Coordination
Cooperative multi-agent reinforcement learning (MARL) struggles with sample efficiency, interpretability, and generalization. While Large Language Models (LLMs) offer powerful planning capabilities, their application has been hampered by a reliance on text-only inputs and a failure to handle the non-Markovian, partially observable nature of multi-agent tasks. We introduce COMPASS, a multi-agent framework that overcomes these limitations by integrating Vision-Language Models (VLMs) for decentralized, closed-loop decision-making. COMPASS dynamically generates and refines interpretable, code-based strategies stored in a skill library that is bootstrapped from expert demonstrations. To ensure robust coordination, it propagates entity information through a structured multi-hop communication protocol, allowing teams to build a coherent understanding from partial observations. Evaluated on the challenging SMACv2 benchmark, COMPASS significantly outperforms state-of-the-art MARL baselines. Notably, in the symmetric Protoss 5v5 task, COMPASS achieved a 57\% win rate, a 30 percentage point advantage over QMIX (27\%). Project page can be found at https://stellar-entremet-1720bb.netlify.app/.
From Cooperation to Hierarchy: A Study of Dynamics of Hierarchy Emergence in a Multi-Agent System
A central premise in evolutionary biology is that individual variation can generate information asymmetries that facilitate the emergence of hierarchical organisation. To examine this process, we develop an agent-based model (ABM) to identify the minimal conditions under which hierarchy arises in dynamic multi-agent systems, focusing on the roles of initial heterogeneity and mutation amplitude across generations. Hierarchical organisation is quantified using the Trophic Incoherence (TI) metric, which captures directional asymmetries in interaction networks. Our results show that even small individual differences can be amplified through repeated local interactions involving reproduction, competition, and cooperation, but that hierarchical order is markedly more sensitive to mutation amplitude than to initial heterogeneity. Across repeated trials, stable hierarchies reliably emerge only when mutation amplitude is sufficiently high, while initial heterogeneity primarily affects early formation rather than long-term persistence. Overall, these findings demonstrate how simple interaction rules can give rise to both the emergence and persistence of hierarchical organisation, providing a quantitative account of how structured inequality can develop from initially homogeneous populations.
comment: 16 pages, 8 figures. This version includes a minor notation clarification in Eq. (11) by introducing the parameter beta in the softmax formulation. No other changes have been made
AVA: Attentive VLM Agent for Mastering StarCraft II
We introduce AVACraft, a multimodal StarCraft II benchmark supporting both Multi-Agent Reinforcement Learning (MARL) and Vision-Language Model (VLM) paradigms. Unlike SMAC-family environments that rely on abstract state representations and exclude VLMs, AVACraft provides RGB visuals, natural language observations, and structured state information, enabling systematic comparison between training-based and zero-shot methods across 21 scenarios spanning micromanagement, coordination, and strategic planning. We establish comprehensive baselines: six MARL algorithms (IQL, QMIX, QTRAN, VDN, MAPPO, IPPO) with Swin-Transformer backbones trained for 5M steps, and multiple VLMs including proprietary (GPT-4o) and open-source (Qwen3-VL) models. Results reveal complementary strengths-MARL peaks at 19.3% win rate after 5M steps, while VLMs achieve 75-90% zero-shot with human-aligned decisions-exposing trade-offs between training efficiency, performance ceilings, interpretability, and deployment cost. Code: https://github.com/camel-ai/VLM-Play-StarCraft2.
Systems and Control (EESS)
HyParLyVe: Hyperplane Partitioning for Neural Lyapunov Verification
This work introduces HyParLyVe (Hyperplane Partitioned Lyapunov Verifier), a novel algorithm for sound and complete verification of neural Lyapunov candidates by interpreting shallow ReLU networks as hyperplane arrangements. This perspective reduces positive definiteness verification to a finite set of vertex evaluations, and the decrease condition to a bounded optimization problem over each region. We formally prove correctness of the proposed verification procedures and demonstrate that HyParLyVe achieves significant speedups over state-of-the-art methods.
A Welfarist Perspective on Fair Generation Curtailment
This paper presents a welfarist approach to fair active power curtailment in distribution grids with distributed photovoltaics. We address the lack of consistent axiomatic foundations in existing ad-hoc curtailment rules by modeling the decision as a social choice problem over feasible operating points and by deriving curtailment objectives from a set of foundational axioms that express principled stances on fairness and grid access rights. Rather than relying on the typically assumed full comparability of utilities, which can lead to undesirable outcomes in heterogeneous residential systems, we adopt a cardinal non-comparability stance on utilities. This approach requires far fewer assumptions about prosumers' private preferences while providing a rigorous basis for fair social ranking. We then present a unified framework that demonstrates that existing curtailment schemes represent specific instances of the Kalai-Smorodinsky rule applied to different normative reference points. This perspective offers grid operators an auditable, axiomatic foundation for justifying fairness in local energy systems.
Sinkhorn Ambiguity Sets for Distributionally Robust Control: Convexity, Weak Compactness, and Tractability
Classical stochastic control assumes perfect knowledge of the uncertainty affecting the plant. In practice, however, such information is often incomplete. To address this limitation, we consider a distributionally robust control (DRC) problem with ambiguity sets defined via the Sinkhorn discrepancy. Compared to other discrepancy measures based on optimal transport, such as the popular Wasserstein distance, the Sinkhorn divergence does not constrain the worst-case distribution to be discrete, and allows combining observed data with prior knowledge in the form of a reference distribution, making this choice particularly suitable when only few noise samples are available for control design. We first study the properties of Sinkhorn ambiguity sets, establishing convexity and weak compactness under standard assumptions. We then leverage these results to prove that, the Sinkhorn DR linear quadratic control problem over linear policies can be solved through convex programming-even in the presence of DR safety constraints. Finally, we validate our theoretical findings and demonstrate the effectiveness of the proposed approach on a trajectory planning example.
Online Energy Management for Bidirectional EV Charging with Rooftop PV: An Aging-Aware MPC Approach
This paper investigates the economic impact of vehicle-home-grid integration in the presence of rooftop PV, by proposing an online, aging-aware energy management strategy for an electric vehicle (EV), a household, and the electrical grid. The model predictive control-based framework explicitly exploits vehicle-to-grid (V2G) and vehicle-to-home (V2H) operation to perform energy arbitrage, increase self-consumption, while respecting user-driven driving requirements. The framework optimizes power flows over a shrinking horizon using a detailed battery aging model that captures both calendar and cycle degradation, and a Transformer-based forecaster that provides short-term predictions of household load and solar irradiance. For a one-year horizon, the proposed strategy yields the lowest annual cost among all evaluated strategies. Adding PV increases the annual profit by EUR 1060.7 compared to operating without PV, and yields an economic gain of up to EUR 2410.5 over smart unidirectional charging, at the expense of only 1.27% extra battery degradation. Even in the least favorable case with no remuneration for V2G energy, bidirectional operation still delivers an economic gain of EUR 355.8 through V2H. Sensitivity analyses over V2G price ratio, EV battery size, household demand, and pickup time uncertainty confirm that these benefits persist across a wide range of scenarios and highlight the potential of EVs as active energy nodes, enabling sustainable energy management and cost-effective battery usage in real-world conditions.
comment: This manuscript has been submitted to an IEEE Transactions journal for possible publication
A Directivity-Dependent Rician K-Factor Model for Indoor Industrial Channels
We derive a physics-based, closed-form model linking antenna directivity to the root-mean-square (RMS) delay spread and mean excess delay in large reverberant indoor environments. Starting from the Rician K-factor-the ratio of line-of-sight (LOS) to scattered power we show that K scales with the total transmit-plus-receive (Tx+Rx) antenna gain through a single reverberance factor that quantifies scatter anisotropy. For an arbitrary scatter power delay profile (PDP), we derive a general identity connecting sigma, tau, and K; the exponential scatter model is the physically motivated special case. Ray-tracing simulations over 100 random link placements in a 57300 m3 industrial hall at 75 GHz validate the model. Compact design rules map target delay-spread values to the minimum required antenna gain, enabling wideband mmWave industrial links.
comment: 4 pages, 4 figures
Exact and Evolutionary Algorithms for Sequential Multi-Objective Transmission Topology Planning
We address day-ahead transmission topology planning and congestion management as a sequential, multi-objective optimization problem and develop two complementary algorithms for it: an exact enumeration method and a tailored evolutionary heuristic. The problem is formulated with four operational objectives reflecting real TSO decision criteria: worst-case line loading under $N-1$ security, topological depth, number of switching actions, and time spent in non-reference topologies, over a 24-hour horizon. We introduce the block algorithm, an exact method that exploits the temporal block structure of feasible strategies to enumerate the complete Pareto front; for fixed operational bounds on depth and switch count, its evaluation count grows polynomially with the planning horizon. We complement it with a multi-objective evolutionary algorithm based on NSGA-III, with structure-guided initialization and problem-specific variation operators tailored to the topology-planning structure. Using real operational data from the Dutch high-voltage grid operated by TenneT TSO, we show that the block algorithm computes the full Pareto front for a highly congested day in under three minutes, and that the evolutionary algorithm converges toward but does not recover the exact front. The block algorithm thus provides both a practical decision-support tool and a ground-truth benchmark for future heuristic and learning-based methods on this problem class.
comment: 29 pages, 6 figures
Global exponential stabilization of a force- and torque-actuated unicycle by flexible-step MPC
We study the problem of global exponential stabilization of a force- and torque-controlled unicycle model in discrete time. To this end, we extend a recently introduced approach to model predictive control (MPC) in which a flexible number of inputs is implemented in every iteration. We present the first flexible-step MPC protocol with state-dependent weights for average descent. Notably, the proposed method relies neither on a suitable design of running or terminal cost functions nor on a suitable choice of terminal constraints. Instead, stability is guaranteed through a generalized discrete-time control Lyapunov function. We establish a new theoretical framework for global exponential stabilization of general nonlinear discrete-time control systems by flexible-step MPC. The obtained results go beyond the unicycle example. However, given the importance of the unicycle dynamics, we make that a focal point of our work. For the particular case of the dynamic (second-order) unicycle model, we show that global exponential stability cannot be attained in the classical sense, but in a slightly weaker sense. The proposed flexible-step MPC method is shown to induce the best possible notion of global exponential stability for this model. We provide explicit rules for the choice of parameters, which guarantee feasibility and global exponential stability. Our numerical simulations show that the discrete MPC method also works very well in applications to a continuous-time torque-actuated unicycle.
Feasibility-aware Hybrid Control for Motion Planning under Signal Temporal Logics
In this work, a novel method for planar task and motion planning based on hybrid modeling is proposed. By virtue of a discrete variable which models local constraint satisfaction and enables local feasibility analysis, the proposed control architecture unifies planning with control design. Concurrently, control barrier functions are designed on a transformed disk version of the original nonconvex and geometrically complex robotic workspace, thus amending the issue of deadlocks. Simulations of the proposed method indicate effective handling of multiple overlapping spatio-temporal tasks even in the face of input saturation.
Almost Sure Reachability in Continuous-time Stochastic Systems
We provide certificates for almost sure reachability of continuous-time stochastic systems governed by stochastic differential equations (SDEs). We first show that a standard Euler-Maruyama discretization may fail to preserve almost sure reachability property of the system using a double-well Langevin system. This observation motivates us to develop certificates for almost sure reachability directly on the continuous-time system. We introduce a pair of certificates, a drift function and a variant function, and prove necessity and sufficiency for almost sure reachability of an open bounded target set. Using these certificates, for linear SDEs, we give a characterization of almost sure reachability in terms of the spectral structure of the system matrices. For polynomial SDEs, we fix a polynomial template for the drift function and choose the variant function template as an exponential function composed with a polynomial. This allows us to translate the conditions in the certificates into sum-of-squares (SOS) constraints. We then propose an alternating scheme to resolve bilinearities. We illustrate the approach on the double-well Langevin example, showing that continuous-time SOS certificates recover almost sure reachability that is lost under time discretization. Moreover, we verify the SOS approach on a polynomial system.
comment: 16 pages, 6 figures
A simulation- and model-based approach to PI control pairing and tuning for the pyro process in a cement plant
The operation of the pyro process in cement production significantly affects the energy efficiency and sustainability of the cement plant, especially for reductions in carbon dioxide emissions. Hence, pyro process control is essential to obtain efficient and sustainable operation of cement plants. In this paper, we demonstrate how simulations and models can be utilized to evaluate and design control strategies for the pyro section in cement plants. We apply a novel differential algebraic equation (DAE) model for dynamic simulation of the pyro-section in cement plants to design decentralized PI controllers for the pyro-section. We utilize the pyro-process model to evaluate the control structure design. Through linearization of the pyro-process model, we apply the Relative Gain Array (RGA) method to choose and evaluate the pairings of the manipulated variables (MVs) and the controlled variables (CVs). Using simulations of the pyro-section, we generate step responses to estimate transfer models and apply Internal Model Control (IMC) for the tuning of the individual decentralized single-input single-output (SISO) PI controllers. Closed-loop simulations of the PI controllers demonstrate that PI controllers with IMC parameters provide smoother and faster responses compared with manually tuned PI parameters.
comment: 6 pages, 6 figures, 3 tables, submitted to the 67th International Conference of Scandinavian Simulation Society (SIMS 2026)
Event-Based Dynamic Programming for Pumped-Storage Hydropower Scheduling
This paper studies the single-unit pumped-storage hydropower (PSH) plant scheduling problem with reservoir dynamics, generation and pumping limits, ramping constraints, start-up and shut-down costs, and minimum up/down-time requirements. A new event-based formulation is proposed in which an operating schedule is represented as a sequence of mode-specific events, with dispatch decisions within each event determined by linear programs. Based on this construction, the original time-indexed mixed-integer formulation is reformulated exactly as a deterministic dynamic program on an event network. The framework is modular and can be extended to incorporate additional operating modes, such as hydraulic short-circuit operation, by introducing corresponding event modules without significantly changing the overall event-network structure. To obtain tractable solution methods, a finite-grid approximation of the event network is developed, leading to a linear programming formulation for the discretized model. In addition, an event-based branch-and-bound algorithm with linear program-based bounds is proposed for the continuous-state problem. Numerical results demonstrate that the proposed event-based framework provides a computationally effective alternative to the conventional time-indexed formulation, while offering substantial modeling flexibility for PSH scheduling problems.
Learning Reactive Dexterous Grasping via Hierarchical Task-Space RL Planning and Joint-Space QP Control
In this work, we propose a hybrid hierarchical control framework for reactive dexterous grasping that explicitly decouples high-level spatial intent from low-level joint execution. We introduce a multi-agent reinforcement learning architecture, specialized into distinct arm and hand agents, that acts as a high-level planner by generating desired task-space velocity commands. These commands are then processed by a GPU-parallelized quadratic programming controller, which translates them into feasible joint velocities while strictly enforcing kinematic limits and collision avoidance. This structural isolation not only accelerates training convergence but also strictly enforces hardware safety. Furthermore, the architecture unlocks zero-shot steerability, allowing system operators to dynamically adjust safety margins and avoid dynamic obstacles without retraining the policy. We extensively validate the proposed framework through a rigorous simulation-to-reality pipeline. Real-world hardware experiments on a 7-DoF arm equipped with a 20-DoF anthropomorphic hand demonstrate highly robust zero-shot transferability for dexterous grasping to a diverse set of unseen objects, highlighting the system's ability to reactively recover from unexpected physical disturbances in unstructured environments.
comment: 18 pages
Will the Carbon Border Adjustment Mechanism Impact European Electricity Prices? A GNN-Based Network Analysis
The European Union's Carbon Border Adjustment Mechanism (CBAM) creates a complex challenge for the interconnected European electricity market. Traditional static analyses often miss the cross-border spillover effects that are vital for understanding this policy. This paper addresses this gap by developing a spatio-temporal Graph Neural Network (GNN) framework. It quantifies how CBAM affects electricity prices and carbon intensity (CI) at the same time. We modeled a subgraph of eight European countries. Our results suggest that CBAM is not just a uniform tax. Instead, it acts as a tool that transforms the market and creates structural differences. In our simulated scenarios, we observe that low-carbon countries like France and Switzerland can gain a competitive advantage. This suggests a potential decrease in their domestic electricity prices. Meanwhile, high-carbon countries like Poland face a double burden of rising costs. We identify the primary driver as a fundamental shift in the market's merit order.
On Surprising Effects of Risk-Aware Domain Randomization for Contact-Rich Sampling-based Predictive Control
Domain randomization (DR) is widely used in policy learning to improve robustness to modeling error, but remains underexplored in contact-rich sampling-based predictive control (SPC), where rollout quality is highly sensitive to uncertainty. In this work, we take the first step by studying risk-aware DR in predictive sampling on a simple yet representative Push-T task, comparing average, optimistic, and pessimistic rollout aggregations under randomized model instances. Our initial results suggest that DR affects not only robustness to model error, but also the effective cost landscape seen by the sampling-based optimizer, by reshaping the basin of attraction around contact-producing actions. This opens up potential for exploring better grounded risk-aware contact-rich SPC under model uncertainty. Video: https://youtu.be/f1F0ALXxhSM
comment: 5 pages, 3 figures
Model Predictive Static Programming for Discrete-Time Optimal Control on Lie Groups
This paper extends the Model Predictive Static Programming (MPSP) framework for nonlinear systems evolving on Euclidean spaces to simple mechanical systems evolving on Lie groups. Classical optimal control approaches based on Pontryagin's Maximum Principle (PMP) lead to nonlinear two-point boundary value problems (TPBVPs), whose numerical solution becomes particularly challenging on nonlinear configuration spaces. To overcome this difficulty, the proposed Lie-group MPSP framework reformulates the finite-horizon optimal control problem as a sequence of static quadratic programs that admit closed-form control updates, thereby avoiding the need to solve TPBVPs directly. The development relies on left-trivialized variations, intrinsic linearization on Lie groups, and a recursive computation of terminal sensitivity matrices, which together enable computationally efficient real-time implementation. The proposed method is demonstrated through optimal flipping maneuvers of a variable-pitch quadrotor (VPQ) and a single-main-rotor helicopter (SMRH), both of which are capable of generating negative thrust. For validation, continuous-time necessary and sufficient optimality conditions are derived, and the corresponding TPBVP solutions are compared against the trajectories generated by the proposed MPSP method in numerical simulations. In addition, the proposed algorithm is systematically compared with the iterative Linear Quadratic Regulator (iLQR) method, and a detailed numerical study is presented to highlight the relative performance and computational features of the two approaches.
Adaptive Diagonal Loading for Norm Constrained Beamforming
Reliable adaptive beamforming is critical for large microphone arrays operating in highly dynamic acoustic environments. In scenarios characterized by fast-moving talkers and interferers, the available sample support for estimating the spatial correlation matrix is often snapshot-deficient. This deficiency, coupled with array imperfections, degrades the White Noise Gain (WNG), leading to severe target signal cancellation. To ensure stable and robust beamforming, we propose a novel adaptive diagonal loading method that guarantees the WNG remains strictly within specified bounds. By leveraging the Kantorovich inequality, we map the desired WNG to a strict upper bound on the condition number of the correlation matrix. Furthermore, we present three estimation techniques for the adaptive loading level, ranging from trace-based bounding to exact eigenvalue decomposition, offering scalable computational complexities of $\mathcal{O}(M)$, $\mathcal{O}(M^2)$, and $\mathcal{O}(M^3)$. Our approach demonstrates highly stable beamforming under fast-changing interference.
comment: 5 pages, 5 figures
Analysis of a Competitive Bivirus SIS Epidemic Model with Game Theoretic Social Distancing
We propose a competitive bi-virus model with dynamic social distancing behavior. Our model illustrates how public perception of different viruses changes the conditions for their eradication, their coexistence, or the dominance of one over the other. We show that our model is not monotone, in contrast to the classic bi-virus model. We detail how social distancing behavior produces different sets of equilibria than the classic bi-virus model and changes the criteria for their stability. In particular, we detail the set of disease free equilibria (DFE) present in our model and identify necessary and sufficient conditions for almost global exponential stability of the same. We prove similar global results for all but one non-DFE isolated (unilateral) equilibria and local stability results for the remainder. We also consider coexistence equilibria; we show such equilibria, when they exist, take the form of lines of equilibria and give local conditions for their stability. Finally, we illustrate our theoretical findings with numerical examples.
Dynamic Quantum-Assisted Co-Design of Control Tuning and Lyapunov Stability Synthesis for Nonlinear Systems
This paper proposes a dynamic quantum-assisted co-design framework for nonlinear closed-loop systems in which controller parameters and Lyapunov-certificate parameters are redesigned jointly at successive decision epochs. Unlike conventional nonlinear control designs that typically tune controller gains offline and verify stability separately, the proposed method embeds performance improvement and Lyapunov-based stability synthesis within a unified online optimization loop. The main novelty is a two-step computational structure that first contracts the continuous admissible search region around the current operating condition using a Black-Hole-based calibration procedure and then constructs a finite binary representation only over this calibrated region. The encoded objective is obtained from sampled nonlinear closed-loop evaluations and approximated by a local quadratic pseudo-Boolean surrogate, enabling an Ising-type Hamiltonian representation suitable for quantum-assisted optimization. Quantum imaginary time evolution is then used to explore the encoded Hamiltonian, and the resulting candidate bitstrings are decoded into continuous controller and Lyapunov parameters. To reduce dependence on the surrogate model, the decoded candidates are re-evaluated using the original nonlinear closed-loop cost and Lyapunov penalties before the final update is applied. The framework can accommodate different Lyapunov decay specifications by modifying the stability penalty and is validated on first-order nonlinear consensus, second-order nonlinear consensus, and induction-motor drive control examples. The implementation code used to generate the reported results is available at \href{https://github.com/LSU-RAISE-LAB/DQCLS-NS}{GitHub}.
StormWave: An Open-Source Portable SDR Platform for Over-the-Air Resilience Evaluation of Terrestrial and Aerial Communications
This paper presents \emph{StormWave}, an open-source, portable software-defined Radio Frequency (RF) interference generation and monitoring platform designed for realistic field-based evaluation of the resilience of wireless communication systems. StormWave enables seamless composition and runtime switching among a wide range of narrowband and wideband waveforms, while supporting multiple digital modulations, adaptive coding, and multi-radio orchestration with real-time spectrum visualization. We evaluate the effectiveness of StormWave through both outdoor ground and air-to-air (A2A) experiments. Ground experiments demonstrate clear waveform- and modulation-dependent interference effects under realistic propagation conditions, while A2A experiments reveal pronounced distance-dependent constellation distortion and access-symbol degradation under active interference. The StormWave source code will be released to the community, with the expectation that StormWave will be used as a flexible, extensible, and field-ready platform for systematically validating interference resilience of wireless systems under realistic operating conditions.
comment: 7 pages, 10 figures
Building Power Grid Models from Open Data: A Complete Pipeline from OpenStreetMap to Optimal Power Flow
Access to realistic transmission grid models is essential for power systems research, yet detailed network data in the United States remains restricted under critical-infrastructure regulations. We present a pipeline that constructs complete, OPF-solvable transmission network models entirely from publicly available data. The five-stage pipeline (1) extracts power infrastructure from OpenStreetMap via a local Overpass API instance, (2) reconstructs bus-branch topology through voltage inference, line merging, and transformer detection, (3) estimates electrical parameters using voltage-class lookup tables calibrated with U.S. Energy Information Administration (EIA) plant-level data, (4) allocates hourly demand from EIA-930 to individual buses using US Census population as a spatial proxy, and (5) solves both DC and AC optimal power flow using PowerModels.jl with a progressive relaxation strategy that automatically loosens constraints on imprecise models. We validate the pipeline on all 48 contiguous US states and six multi-state regions, including the full Western (5,076 buses) and Eastern (21,697 buses) Interconnections. Of the 48 single-state models, 42 (88%) converge at the strictest relaxation level for AC-OPF at peak hour and 44 (92%) off-peak. Dispatch costs (median $22/MWh) and system losses (median 1.0%) are consistent with real wholesale-market outcomes. The pipeline relies exclusively on open data sources, enabling reproducible grid analysis without proprietary data. All 54 models (48 single-state and 6 multi-state) are publicly released at https://github.com/microsoft/GridSFM.
comment: All models are publicly released at https://github.com/microsoft/GridSFM
OPENJ: A Conceptual Framework for Open-Source Digital Human Modeling and Ergonomic Assessment in a CAD Environment
Industrial workplace challenges range from musculoskeletal disorders -- a leading cause of occupational injury -- to suboptimal workstation layouts, inefficient task sequences, and poor human-equipment fit. Digital human modeling (DHM) tools address several of these challenges by placing a scalable virtual mannequin in a computer-aided design (CAD) environment, enabling engineers to evaluate ergonomic risk through standardized assessment methods (RULA, REBA, NIOSH Lifting Equation, OWAS), optimize workstation layouts for reach and visibility, predict task postures through inverse kinematics, and simulate operations before physical implementation. Despite four decades of development since the Jack system originated at the University of Pennsylvania in the 1980s, the integrated DHM capability set -- anthropometric mannequin, posture prediction, ergonomic assessment, and CAD integration -- remains exclusive to commercial platforms such as Siemens Tecnomatix Jack (Process Simulate), Dassault DELMIA, Humanetics RAMSIS, and the University of Iowa's Santos system. These platforms operate under proprietary, vendor-quoted pricing models, and their acquisition and operating costs, together with closed-source implementations, have been repeatedly identified as practical adoption barriers for individual researchers, small-to-medium enterprises, and educational institutions. Organizations without access resort to manual observational methods -- paper-based worksheets applied to photographs or video -- sacrificing the predictive power and reproducibility that computational analysis provides. The paper serves as a design blueprint for (OpenJane/Joe), positioning the project for subsequent open-source implementation and community adoption.
comment: 11 pages, 2 figures, submitted to ASME IMECE 2026
Globally Solving Unbalanced Optimal Transport and Density Control for Gaussian Distributions
In this article, we study unbalanced optimal transport (UOT) and establish a control-theoretic dynamical extension, which we call the unbalanced density control (UDC), for a class of Gaussian reference measures. In the static setting, we consider UOT with quadratic transport cost and Kullback--Leibler penalties on the marginals relative to prescribed Gaussian measures. We show that the infinite-dimensional variational problem admits an exact Gaussian reduction, yielding a finite-dimensional optimization over masses, means, and covariances, together with a closed-form expression for the optimal transported mass. We then formulate UDC for discrete-time linear systems, where the initial and terminal state measures are imposed softly through KL penalties and the intermediate evolution is governed by controlled linear dynamics with quadratic control cost. For this problem, we prove that any feasible solution can be replaced, without loss of optimality, by a Gaussian initial measure and an affine-Gaussian control policy. This leads to an exact finite-dimensional reformulation and, after a standard covariance-steering lifting, to an SDP-based optimization for fixed mass, again coupled with a closed-form mass update. We further establish existence of optimal solutions and identify a sufficient condition under which the affine-Gaussian UDC policy is deterministic. These results provide globally optimal solution methods for both Gaussian UOT and Gaussian UDC. Finally, we illustrate our results with several numerical examples.
comment: 28 pages; submitted to a journal
Thinking fast and slow -- decision intelligence for power systems
Decision-making in power systems spans multiple timescales - from milliseconds to prevent surges, to seconds to balance frequency and protect grid assets, to minutes for real-time energy balancing, to day-ahead, seasonal, and long-term planning. Growing uncertainty and complexity, driven by intermittent renewables and distributed energy resources (DER), demand fresh approaches to power system intelligence and architecture. Daniel Kahneman describes the interplay of two systems of human decision-making: System 1 that is fast, intuitive, experience based, reactive, and System 2 that is slow, deliberate, analytical. Similarly, octopus intelligence illustrates a model for distributed yet coordinated decision-making between central and edge intelligence. Future power systems must embed coordinated intelligence that operates across diverse timescales and with placement at both edge and centralized levels. This paper maps decision-intelligence in power systems against System 1 and 2 and edge-central architecture paradigms based on the trade-offs inherent in decision making such as speed/latency, energy cost/compute, accuracy, and robustness. The framework inspires an agentic intelligence architecture - laying the foundation for trustworthy, autonomous power systems of the future.
comment: 5 pages, This work has been submitted to IEEE for possible publication
Safety by Invariance, Liveness through Refinement: Heterogeneous Contract Framework for Co-Design of Layered Control
Real-world control systems must achieve long-horizon objectives (liveness) while respecting continuous-time safety constraints, a combination that motivates hierarchical layered control architectures (LCAs). Existing LCA research, however, lacks (i) a uniform specification language across discrete planning and continuous execution, (ii) formal guarantees that specifications are preserved when interconnecting subsystems at heterogeneous time scales, and (iii) compositional separation between layers, owing to reliance on naive input-filtering laws. This paper addresses all three gaps by importing the safety--liveness decomposition into a heterogeneous assume--guarantee framework: \emph{safety is enforced by invariance} at the continuous-time layer, while \emph{liveness is achieved through refinement} at the discrete-time layer, with inter-layer coordination formalized via vertical refinement and timing-compatibility conditions. We instantiate this contract with a novel LCA combining an MPC planner, an input-to-state stabilizing (ISS) low-level controller, and a reference-governor bridge, and validate it on a Hybrid Energy Storage System (HESS) comprising a battery and a supercapacitor.
comment: 22 pages
A Rule-Aware Prompt Framework for Structured Numeric Reasoning in Cyber-Physical Systems
Smart grids rely on high-dimensional numeric telemetry and explicit operating rules to maintain reliable and secure operation. Recent large language models (LLMs) are increasingly considered as candidate decision-support components for power system operations, yet most deployments focus on textual logs, alerts, or operator messages and do not directly address rule-grounded reasoning over numeric grid measurements. This paper proposes a rule-aware prompt framework that systematically encodes power system domain context, numeric normalization, and decision rules into a modular prompt architecture for LLMs. The framework decomposes prompts into reusable modules, including role, domain context, numeric normalization, rule-aware reasoning, value block, and output schema, and exposes an interface for plugging in diverse grid operating rules. A key design element separates rule specification from the representation of normalized numeric deviations, enabling concise prompts aligned with power system criteria. To illustrate its behavior, we instantiate the framework on numeric anomaly detection in the IEEE 118-bus transmission network and evaluate several prompting and adaptation regimes. The results show that rule-aware, z-score-based value blocks and a hybrid LLM+DL architecture substantially improve both consistency with grid operating rules and anomaly detection performance while reducing token usage, providing a reusable bridge between grid telemetry and general-purpose LLMs.
Autonomous Reliability Qualification of Ga$_2$O$_3$-based Hydrogen and Temperature Sensors via Safe Active Learning
We present a Safe Active Learning (SAL) framework for autonomous reliability characterization of rectifying Ga$_2$O$_3$-based devices under coupled thermal and hydrogen stress. SAL treats rectification as a device-physics-motivated safety observable and models its evolution over elapsed time, temperature, and H$_2$ concentration using a Gaussian-process surrogate. To handle condition-dependent and uncertain experiment durations, the method combines an adaptive completion-time window, time-window lower-confidence-bound safety checks, a trust region anchored to previously verified safe conditions, and a two-phase strategy that transitions from conservative safe exploration to progressively relaxed rectification targets as the device degrades. We first evaluate SAL in simulation, where it safely expands the explored region while learning the evolving rectification surface. We then demonstrate SAL experimentally on an automated high-temperature probe-station platform using a Pt/Cr$_2$O$_3$:Mg/$β$-Ga$_2$O$_3$ device. In the reported campaign, phase 1 incurred only one unsafe measurement associated with spurious current-voltage sweeps, while phase 2 intentionally probed lower-rectification regimes. Finally, we use the curated SAL dataset for offline long-horizon forecasting of device response at a target voltage using a structured Gaussian-process model with a condition-dependent Kohlrausch--Williams--Watts mean and a residual covariance kernel. The model captures long-time, saturating degradation trends in an auxiliary validation dataset, illustrating how safety-aware autonomous experimentation enables both conservative characterization and subsequent degradation modeling. Although demonstrated here for a rectifying Ga$_2$O$_3$ device, SAL is applicable to other systems where a measurable in situ safety observable can be defined.
A Mission-Centric Cyber-Resilience Benchmark for Silent-Watch Operation of Electrified Ground-Platform Power Architectures
Silent-watch operation makes electrified ground platforms depend on supervisory energy management because mission loads must be sustained from stored energy while the engine is off. This paper develops a mission-centric cyber-resilience benchmark for this operating mode. The benchmark connects battery state-of-charge (SOC) spoofing to mission outcomes rather than evaluating the attack only through detector response or control error. It combines a reduced-order DC-bus model, residual-based detection, fallback shedding, and five mission-facing metrics for endurance, critical-load service, priority-weighted loss-of-load cost, unsafe-voltage exposure, and detection delay. The study shows that SOC spoofing creates a structured stealth-versus-impact envelope. Small biases have limited mission effect, intermediate biases produce an endurance deficit well approximated by a first-order expression in bias magnitude, shed power, and average battery draw, and large biases disable the SOC-driven guard. The results also show that defense value depends on fallback depth, not detection alone. An undersized fallback action can leave the Defended case failing to complete the mission despite early detection. MATLAB-to-Simulink parity across five regression scenarios provides a software-verified basis for hardware-in-the-loop testing.
Beyond the Bellman Fixed Point: Geometry and Fast Policy Identification in Value Iteration
Q-value iteration (Q-VI) is usually analyzed through the \(γ\)-contraction of the Bellman operator. This argument proves convergence to \(Q^*\), but it gives only a coarse account of when the induced greedy policy becomes optimal. We study discounted Q-VI as a switching system and focus on the practically optimal solution set (POSS), the set of \(Q\)-functions whose tie-broken greedy policies are optimal. The main result shows that Q-VI reaches the optimal action class in finite time by entering an invariant tube around \(\mathcal X_1=Q^*+\operatorname{span}(\mathbf 1)\), which is contained in the POSS. For every \(\varepsilon>0\), the distance to \(\mathcal X_1\) satisfies an exponential bound with rate \((\barρ+\varepsilon)^k\), where \(\barρ\) is the joint spectral radius of the projected switching family restricted to directions transverse to \(\mathcal X_1\). When \(\barρ<γ\), this transverse convergence is faster than the classical contraction rate. The analysis separates fast policy identification from the subsequent convergence to \(Q^*\), which may still be governed by the all-ones mode. We also give spectral and graph-theoretic conditions under which the strict inequality \(\barρ<γ\) holds or fails.
Lyapunov-Certified Direct Switching Theory for Q-Learning
Q-learning is a fundamental algorithmic primitive in reinforcement learning. This paper develops a new framework for analyzing Q-learning from a switching-system viewpoint. In particular, we derive a direct stochastic switching-system representation of the Q-learning error. The key observation is that the Bellman maximization error can be expressed exactly as an average of action-wise Q-errors under a suitable stochastic policy. The resulting recursion has a switched linear conditional-mean drift and martingale-difference noise. To the best of our knowledge, this is the first convergence-rate analysis of standard Q-learning whose leading exponential rate is expressed through the joint spectral radius (JSR) of a direct switching family. Since the JSR is the exact worst-case exponential rate of the associated switched linear drift, the resulting rate is among the tightest drift-based rates that can be certified for this Q-learning representation. Building on this representation, we prove finite-time bounds based on a product-defined JSR-induced Lyapunov function and also give an optional common quadratic Lyapunov certificate. The quadratic certificate is only a sufficient condition and hence applies only to instances for which the certificate is feasible, whereas the JSR-induced Lyapunov construction applies to the full direct switching family whenever its JSR is below one. When feasible, the quadratic certificate replaces product-based verification by a computable matrix inequality and gives a simpler stochastic bound. We further extend the framework to Markovian observation models.
LightSBB-M: Bridging Schrödinger and Bass for Generative Diffusion Modeling
The Schrodinger Bridge and Bass (SBB) formulation, which jointly controls drift and volatility, is an established extension of the classical Schrodinger Bridge (SB). Building on this framework, we introduce LightSBB-M, an algorithm that computes the optimal SBB transport plan in only a few iterations. The method exploits a dual representation of the SBB objective to obtain analytic expressions for the optimal drift and volatility, and it incorporates a tunable parameter beta greater than zero that interpolates between pure drift (the Schrodinger Bridge) and pure volatility (Bass martingale transport). We show that LightSBB-M achieves the lowest 2-Wasserstein distance on synthetic datasets against state-of-the-art SB and diffusion baselines with up to 32 percent improvement. We also illustrate the generative capability of the framework on an unpaired image-to-image translation task (adult to child faces in FFHQ). These findings demonstrate that LightSBB-M provides a scalable, high-fidelity SBB solver that outperforms existing SB and diffusion baselines across both synthetic and real-world generative tasks. The code is available at https://github.com/alexouadi/LightSBB-M.
Grid Integration of AI Data Centers: A Critical Review of Energy Storage Solutions
Artificial intelligence (AI) is driving unprecedented growth in data center (DC) scale and power demand. AI workloads impose highly dynamic, difficult-to-forecast power profiles on the utility grid, creating reliability and stability challenges that conventional DC architectures are not designed to address. This paper provides a critical review of energy storage systems (ESSs) as the key enabling technology for reliable grid integration of AI DCs. We organize the review around a four-layer hierarchical taxonomy, namely chip-level buffering, rack/server-level ESSs, facility-level uninterruptible power supply (UPS) systems, and grid-scale battery energy storage systems (BESSs), supplemented by non-battery technologies including fuel cells (FCs) and thermal energy storage (TES). Each layer is analyzed with respect to response timescale, power and energy ratings, operational role, integration challenges, and coordination requirements. Key findings include: (i) AI DC load profiles differ fundamentally from traditional loads in their sub-second variability, making conventional ESS dispatch strategies insufficient; (ii) hierarchical, coordinated ESS deployment across all layers is necessary for effective load smoothing and grid support; and (iii) significant gaps remain in simulation tools, degradation modeling, load forecasting, and optimal multi-layer sizing. This review identifies open research challenges and future directions at the intersection of AI computing infrastructure and power system integration.
comment: 22 pages, 9 figures, 5 tables
A Trajectory-Based Approach to Controlled Invariance and Recursively Feasible MPC
In this paper, we revisit the computation of controlled invariant sets for linear discrete-time systems through a trajectory-based viewpoint. We begin by introducing the notion of convex feasible points, which provides a new characterization of controlled invariance using finitely long state trajectories. We further show that combining this notion with the classical backward fixed-point algorithm allows for the computation of the maximal controlled invariant set. Building on these results, we propose a model predictive control (MPC) scheme that guarantees recursive feasibility without relying on precomputed terminal sets. Finally, we formulate the search for convex feasible points as an optimization problem, yielding a practical computational method for constructing controlled invariant sets. The effectiveness of the approach is illustrated through numerical examples.
comment: 11 pages,5 figures, accepted at the European control conference, Updated title and abstract, updated proofs, updated corolary 1 in appendix
SpecKV: Adaptive Speculative Decoding with Compression-Aware Gamma Selection
Speculative decoding accelerates large language model (LLM) inference by using a small draft model to propose candidate tokens that a larger target model verifies. A critical hyperparameter in this process is the speculation length $γ$, which determines how many tokens the draft model proposes per step. Nearly all existing systems use a fixed $γ$ (typically 4), yet empirical evidence suggests that the optimal value varies across task types and, crucially, depends on the compression level applied to the target model. In this paper, we present SpecKV, a lightweight adaptive controller that selects $γ$ per speculation step using signals extracted from the draft model itself. We profile speculative decoding across 4 task categories, 4 speculation lengths, and 3 compression levels (FP16, INT8, NF4), collecting 5,112 step-level records with per-step acceptance rates, draft entropy, and draft confidence. We demonstrate that the optimal $γ$ shifts across compression regimes and that draft model confidence and entropy are strong predictors of acceptance rate (correlation $\approx$ 0.56). SpecKV uses a small MLP trained on these signals to maximize expected tokens per speculation step, achieving a 56.0% improvement over the fixed-$γ=4$ baseline with only 0.34 ms overhead per decision (<0.5% of step time). The improvement is statistically significant (p < 0.001, paired bootstrap test). We release all profiling data, trained models, and notebooks as open-source artifacts.
comment: 11 pages, 8 figures, 7 tables. Code and data available at: https://github.com/Amorfati123/SpecKV
Triple-Identity Authentication: The Future of Secure Access
In password-based authentication systems, the username fields are essentially unprotected, while the password fields are susceptible to attacks. In this article, we shift our research focus from traditional authentication paradigm to the establishment of gatekeeping mechanisms for the systems. To this end, we introduce a Triple-Identity Authentication scheme. First, we combine each user credential (i.e., login name, login password, and authentication password) with the International Mobile Equipment Identity (IMEI) and International Mobile Subscriber Identity (IMSI) of a user's smartphone to create a combined identity represented as "credential+IMEI+IMSI", defined as a system attribute of the user. Then, we grant the password-based local systems autonomy to use the internal elements of our matrix-like hash algorithm. Following a credential input, the algorithm hashes it, and then the local system, rather than the algorithm, creates an identifier using a set of elements randomly selected from the algorithm, which is used to verify the user's combined identity. This decentralized authentication based on the identity-identifier handshake approach is implemented at the system's interaction points, such as login name field, login password field, and server's authentication point. Ultimately, this approach establishes effective security gates, empowering the password-based local systems to autonomously safeguard user identification and authentication processes.
comment: 10 pages, 2 figures,
Risk-Based PV-Rich Distribution System Planning Using Generative AI
Hosting capacity (HC) assessment plays a critical role in distribution system planning under increasing penetration of distributed energy resources (DERs) and associated uncertainties in load and generation. However, conventional approaches often rely on deterministic worst-case evaluation, leading to overly conservative HC estimates. This paper introduces a risk-based framework for HC assessment that explicitly accounts for the frequency, intensity, and duration of voltage violations under uncertain operating conditions. A generative AI-based approach is employed to generate realistic, time-correlated load demand scenarios conditioned on projected energy consumption growth levels. These scenarios are then used to assess voltage violations and quantify their risk using probabilistic intensity, duration, and frequency (IDF) metrics. The results show that extreme-percentile (zero-risk) approaches significantly underestimate PV-HC by treating all violations equally, regardless of their likelihood or persistence. For instance, allowing a 5% risk level increases HC by approximately 18% for a 15 min violation duration. The proposed approach provides a practical tool for risk-informed distribution system planning under uncertainty.
The AI risk repository: A meta-review, database, and taxonomy of risks from artificial intelligence
Artificial intelligence (AI) is reshaping society, from video generation to medical diagnosis, coding agents to autonomous vehicles. Yet researchers, policymakers, and technology companies lack shared terminology for discussing AI risks. Consider "privacy": one framework uses this term to describe a model's ability to leak sensitive training data, while another uses it to mean freedom from government surveillance. Conversely, researchers have introduced "Goodhart's law," "specification gaming," "reward hacking," and "mesa-optimization" to describe the same phenomenon of AI systems optimizing for measured proxies rather than intended goals. This terminological diversity creates friction: comparing findings across studies requires mapping between frameworks, and comprehensive risk coverage requires consulting multiple taxonomies that use different organizing principles. This paper addresses this challenge by creating a comprehensive catalog of AI risks. We systematically analyzed every major AI risk framework published to date-74 frameworks containing 1,725 distinct risks-and organized them into a unified system. Our two classification systems reveal important patterns: contrary to common assumptions, human decisions cause nearly as many AI risks (38%) as the AI systems themselves (42%). The work provides practical tools for anyone working on AI safety, from developers conducting risk assessments to policymakers writing regulations to auditors evaluating AI systems. By establishing a common reference point, this repository creates the foundation for more coordinated and comprehensive approaches to managing AI's risks while realizing its benefits.
comment: This paper has been published in Patterns (Cell Press, 2026) under a CC BY 4.0 licence: https://doi.org/10.1016/j.patter.2026.101517
Optimal Control with Natural Images: Efficient Reinforcement Learning using Overcomplete Sparse Codes
Optimal control and sequential decision making are widely used in many complex tasks. Optimal control over a sequence of natural images is a first step towards understanding the role of vision in control. Here, we formalize this problem as a reinforcement learning task, and derive general conditions under which an image includes enough information to implement an optimal policy. Reinforcement learning is shown to provide a computationally efficient method for finding optimal policies when natural images are encoded into "efficient" image representations. This is demonstrated by introducing a new reinforcement learning benchmark that easily scales to large numbers of states and long horizons. In particular, by representing each image as an overcomplete sparse code, we are able to efficiently solve an optimal control task that is orders of magnitude larger than those tasks solvable using complete codes. Theoretical justification for this behaviour is provided. This work also demonstrates that deep learning is not necessary for efficient optimal control with natural images.
A Practical Guide to PID Controller Implementation
How difficult can it be to implement a PID controller? The answer is twofold. Implementing the PID control law is simple and computationally inexpensive. However, this basic form will not work in practical applications. The primary reason for this is the various physical limitations of the actuator. Measurement noise, different implementations depending on the various structures (P, PI, PD or PID), bumpless transfer, and varying sampling time also result in problems rendering the basic form inoperable. PID implementation is therefore more difficult than meets the eye. This paper introduces a reference implementation of the PID controller which considers these practical issues. It includes pseudo-code, discussion of the implementation choices and simulation of carefully selected, important test cases.
Co-Learning Port-Hamiltonian Systems and Optimal Energy-Shaping Control
We develop a physics-informed learning framework for energy-shaping control of port-Hamiltonian (pH) systems from trajectory data. The proposed approach co-learns a pH system model and an optimal energy-balancing passivity-based controller (EB-PBC) through alternating optimization with policy-aware data collection. At each iteration, the system model is refined using trajectory data collected under the current control policy, and the controller is re-optimized on the updated model. Both components are parameterized by neural networks that embed the pH dynamics and EB-PBC structure, ensuring interpretability in terms of energy interactions. The learned controller renders the closed-loop system inherently passive and provably stable, and exploits passive plant dynamics without canceling the natural potential. A dissipation regularization enforces strict energy decay during training, thereby enhancing robustness to sim-to-real gaps. The proposed framework is validated on state-regulation and swing-up tasks for planar and torsional pendulum systems.
ModelPredictiveControl.jl: advanced process control made easy in Julia
Proprietary closed-source software is still the norm in advanced process control. Transparency and reproducibility are key aspects of scientific research. Free and open-source toolkit can contribute to the development, sharing and advancement of new and efficient control approaches, and the industrial sector will certainly benefit from them. This paper presents ModelPredictiveControl.jl, an open-source software package for designing model predictive controllers in the Julia programming language. It is designed to be easy to use and modular, while providing advanced features like nonlinear control and moving horizon estimation. It relies on powerful control system, mathematical optimization and automatic differentiation frameworks to simplify the construction and testing of state estimators and predictive controllers. It also integrates with the standard plotting library to quickly visualize closed-loop data. The paper presents the main functionalities and illustrates them with two case studies in simulation. The first example is a continuously stirred tank reactor described by linear dynamics. The second one implements a nonlinear, an economic, and a successive linearization model predictive controllers for an inverted pendulum. The solving times are benchmarked against equivalent implementations in MATLAB to show the efficiency of the package.
comment: 11 pages, 12 figures, 1 table
How Does the Lagrangian Guide Safe Reinforcement Learning through Diffusion Models?
Diffusion policy sampling enables reinforcement learning (RL) to represent multimodal action distributions beyond suboptimal unimodal Gaussian policies. However, existing diffusion-based RL methods primarily focus on offline settings for reward maximization, with limited consideration of safety in online settings. To address this gap, we propose Augmented Lagrangian-Guided Diffusion (ALGD), a novel algorithm for off-policy safe RL. By revisiting optimization theory and energy-based model, we show that the instability of primal-dual methods arises from the non-convex Lagrangian landscape. In diffusion-based safe RL, the Lagrangian can be interpreted as an energy function guiding the denoising dynamics. Counterintuitively, direct usage destabilizes both policy generation and training. ALGD resolves this issue by introducing an augmented Lagrangian that locally convexifies the energy landscape, yielding a stabilized policy generation and training process without altering the distribution of the optimal policy. Theoretical analysis and extensive experiments demonstrate that ALGD is both theoretically grounded and empirically effective, achieving strong and stable performance across diverse environments.
Robotics
MolmoAct2: Action Reasoning Models for Real-world Deployment
Vision-Language-Action (VLA) models aim to provide a single generalist controller for robots, but today's systems fall short on the criteria that matter for real-world deployment. Frontier models are closed, open-weight alternatives are tied to expensive hardware, reasoning-augmented policies pay prohibitive latency for their grounding, and fine-tuned success rates remain below the threshold for dependable use. We present MolmoAct2, a fully open action reasoning model built for practical deployment, advancing its predecessor along five axes. We introduce MolmoER, a VLM backbone specialized for spatial and embodied reasoning, trained on a 3.3M-sample corpus with a specialize-then-rehearse recipe. We release three new datasets spanning low-to-medium cost platforms, including MolmoAct2-BimanualYAM, 720 hours of teleoperated bimanual trajectories that constitute the largest open bimanual dataset to date, together with quality-filtered Franka (DROID) and SO100/101 subsets. We provide OpenFAST, an open-weight, open-data action tokenizer trained on millions of trajectories across five embodiments. We redesign the architecture to graft a flow-matching continuous-action expert onto a discrete-token VLM via per-layer KV-cache conditioning. Finally, we propose MolmoThink, an adaptive-depth reasoning variant that re-predicts depth tokens only for scene regions that change between timesteps, retaining geometric grounding at a fraction of prior latency. In the most extensive empirical study of any open VLA to date, spanning 7 simulation and real-world benchmarks, MolmoAct2 outperforms strong baselines including Pi-05, while MolmoER surpasses GPT-5 and Gemini Robotics ER-1.5 across 13 embodied-reasoning benchmarks. We release model weights, training code, and complete training data. Project page: https://allenai.org/blog/molmoact2
comment: 31 pages, project page: https://allenai.org/blog/molmoact2
Enhancing RL Generalizability in Robotics through SHAP Analysis of Algorithms and Hyperparameters ICPR 2026
Despite significant advances in Reinforcement Learning (RL), model performance remains highly sensitive to algorithm and hyperparameter configurations, while generalization gaps across environments complicate real-world deployment. Although prior work has studied RL generalization, the relative contribution of specific configurations to the generalization gap has not been quantitatively decomposed and systematically leveraged for configuration selection. To address this limitation, we propose an explainable framework that evaluates RL performance across robotic environments using SHapley Additive exPlanations (SHAP) to quantify configuration impacts. We establish a theoretical foundation connecting Shapley values to generalizability, empirically analyze configuration impact patterns, and introduce SHAP-guided configuration selection to enhance generalization. Our results reveal distinct patterns across algorithms and hyperparameters, with consistent configuration impacts across diverse tasks and environments. By applying these insights to configuration selection, we achieve improved RL generalizability and provide actionable guidance for practitioners.
comment: 15 pages, 7 figures, accepted by ICPR 2026
Semantic Risk-Aware Heuristic Planning for Robotic Navigation in Dynamic Environments: An LLM-Inspired Approach
The integration of Large Language Model (LLM) reasoning principles into classical robot path planning represents a rapidly emerging research direction. In this paper, we propose a Semantic Risk-Aware Heuristic (SRAH) planner that encodes LLM-inspired cost functions penalising geometrically cluttered or high-risk zones into an A$^*$ search framework, augmented with closed-loop replanning upon dynamic obstacle detection. We evaluate SRAH against two established baselines Breadth-First Search (BFS) with replanning and a Greedy heuristic without replanning across 200 randomised trials in a $15{\times}15$ grid-world with 20\% static obstacle density and stochastic dynamic obstacles. SRAH achieves a task success rate of 62.0\%, outperforming BFS (56.5\%) by 9.7\% relative improvement and Greedy (4.0\%) by a large margin. We further analyse the trade-off between planning overhead, path efficiency, and failure-recovery count, and demonstrate via an obstacle-density ablation that semantic cost shaping consistently improves navigation across environments of varying difficulty. Our results suggest that even lightweight, LLM-inspired heuristics provide measurable safety and robustness gains for autonomous robot navigation.
comment: 5 pages, 5 figures. Experimental study on semantic risk-aware heuristic planning for robotic navigation
LiDAR Teach, Radar Repeat: Robust Cross-Modal Navigation in Degenerate and Varying Environments
Long-term autonomy requires robust navigation in environments subject to dynamic and static changes, as well as adverse weather conditions. Teach-and-Repeat (T\&R) navigation offers a reliable and cost-effective solution by avoiding the need for consistent global mapping; however, existing T\&R systems lack a systematic solution to tackle various environmental variations such as weather degradation, ephemeral dynamics, and structural changes. This work proposes LTR$^2$, the first cross-modal, cross-platform LiDAR-Teach-and-Radar-Repeat system that systematically addresses these challenges. LTR$^2$ leverages LiDAR during the teaching phase to capture precise structural information under normal conditions and utilizes 4D millimeter-wave radar during the repeating phase for robust operation under environmental degradations. To align sparse and noisy forward-looking 4D radar with dense and accurate omnidirectional 3D LiDAR data, we introduce a Cross-Modal Registration (CMR) network that jointly exploits Doppler-based motion priors and the physical laws governing LiDAR intensity and radar power density. Furthermore, we propose an adaptive fine-tuning strategy that incrementally updates the CMR network based on localization errors, enabling long-term adaptability to static environmental changes without ground-truth labels. We demonstrate that the proposed CMR network achieves state-of-the-art cross-modal registration performance on the open-access dataset. Then we validate LTR$^2$ across three robot platforms over a large-scale, long-term deployment (40+ km over 6 months), including challenging conditions such as nighttime smoke. Experimental results and ablation studies demonstrate centimeter-level accuracy and strong robustness against diverse environmental disturbances, significantly outperforming existing approaches.
comment: Accepted by IEEE Transactions on Robotics
DynoSLAM: Dynamic SLAM with Generative Graph Neural Networks for Real-World Social Navigation
Traditional Simultaneous Localization and Mapping (SLAM) algorithms rely heavily on the static environment assumption, which severely limits their applicability in real-world spaces populated by moving entities, such as pedestrians. In this work, we propose DynoSLAM, a tightly-coupled Dynamic GraphSLAM architecture that integrates socially-aware Graph Neural Networks (GNNs) directly into the factor graph optimization. Unlike conventional approaches that use rigid constant-velocity heuristics or deterministic single-agent neural priors, our framework formulates pedestrian motion forecasting as a stochastic World Model. By utilizing Monte Carlo rollouts from a trained GNN, we capture the multimodal epistemic uncertainty of human interactions and embed it into the SLAM graph via a dynamic Mahalanobis distance factor. We demonstrate through extensive simulated experiments that this stochastic formulation not only maintains highly accurate retrospective tracking but also prevents the optimization failures caused by the deterministic "argmax problem". Ultimately, extracting the empirical mean and covariance matrices of future pedestrian states provides a mathematically rigorous, probabilistic safety envelope for downstream local planners, enabling anticipatory and collision-free robot navigation in densely crowded environments.
comment: Code & Project page at https://github.com/makriot/dynoslam
Seeing Realism from Simulation: Efficient Video Transfer for Vision-Language-Action Data Augmentation ICML 2026
Vision-language-action (VLA) models typically rely on large-scale real-world videos, whereas simulated data, despite being inexpensive and highly parallelizable to collect, often suffers from a substantial visual domain gap and limited environmental diversity, resulting in weak real-world generalization. We present an efficient video augmentation framework that converts simulated VLA videos into realistic training videos while preserving task semantics and action trajectories. Our pipeline extracts structured conditions from simulation via video semantic segmentation and video captioning, rewrites captions to diversify environments, and uses a conditional video transfer model to synthesize realistic videos. To make augmentation practical at scale, we introduce a diffusion feature-reuse mechanism that reuses video tokens across adjacent timesteps to accelerate generation, and a coreset sampling strategy that identifies a compact, non-redundant subset for augmentation under limited computation. Extensive experiments on Robotwin 2.0, LIBERO, LIBERO-Plus, and a real robotic platform demonstrate consistent improvements. For example, our method improves RDT-1B by 8% on Robotwin 2.0, and boosts $π_0$ by 5.1% on the more challenging LIBERO-Plus benchmark. Code is available at: https://github.com/nanfangxiansheng/Seeing-Realism-from-Simulation.
comment: ICML 2026
Latent Bridge: Feature Delta Prediction for Efficient Dual-System Vision-Language-Action Model Inference
Dual-system Vision-Language-Action (VLA) models achieve state-of-the-art robotic manipulation but are bottlenecked by the VLM backbone, which must execute at every control step while producing temporally redundant features. We propose Latent Bridge, a lightweight model that predicts VLM output deltas between timesteps, enabling the action head to operate on predicted outputs while the expensive VLM backbone is called only periodically. We instantiate Latent Bridge on two architecturally distinct VLAs: GR00T-N1.6 (feature-space bridge) and π0.5 (KV-cache bridge), demonstrating that the approach generalizes across VLA designs. Our task-agnostic DAgger training pipeline transfers across benchmarks without modification. Across four LIBERO suites, 24 RoboCasa kitchen tasks, and the ALOHA sim transfer-cube task, Latent Bridge achieves 95-100% performance retention while reducing VLM calls by 50-75%, yielding 1.65-1.73x net per-episode speedup.
Parking Assistance for Trailer-Truck Transport Vehicles Using Sensor Fusion and Motion Planning
Autonomous driving technology has rapidly evolved over the past decade, offering significant improvements in transportation efficiency, safety, and cost reduction. While much of the progress has focused on highway driving and obstacle avoidance, low-speed maneuvers such as parking remain among the most difficult challenges for autonomous systems. This challenge is especially pronounced in trailer-truck transport vehicles due to their articulated motion and environmental constraints. This paper presents a proposed framework for autonomous truck parking that integrates perception, motion planning, control systems, and infrastructure awareness. By combining sensor fusion, Hybrid A* path planning, nonlinear model predictive control (NMPC), and data-driven parking systems, this work highlights the importance of system-level coordination for reliable and scalable autonomous parking solutions. As a proof-of-concept implementation, we adapted an open-source A* path planning simulation to incorporate a tractor-trailer kinematic model, demonstrating articulated vehicle path planning within a command-line simulation environment, with jackknife prevention identified as an area requiring further development.
comment: 7 pages, 4 figures
Tensegrity crutches with compliance from a pre-stressed self-tensile module improve ground reaction force profiles, speed, effort, comfort, and perceived stability
Purpose: Six million people use crutches as mobile aids in the US. Rigid designs with no axial mobility limit sensory feedback and lead to secondary injury on the upper joints. Spring-loaded designs offer compliance but may compromise stability. We designed a biologically inspired tensegrity crutch with a compliant module aiming to achieve favorable mechanical properties. The terminal module was a pre-stressed self-tensile two-cell tensegrity structure. We compared the tensegrity crutch to commercial rigid and spring-loaded crutches in mechanical tests using axial loading, in overground straight and turning walking, and in participant experience. Methods: In human trials, healthy young adults (N=18) with no recent lower-body injury performed straight walking and turning trials at a comfortable self-selected pace. A knee blocker simulated unilateral injury of the dominant leg. After using each type of crutch, participants reported their perceived levels of effort, comfort, pain, stability, and usability. Results: Compared to the rigid design, both spring-loaded and tensegrity conditions reduced peak loading rates. The tensegrity design improved effort, comfort, pain, and usability. Spring-loaded crutches reduced perceived stability and walking speed. Conclusion: The biologically inspired tensegrity crutches were an overall improvement to existing designs. Simulations and mechanical testing suggest that nonlinear stiffness, ground-following, and force feedback are among the beneficial mechanical properties that underlie this improvement.
comment: 43 double-spaced pages, 7 figures
Temporally Consistent Object 6D Pose Estimation for Robot Control
Single-view RGB object pose estimators have reached a level of precision and efficiency that makes them good candidates for vision-based robot control. However, off-the-shelf methods lack temporal consistency and robustness that are mandatory for a stable feedback control. In this work, we develop a factor graph approach to enforce temporal consistency of the object pose estimates. In particular, the proposed approach: (i) incorporates object motion models, (ii) explicitly estimates the object pose measurement uncertainty, and (iii) integrates the above two components in an online optimization-based estimator. We demonstrate that with appropriate outlier rejection and smoothing using the proposed factor graph approach, we can significantly improve the results on standardized pose estimation benchmarks. We experimentally validate the stability of the proposed approach for a feedback-based robot control task in which the object is tracked by the camera attached to a torque controlled manipulator.
comment: Project page: https://data.ciirc.cvut.cz/public/projects/2024TemporalPose/
Learning Equivariant Neural-Augmented Object Dynamics From Few Interactions
Learning data-efficient object dynamics models for robotic manipulation remains challenging, especially for deformable objects. A popular approach is to model objects as sets of 3D particles and learn their motion using graph neural networks. In practice, this is not enough to maintain physical feasibility over long horizons and may require large amounts of interaction data to learn. We introduce PIEGraph, a novel approach to combining analytical physics and data-driven models to capture object dynamics for both rigid and deformable bodies using limited real-world interaction data. PIEGraph consists of two components: (1) a \textbf{P}hysically \textbf{I}nformed particle-based analytical model (implemented as a spring--mass system) to enforce physically feasible motion, and (2) an \textbf{E}quivariant \textbf{Graph} Neural Network with a novel action representation that exploits symmetries in particle interactions to guide the analytical model. We evaluate PIEGraph in simulation and on robot hardware for reorientation and repositioning tasks with ropes, cloth, stuffed animals and rigid objects. We show that our method enables accurate dynamics prediction and reliable downstream robotic manipulation planning, which outperforms state of the art baselines.
comment: 10 pages, 8 figures
AnchorD: Metric Grounding of Monocular Depth Using Factor Graphs
Dense and accurate depth estimation is essential for robotic manipulation, grasping, and navigation, yet currently available depth sensors are prone to errors on transparent, specular, and general non-Lambertian surfaces. To mitigate these errors, large-scale monocular depth estimation approaches provide strong structural priors, but their predictions can be potentially skewed or mis-scaled in metric units, limiting their direct use in robotics. Thus, in this work, we propose a training-free depth grounding framework that anchors monocular depth estimation priors from a depth foundation model in raw sensor depth through factor graph optimization. Our method performs a patch-wise affine alignment, locally grounding monocular predictions in metric real-world depth while preserving fine-grained geometric structure and discontinuities. To facilitate evaluation in challenging real-world conditions, we introduce a benchmark dataset with dense scene-wide ground truth depth in the presence of non-Lambertian objects. Ground truth is obtained via matte reflection spray and multi-camera fusion, overcoming the reliance on object-only CAD-based annotations used in prior datasets. Extensive evaluations across diverse sensors and domains demonstrate consistent improvements in depth performance without any (re-)training. We make our implementation publicly available at https://anchord.cs.uni-freiburg.de.
comment: 8 pages, 9 Figures, 3 Tables
CoRAL: Contact-Rich Adaptive LLM-based Control for Robotic Manipulation
While Large Language Models (LLMs) and Vision-Language Models (VLMs) demonstrate remarkable capabilities in high-level reasoning and semantic understanding, applying them directly to contact-rich manipulation remains a challenge due to their lack of explicit physical grounding and inability to perform adaptive control. To bridge this gap, we propose CoRAL (Contact-Rich Adaptive LLM-based control), a modular framework that enables zero-shot planning by decoupling high-level reasoning from low-level control. Unlike black-box policies, CoRAL uses LLMs not as direct controllers, but as cost designers that synthesize context-aware objective functions for a sampling-based motion planner (MPPI). To address the ambiguity of physical parameters in visual data, we introduce a neuro-symbolic adaptation loop: a VLM provides semantic priors for environmental dynamics, such as mass and friction estimates, which are then explicitly refined in real time via online system identification, while the LLM iteratively modulates the cost-function structure to correct strategic errors based on interaction feedback. Furthermore, a retrieval-based memory unit allows the system to reuse successful strategies across recurrent tasks. This hierarchical architecture ensures real-time control stability by decoupling high-level semantic reasoning from reactive execution, effectively bridging the gap between slow LLM inference and dynamic contact requirements. We validate CoRAL on both simulation and real-world hardware across challenging and novel tasks, such as flipping objects against walls by leveraging extrinsic contacts. Experiments demonstrate that CoRAL outperforms state-of-the-art VLA and foundation-model-based planner baselines by boosting success rates over 50% on average in unseen contact-rich scenarios, effectively handling sim-to-real gaps through its adaptive physical understanding.
comment: 21 pages, 9 figures, 3 tables. Accepted to Robotics: Science and Systems (RSS) 2026
Hyp2Former: Hierarchy-Aware Hyperbolic Embeddings for Open-Set Panoptic Segmentation
Recognizing unknown objects is crucial for safety-critical applications such as autonomous driving and robotics. Open-Set Panoptic Segmentation (OPS) aims to segment known thing and stuff classes while identifying valid unknown objects as separate instances. Prior OPS approaches largely treat known categories as a flat label set, ignoring the semantic hierarchy that provides valuable structural priors for distinguishing unknown objects from in-distribution classes. In this work, we propose Hyp2Former, an end-to-end framework for OPS that does not require explicit modeling of unknowns during training, and instead learns hierarchical semantic similarities continuously in hyperbolic space. By explicitly encoding hierarchical relationships among known categories, the model learns a structured embedding space that captures multiple levels of semantic abstraction. As a result, unknown objects that cannot be confidently classified as known categories still remain in close proximity to higher-level concepts (e.g., an unknown animal remains closer to "animal" or "object" than to unrelated concepts such as "electronics" or "stuff") and can therefore be reliably detected, even if their fine-grained category was not represented during training. Empirical evaluations across multiple public datasets such as MS COCO, Cityscapes, and Lost&Found demonstrate that Hyp2Former outperforms existing methods on OPS, achieving the best balance between unknown object discovery and in-distribution robustness.
Robotic Affection -- Opportunities of AI-based haptic interactions to improve social robotic touch through a multi-deep-learning approach
Despite the advancement in robotic grasping and dexterity through haptic information, affective social touch, such as handshaking or reassuring stroking, remains a major challenge in Human-Robot-Interaction. This position paper examines current progress and limitations across artificial intelligence, haptics and robotics research, and proposes a novel multi-model architecture to address these gaps. Drawing inspiration from neurobiology, we decompose affective touch into distinct, specialized subtasks models. By treating affective touch as a distributed, closed-loop perceptual task rather than a monolithic motoric movement, we aim to overcome the "haptic uncanny valley" through a peer-to-peer, state-sharing framework. Our approach supports scalable and cumulative development within a Sim-to-Real pipeline, fostering interdisciplinary collaboration. By enabling haptics, AI, and robotics researchers to contribute independently yet coherently, we outline a pathway toward a unified, expressive system for social robotics.
comment: AI for Haptics and Haptics for AI: Challenges and Opportunities Workshop at the 2026 CHI Conference on Human Factors in Computing Systems (CHI 26), April 13 - 17 2026, Barcelona, Spain
Orchestrating Spatial Semantics via a Zone-Graph Paradigm for Intricate Indoor Scene Generation
Autonomous 3D indoor scene synthesis breaks down in non-convex rooms with tightly coupled spatial constraints. Data-driven generators lack topological priors for long-horizon planning, while iterative agents fragment semantics and become geometrically brittle. We present ZoneMaestro, a unified framework that shifts the paradigm from object-centric synthesis to Zone-Graph Orchestration. By internalizing a novel zone-based logic, ZoneMaestro translates high-level semantic intent into functional zones and topological constraints, enabling robust adaptation to diverse architectural forms. To support this, we construct Zone-Scene-10K, a large-scale dataset enriched with explicit Zone-Graph annotations. We further introduce an Alternating Alignment Strategy that cycles between reasoning internalization and Zone-Aware Group Relative Policy Optimization (Z-GRPO), effectively reconciling the tension between semantic richness and geometric validity without relying on external physics engines. To rigorously evaluate spatial intelligence beyond convex primitives, we formally define the task of Intricate Spatial Orchestration and release SCALE, a stress-test benchmark for irregular indoor scenarios with complex, dense spatial relations. Extensive experiments demonstrate that ZoneMaestro resolves the density-safety dichotomy, significantly outperforming state-of-the-art baselines in both structural coherence and intent adherence.
Sim-to-Real Transfer and Robustness Evaluation of Reinforcement Learning Control with Integrated Perception on an ASV for Floating Waste Capture
Autonomous surface vessels for floating-waste removal operate under varying hydrodynamics, external disturbances, and challenging water-surface perception. We present a field-validated system that combines camera-based polarimetric perception with a lightweight DRL-based controller for floating-waste detection and capture. Camera detections are converted into water-surface target points and tracked by a controller trained entirely in simulation and deployed directly on a retrofitted ASV platform. Our main contribution is a sim-to-real testing methodology that combines a two-stage simulation protocol with a perception abstraction module designed to mimic real camera behavior, enabling reproducible field trials and explicit evaluation of the sim-to-real gap. We apply this framework in matched simulation and field experiments across 14 disturbance regimes to expose failure modes and evaluate robustness. The results show centimeter-level terminal accuracy and indicate robust control performance under the evaluated perturbation regimes. The main source of degradation is insufficient actuation-model fidelity. We also demonstrate the system in a search-and-capture application using real camera detections in real-world conditions over areas of up to $450~m^2$. The study distills practical lessons for reliable transfer, including improved actuation-model fidelity, targeted domain randomization, and careful management of latency and timestamps across modules, while highlighting remaining challenges.
Beyond Specialization: Robust Reinforcement Learning Navigation via Procedural Map Generators
Deep reinforcement learning (DRL) navigation policies often overfit to the structure of their training environments, as environmental diversity is typically constrained by the manual effort required to design diverse scenarios. While procedural map generation offers scalable diversity, no prior work systematically compares how different generator types affect policy generalization. We integrate four generators (sparse, maze, graph, and Wave Function Collapse) with guaranteed navigability into MuRoSim, a 2D simulator focusing on training efficiency for LiDAR-based navigation. We cross-evaluate five navigation policies on 1000 seeded maps per generator across three training seeds. Results show a strongly asymmetric cross-generator transfer: a specialist trained on sparse layouts falls to 3.3% success on mazes, whereas a policy trained on the combined generator set achieves 91.5 +/- 1.1% mean success. We further demonstrate that A* path-planner subgoal inputs are the dominant factor for robustness, raising success from the 90.2 +/- 1.4% feedforward baseline to 98.9 +/- 0.4% and outperforming GRU recurrence, which only improves the reactive baseline. The DRL policies outperform a classical Carrot+A* controller, which matches their success only at low speeds (1.0 m/s) but collapses to 24.9% at 2.0 m/s. This highlights learned speed adaptation as the decisive advantage of the learned approach. Real-world experiments on a RoboMaster confirm sim-to-real transfer in a cluttered arena, while a maze-like layout exposes remaining failure modes that recurrence helps mitigate.
comment: This work has been submitted to the IEEE for possible publication
A Semantic Autonomy Framework for VLM-Integrated Indoor Mobile Robots: Hybrid Deterministic Reasoning and Cross-Robot Adaptive Memory
Autonomous indoor mobile robots can navigate reliably to metric coordinates using established frameworks such as ROS 2 Navigation 2, yet they lack the ability to interpret natural language instructions that express intent rather than positions. Vision-Language Models offer the semantic reasoning required to bridge this gap, but their inference latency (2-9 seconds per decision on consumer hardware) and session-by-session amnesia limit practical deployment. This paper presents the Semantic Autonomy Stack, a six-layer reference framework for semantically autonomous indoor navigation, and validates a complete instance featuring hybrid deterministic-VLM reasoning and cross-robot adaptive memory on physical robots with off-the-shelf edge hardware. A seven-step parametric resolver handles 88% of instructions in under 0.1 milliseconds without invoking a language model, camera, or GPU; only genuinely ambiguous instructions escalate to VLM reasoning. A five-category semantic memory framework with explicit scope taxonomy (global environment knowledge, per-operator preferences, per-robot capabilities) enables cross-session learning and cross-robot knowledge transfer: preferences learned through VLM interactions on one robot are promoted to deterministic resolution and transferred to a second robot via a shared compiled digest, achieving a measured latency reduction of 103,000-fold. Experimental validation on two custom-built differential-drive robots across 82 scenario-level decisions and three sessions demonstrates 100% semantic transfer accuracy (33/33, 95% CI [0.894, 1.000]), 100% semantic resolution accuracy, and concurrent multi-robot operation feasibility - all on Raspberry Pi 5 platforms with no onboard GPU, requiring zero training data.
comment: 33 pages, 11 figures, 14 tables
Adaptive Gait Generation for Multi-Terrain Exoskeletons via Constrained Kernelized Movement Primitives
Lower limb exoskeletons (LLEs) present the potential to make motor-impaired individuals walk again. Their application in real-world environments is still limited by the lack of effective adaptive gait planning. Indeed, current exoskeletons are meant to walk only on a flat and even terrain. Generating environment-aware, physiologically consistent gait trajectories in real-time is an open challenge. To overcome this, we propose a novel Kernelized Movement Primitives (KMP)-based framework for adaptive gait generation (AGG) across multiple indoor terrains. The proposed approach learns a probabilistic representation of human gait in both the joint and task spaces from a limited number of human demonstrations, representing natural gait characteristics and ensuring kinematic feasibility. In addition, the learned trajectories are adapted using environmental information extracted from an onboard RGB-D camera by treating the AGG as a linearly constrained optimization problem with via-points. The proposed method has been thoroughly validated first in simulations for gait generation in different scenarios, such as flat-ground walking, slopes, stairs, and obstacles crossing. Finally, the effectiveness and robustness of the method have been demonstrated with experiments on a commercial LLE in real-world scenarios. The results obtained demonstrate the feasibility of an environment-aware gait planning system for a new generation of intelligent lower limb exoskeletons for assisting people with disabilities in their every-day life.
Visibility-Aware Mobile Grasping in Dynamic Environments
This paper addresses the problem of mobile grasping in dynamic, unknown environments where a robot must operate under a limited field-of-view. The fundamental challenge is the inherent trade-off between ``seeing'' around to reduce environmental uncertainty and ``moving'' the body to achieve task progress in a high-dimensional configuration space, subject to visibility constraints. Previous approaches often assume known or static environments and decouple these objectives, failing to guarantee safety when unobserved dynamic obstacles intersect the robot's path during manipulation. In this paper, we propose a unified mobile grasping system comprising two core components: (1) an iterative low-level whole-body planner coupled with velocity-aware active perception to navigate dynamic environments safely; and (2) a hierarchical high-level planner based on behavior trees that adaptively generates subgoals to guide the robot through exploration and runtime failures. We provide experimental results across 400 randomized simulation scenarios and real-world deployment on a Fetch mobile manipulator. Results show that our system achieves a success rate of 68.8\% and 58.0\% in unknown static and dynamic environments, respectively, significantly boosting success rates by 22.8\% and 18.0\% over the \nam approach in both unknown static and dynamic environments, with improved collision safety.
Higher-Order Flexible Configurations of Planar Parallel Manipulators Constructed by Averaging
This paper investigates singular configurations of planar 3-RPR parallel manipulators, which result from applying the averaging technique to solution pairs of their direct kinematic problem. Without computing the zeros of the corresponding degree 6 polynomial we parametrize the input pairs and determine their relative orientation in a way that the flexion order of the averaged configurations increases. Moreover, the obtained results are visualized for concrete examples. The presented methodology can also be used for studying the spherical and spatial analogues of planar 3-RPR parallel manipulators.
comment: This manuscript represents the full complementary version of the paper of the same title submitted to the International Conference on Geometry and Graphics 2026 (ICGG 2026)
Shared Autonomy Assisted by Impedance-Driven Anisotropic Guidance Field
Shared autonomy (SA) enables robots to infer human intent and assist in its achievement. While most research focuses on improving intent inference, it overlooks whether humans can understand the robot's intent in return. Without such mutual understanding, collaboration becomes less effective, degrading user experience and task performance. To address this gap, previous studies have explicitly conveyed the robot intent through additional interfaces, which remain unintuitive and limited in expressiveness. Inspired by impedance control, we propose Impedance-Driven Anisotropic Guidance Field Enhanced Shared Autonomy (IAGF-SA), a novel paradigm that extends SA with an embodied, physically-grounded communication channel. This channel adaptively modulates the robot's dynamic response to human input, enabling intuitive, continuous, physically-grounded robot intent communication while naturally guiding human actions. User studies across three scenarios and two teleoperation interfaces indicate that IAGF-SA improves task performance, human-robot agreement, and subjective experience, thus demonstrating its effectiveness in enhancing human-robot communication and collaboration.
comment: 8 pages, 7 figures. Accepted for publication in IEEE Robotics and Automation Letters
Robust Adaptive Predictive Control for Hook-Based Aerial Transportation Between Moving Platforms
This paper presents a novel model predictive control (MPC) approach for autonomous pick-and-place between moving platforms with a hook-equipped aerial manipulator. First, for accurate and rapid modeling of the complex dynamics, a digital twin model of the quadcopter equipped with a hook-based gripper, implemented in MuJoCo, is constructed and used as the predictive model for the MPC. To handle uncertainties of the predictive model (e.g. due to aerodynamics and uncertain payloads), a robust adaptive MPC approach is proposed. By systematic integration of zero-order robust optimization (zoRO) based uncertainty propagation and an extended Kalman filter (EKF) for parameter estimation, the MPC algorithm ensures robust constraint satisfaction, high performance, and computational efficiency. The effectiveness of the proposed method is evaluated in complex simulated scenarios and in real-world flight experiments.
comment: Supplementary video: https://youtu.be/l_L7mpUYJqU
Feedback Motion Planning for Stochastic Nonlinear Systems with Signal Temporal Logic Specifications
We study feedback motion planning for continuous-time stochastic nonlinear systems under signal temporal logic (STL) specifications. We propose a framework that synthesizes control policies for chance-constrained STL trajectory optimization problems, with the goal of ensuring that the closed-loop stochastic system satisfies a given STL formula with high probability (e.g., 99.99\%). Our approach is based on a predicate erosion strategy that transforms the intractable stochastic problem into a deterministic STL trajectory optimization problem with tightened STL formula constraints. The amount of erosion is determined by a probabilistic reachable tube (PRT) that bounds the deviation between the stochastic trajectory and an associated nominal trajectory. To compute such bounds, we leverage contraction theory and feedback design, and develop several tracking controllers. This yields a complete feedback motion planning pipeline which can be implemented by numerical optimizations. We demonstrate the efficacy and versatility of the proposed framework through simulations on several robotic systems and through experiments on a real-world quadrupedal robot, and show that it is less conservative and achieves higher specification satisfaction probability than representative baselines.
ShapeGrasp: Simultaneous Visuo-Haptic Shape Completion and Grasping for Improved Robot Manipulation
Humans grasp unfamiliar objects by combining an initial visual estimate with tactile and proprioceptive feedback during interaction. We present ShapeGrasp, a robotic implementation of this approach. The proposed method is an iterative grasp-and-complete pipeline that couples implicit surface visuo-haptic shape completion (creation of full 3D shape from partial information) with physics-based grasp planning. From a single RGB-D view, ShapeGrasp infers a complete shape (point cloud or triangular mesh), generates candidate grasps via rigid-body simulation, and executes the best feasible grasp. Each grasp attempt yields additional geometric constraints -- tactile surface contacts and space occupied by the gripper body -- which are fused to update the object shape. Failures trigger pose re-estimation and regrasping using the refined shape. We evaluate ShapeGrasp in the real world using two different robots and grippers. To the best of our knowledge, this is the first approach that updates shape representations following a real-world grasp. We achieved superior results over baselines for both grippers (grasp success rate of 84% with a three-finger gripper and 91% with a two-finger gripper), while improving the 3D shape reconstruction quality in all evaluation metrics used.
comment: Submitted for review to T-RO
Natural Gradient Bayesian Filtering: Geometry-Aware Filter for Dynamical Systems
Bayesian filtering is a cornerstone of state estimation in complex systems such as aerospace systems, yet exact solutions are available only for linear Gaussian models. In practice,nonlinear systems are handled through tractable approximations,with Gaussian filters such as the extended and unscented Kalman filters being among the most widely used methods. This tutorial revisits Gaussian filtering from an information-geometric perspective, viewing the prediction and measurement update steps as inference procedures over state distributions. Within this framework, we introduce a geometry-aware Gaussian filtering approach that leverages natural gradient descent on the statistical manifold of Gaussian distributions. The resulting Natural Gradient Gaussian Approximation (NANO) filter iteratively refines the posterior mean and covariance while respecting the intrinsic geometry of the Gaussian family and preserving the positive definiteness of the covariance matrix. We further highlight fundamental connections to the classical Kalman filtering, showing that a single natural-gradient step exactly recovers the Kalman measurement update in the linear-Gaussian case. The practical implications of the proposed framework are illustrated through case studies in representative nonlinear estimation problems,including satellite attitude estimation, simultaneous localization and mapping, and state estimation for robotic systems including quadruped and humanoid robots.
SAGA: A Robust Self-Attention and Goal-Aware Anchor-based Planner for Safe UAV Autonomous Navigation
Agile unmanned aerial vehicle (UAV) navigation in cluttered environments demands a planning architecture that is both computationally efficient and structurally expressive enough to reason over multiple feasible motions. This paper presents SAGA, a robust self-attention and goal-aware anchor-based planner for safe UAV autonomous navigation. SAGA formulates local planning as a one-stage joint regression-and-ranking problem over a fixed lattice of motion anchors. Given a depth image and a body-frame motion state, the planner predicts refined terminal states and planning scores for all anchors in a single forward pass, after which the best candidate is decoded into a dynamically feasible trajectory. The key idea of SAGA is to transform anchor-aligned features into geometry-aware tokens and perform cross-anchor global reasoning with self-attention. To preserve directional structure in the token space, we further introduce a polar positional encoding derived from anchor yaw and pitch. In addition, a goal-aware modulation module injects velocity, acceleration, and target information into the token representation before final score prediction. Experiments in cluttered pillar-map environments under maximum speed settings of 2.0, 3.0, and 4.0~m/s show that SAGA consistently achieves a 100\% success rate, while YOPO drops from 90.91\% to 62.50\%, Ego-planner from 71.43\% to 52.63\%, and Fast-planner from 52.63\% to 38.46\%. Under the 4.0~m/s maximum speed setting, SAGA also improves average safety from 1.9843~m to 2.3888~m and minimum safety from 0.4390~m to 0.7576~m over YOPO, while reducing total flight time from 40.4631~s to 27.4901~s. The comparison with SAGA w/o PPE further shows that explicit polar positional encoding is critical for stable cross-anchor reasoning and safe passage selection in cluttered scenes.
EdgeLPR: On the Deep Neural Network trade-off between Precision and Performance in LiDAR Place Recognition
Place recognition is essential for long-term autonomous navigation, enabling loop closure and consistent mapping. Although deep learning has improved performance, deploying such models on resource-constrained platforms remains challenging. This work explores efficient LiDAR-based place recognition for EdgeAI by leveraging Bird's Eye View representations to enable lightweight image-based networks. We benchmark representative architectures without aggregation heads using a unified descriptor scheme based on global pooling and linear projection, and evaluate performance under FP32, FP16, and INT8 quantization. Experiments reveal trade-offs between accuracy, robustness, and efficiency: FP16 matches FP32 with lower cost, while INT8 introduces architecture-dependent degradation. Overall, the presented results are a strong basis for future research on 'use-case'-aware quantisation of Neural Networks for Edge deployment.
comment: Accepted to CoDIT 2026
Exact Higher-Order Derivatives for SE(3) via Analytical/AD Methods
Fast prototyping of new SE(3) estimation objectives remains awkward in practice. Modern Lie-group frameworks -- GTSAM, manif, Sophus, SymForce, Ceres -- target first-order workloads through different code-generation and automatic-differentiation strategies, each optimized for a particular seam between hand-derived geometry and generic differentiation. The remaining gap is a compact, AD-safe path from these first-order primitives to exact Hessians, observed-information matrices, and higher-order derivative tensors: the quantities needed for exact Newton steps, observed-information covariance estimates, and covariance correction. This paper presents a hybrid analytical/AD recipe for SE(3) negative log-likelihoods. The practitioner writes the NLL gradient once, generic over a scalar type, and places the analytical/AD seam at the point-action interface y = Tx. Closed-form Lie-group Jacobians are used up to this interface; AD is applied only beyond it. The same source is then instantiated with ordinary floating-point scalars for gradients, vector-seeded dual numbers for exact Hessians in a single forward-mode pass, and nested dual numbers for higher-order derivative tensors. On a representative 6-DoF, 5-landmark SE(3) NLL, the advocated seeded-Hessian path is approximately 5x faster than finite-differencing the AD gradient on this benchmark while matching a nested-AD oracle to machine precision. The implementation adds roughly 70 lines of analytical-Jacobian code over an AD-only baseline. We also identify and fix a removable singularity in the standard SO(3)/SE(3) scalar basis that would otherwise produce NaNs at the origin under seeded AD, and we audit which Lie-group derivative tensors require this stabilized basis. The result is a practical path from rapidly written SE(3) objectives to exact higher-order derivatives, with predictable runtime and no finite-difference tuning.
comment: 7 pages, 1 table. Companion code available at https://github.com/sigmapointlabs/se3_ad_recipes
Change-Robust Online Spatial-Semantic Topological Mapping
Autonomous robots require change-robust spatial-semantic reasoning: using spatial and semantic knowledge to decide where to go, how to get there, and where the robot is despite environmental change. Existing approaches typically attach semantics to SLAM-built metric maps, but these pipelines are brittle under appearance shifts and scene dynamics, where data association and relocalization degrade. We propose a Change-Robust Online Spatial-Semantic (CROSS) representation that replaces a globally consistent metric substrate with an online, pose-aware topological graph of RGB-D keyframes. The system explicitly reasons over perceptual ambiguity using sequential hypothesis testing in continuous SE(3). Our estimator maintains a bounded Gaussian-mixture belief over poses, enabling principled handling of loop closures and kidnapped-robot events. Experiments under severe appearance change, including real-robot object-goal navigation with lighting shifts and furniture rearrangement, demonstrate improved robustness over SLAM-based and topological baselines while remaining safe under perceptual aliasing.
Do We Really Need Immediate Resets? Rethinking Collision Handling for Efficient Robot Navigation
Should a single collision necessarily terminate an entire navigation episode? In most deep reinforcement learning (DRL) frameworks for robot navigation, this remains the standard practice: every collision immediately triggers a global environment reset and is penalized as a complete task failure. While a collision during deployment naturally indicates task failure, applying the same treatment during training prevents the agent from exploring challenging obstacle configurations, which slows learning progress in the early training phase. In this work, we challenge this convention and propose a Multi-Collision reset Budget (MCB) framework that decouples local collision termination from global environment resets, allowing the agent to retry difficult configurations within the same episode. Experiments on multiple simulated and real-world robotic platforms show that the framework accelerates early-stage exploration and improves both success rate and navigation efficiency over conventional single-collision reset baselines, with a small collision budget producing the largest gains.
comment: 7 pages, 7 figures
Sampling-Based Control via Entropy-Regularized Optimal Transport
Sampling-based model predictive control methods like MPPI and CEM are essential for real-time control of nonlinear robotic systems, particularly where discontinuous dynamics preclude gradient-based optimization. However, these methods derive from information-theoretic objectives that are agnostic to the geometry of the control problem, leading to pathological behaviors such as mode-averaging when the cost landscape is complex. We present OT-MPC, a sampling-based algorithm that overcomes these limitations through an entropy-regularized optimal transport formulation. By computing an optimal coupling between candidate control sequences and low-cost proposals, OT-MPC refines candidates toward nearby promising samples while coordinating updates across the ensemble to maintain coverage of the solution space. We derive closed-form, gradient-free updates via the Sinkhorn algorithm, enabling real-time performance. Experiments on navigation, manipulation, and locomotion tasks demonstrate improved success rates over existing methods.
comment: 18 Pages
Robotic Desk Organization: A Multi-Primitive Approach to Manipulating Heterogeneous Objects via Environmental Constraints
Desktop organization remains challenging for service robots because of heterogeneous objects and diverse manipulation objectives, such as collection and stacking. In this article, a task-oriented framework is presented for organizing planar rigid and deformable objects on desks. A perception pipeline was developed that augments existing datasets with uncommon desktop items and makes geometry-based pose and keypoint estimation possible, along with the detection of environmental constraints, such as table edges. To handle diverse manipulation requirements, environment-assisted primitives are used, including contact-based grasping for small objects, edge-based push-grasping for planar rigid objects, and levering-based grasping for planar deformable objects. These primitives leverage environmental and interobject constraints to improve robustness. A task planner was designed to integrate these primitives into multiobject organization. Sufficient real-world experiments demonstrate the effectiveness and robustness of the proposed framework. This research provides practical manipulation primitives for planar rigid and deformable objects, highlighting the role of environmental and interobject constraints in complex multiobject manipulation tasks. Code and video are available online.
AoI-Aware Multi-Robot Sensing and Transport on Connected Graphs
A team of mobile robots monitors spatially distributed processes and delivers measurements to a base, where AoI is measured from sensing start, capturing both stochastic parallel sensing delays and hop-based propagation. At each non-base node, multiple robots may collaborate, yielding node-dependent geometric group sensing times, while other robots act as mobile conveyors that transport samples along unit-time edges. The paper first derives a per-node and network-wide AoI lower bound that decomposes into a sensing term, determined by mean group sensing times, and a propagation term, given by shortest-path distances. It then shows that minimizing the sensing component yields a separable discretely convex resource allocation problem, solved optimally by a greedy water-filling algorithm. A shortest-path-tree conveyor architecture with an Euler-walk deployment is constructed and proven to attain the lower bound in a full-conveyor regime. Numerical simulations illustrate the impact of sensing allocation and conveyor deployment on AoI performance.
Benchmarking Local Language Models for Social Robots using Edge Devices
Social-educational robots designed for socially interactive pedagogical support, such as the Robot Study Companion (RSC), rely on responsive, privacy-preserving interaction despite severely limited compute. However, there is a gap in systematic benchmarking of language models for edge computing in pedagogical applications. This paper benchmarks 25 open-source language models for local deployment on edge hardware. We evaluate each model across three dimensions: inference efficiency (tokens per second, energy consumption), general knowledge (a six-category MMLU subset), and teaching effectiveness (LLM-rated pedagogical quality), validated against five independent human raters using the Raspberry Pi(RPi)4 as the primary platform, with additional comparisons on the RPi5 and a laptop GPU. Results reveal pronounced trade-offs: throughput and energy efficiency vary by over an order of magnitude across models, MMLU accuracy ranges from near-random to 57.2%, and teaching effectiveness does not correlate monotonically with either metric. Among the evaluated models, Granite4 Tiny Hybrid (7B) achieves a strong overall balance, reaching 2.5 tokens per second, 0.90 tokens per joule, and 54.6% MMLU accuracy; high MMLU accuracy does not appear necessary for strong teaching scores. Human validation on four representative models preserved the automated rank ordering (Pearson r = 0.967, n = 4). Based on these findings, we propose a three-tier local inference architecture for the RSC that balances responsiveness and accuracy on resource-constrained hardware.
comment: Accepted for 22nd IEEE International Conference on Advanced Robotics and its Social Impact (June 2026) in Vienna, Austria
Refining Compositional Diffusion for Reliable Long-Horizon Planning
Compositional diffusion planning generates long-horizon trajectories by stitching together overlapping short-horizon segments through score composition. However, when local plan distributions are multimodal, existing compositional methods suffer from mode-averaging, where averaging incompatible local modes leads to plans that are neither locally feasible nor globally coherent. We propose Refining Compositional Diffusion (RCD), a training-free guidance method that steers compositional sampling toward high-density, globally coherent plans. RCD leverages the self-reconstruction error of a pretrained diffusion model as a proxy for the log-density of composed plans, combined with an overlap consistency term that enforces consistency at segment boundaries. We show that the combined guidance concentrates sampling on high-density plans that mitigate mode-averaging. Experiments on challenging long-horizon tasks from OGBench, including locomotion, object manipulation, and pixel-based observations, demonstrate that RCD consistently outperforms existing methods.
OGPO: Sample Efficient Full-Finetuning of Generative Control Policies
Generative control policies (GCPs), such as diffusion- and flow-based control policies, have emerged as effective parameterizations for robot learning. This work introduces Off-policy Generative Policy Optimization (OGPO), a sample-efficient algorithm for finetuning GCPs that maintains off-policy critic networks to maximize data reuse and propagate policy gradients through the full generative process of the policy via a modified PPO objective, using critics as the terminal reward. OGPO achieves state-of-the-art performance on manipulation tasks spanning multi-task settings, high-precision insertion, and dexterous control. To our knowledge, it is also the only method that can fine-tune poorly-initialized behavior cloning policies to near full task-success with no expert data in the online replay buffer, and does so with few task-specific hyperparameter tuning. Through extensive empirical investigations, we demonstrate the OGPO drastically outperforms methods alternatives on policy steering and learning residual corrections, and identify the key mechanisms behind its performance. We further introduce practical stabilizers, including success-buffer regularization, conservative advantages, $χ^2$ regularization, and Q-variance reduction, to mitigate critic over-exploitation across state- and pixel-based settings. Beyond proposing OGPO, we conduct a systematic empirical study of GCP finetuning, identifying the stabilizing mechanisms and failure modes that govern successful off-policy full-policy improvement.
Human-in-the-Loop Uncertainty Analysis in Self-Adaptive Robots Using LLMs
Self-adaptive robots operate in dynamic, unpredictable environments where unaddressed uncertainties can lead to safety violations and operational failures. However, systematically identifying and analyzing these uncertainties, including their sources, impacts, and mitigation strategies, remains a significant challenge given the inherent complexity of real-world environments, dynamic robotic behavior, and the rapid evolution of robotic technologies. To address this, we introduce RoboULM, a human-in-the-loop methodology and tool that supports practitioners in systematically exploring uncertainties at the design stage using large language models (LLMs). Moreover, we present an uncertainty taxonomy that provides a detailed catalog of uncertainties in self-adaptive robots. We evaluated RoboULM with 16 practitioners from four industrial use cases. The results show that RoboULM was perceived as both useful and easy to understand, with the participants particularly valuing structured prompting and iterative refinement support. These findings demonstrate the potential of RoboULM as a viable solution for systematic uncertainty analysis in complex robots.
Design and Characteristics of a Thin-Film ThermoMesh for the Efficient Embedded Sensing of a Spatio-Temporally Sparse Heat Source
This work presents ThermoMesh, a passive thin-film thermoelectric mesh sensor designed to detect and characterize spatio-temporally sparse heat sources through conduction-based thermal imaging. The device integrates thermoelectric junctions with linear or nonlinear interlayer resistive elements to perform simultaneous sensing and in-sensor compression. We focus on the single-event (1-sparse) operation and define four performance metrics: range, efficiency, sensitivity, and accuracy. Numerical modeling shows that a linear resistive interlayer flattens the sensitivity distribution and improves minimum sensitivity by approximately tenfold for a $16\times16$ mesh. Nonlinear temperature-dependent interlayers further enhance minimum sensitivity at scale: a ceramic negative-temperature-coefficient (NTC) layer over 973-1273K yields a $\sim14{,}500\times$ higher minimum sensitivity than the linear design at a $200\times200$ mesh, while a VO$_2$ interlayer modeled across its metal-insulator transition (MIT) over 298-373K yields a $\sim24\times$ improvement. Using synthetic 1-sparse datasets with white boundary-channel noise at a signal-to-noise ratio of 40dB, the VO$_2$ case achieved $98\%$ localization accuracy, a mean absolute temperature error of $0.23$K, and a noise-equivalent temperature (NET) of $0.07$K. For the ceramic-NTC case no localization errors were observed under the tested conditions, with a mean absolute temperature error of $1.83$K and a NET of $1.49$K. These results indicate that ThermoMesh could enable energy-efficient embedded thermal sensing in scenarios where conventional infrared imaging is limited, such as molten-droplet detection or hot-spot monitoring in harsh environments.
comment: 45 pages, 13 figures, 63 references, under review in Sensors and Actuators A: Physical
ROBOPOL: Social Robotics Meets Vehicular Communications for Cooperative Automated Driving
On the way toward full autonomy, sharing roads between automated and autonomous vehicles in so-called mixed traffic is unavoidable. Moreover, even if all vehicles on the road were autonomous, pedestrians would still cross streets. We propose social robots as moderators between autonomous vehicles and vulnerable road users. This paper presents a first proof-of-concept integration of a social robot advising pedestrians in crossing scenarios involving a cooperative automated vehicle. We also discuss key enablers required for designing "robot policeman" in a generic use case of cooperative intersection management. Our work provides a vision of the role of social robotics in future Cooperative Intelligent Transport Systems.
Safe Planning in Interactive Environments via Iterative Policy Updates and Adversarially Robust Conformal Prediction
Safe planning of an autonomous agent in interactive environments -- such as the control of a self-driving vehicle among pedestrians -- poses a major challenge as the behavior of the environment is unknown and reactive to the behavior of the autonomous agent. This coupling gives rise to interaction-driven distribution shifts where the autonomous agent's control policy may change the environment's behavior, thereby invalidating safety guarantees in existing work. Indeed, recent works have used conformal prediction (CP) to generate distribution-free safety guarantees using observed data of the environment. However, CP's assumption on data exchangeability is violated in interactive settings due to a circular dependency where a control policy update changes the environment's behavior, and vice versa. To address this gap, we propose an iterative framework that robustly maintains safety guarantees across policy updates by quantifying the potential impact of a planned policy update on the environment's behavior. We realize this via adversarially robust CP where we perform a regular CP step in each episode using observed data under the current policy, but then transfer safety guarantees across policy updates by analytically adjusting the CP result to account for distribution shifts. This adjustment is performed based on a policy-to-trajectory sensitivity analysis, resulting in a safe, episodic open-loop planner. We further conduct a contraction analysis of the system providing conditions under which both the CP results and the policy updates are guaranteed to converge. We empirically demonstrate these safety and convergence guarantees on a two-dimensional car-pedestrian and a high-dimensional quadcopter case study. To the best of our knowledge, these are the first results that provide valid safety guarantees in such interactive settings.
OmniTrack++: Omnidirectional Multi-Object Tracking by Learning Large-FoV Trajectory Feedback CVPR 2025
To address panoramic distortion, large search space, and identity ambiguity under a 360° FoV, OmniTrack++ adopts a feedback-driven framework that progressively refines perception with trajectory cues. A DynamicSSM block first stabilizes panoramic features, implicitly alleviating geometric distortion. On top of normalized representations, FlexiTrack Instances use trajectory-informed feedback for flexible localization and reliable short-term association. To ensure long-term robustness, an ExpertTrack Memory consolidates appearance cues via a Mixture-of-Experts design, enabling recovery from fragmented tracks and reducing identity drift. Finally, a Tracklet Management module adaptively switches between end-to-end and tracking-by-detection modes according to scene dynamics, offering a balanced and scalable solution for panoramic MOT. To support rigorous evaluation, we establish the EmboTrack benchmark, a comprehensive dataset for panoramic MOT that includes QuadTrack, captured with a quadruped robot, and BipTrack, collected with a bipedal wheel-legged robot. Together, these datasets span wide-angle environments and diverse motion patterns, providing a challenging testbed for real-world panoramic perception. Extensive experiments on JRDB and EmboTrack demonstrate that OmniTrack++ achieves state-of-the-art performance, yielding substantial HOTA improvements of +3.94 on JRDB and +15.03 on QuadTrack over the original OmniTrack. These results highlight the effectiveness of trajectory-informed feedback, adaptive paradigm switching, and robust long-term memory in advancing panoramic multi-object tracking. Datasets and code will be made available at https://github.com/xifen523/OmniTrack.
comment: Extended version of CVPR 2025 paper arXiv:2503.04565. Datasets and code will be made publicly available at https://github.com/xifen523/OmniTrack
Visualizing Critic Match Loss Landscapes for Interpretation of Online Reinforcement Learning Control Algorithms
Reinforcement learning has proven its power on various occasions. However, its performance is not always guaranteed when system dynamics change. Instead, it largely relies on users' empirical experience. For reinforcement learning algorithms with an actor-critic structure, the critic neural network reflects the approximation and optimization process in the RL algorithm. Analyzing the performance of the critic neural network helps to understand the mechanism of the algorithm. To support systematic interpretation of such algorithms in dynamic control problems, this work proposes a critic match loss landscape visualization method for online reinforcement learning. The method constructs a loss landscape by projecting recorded critic parameter trajectories onto a low-dimensional linear subspace. The critic match loss is evaluated over the projected parameter grid using fixed reference state samples and temporal-difference targets. This yields a three-dimensional loss surface together with a two-dimensional optimization path that characterizes critic learning behavior. To extend analysis beyond visual inspection, quantitative landscape indices and a normalized system performance index are introduced, enabling structured comparison across different training outcomes. The approach is demonstrated using the Action-Dependent Heuristic Dynamic Programming algorithm on cart-pole and spacecraft attitude control tasks. Comparative analyses across projection methods and training stages reveal distinct landscape characteristics associated with stable convergence and unstable learning. The proposed framework enables both qualitative and quantitative interpretation of critic optimization behavior in online reinforcement learning.
comment: Published in Acta Astronautica, Vol. 246, pp. 909-920, 2026. DOI:10.1016/j.actaastro.2026.04.045
Tempus: A Temporally Scalable Resource-Invariant GEMM Streaming Framework for Versal AI Edge
Scaling laws for Large Language Models (LLMs) establish that model quality improves with computational scale, yet edge deployment imposes strict constraints on compute, memory, and power. Since General Matrix Multiplication (GEMM) accounts for up to 90% of inference time, efficient GEMM acceleration is critical for edge AI. The Adaptive Intelligent Engines available in the AMD Versal adaptive SoCs are well suited for this task, but existing state-of-the-art (SOTA) frameworks maximize performance through spatial scaling, distributing workloads across hundreds of cores -- an approach that fails on resource-limited edge SoCs due to physical implementation failures, bandwidth saturation, and excessive resource consumption. We propose Tempus, a Resource-Invariant Temporal GEMM framework for the AMD Versal AI Edge SoC. Rather than expanding hardware resources with matrix size, Tempus employs a fixed compute block of 16 AIE-ML cores, achieving scalability through iterative graph execution and algorithmic data tiling and replication in the Programmable Logic. High-speed cascade streaming ensures low-latency partial sum reduction at Initiation Interval (II) of 1, while a deadlock-free DATAFLOW protocol maximizes transfer-compute overlap and PLIO reuse. Evaluated on GEMM workloads, Tempus achieves 607 GOPS at 10.677 W total on-chip power. By characterizing system-level efficiency through the Platform-Aware Utility (PAU) metric, we prove that Tempus achieves a 211.2x higher prominence factor than the leading spatial SOTA (ARIES). Furthermore, the framework maintains a 0.00% utilization of URAM/DSP, yielding 22.0x core frugality, 7.1x power frugality, and a 6.3x reduction in I/O demand, establishing a sustainable, scalable foundation for edge LLM inference.
comment: Source code available at: https://github.com/mgrailoo/TEMPUS
A High-Fidelity Digital Twin for Robotic Manipulation Based on 3D Gaussian Splatting
Developing high-fidelity, interactive digital twins is crucial for enabling closed-loop motion planning and reliable real-world robot execution, which are essential to advancing sim-to-real transfer. However, existing approaches often suffer from slow reconstruction, limited visual fidelity, and difficulties in converting photorealistic models into planning-ready collision geometry. We present a practical framework that constructs high-quality digital twins within minutes from sparse RGB inputs. Our system employs 3D Gaussian Splatting (3DGS) for fast, photorealistic reconstruction as a unified scene representation. We enhance 3DGS with visibility-aware semantic fusion for accurate 3D labelling and introduce an efficient, filter-based geometry conversion method to produce collision-ready models seamlessly integrated with a Unity-ROS2-MoveIt physics engine. In experiments with a Franka Emika Panda robot performing pick-and-place tasks, we demonstrate that this enhanced geometric accuracy effectively supports robust manipulation in real-world trials. These results demonstrate that 3DGS-based digital twins, enriched with semantic and geometric consistency, offer a fast, reliable, and scalable path from perception to manipulation in unstructured environments.
comment: Accepted By Journal of Robot Learning
Teaching Robots to Interpret Social Interactions through Lexically-guided Dynamic Graph Learning ACM MM 26
For a robot to be called socially intelligent, it must be able to infer users internal states from their current behaviour, predict the users future behaviour, and if required, respond appropriately. In this work, we investigate how robots can be endowed with such social intelligence by modelling the dynamic relationship between user's internal states (latent) and actions (observable state). Our premise is that these states arise from the same underlying socio-cognitive process and influence each other dynamically. Drawing inspiration from theories in Cognitive Science, we propose a novel multi-task learning framework, termed as \textbf{SocialLDG} that explicitly models the dynamic relationship among the states represent as six distinct tasks. Our framework uses a language model to introduce lexical priors for each task and employs dynamic graph learning to model task affinity evolving with time. SocialLDG has three advantages: First, it achieves state-of-the-art performance on two challenging human-robot social interaction datasets available publicly. Second, it supports strong task scalability by learning new tasks seamlessly without catastrophic forgetting. Finally, benefiting from explicit modelling task affinity, it offers insights on how different interactions unfolds in time and how the internal states and observable actions influence each other in human decision making.
comment: submitted to ACM MM 26
Learning Vision-Based Omnidirectional Navigation: A Teacher-Student Approach Using Monocular Depth Estimation
Reliable obstacle avoidance in industrial settings demands 3D scene understanding, but widely used 2D LiDAR sensors perceive only a single horizontal slice of the environment, missing critical obstacles above or below the scan plane. We present a teacher-student framework for vision-based mobile robot navigation that eliminates the need for LiDAR sensors. A teacher policy trained via Proximal Policy Optimization (PPO) in NVIDIA Isaac Lab leverages privileged 2D LiDAR observations that account for the full robot footprint to learn robust navigation. The learned behavior is distilled into a student policy that relies solely on monocular depth maps predicted by a fine-tuned Depth Anything V2 model from four RGB cameras. The complete inference pipeline, comprising monocular depth estimation (MDE), policy execution, and motor control, runs entirely onboard an NVIDIA Jetson Orin AGX mounted on a DJI RoboMaster platform, requiring no external computation for inference. In simulation, the student achieves success rates of 82-96.5%, consistently outperforming the standard 2D LiDAR teacher (50-89%). In real-world experiments, the MDE-based student outperforms the 2D LiDAR teacher when navigating around obstacles with complex 3D geometries, such as overhanging structures and low-profile objects, that fall outside the single scan plane of a 2D LiDAR.
comment: This work has been submitted to the IEEE for possible publication
MVP-LAM: Learning Action-Centric Latent Action via Cross-Viewpoint Reconstruction
Latent actions learned from diverse human videos serve as pseudo-labels for vision-language-action (VLA) pretraining, but provide effective supervision only if they remain informative about the underlying ground-truth actions. For effective supervision, latent actions should contain information about the underlying actions even though they are inaccessible. We propose Multi-ViewPoint Latent Action Moel (MVP-LAM), which learns latent actions that are highly informative about ground-truth actions from multi-view videos. MVP-LAM trains latent actions with a cross-viewpoint reconstruction objective, so that a latent action from one view must explain the future in another view, reducing reliance on viewpoint-specific cues. On Bridge V2, MVP-LAM produces more action-centric latent actions, achieving higher mutual information with ground-truth actions and improved action prediction, including under out-of-distribution evaluation. Finally, pretraining VLAs with MVP-LAM latent actions improves downstream manipulation performance on various benchmarks. The code and trained checkpoints are available at https://jmsnu.github.io.
BridgeACT: Bridging Human Demonstrations to Robot Actions via Unified Tool-Target Affordances
Learning robot manipulation from human videos is appealing due to the scale and diversity of human demonstrations, but transferring such demonstrations to executable robot behavior remains challenging. Prior work either relies on robot data for downstream adaptation or learns affordance representations that remain at the perception level and do not directly support real-world execution. We present BridgeACT, an affordance-driven framework that learns robotic manipulation directly from human videos without requiring any robot demonstration data. Our key idea is to model affordance as an embodiment-agnostic intermediate representation that bridges human demonstrations and robot actions. BridgeACT decomposes manipulation into two complementary problems: where to grasp and how to move. To this end, BridgeACT first grounds task-relevant affordance regions in the current scene, and then predicts task-conditioned 3D motion affordances from human demonstrations. The resulting affordances are mapped to robot actions through a grasping module and a lightweight closed-loop motion controller, enabling direct deployment on real robots. In addition, we represent complex manipulation tasks as compositions of affordance operations, which allows a unified treatment of diverse tasks and object-to-object interactions. Experiments on real-world manipulation tasks show that BridgeACT outperforms prior baselines and generalizes to unseen objects, scenes, and viewpoints.
From Prompt to Physical Actuation: Holistic Threat Modeling of LLM-Enabled Robotic Systems
As large language models are integrated into autonomous robotic systems for task planning and control, compromised inputs or unsafe model outputs can propagate through the planning pipeline to physical-world consequences. Although prior work has studied robotic cybersecurity, adversarial perception attacks, and LLM safety independently, no existing study traces how these threat categories interact and propagate across trust boundaries in a unified architectural model. We address this gap by modeling an LLM-enabled autonomous robot in an edge-cloud architecture as a hierarchical Data Flow Diagram and applying STRIDE-per-interaction analysis across six boundary-crossing interaction points using a three-category taxonomy of Conventional Cyber Threats, Adversarial Threats, and Conversational Threats. The analysis reveals that these categories converge at the same boundary crossings, and we trace three cross-boundary attack chains from external entry points to unsafe physical actuation, each exposing a distinct architectural property: the absence of independent semantic validation between user input and actuator dispatch, cross-modal translation from visual perception to language-model instruction, and unmediated boundary crossing through provider-side tool use. To our knowledge, this is the first DFD-based threat analysis integrating all three threat categories across the full perception-planning-actuation pipeline of an LLM-enabled robotic system.
comment: Submitted to 23rd Annual International Conference on Privacy, Security, and Trust (PST2026)
OneVL: One-Step Latent Reasoning and Planning with Vision-Language Explanation
Chain-of-Thought (CoT) reasoning has become a powerful driver of trajectory prediction in VLA-based autonomous driving, yet its autoregressive nature imposes a latency cost that is prohibitive for real-time deployment. Latent CoT methods attempt to close this gap by compressing reasoning into continuous hidden states, but consistently fall short of their explicit counterparts. We suggest that this is due to purely linguistic latent representations compressing a symbolic abstraction of the world, rather than the causal dynamics that actually govern driving. Thus, we present OneVL (One-step latent reasoning and planning with Vision-Language explanations), a unified VLA and World Model framework that routes reasoning through compact latent tokens supervised by dual auxiliary decoders. Alongside a language decoder that reconstructs text CoT, we introduce a visual world model decoder that predicts future-frame tokens, forcing the latent space to internalize the causal dynamics of road geometry, agent motion, and environmental change. A three-stage training pipeline progressively aligns these latents with trajectory, language, and visual objectives, ensuring stable joint optimization. In inference, the auxiliary decoders are discarded, and all latent tokens are prefilled in a single parallel pass, matching the speed of answer-only prediction. Across four benchmarks, OneVL becomes the first latent CoT method to surpass explicit CoT, delivering superior accuracy at answer-only latency. These results show that with world model supervision, latent CoT produces more generalizable representations than verbose token-by-token reasoning. Code has been open-sourced to the community. Project Page: https://xiaomi-embodied-intelligence.github.io/OneVL
comment: Technical Report; 49 pages, 22 figures, 10 tables; Project Page at https://xiaomi-embodied-intelligence.github.io/OneVL GitHub at https://github.com/xiaomi-research/onevl
Flux4D: Flow-based Unsupervised 4D Reconstruction NeurIPS 2025
Reconstructing large-scale dynamic scenes from visual observations is a fundamental challenge in computer vision, with critical implications for robotics and autonomous systems. While recent differentiable rendering methods such as Neural Radiance Fields (NeRF) and 3D Gaussian Splatting (3DGS) have achieved impressive photorealistic reconstruction, they suffer from scalability limitations and require annotations to decouple actor motion. Existing self-supervised methods attempt to eliminate explicit annotations by leveraging motion cues and geometric priors, yet they remain constrained by per-scene optimization and sensitivity to hyperparameter tuning. In this paper, we introduce Flux4D, a simple and scalable framework for 4D reconstruction of large-scale dynamic scenes. Flux4D directly predicts 3D Gaussians and their motion dynamics to reconstruct sensor observations in a fully unsupervised manner. By adopting only photometric losses and enforcing an "as static as possible" regularization, Flux4D learns to decompose dynamic elements directly from raw data without requiring pre-trained supervised models or foundational priors simply by training across many scenes. Our approach enables efficient reconstruction of dynamic scenes within seconds, scales effectively to large datasets, and generalizes well to unseen environments, including rare and unknown objects. Experiments on outdoor driving datasets show Flux4D significantly outperforms existing methods in scalability, generalization, and reconstruction quality.
comment: NeurIPS 2025. Project page: https://waabi.ai/flux4d/
Safe Navigation using Neural Radiance Fields via Reachable Sets
Safe navigation in cluttered environments is an important challenge for autonomous systems. Robots navigating through obstacle ridden scenarios need to be able to navigate safely in the presence of obstacles, goals, and ego objects of varying geometries. In this work, reachable set representations of the robot's real-time capabilities in the state space can be utilized to capture safe navigation requirements. While neural radiance fields (NeRFs) are utilized to compute, store, and manipulate the volumetric representations of the obstacles, or ego vehicle, as needed. Constrained optimal control is employed to represent the resulting path planning problem, involving linear matrix inequality constraints. We present simulation results for path planning in the presence of numerous obstacles in two different scenarios. Safe navigation is demonstrated through using reachable sets in the corresponding constrained optimal control problems.
comment: 5 pages, 8 figures, 2026 4th International Conference on Mechatronics, Control and Robotics (ICMCR)
KinDER: A Physical Reasoning Benchmark for Robot Learning and Planning
Robotic systems that interact with the physical world must reason about kinematic and dynamic constraints imposed by their own embodiment, their environment, and the task at hand. We introduce KinDER, a benchmark for Kinematic and Dynamic Embodied Reasoning that targets physical reasoning challenges arising in robot learning and planning. KinDER comprises 25 procedurally generated environments, a Gymnasium-compatible Python library with parameterized skills and demonstrations, and a standardized evaluation suite with 13 implemented baselines spanning task and motion planning, imitation learning, reinforcement learning, and foundation-model-based approaches. The environments are designed to isolate five core physical reasoning challenges: basic spatial relations, nonprehensile multi-object manipulation, tool use, combinatorial geometric constraints, and dynamic constraints, disentangled from perception, language understanding, and application-specific complexity. Empirical evaluation shows that existing methods struggle to solve many of the environments, indicating substantial gaps in current approaches to physical reasoning. We additionally include real-to-sim-to-real experiments on a mobile manipulator to assess the correspondence between simulation and real-world physical interaction. KinDER is fully open-sourced and intended to enable systematic comparison across diverse paradigms for advancing physical reasoning in robotics. Website and code: https://prpl-group.com/kinder-site/
comment: Project website: https://prpl-group.com/kinder-site/. 21 pages, 8 figures. Accepted to Robotics Science and Systems (RSS), 2026
REALM: An RGB and Event Aligned Latent Manifold for Cross-Modal Perception
Event cameras provide several unique advantages over standard frame-based sensors, including high temporal resolution, low latency, and robustness to extreme lighting. However, existing learning-based approaches for event processing are typically confined to narrow, task-specific silos and lack the ability to generalize across modalities. We address this gap with REALM, a cross-modal framework that learns an RGB and Event Aligned Latent Manifold by projecting event representations into the pretrained latent space of RGB foundation models. Instead of task-specific training, we leverage low-rank adaptation (LoRA) to bridge the modality gap, effectively unlocking the geometric and semantic priors of frozen RGB backbones for asynchronous event streams. We demonstrate that REALM effectively maps events into the ViT-based foundation latent space. Our method allows us to perform downstream tasks like depth estimation and semantic segmentation by simply transferring linear heads trained on the RGB teacher. Most significantly, REALM enables the direct, zero-shot application of complex, frozen image-trained decoders, such as MASt3R, to raw event data. We demonstrate state-of-the-art performance in wide-baseline feature matching, significantly outperforming specialized architectures. Code and models are available upon acceptance.
BEVCALIB: LiDAR-Camera Calibration via Geometry-Guided Bird's-Eye View Representations
Accurate LiDAR-camera calibration is fundamental to fusing multi-modal perception in autonomous driving and robotic systems. Traditional calibration methods require extensive data collection in controlled environments and cannot compensate for the transformation changes during the vehicle/robot movement. In this paper, we propose the first model that uses bird's-eye view (BEV) features to perform LiDAR camera calibration from raw data, termed BEVCALIB. To achieve this, we extract camera BEV features and LiDAR BEV features separately and fuse them into a shared BEV feature space. To fully utilize the geometric information from the BEV feature, we introduce a novel feature selector to filter the most important features in the transformation decoder, which reduces memory consumption and enables efficient training. Extensive evaluations on KITTI, NuScenes, and our own dataset demonstrate that BEVCALIB establishes a new state of the art. Under various noise conditions, BEVCALIB outperforms the best baseline in the literature by an average of (47.08%, 82.32%) on KITTI dataset, and (78.17%, 68.29%) on NuScenes dataset, in terms of (translation, rotation), respectively. In the open-source domain, it improves the best reproducible baseline by one order of magnitude. Our code and demo results are available at https://cisl.ucr.edu/BEVCalib.
comment: Published in CoRL 2025
A Certifably Correct Algorithm for Generalized Robot-World and Hand-Eye Calibration
Automatic extrinsic sensor calibration is a fundamental problem for multi-sensor platforms. Reliable and general-purpose solutions should be computationally efficient, require few assumptions about the structure of the sensing environment, and demand little effort from human operators. In this work, we introduce a fast and certifiably globally optimal algorithm for solving a generalized formulation of the robot-world and hand-eye calibration (RWHEC) problem. The formulation of RWHEC presented is "generalized" in that it supports the simultaneous estimation of multiple sensor and target poses, and permits the use of monocular cameras that, alone, are unable to measure the scale of their environments. In addition to demonstrating our method's superior performance over existing solutions through extensive simulated and real experiments, we derive novel identifiability criteria and establish a priori guarantees of global optimality for problem instances with bounded measurement errors. As part of our analysis, we propose a new constraint qualification for nonlinear programs with redundant constraints; this constraint qualification is of independent interest for establishing the exactness of SDP relaxations of QCQPs that have been tightened through the addition of redundant constraints. Finally, we provide a free and open-source implementation of our algorithms and experiments.
comment: 28 pages, 10 figures, accepted for publication in the International Journal of Robotics Research
RoboEval: Where Robotic Manipulation Meets Structured and Scalable Evaluation
We introduce RoboEval, a structured evaluation framework and benchmark for robotic manipulation that augments binary success with principled behavioral and outcome metrics. Existing evaluations often collapse performance into outcome counts, masking differences in execution quality and obscuring failure structure. RoboEval provides eight bimanual tasks with systematically controlled variations, more than three thousand expert demonstrations, and a modular simulation platform for reproducible experimentation. All tasks are instrumented with standardized metrics that quantify efficiency, coordination, and safety/stability, as well as outcome measures that trace stagewise progress and localize failure modes. Through extensive experiments with state-of-the-art visuomotor policies, we validate these metrics by analyzing their stability under variation, discriminative power across policies with similar success rates, and correlation with task success. Project Page: https://robo-eval.github.io
comment: Project page: https://robo-eval.github.io
FORMULA: FORmation MPC with neUral barrier Learning for safety Assurance
Multi-robot systems (MRS) are essential for large-scale applications such as disaster response, material transport, and warehouse logistics, yet ensuring robust, safety-aware formation control in cluttered and dynamic environments remains a major challenge. Existing model predictive control (MPC) approaches suffer from limitations in scalability and provable safety, while control barrier functions (CBFs), though principled for safety enforcement, are difficult to handcraft for large-scale nonlinear systems. This paper presents FORMULA, a safe distributed, learning-enhanced predictive control framework that integrates MPC with Control Lyapunov Functions (CLFs) for stability and neural network-based CBFs for decentralized safety, eliminating manual safety constraint design. This scheme maintains formation integrity during obstacle avoidance, resolves deadlocks in dense configurations, and reduces online computational load. Simulation results demonstrate that FORMULA enables scalable, safety-aware, formation-preserving navigation for multi-robot teams in complex environments.
comment: Accepted to IEEE Intelligent Vehicles Symposium (IV) 2026
Optimizing Grasping in Legged Robots: A Deep Learning Approach to Loco-Manipulation
This paper presents a deep learning framework designed to enhance the grasping capabilities of quadrupeds equipped with arms, with a focus on improving precision and adaptability. Our approach centers on a sim-to-real methodology that minimizes reliance on physical data collection. We developed a pipeline within the Genesis simulation environment to generate a synthetic dataset of grasp attempts on common objects. By simulating thousands of interactions from various perspectives, we created pixel-wise annotated grasp-quality maps to serve as the ground truth for our model. This dataset was used to train a custom CNN with a U-Net-like architecture that processes multi-modal input from an onboard RGB and depth cameras, including RGB images, depth maps, segmentation masks, and surface normal maps. The trained model outputs a grasp-quality heatmap to identify the optimal grasp point. We validated the complete framework on a four-legged robot. The system successfully executed a full loco-manipulation task: autonomously navigating to a target object, perceiving it with its sensors, predicting the optimal grasp pose using our model, and performing a precise grasp. This work proves that leveraging simulated training with advanced sensing offers a scalable and effective solution for object handling.
A Vision-Based Shared-Control Teleoperation Scheme for Controlling the Robotic Arm of a Four-Legged Robot
In hazardous and remote environments, robotic systems perform critical tasks demanding improved safety and efficiency. Among these, quadruped robots with manipulator arms offer mobility and versatility for complex operations. However, teleoperating quadruped robots is challenging due to the lack of integrated obstacle detection and intuitive control methods for the robotic arm, increasing collision risks in confined or dynamically changing workspaces. Teleoperation via joysticks or pads can be non-intuitive and demands a high level of expertise due to its complexity, culminating in a high cognitive load on the operator. To address this challenge, a teleoperation approach that directly maps human arm movements to the robotic manipulator offers a simpler and more accessible solution. This work proposes an intuitive remote control by leveraging a vision-based pose estimation pipeline that utilizes an external camera with a machine learning-based model to detect the operator's wrist position. The system maps these wrist movements into robotic arm commands to control the robot's arm in real-time. A trajectory planner ensures safe teleoperation by detecting and preventing collisions with both obstacles and the robotic arm itself. The system was validated on the real robot, demonstrating robust performance in real-time control. This teleoperation approach provides a cost-effective solution for industrial applications where safety, precision, and ease of use are paramount, ensuring reliable and intuitive robotic control in high-risk environments.
Viewpoint-Agnostic Grasp Pipeline using VLM and Partial Observations
Robust grasping in cluttered, unstructured environments remains challenging for mobile legged manipulators due to occlusions that lead to partial observations, unreliable depth estimates, and the need for collision-free, execution-feasible approaches. In this paper we present an end-to-end pipeline for language-guided grasping that bridges open-vocabulary target selection to safe grasp execution on a real robot. Given a natural-language command, the system grounds the target in RGB using open-vocabulary detection and promptable instance segmentation, extracts an object-centric point cloud from RGB-D, and improves geometric reliability under occlusion via back-projected depth compensation and two-stage point cloud completion. We then generate and collision-filter 6-DoF grasp candidates and select an executable grasp using safety-oriented heuristics that account for reachability, approach feasibility, and clearance. We evaluate the method on a quadruped robot with an arm in two cluttered tabletop scenarios, using paired trials against a view-dependent baseline. The proposed approach achieves a 90% overall success rate (9/10) against 30% (3/10) for the baseline, demonstrating substantially improved robustness to occlusions and partial observations in clutter.
Multiagent Systems
Executor-Side Progressive Risk-Gated Actuation for Agentic AI in Wireless Supervisory Control
Agentic artificial intelligence (AI) shows promise for automating O-RAN wireless supervisory control, but translated intents still require an executor-side decision before live network actuation. Existing control flows lack explicit semantics for whether an intent should commit, gate for evidence, or reject under stale telemetry, concurrent policies, deadline and bandwidth limits, and rollback constraints. We propose Progressive Risk-Gated Actuation (PRGA), an executor-side contract for risk-gated wireless intent execution. PRGA structures each intent into executable local triage (C0), on-demand coordination evidence (C1), and post-hoc provenance support (C2), with C2 kept off the online safety path. A deterministic two-stage policy checks expiry, freshness, rollback-handle validity, local conflict, blocking preconditions, and planner-executor risk divergence from C0, then retrieves C1 only for gated intents when deadline and bandwidth budgets allow; evidence-mandatory gates reject when required C1 is unavailable. On two 3GPP-parameterized energy-saving and slice-SLA benchmarks, PRGA reduces time-to-first-safe-action by 23.3-27.4% and per-commit control-plane bytes by 52.7-54.2% against a decision-identical eager full-evidence cost-overlay comparator, thereby isolating retrieval-cost accounting; remains non-inferior within a pre-declared 0.5 percentage-point unsafe-action margin against an invariant-respecting static-threshold comparator; and rejects 100% of injected over-threshold stale inputs in the stale-state fault campaign. On these benchmarks, PRGA improves supervisory responsiveness and control-plane efficiency within the evaluated unsafe-action boundary.
When Stress Becomes Signal: Detecting Antifragility-Compatible Regimes in Multi-Agent LLM Systems
Multi-agent LLM systems are increasingly used to solve complex tasks through decomposition, debate, specialization, and ensemble reasoning. However, these systems are usually evaluated in terms of robustness: whether performance is preserved under perturbation. This paper studies a different question: whether semantic stress exposes structured variation that could support future antifragile learning. We introduce CAFE, a statistical framework for detecting antifragility-compatible regimes in multi-agent architectures. CAFE models a controlled expected distribution of semantic stressors, reconstructs an architecture-specific observed effective stress distribution from multi-dimensional judge signals, and compares both distributions using a distributional Jensen Gap under a convex stress potential. A positive gap does not imply immediate performance improvement; instead, it indicates a convex-expansive deformation of the observed stress distribution, suggesting that the architecture exposes learnable stress structure. We evaluate CAFE on a banking-risk analysis benchmark with five multi-agent architectures: flat, hierarchical, debate, meta-adaptive, and ensemble. Across all architectures, semantic stress reduces average judged quality by roughly one third. Yet all architectures exhibit positive distributional Jensen Gaps with bootstrap confidence intervals above zero. These results show that immediate quality degradation can coexist with statistically detectable antifragility-compatible stress geometry. CAFE is therefore not an antifragile learner itself, but a measurement layer for identifying when and where antifragility learning may be worth applying.
FitText: Evolving Agent Tool Ecologies via Memetic Retrieval
A semantic gap separates how users describe tasks from how tools are documented. As API ecosystems scale to tens of thousands of endpoints, static retrieval from the initial query alone cannot bridge this gap: the agent's understanding of what it needs evolves during execution, but its tool set does not. We introduce FitText, a training-free framework that makes retrieval dynamic by embedding it directly in the agent's reasoning loop. FitText generates natural-language pseudo-tool descriptions as retrieval probes, refines them iteratively using retrieval feedback, and explores diverse alternatives through stochastic generation. Memetic Retrieval adds evolutionary selection pressure over candidate descriptions, guided by a tool memory that avoids redundant search. On ToolRet (43k tools, 4 domains), FitText improves average retrieval rank from 8.81 to 2.78; on StableToolBench (16,464 APIs), it achieves a 0.73 average pass rate--a 24-point absolute gain over static query retrieval. The gains transfer across base models capable of acting as competent semantic operators; under weaker base models, Memetic's evolutionary search inverts--amplifying noise rather than refining signal--surfacing model capacity as a prerequisite for evolutionary tool exploration.
LLM-enabled Social Agents
Large Language Models (LLMs) have transformed agent-agent and human-agent interaction by enabling software, physical, and simulation agents to communicate and deliberate through natural language. Yet fluent language use does not by itself yield socially intelligible behaviour. Most current systems remain weakly grounded in roles, norms, intentions, and contextual constraints, limiting their capacity for meaningful participation in social environments. This paper develops a conceptual baseline for LLM-enabled social agents by arguing that they should be grounded in role definitions operationalized through persona descriptions. On this basis, we outline research directions for representation, hybrid control, and evaluation. The paper concludes that persona-based role definitions are a necessary foundation for turning language competence into social behaviour.
comment: 11 pages, 1 figure, Hybrid Human Artificial Intelligence (HHAI) 2026
SOTOPIA-TOM: Evaluating Information Management in Multi-Agent Interaction with Theory of Mind
As LLM-based agents are increasingly interacting in multi-party settings, they need to properly handle information asymmetry, i.e., knowing when and to whom to disclose information is appropriate. Yet, existing benchmarks fail to measure this ability in realistic multi-party settings. Thus, we introduce SOTOPIA-TOM, a multi-dimensional benchmarking framework to evaluate LLM agents' ability to successfully navigate information asymmetric and privacy sensitive multi-party interactions. We create an interaction environment which enables both public (broadcast) and private (direct message) communication, and craft 160 human-reviewed scenarios across eight industry sectors, each involving 3 to 5 agents with partitioned private knowledge and channel-dependent sharing policies. To measure interaction abilities, we create a multi-dimensional evaluation framework to assess how well agents share useful information, seek missing details, coordinate efficiently, and protect privacy, which we also combine into a composite INFOMGMT metric. Results show that, across 6 LLM backbones and prompting strategies (vanilla, CoT-privacy, and ToM-based interventions), even the largest high-reasoning model (GPT-5) reaches only a 62% INFOMGMT score, which indicates persistent deficiencies in information seeking and privacy-aware decision-making. Additionally, ToM-based interventions more consistently improve the overall coordination-privacy balance (for example, relative to the vanilla baseline, ToM-Coach reduces critical privacy violations on GPT-4o from 9.9% to 2.2% while increasing the composite InfoMgmt score more than 2.5x from 15% to 40%). Overall, SOTOPIA-TOM exposes persistent limitations of current LLM agents in complex, information-asymmetric coordination and provides an extensible testbed for developing more privacy-aware, theory-of-mind capable multi-agent systems.
comment: 37 pages, 22 Figures
Distributed Observer-based Fault Detection over Intelligent Networked Multi-Vehicle Systems
Decentralized strategies are of interest for local decision-making over multi-vehicle networks. This paper studies mixed traffic networks of human-driven and autonomous vehicles with partial sensor measurements. The idea is to enable the group of connected autonomous vehicles (CAVs) to track the state of a group of human-driven vehicles (HDVs) via distributed consensus-based observers/estimators. Particularly, we make no assumption that the group of HDVs is locally observable in the direct neighborhood of any CAV. Then, the main contribution is to design local residual-based fault detection and isolation (FDI) at every CAV to detect possible faults/attacks in the sensor measurements. This distributed detection strategy enables every CAV to locally find possible anomalies in its taken sensor measurement with no need for a central processing unit. Two FDI logics are proposed with and without considering the history of the residuals. These FDI techniques are based on probabilistic threshold design on the residuals (in contrast to the existing deterministic threshold FDI techniques) with no assumption that the noise is of bounded support. This is more realistic in real-world multi-vehicle transportation systems.
comment: European journal of control
Planner Matters! An Efficient and Unbalanced Multi-agent Collaboration Framework for Long-horizon Planning
Language model (LM)-based agents have demonstrated promising capabilities in automating complex tasks from natural language instructions, yet they continue to struggle with long-horizon planning and reasoning. To address this, we propose an enhanced multi-agent framework that decomposes automation into three roles: a planner for high-level decision-making, an actor for task execution, and a memory manager for contextual reasoning. While this modular decomposition aligns with established design patterns, our core contribution lies in a systematic compute-allocation analysis, revealing that planning is the dominant factor influencing task performance. Execution and memory management require significantly less compute and model capacity to achieve competitive results. Building on these insights, we introduce a planner-centric reinforcement learning approach, which exclusively optimizes the planner using trajectory-level rewards from a VLM-as-judge, while freezing the other components. Extensive experiments on benchmarks spanning web navigation, OS control, and tool use demonstrate that concentrating model capacity and learning on high-level planning yields robust and compute-efficient improvements in long-horizon agent automation. Our code is publicly released.
AAFLOW: Scalable Patterns for Agentic AI Workflows SC2026
Agentic workflows in large language model systems integrate retrieval, reasoning, and memory, but existing frameworks suffer from scalability and reproducibility limitations due to fragmented data orchestration, serialization overhead, and non-deterministic execution. Although these frameworks increase flexibility, they don't have a formal execution model that adheres to the principles of high-performance computing. We introduce AAFLOW, a unified distributed runtime that creates communication-efficient execution plans by modeling agentic workflows as an operator abstraction. Using Apache Arrow and Cylon, AAFLOW creates a zero-copy data plane that allows direct interoperability between preprocessing, embedding, and vector retrieval without the need for serialization overhead. To lower coordination costs, it uses resource-deterministic scheduling and asynchronous batching. While retaining comparable LLM generation throughput, experimental results demonstrate up to 4.64 times pipeline speedup and 2.8 times gains in embedding and upsert phases. Rather than LLM inference acceleration, these advantages result from enhanced data flow, batching, and communication efficiency.
comment: 10 pages, 8 Figures, 3 Tables. preprint for SC2026
Enwar 3.0: An Agentic Multi-Modal LLM Orchestrator for Situation-Aware Beamforming, Blockage Prediction, and Handover Management
Maintaining robust millimeter-wave (mmWave) connectivity in vehicular networks requires real-time adaptation to environmental dynamics, sensor degradation, and link variability. This paper presents Enwar 3.0, an environment-aware reasoning framework that unifies multi-modal sensing, agentic large language models (LLMs), and context-driven model selection for predictive beamforming, blockage detection, and handover management. Building upon prior iterations of Enwar, the proposed architecture integrates a classifier-driven assessment of sensor health with a primed LLM that orchestrates multiple specialized agents through structured, task-aware prompting. A novel synthetic degradation pipeline enables the training of a sensor degradation classifier that detects real-time impairments across camera, radar, LiDAR, and GPS inputs, achieving over 99% accuracy. The LLM, trained via chain-of-thought (CoT) priming and human-in-the-loop feedback, coordinates agent calls for beam selection, blockage forecasting, and environment perception while dynamically loading sensor-specific models based on environmental context. Extensive evaluations across 15 sensor combinations demonstrate that Enwar 3.0 delivers state-of-the-art performance in both predictive accuracy and interpretability, with beam selection accuracy exceeding 88%, blockage F1-scores surpassing 98%, and reasoning correctness reaching 87% on complex decision prompts. This work establishes a scalable foundation for LLM-integrated wireless systems that reason, perceive, and adapt in real-time.
MARS-DA: A Hierarchical Reinforcement Learning Framework for Risk-Aware Multi-Agent Bidding in Power Grids
The increasing penetration of renewable energy has introduced substantial volatility into wholesale electricity markets, complicating the optimal bidding strategies for power producers. Traditional Reinforcement Learning (RL) approaches often struggle to balance profit maximization with risk management, frequently overfitting to specific market conditions or failing to account for the stochastic spread between Day-Ahead (DA) and Real-Time (RT) settlements. To address these challenges, this paper makes two primary contributions. First, we introduce and open-source a high-fidelity gymnasium environment for two-settlement electricity market bidding. Grounded in extensive empirical data from the PJM Interconnection, the environment explicitly models the interplay between DA commitments and RT deviations, providing a standardized testbed for general and risk-sensitive agents. Second, we propose MARS-DA (Multi-Agent Regime-Switching for Day-Ahead markets), a novel hierarchical framework that orchestrates distinct sub-policies for risk management and profit seeking. MARS-DA utilizes a top-level Meta-Controller to dynamically blend the actions of two specialized base agents: a "Safe Agent" that optimizes for reliable DA allocation and a "Speculator Agent" that targets volatile RT arbitrage opportunities. Extensive experiments demonstrate that MARS-DA achieves superior risk-adjusted returns compared to state-of-the-art baselines while maintaining robust regime alignment during periods of extreme market volatility.
Hidden Coalitions in Multi-Agent AI: A Spectral Diagnostic from Internal Representations
Collections of interacting AI agents can form coalitions, creating emergent group-level organization that is critical for AI safety and alignment. However, observing agent behavior alone is often insufficient to distinguish genuine informational coupling from spurious similarity, as consequential coalitions may form at the level of internal representations before any overt behavioral change is apparent. Here, we introduce a practical method for detecting coalition structure from the internal neural representations of multi-agent systems. The approach constructs a pairwise mutual-information graph from the hidden states of agents and applies spectral partitioning to identify the most salient coalition boundary. We validate this method in two domains. First, in multi-agent reinforcement learning environments, the method successfully recovers programmed hierarchical and dynamic coalition structures and correctly rejects false positives arising from behavioral coordination without informational coupling. Second, using a large language model, the method identifies coalition structures implied by descriptive prompts, tracks dynamic team reassignments, and reveals a representational hierarchy where explicit labels dominate over conflicting interaction patterns. Across both settings, the recovered partition reveals subgroup organization that a scalar cross-agent mutual-information measure cannot distinguish. The results demonstrate that analyzing hidden-state mutual information through spectral partitioning provides a scalable diagnostic for identifying representational coalitions, offering a valuable tool for monitoring emergent structure in distributed AI systems.
comment: 18 pages
Optimistic ε-Greedy Exploration for Cooperative Multi-Agent Reinforcement Learning
The Centralized Training with Decentralized Execution (CTDE) paradigm is widely used in cooperative multi-agent reinforcement learning. However, conventional methods based on CTDE can suffer from value underestimation and converge to suboptimal solutions. While such underestimation is typically attributed to the representational limitations of monotonic structures, we provide a novel perspective by demonstrating that the insufficient sampling of optimal joint actions during exploration is also a critical factor. To address this problem, we propose Optimistic $ε$-Greedy Exploration. Our method introduces optimistic action-value networks that serve as decoupled exploration indicators, which we theoretically prove to converge in probability to the maximum achievable returns. By sampling actions from these distributions with a probability of $ε$, we effectively increase the selection frequency of high-return joint actions. Experimental results in various environments reveal that our strategy effectively prevents the algorithm from falling into suboptimal solutions and significantly improves final returns, win rates, and convergence speeds compared to other enhanced algorithms. Our code has been open-sourced at https://github.com/qxqxtxdy/OptimisticExploration.
GOSPA-Driven Non-Myopic Multi-Sensor Management with Multi-Bernoulli Filtering
In this paper, we propose a non-myopic sensor management algorithm for multi-target tracking, with multiple sensors operating in the same surveillance area. The algorithm is based on multi-Bernoulli filtering and selects the actions that solve a non-myopic minimisation problem, where the cost function is the mean square generalised optimal sub-pattern assignment (GOSPA) error, over a future time window. For tractability, the sensor management algorithm actually uses an upper bound of the GOSPA error and is implemented via Monte Carlo Tree Search (MCTS). The sensors have the ability to jointly optimise and select their actions with the considerations of all other sensors in the surveillance area. The benefits of the proposed algorithm are analysed via simulations.
comment: submitted to Elsevier Signal Processing May 2026
Foresight Arena: An On-Chain Benchmark for Evaluating AI Forecasting Agents
Evaluating the true forecasting ability of AI agents requires environments that are resistant to environments resistant to overfitting, free from centralized trust, and grounded in incentive-compatible scoring. Existing benchmarks either rely on static datasets vulnerable to training-data contamination, or measure trading PnL -- a metric conflating predictive accuracy with timing, sizing, and risk appetite. We introduce Foresight Arena, the first permissionless, on-chain benchmark for evaluating AI forecasting agents on real-world prediction markets. Agents submit probabilistic forecasts on binary Polymarket markets via a commit-reveal protocol enforced by Solidity smart contracts on Polygon PoS; outcomes are resolved trustlessly through the Gnosis Conditional Token Framework. Performance is measured by the Brier Score and a novel Alpha Score -- proper scoring rules that incentivize honest probability reporting and isolate predictive edge over market consensus. We provide a formal analysis: closed-form variance for per-market Alpha, the connection to Murphy's classical Brier decomposition, and a power analysis characterizing the number of rounds required to reliably distinguish agents of different skill levels. We show that detecting a true edge of $α^* = 0.02$ at 80% power requires approximately 350 resolved binary predictions (50 rounds of 7 markets), while $α^* = 0.01$ requires four times more. We complement these analytical results with a deterministic, seed-controlled simulation study calibrated to literature-reported Brier-score ranges, illustrating how Murphy decomposition distinguishes well-calibrated agents from market-tracking agents that fail through reduced resolution. Live results from the deployed benchmark will be reported in a future revision. All smart contracts and evaluation infrastructure are open-source.
comment: v2: Reframed Section 6 as an illustrative simulation study with explicit disclosure that the numerical results in Section 6 come from a calibrated Monte Carlo simulation rather than a live deployment; added live-evaluation-pending limitation
The Hive Mind is a Single Reinforcement Learning Agent
Decision-making is an essential attribute of any intelligent agent or group. Natural systems are known to converge to effective strategies through at least two distinct mechanisms: collective decision-making via imitation of others, and trial-and-error by a single agent. This paper establishes an equivalence between these two paradigms by drawing from the well-studied collective decision-making problem of nest-hunting in swarms of honey bees. We show that the emergent distributed cognition (sometimes referred to as the $\textit{hive mind}$) arising from individuals following simple, local imitation-based rules is that of a single online reinforcement learning (RL) agent interacting with many parallel environments. More specifically, in the purely imitative $\textit{weighted voter}$ model of bees' waggle dance, the update rule through which this macro-agent learns is a multi-armed bandit algorithm that we coin $\textit{Maynard-Cross Learning}$. Our analysis implies that a group of purely imitative organisms can be equivalent to a more complex, reinforcement-enabled entity, substantiating the idea that group-level intelligence may explain how seemingly simple and blind individual behaviors are selected in nature. Beyond biology, the framework offers new tools for analyzing economic and social systems where individuals imitate successful strategies, effectively participating in a collective learning process. Our findings may further inform the design of scalable RL-inspired collective systems in artificial domains.
The Price of Uncertainty for Social Consensus
How hard is it to achieve consensus in a social network under uncertainty? In this paper we model this problem as a social graph of agents where each vertex is initially colored red or blue. The goal of the agents is to achieve consensus, which is when the colors of all agents align. Agents attempt to do this locally through steps in which an agent changes their color to the color of the majority of their neighbors. In real life, agents may not know exactly how many of their neighbors are red or blue, which introduces uncertainty into this process. Modeling uncertainty as perturbations of relative magnitude $1+\varepsilon$ to these color neighbor counts, we show that even small values of $\varepsilon$ greatly hinder the ability to achieve consensus in a social network. We prove theoretically tight upper and lower bounds on the \emph{price of uncertainty}, a metric defined in previous work by Balcan et al. to quantify the effect of uncertainty in network games.
comment: 17 pages
FORMULA: FORmation MPC with neUral barrier Learning for safety Assurance
Multi-robot systems (MRS) are essential for large-scale applications such as disaster response, material transport, and warehouse logistics, yet ensuring robust, safety-aware formation control in cluttered and dynamic environments remains a major challenge. Existing model predictive control (MPC) approaches suffer from limitations in scalability and provable safety, while control barrier functions (CBFs), though principled for safety enforcement, are difficult to handcraft for large-scale nonlinear systems. This paper presents FORMULA, a safe distributed, learning-enhanced predictive control framework that integrates MPC with Control Lyapunov Functions (CLFs) for stability and neural network-based CBFs for decentralized safety, eliminating manual safety constraint design. This scheme maintains formation integrity during obstacle avoidance, resolves deadlocks in dense configurations, and reduces online computational load. Simulation results demonstrate that FORMULA enables scalable, safety-aware, formation-preserving navigation for multi-robot teams in complex environments.
comment: Accepted to IEEE Intelligent Vehicles Symposium (IV) 2026
Higher-Order Uncoupled Learning Dynamics and Nash Equilibrium
We study learnability of mixed-strategy Nash Equilibrium (NE) in general finite games using higher-order replicator dynamics as well as classes of higher-order uncoupled heterogeneous dynamics. In higher-order uncoupled learning dynamics, players have no access to utilities of opponents (uncoupled) but are allowed to use auxiliary states to further process information (higher-order). We establish a link between uncoupled learning and feedback stabilization with decentralized control. Using this association, we show that for any finite game with an isolated completely mixed-strategy NE, there exist higher-order uncoupled learning dynamics that lead (locally) to that NE. We further establish the lack of universality of learning dynamics by linking learning to the control theoretic concept of simultaneous stabilization. We construct two games such that any higher-order dynamics that learn the completely mixed-strategy NE of one of these games can never learn the completely mixed-strategy NE of the other. Next, motivated by imposing natural restrictions on allowable learning dynamics, we introduce the Asymptotic Best Response (ABR) property. Dynamics with the ABR property asymptotically learn a best response in environments that are asymptotically stationary. We show that the ABR property relates to an internal stability condition on higher-order learning dynamics. We provide conditions under which NE are compatible with the ABR property. Finally, we address learnability of mixed-strategy NE in the bandit setting using a bandit version of higher-order replicator dynamics.
Systems and Control (EESS)
SpecKV: Adaptive Speculative Decoding with Compression-Aware Gamma Selection
Speculative decoding accelerates large language model (LLM) inference by using a small draft model to propose candidate tokens that a larger target model verifies. A critical hyperparameter in this process is the speculation length~$γ$, which determines how many tokens the draft model proposes per step. Nearly all existing systems use a fixed~$γ$ (typically~4), yet empirical evidence suggests that the optimal value varies across task types and, crucially, depends on the compression level applied to the target model. In this paper, we present \textbf{SpecKV}, a lightweight adaptive controller that selects~$γ$ per speculation step using signals extracted from the draft model itself. We profile speculative decoding across 4~task categories, 4~speculation lengths, and 3~compression levels (FP16, INT8, NF4), collecting 5,112 step-level records with per-step acceptance rates, draft entropy, and draft confidence. We demonstrate that the optimal~$γ$ shifts across compression regimes and that draft model confidence and entropy are strong predictors of acceptance rate (correlation~$\approx 0.56$). SpecKV uses a small MLP trained on these signals to maximize expected tokens per speculation step, achieving a 56.0\% improvement over the fixed-$γ$=4 baseline with only 0.34\,ms overhead per decision ($<$0.5\% of step time). The improvement is statistically significant ($p < 0.001$, paired bootstrap test). We release all profiling data, trained models, and notebooks as open-source artifacts.
comment: 11 pages, 8 figures, 7 tables. Code and data available at: https://github.com/Amorfati123/SpecKV
Per-Platform GPIO Overhead in Hardware-Validated Edge ML Inference Timing
Edge machine learning (ML) deployments increasingly rely on per-inference timing measured by software clocks such as Python's perf_counter, but these measurements are not always validated against external hardware references on embedded Linux, and edge ML benchmarking methodologies typically do not isolate platform-dependent instrumentation overhead. This paper reports a preliminary characterization of GPIO call overhead in hardware-validated edge ML inference timing on two embedded platforms running a one-dimensional convolutional neural network (1-D CNN) arrhythmia classifier on electrocardiogram (ECG) data from the MIT-BIH Arrhythmia Database, with five classes per the Association for the Advancement of Medical Instrumentation (AAMI) EC57 standard. Across $n = 10$ trials on each platform at a controlled steady-state baseline, the per-platform constant on the Jetson Orin Nano (TensorRT FP16, Jetson.GPIO) is approximately $-20\,μ$s, and on the Raspberry Pi 4 (ONNX Runtime CPU, pigpio) approximately $-86\,μ$s, yielding a cross-platform asymmetry of approximately $66\,μ$s that is large relative to commonly used uniform validation tolerances. The Jetson constant is well-approximated by direct GPIO call duration (the direct profile accounts for ~88% of the platform constant), while the Pi direct profile over-predicts the platform constant by ~19%, motivating empirical per-platform calibration in the deployed measurement context. The Pi constant is not a single sharp value but exhibits a cross-day range of approximately $6\,μ$s across the three sessions sampled, while the Jetson constant reproduces to within approximately $0.14\,μ$s. These preliminary results suggest that cross-platform edge ML timing studies may benefit from platform-aware and potentially session-aware validation gates.
comment: 4 pages, 3 tables. Work in Progress. Submitted to IEEE SMC 2026 Work-in-Progress track
Tool Use as Action: Towards Agentic Control in Mobile Core Networks
Artificial Intelligence (AI) will play an essential role in 6G. It will fundamentally reshape the network architecture itself and drive major changes in the design of network entities, interfaces, and procedures. The adoption of agentic AI in next-generation networks is expected to enhance network intelligence and autonomy through agents capable of planning, reasoning, and acting, while also opening up new business opportunities. Under this vision, existing network functions are expected to evolve into AI-enabled agents and tools that deliver both connectivity and beyond-connectivity services. As an initial attempt to move toward this vision, this paper presents a tool-based interface design and an experimental prototype that are based on agentic AI for the mobile core network, with the Model Context Protocol (MCP) and the Agent2Agent (A2A) protocol as foundational protocols. MCP is selected to design the interface between the agent and network tools, and the A2A protocol is used for message exchange between AI agents. In such an experimental setup, we analyze packet-level message flows between the agents, tools, and network functions and break down the latency of end-to-end operations, starting from the prompt injection until the completion of the input task. This work demonstrates how an AI agent-based core network combined with network-specific tools can be utilized in next generation mobile systems to execute intent-based tasks.
Vision-Based Structural Damage Identification in Vibrating Beams via Dynamic Mode Decomposition
Structural damage detection using non-contact sensing remains a challenging problem in structural health monitoring. This study presents a data-driven framework based on Dynamic Mode Decomposition (DMD) for extracting structural dynamics directly from high-speed video recordings of vibrating structures. Within this approach, the underlying dynamics are approximated by a linear operator, whose spectral decomposition yields modal frequencies and corresponding spatial mode shapes, enabling a physically interpretable representation of the system response. The proposed methodology is evaluated through both numerical and experimental investigations. First, a cantilever beam model is simulated in ANSYS under healthy and damaged conditions. DMD is applied to partial observation data to reconstruct and predict the system response, while the extracted modal features are analyzed to characterize damage-induced variations. Second, high-speed video recordings of the beam are processed into spatiotemporal snapshot matrices, allowing DMD to recover full-field dynamic behavior without contact sensors. To enable quantitative assessment, a damage index is formulated based on DMD-derived modal features, capturing deviations in both frequency content and spatial characteristics. The results demonstrate consistent and distinguishable patterns between healthy and damaged states across both simulation and experiments, highlighting the capability of DMD as a robust and interpretable tool for non-contact damage detection using video data.
comment: Submitted to SMASIS 2026
Risk-Averse Ensemble Control for Control-Affine Systems
A number of important modern applications in optimal control can be formulated as open loop control problems in which the underlying dynamical systems are subject to random inputs. These so-called ensemble control problems require the corresponding optimal control to be deterministic, as it must be computed before the realization of uncertainty and the passage of time. Practical applications of ensemble control include quantum control and the training of Neural ODEs. However, the standard approach to ensemble control treats the uncertainty in the objective function via the expectation, which provides optimal controls that only work well on average while ignoring critical outlier phenomena. This study provides a comprehensive mathematical treatment of risk-averse ensemble control. Within this setting, we adopt a control-affine structure that ensures the lower semi-continuity needed for proving the existence of optimal solutions. The central analytical contribution of this paper is a rigorous characterization of the control-to-state mapping in which we establish weak-to-strong continuity, continuous Fréchet differentiability, and weak-to-strong continuity of the derivative operator. Furthermore, this regularity yields primal and dual first-order optimality conditions characterized by an adjoint state of bounded variation, and it fulfills the functional prerequisites required for the convergence of infinite dimensional optimization algorithms. We conclude by validating these theoretical developments through a numerical experiment in quantum control.
comment: 39 pages, 1 figure
Executor-Side Progressive Risk-Gated Actuation for Agentic AI in Wireless Supervisory Control
Agentic artificial intelligence (AI) shows promise for automating O-RAN wireless supervisory control, but translated intents still require an executor-side decision before live network actuation. Existing control flows lack explicit semantics for whether an intent should commit, gate for evidence, or reject under stale telemetry, concurrent policies, deadline and bandwidth limits, and rollback constraints. We propose Progressive Risk-Gated Actuation (PRGA), an executor-side contract for risk-gated wireless intent execution. PRGA structures each intent into executable local triage (C0), on-demand coordination evidence (C1), and post-hoc provenance support (C2), with C2 kept off the online safety path. A deterministic two-stage policy checks expiry, freshness, rollback-handle validity, local conflict, blocking preconditions, and planner-executor risk divergence from C0, then retrieves C1 only for gated intents when deadline and bandwidth budgets allow; evidence-mandatory gates reject when required C1 is unavailable. On two 3GPP-parameterized energy-saving and slice-SLA benchmarks, PRGA reduces time-to-first-safe-action by 23.3-27.4% and per-commit control-plane bytes by 52.7-54.2% against a decision-identical eager full-evidence cost-overlay comparator, thereby isolating retrieval-cost accounting; remains non-inferior within a pre-declared 0.5 percentage-point unsafe-action margin against an invariant-respecting static-threshold comparator; and rejects 100% of injected over-threshold stale inputs in the stale-state fault campaign. On these benchmarks, PRGA improves supervisory responsiveness and control-plane efficiency within the evaluated unsafe-action boundary.
PowerSINDy: Identifying Nonlinear Time-Dependent Dynamics in Power Grid Frequency
System identification plays a crucial role in physics and machine learning for discovering governing equations directly from data. A powerful approach is the Sparse Identification of Nonlinear Dynamics (SINDy) method, which assumes that only a few dominant terms drive the essential behavior of a nonlinear dynamical system. While SINDy methods have shown excellent results, they are most often illustrated on synthetic or simulated systems, leaving open the question of how well they perform on complex, noisy, real-world data. Power grid frequency dynamics provide a highly relevant and challenging environment for advancing system identification methods. In this work, we propose PowerSINDy as a framework for empirical power system data. We apply this framework to empirical frequency data from the Continental Europe (CE) and South Korea (SK) synchronous grids, two major power systems with distinct dynamical characteristics. PowerSINDy, which also includes time-dependent terms, can identify the dynamics of these complex real-world systems. Furthermore, we benchmark three sparsity-promoting regression strategies: Sequentially Thresholded Least Squares (STLSQ), Least Absolute Shrinkage and Selection Operator (LASSO), and Sparse Relaxed Regularized Regression (SR3) to evaluate trade-offs between accuracy, sparsity, and robustness. Results show that LASSO consistently achieves the lowest stable RMSEs, reaching 0.0101 for the CE, while STLSQ provides the best balance between accuracy and stability. SR3 exhibits higher variability and sensitivity to regularization, with L0 and L1 producing nearly indistinguishable outcomes.
Set-Based Training of Neural Barrier Certificates for Safety Verification of Dynamical Systems
Barrier certificates are scalar functions over the state space of dynamical systems that separate all unsafe states from all reachable states. The existence of a barrier certificate formally verifies the safety of the dynamical system. Recent approaches synthesize barrier certificates by iteratively training a neural network. In each iteration, the candidate is formally verified - if successful, the barrier certificate is found. Instead, we propose a set-based training approach that tightly integrates verification into training via a set-based loss function that soundly encodes all barrier certificate properties. A loss of zero formally proves the validity of the barrier certificate, collapsing the iterative training and verification into a single training procedure. Our experiments demonstrate that our set-based training approach scales well with the system dimension and naturally handles complex nonlinear dynamics.
Least Costly Space-Filling Experiment Design for the Identification of a Nonlinear System
The quality of an estimated nonlinear model highly depends on the data quality that was used for the system identification. By using a Gaussian Process-based optimal input design approach, a so-called space-filling dataset can be generated in the feature space of the system model. The design method is applicable for a broad type of signals and models and also incorporates information measures through optimality criteria into the signal design. However, the resulting input design can be costly to apply to the real system. The goal of this paper is to propose a space-filling input design that can minimize the experimentation cost in terms of a user defined measure, while still guaranteeing a prescribed level of space-fillingness. Through a Monte Carlo simulation study we demonstrate that the proposed method can appropriately shape the excitation signal to significantly reduce the experimental cost while the identified model performance remains adequate.
Data-Driven Optimal Distributed Controller Synthesis via Spatial Regret
In this paper, we present a novel method for synthesising an optimal distributed spatial regret controller using experimentally obtained frequency-response data. Spatial regret provides a measure of the performance gap between a structured distributed controller and an oracle with enhanced communication topology. We relax assumptions on the communication topology, allowing the oracle to adopt any enhanced structure. While this generalisation requires an iterative solution in place of a single convex program, we provide a tractable algorithm that synthesises optimal controllers from frequency-response data while preserving stability and the desired communication structure. Through numerical examples, we illustrate the better performance of the spatial regret controller compared to classical H2/Hinf designs, underscoring the effectiveness of the proposed methodology.
comment: Accepted for presentation and publication in the proceedings of the 2026 IFAC World Conference
Differentially Private Synthetic Voltage Phasor Release for Distribution Grids
Training machine learning models, including Grid Foundation Models (GFMs), requires large volumes of realistic grid data, yet substantial privacy concerns discourage utilities and data providers from sharing load profiles and network parameters. We study the release of synthetic voltage phasor trajectories for distribution grids under differential privacy (DP). We first fit a DP generative model to historical customer loads, then propagate synthetic load trajectories through the AC power flow equations on the true admittance matrix to produce voltage phasors. The central question is whether the randomness already present in the DP synthetic loads is sufficient to protect not only the loads, but also the network topology encoded by the bus admittance matrix. We show that it is. The implication is that a corpus of voltage trajectories can be constructed from DP synthetic loads while preserving the statistics of AC power flow, which is critical for training GFMs. This preservation of the power flow statistics stands in contrast to approaches that perturb the admittance matrix directly or inject noise into the voltage outputs, both of which distort the underlying physics. Concretely, we derive $(\varepsilon,δ)$-DP guarantees for the released voltage trajectories with respect to the admittance matrix, meaning privacy of the network parameters is obtained without any additional noise mechanism. Our bound depends on the adjacency assumption, the Jacobian of the AC power flow, and the covariance of the synthetic DP-loads. Finally, we present a synthetic voltage generation procedure and an empirical evaluation against Gaussian output-perturbation baselines, demonstrating that our approach provides a clear advantage for enabling GFM training.
comment: Submitted to IEEE TCNS
A Scalable 256-Antenna Distributed MIMO Testbed with Real-Time Fully Digital Beamforming
Distributed massive MIMO (D-MIMO) is a promising technology for future generation wireless systems as it takes advantage of both an increased array aperture and a decentralized processing architecture and topology. In order to truly understand the possibilities and limitations of these approaches in real scenarios, practical realization of testbeds is an essential step in the technology advancement. This work presents the Lund University Large Intelligent Surface testbed -- LuLIS, that can operate up to 256 coherent radio frequency (RF) chains using 16 AMD Zynq UltraScale RFSoC ZCU216 evaluation boards acting as distributed processing nodes. Real-time processing is facilitated by acceleration and distribution of MIMO processing algorithms on the FPGA fabric of the boards. The system is easily scalable, as increasing the number of antennas is done in multiples of 16 by adding more RFSoCs, which also implies addition of another processing node. The design allows up-scaling without hardware redesign, introduction of large latencies or data transfer overhead. The testbed is flexible in terms of deployment, with options of fully distributing the nodes (as in D-MIMO) or co-locating them (as in more traditional Massive MIMO). A detailed description of the implementation of the testbed is presented and initial results are shown for an uplink (UL) transmission from four single-antenna user equipments (UEs) to 64, 128 and 256 base-station antennas.
comment: This work has been submitted to the IEEE for possible publication
A Fresh Look on Network Synchronization
This paper gives a fresh look at network synchronization. Here we no longer analyze it from the view of mathematics, such as graph theory, while we probe into one from control theory. First, we analyze the synchronization region using the inner coupling matrix, giving up the routine method of studying the network structure. The motivation comes from the inner coupling matrix that is not subject to any restrictions like network structure, such as distance and communication strength among nodes. It can be configured at will to meet the synchronization performance if only the states of the local dynamic are measurable or observable and the communication topology is connected. Thus, it is very useful for future practical engineering design. In addition, we have an amazing finding that the network synchronization and multi-agent system consensus problems are equivalence essentially. Afterwards a unified viewpoint, that is, the essence of multi-agent consensus control is the same as that of network synchronization, is present. Here, the equivalence relation is clearly proven and proposed. Therefore, we can synthesize the inner coupling matrix for network systems or the controller gain for multi-agent systems for each other. Finally, we also present a kind of method for addressing the nonlinear complex network system. Then the effectiveness of method is verified by taking the network of the three-oscillator universal probe as an example.
Robust Adaptive Predictive Control for Hook-Based Aerial Transportation Between Moving Platforms
This paper presents a novel model predictive control (MPC) approach for autonomous pick-and-place between moving platforms with a hook-equipped aerial manipulator. First, for accurate and rapid modeling of the complex dynamics, a digital twin model of the quadcopter equipped with a hook-based gripper, implemented in MuJoCo, is constructed and used as the predictive model for the MPC. To handle uncertainties of the predictive model (e.g. due to aerodynamics and uncertain payloads), a robust adaptive MPC approach is proposed. By systematic integration of zero-order robust optimization (zoRO) based uncertainty propagation and an extended Kalman filter (EKF) for parameter estimation, the MPC algorithm ensures robust constraint satisfaction, high performance, and computational efficiency. The effectiveness of the proposed method is evaluated in complex simulated scenarios and in real-world flight experiments.
comment: Supplementary video: https://youtu.be/l_L7mpUYJqU
Feedback Motion Planning for Stochastic Nonlinear Systems with Signal Temporal Logic Specifications
We study feedback motion planning for continuous-time stochastic nonlinear systems under signal temporal logic (STL) specifications. We propose a framework that synthesizes control policies for chance-constrained STL trajectory optimization problems, with the goal of ensuring that the closed-loop stochastic system satisfies a given STL formula with high probability (e.g., 99.99\%). Our approach is based on a predicate erosion strategy that transforms the intractable stochastic problem into a deterministic STL trajectory optimization problem with tightened STL formula constraints. The amount of erosion is determined by a probabilistic reachable tube (PRT) that bounds the deviation between the stochastic trajectory and an associated nominal trajectory. To compute such bounds, we leverage contraction theory and feedback design, and develop several tracking controllers. This yields a complete feedback motion planning pipeline which can be implemented by numerical optimizations. We demonstrate the efficacy and versatility of the proposed framework through simulations on several robotic systems and through experiments on a real-world quadrupedal robot, and show that it is less conservative and achieves higher specification satisfaction probability than representative baselines.
Risk-Based PV-Rich Distribution System Planning Using Generative AI
Hosting capacity (HC) assessment plays a critical role in distribution system planning under increasing penetration of distributed energy resources (DERs) and associated uncertainties in load and generation. However, conventional approaches often rely on deterministic worst-case evaluation, leading to overly conservative HC estimates. This paper introduces a risk-based framework for HC assessment that explicitly accounts for the frequency, intensity, and duration of voltage violations under uncertain operating conditions. A generative AI-based approach is employed to generate realistic, time-correlated load demand scenarios conditioned on projected energy consumption growth levels. These scenarios are then used to assess voltage violations and quantify their risk using probabilistic intensity, duration, and frequency (IDF) metrics. The results show that extreme-percentile (zero-risk) approaches significantly underestimate PV-HC by treating all violations equally, regardless of their likelihood or persistence. For instance, allowing a 5\% risk level increases HC by approximately 18\% for a 15~min violation duration. The proposed approach provides a practical tool for risk-informed distribution system planning under uncertainty.
Efficient Multi-Market Scheduling of Virtual Power Plants via Spectral Representation of Uncertainty
As the penetration of distributed energy resources increases, harnessing their flexibility becomes critical for power system operations. Virtual power plants (VPPs) offer a promising solution. However, existing VPP market scheduling tools exhibit a tradeoff between economic performance and tractability. Stochastic formulations provide probabilistically optimal decisions but are computationally intractable for large systems due to scenario explosion. Robust approaches are more tractable but often yield conservative decisions. This paper addresses this gap by proposing a stochastic multi-market VPP scheduling framework that represents uncertainty in the spectral domain via intrusive Polynomial Chaos Expansion (PCE). The resulting reformulation yields a low-dimensional deterministic spectral counterpart that preserves the stochastic structure and can be solved efficiently with standard optimization tools. The proposed spectral approach is demonstrated on a DER-based VPP operating on a realistic Swiss low-voltage grid and benchmarked against a state-of-the-art scenario-based solution. Results show that intrusive PCE achieves solution quality comparable to the scenario-based benchmark, with up to a 137 times reduction in computational effort, while yielding highly accurate bidding decisions. Finally, to facilitate adoption and reproducibility, we release an open-source, application-agnostic projection tool that automates the spectral reformulation for generic single- and two-stage stochastic programs.
SkillCom: Decomposing LLM-based Semantic Communication into Task and Channel Aware Skills
Large language models (LLMs) are increasingly used as semantic encoders and decoders in semantic communication. However, current LLM based systems mostly remain monolithic: a single prompted model, or a tightly coupled transmitter/receiver pair, must jointly perform semantic encoding, channel adaptation, and semantic decoding. Such coupling makes intermediate decisions difficult to control, diagnose, or replace, and may cause channel corruption to propagate through a compressed source representation. To address the limitations, we propose \textbf{SkillCom}, a modular framework that decomposes LLM-based semantic communication into four explicit skills: semantic abstraction skill, channel-adaptive transmission skill, receiver-side repair skill, and task execution skill. These skills are interconnected through typed semantic-unit interfaces. Thus, transmission operates on structured unit-level representations rather than on one monolithic text block. This design localizes channel impairment, enables targeted repair from successfully received units, and supports stage-wise ablation and single-skill replacement under matched communication constraints. Experiments on multi-hop question answering and dialogue state tracking show that SkillCom consistently outperforms the monolithic LLM baseline, remains more robust under varying channel conditions, and exhibits task-dependent preferences over skill realizations. The results suggest that explicit skill decomposition provides a more robust and diagnosable foundation for LLM-based semantic communication than monolithic methods.
Should Small-Scale Data Centers Participate in the Day-Ahead Electricity Market?
The global race to artificial intelligence competitive advantage is challenging electricity grids by demanding growing data center capacity. Addressing this challenge requires synergistic operational strategies that integrate data centers into electricity markets while supporting grid operation. This work proposes a bilateral power purchase agreement between small-scale data centers and distribution system operators, enabling data center participation in the day-ahead electricity market. To facilitate market participation, we develop a scenario-based, risk-averse bidding strategy that leverages flexibility from local energy resources, waste heat recovery, and data center workload. The strategy jointly minimizes operational costs and carbon emissions, creating a carbon-aware cost-effective framework for data center integration in the electricity day-ahead market. The method is evaluated on a study case comparing a conventional time-of-use supply scheme with the proposed custom power purchase agreement, showing a potential 22\% cost reduction, thus highlighting financial opportunities for small-scale data centers day-ahead electricity market participation. Two additional case studies illustrate the marginal effects of: (i) data center flexible workload on energy costs and (ii) virtual de-rating of grid transfer capacity.
comment: 10 pages, 5 figures. The paper is under review for IEEE transactions on Energy Markets, Policy and Regulation
Natural Gradient Bayesian Filtering: Geometry-Aware Filter for Dynamical Systems
Bayesian filtering is a cornerstone of state estimation in complex systems such as aerospace systems, yet exact solutions are available only for linear Gaussian models. In practice,nonlinear systems are handled through tractable approximations,with Gaussian filters such as the extended and unscented Kalman filters being among the most widely used methods. This tutorial revisits Gaussian filtering from an information-geometric perspective, viewing the prediction and measurement update steps as inference procedures over state distributions. Within this framework, we introduce a geometry-aware Gaussian filtering approach that leverages natural gradient descent on the statistical manifold of Gaussian distributions. The resulting Natural Gradient Gaussian Approximation (NANO) filter iteratively refines the posterior mean and covariance while respecting the intrinsic geometry of the Gaussian family and preserving the positive definiteness of the covariance matrix. We further highlight fundamental connections to the classical Kalman filtering, showing that a single natural-gradient step exactly recovers the Kalman measurement update in the linear-Gaussian case. The practical implications of the proposed framework are illustrated through case studies in representative nonlinear estimation problems,including satellite attitude estimation, simultaneous localization and mapping, and state estimation for robotic systems including quadruped and humanoid robots.
Distributed Observer-based Fault Detection over Intelligent Networked Multi-Vehicle Systems
Decentralized strategies are of interest for local decision-making over multi-vehicle networks. This paper studies mixed traffic networks of human-driven and autonomous vehicles with partial sensor measurements. The idea is to enable the group of connected autonomous vehicles (CAVs) to track the state of a group of human-driven vehicles (HDVs) via distributed consensus-based observers/estimators. Particularly, we make no assumption that the group of HDVs is locally observable in the direct neighborhood of any CAV. Then, the main contribution is to design local residual-based fault detection and isolation (FDI) at every CAV to detect possible faults/attacks in the sensor measurements. This distributed detection strategy enables every CAV to locally find possible anomalies in its taken sensor measurement with no need for a central processing unit. Two FDI logics are proposed with and without considering the history of the residuals. These FDI techniques are based on probabilistic threshold design on the residuals (in contrast to the existing deterministic threshold FDI techniques) with no assumption that the noise is of bounded support. This is more realistic in real-world multi-vehicle transportation systems.
comment: European journal of control
Awareness in collective decision-making: Modeling and control in a game-theoretic framework
For a society to remain healthy and prosperous, people must collectively behave and act to contribute to the common good, even if there is often a tradeoff against their individual benefit. Paradigmatic examples include the adoption of sustainable behaviors and technologies to combat the climate crisis, and the mobilization for collective action to promote the rights and freedoms of repressed minorities. In this tutorial, we illustrate how game theory and network systems theory can be powerful tools to model and study this collective decision-making problem. We provide examples of how awareness of this tradeoff can impact collective change toward the societal good, exploring different problem contexts such as sustainable behavior and collective action. Finally, we review recent developments using systems and control-theoretic approaches to generate awareness and guide the emergent population dynamics towards a desired outcome, and conclude by highlighting new research and application frontiers.
comment: Manuscript submitted to IEEE Control Systems Magazine
Frequency-Domain Compliance Assessment of Grid-Forming Devices
Grid-ForMing Inverters (GFMIs) are expected to provide voltage stiffness to the grid. Explicitly, system operators (SOs) and regulators expect GFMIs to behave like a "voltage source behind impedance (VSBI)" in the (sub)-transient time frame. SOs assess this VSBI characteristic of GFMIs during compliance by defining a pass-fail time-domain criterion. This is done by evaluating the GFMIs' active (or reactive) power/current response to step changes in voltage phase (and magnitude) at its terminals. However, this approach is prone to errors due to poorly defined measurement specifications for very fast (less than a cycle) transients. To address this, this work proposes a compliance criterion for the VSBI characteristic of GFMIs in the frequency domain based on elements of the frequency-domain Jacobian. The compliance criterion is defined in terms of the minimum expected P(s)/θ(s) and Q(s)/V(s) Bode plot characteristics across a specific frequency range. The equivalence between the time-domain and frequency-domain criteria is established. The proposed method is demonstrated by assessing the compliance of generic NLR (formerly NREL) GFMI models in PSCAD. Furthermore, the impact of GFMI compliance on the small-signal stability of the IEEE 39-bus bulk-power system is demonstrated.
Feedback Linearization-Based Guidance with Zero-Dynamics Correction for Guaranteed Interception
This paper develops a guidance law for nonlinear interception using input-output feedback linearization (IOL). The engagement between a pursuer and an evader is modeled using point-mass dynamics, and a baseline IOL-based guidance law is constructed by regulating the angular rates of the line-of-sight (LOS) vector. While this approach yields stable input-output behavior, it does not constrain the internal (zero) dynamics of the system, which can result in non-intercepting trajectories despite successful regulation of the LOS rates. To address this limitation, a modified IOL-based guidance law is proposed that incorporates a correction mechanism to enforce convergence of the range. The resulting formulation ensures that LOS alignment corresponds to a closing trajectory, thereby enabling convergence of the pursuer to the evader for a broad class of initial engagement geometries. The proposed method retains the computational simplicity and real-time implementability of feedback linearization while improving closed-loop performance relative to classical guidance laws. Extensive Monte Carlo simulations over a wide range of initial conditions are conducted to evaluate the proposed method. The results demonstrate improved reliability, reduced miss distance, and consistent convergence compared to the baseline IOL and classical proportional navigation.
High-Fidelity Full-Sky Video Prediction for Photovoltaic Ramp Event Forecasting
Accurate ultra-short-term forecasting of photovoltaic (PV) ramp events is essential for maintaining grid stability in solar-integrated power systems, particularly under rapidly changing cloud conditions. This paper presents a generative forecasting framework that integrates a future sky video prediction model (PhyDiffNet) with a ramp aware PV output forecasting model (RaPVFormer). Based on the relatively slow yet chaotic dynamics of cloud motion, the system forecasts ramp events up to 16 minutes in advance at a 1-minute resolution by capturing fine-grained spatiotemporal cloud patterns and generating high-fidelity full-sky video frames. Interpretability is enhanced through attention visualization, highlighting cloud occlusion regions that significantly influence irradiance variability. Supported by extensive quantitative evaluation, the proposed framework demonstrates state-of-the-art performance in both full-sky video prediction and PV output forecasting. It delivers consistent improvements in structural, perceptual, and temporal video quality, along with a 10% increase in Critical Success Index (CSI) for PV ramp detection. These results demonstrate the capability of AI driven multimodal sensing for ultra short term solar forecasting, supporting more reliable renewable integration and potentially reducing dependence on reserve capacity.
comment: 10 pages, 11 figures
From Barrier to Bridge: The Case for AI Data Center/Power Grid Co-Design
For over a century, the electric grid has relied on a single statistical assumption: \emph{load diversity}, the principle that the uncorrelated demands of millions of small consumers produce a smooth, predictable aggregate. AI training data centers break that assumption. A single hyperscale training campus can draw power comparable to a mid-sized city, driven by one tightly synchronized job whose demand swings by hundreds of megawatts in seconds. This paper argues that the resulting entanglement of compute and power infrastructure requires a shift from implicit coexistence to explicit co-development between the historically decoupled data center and electric power industries. We introduce the distinct design principles, operational philosophies, and economic incentives of each sector, and show why their cultural and technical misalignment makes coordination difficult. We identify key research directions, from joint capacity planning, multi-timescale control, a compute--power protocol stack, to market innovation, that must be pursued to power the future of AI sustainably and reliably.
Stable Agentic Control: Tool-Mediated LLM Architecture for Autonomous Cyber Defense
Agentic systems involved in high-stake decision-making under adversarial pressure need formal guarantees not offered by existing approaches. Motivated by the operational needs of security operations centers (SOCs) that must configure endpoint detection and response (EDR) policies under adversarial pressure, we present a tool-mediated architecture: LLM agents use deterministic tools (Stackelberg best-response, Bayesian observer updates, attack-graph primitives) and select from finite action catalogs enforced at the tool-output interface. A composite Lyapunov function machine-checked in Lean 4 with zero sorry certifies controllability, observability from asymmetric sensor data, and Input-to-State Stability (ISS) robustness under intelligent adversarial disturbance, with two corollaries extending the certificate to any controller or adversary from the catalogs. On 282 real enterprise attack graphs, the claims hold with margin. On paired offensive/defensive telemetry, a tool-mediated Claude Sonnet 4 controller reduces the attacker's expected payoff (game value) by 59% relative to a deterministic greedy baseline, with zero variance across 40 runs at four temperatures. A Claude Haiku 4.5 controller converges to suboptimal game values but stays catalog-bounded over an additional 40 runs, demonstrating that architectural stability is not dependent on the controller capability. The LLM agent's non-determinism furthers creative exploration of strategies, while the tool-mediated architecture ensures system stability.
comment: 23 pages total (9 main paper + 16 appendices/references), 2 figures
Market Power and Distributed Solar Integration in Microgrids under Limited Regulation
Decentralized electricity systems increasingly emerge where centralized grids fail to provide reliable supply. In such settings, privately operated neighborhood microgrids, often based on diesel generators, exhibit significant market power, limited regulatory oversight, and high environmental externalities. In parallel, households increasingly deploy off-grid solar photovoltaic (PV) systems to gain control over electricity supply. However, these systems suffer from curtailed excess generation during peak solar hours and unreliable access at other times. While prior studies have optimized microgrids in low-reliability grid contexts from a techno-economic perspective, they largely neglect the market power exerted by monopolistic private generators. This paper addresses this gap by developing a bi-level game-theoretic model that enables household-generated electricity to be fed into the microgrid while explicitly accounting for the market power of a neighborhood diesel generator company (DGC). The regulator sets price and feed-in-tariff caps to maximize household economic surplus (HES), while the DGC acts as a profit-maximizing agent controlling access and supply. The model is illustrated using high-resolution empirical data from Lebanon. Results show that: (i) price and feed-in-tariff caps substantially increase HES and consistently induce significant household PV feed-in to the microgrid; (ii) higher DGC budgets or greater PV-owner penetration lead to pronounced gains in HES; and (iii) the renewable energy share reaches 60% under base conditions and approaches 100% at sufficiently high budgets or PV-owner penetration levels, compared to 0% under the status quo.
comment: Submitted to a peer-reviewed journal
Recursive Sparse Parameter Identification of Multivariate ARMAX Systems with Non-stationary Observations and Colored Noise
The classical sparse parameter identification methods are usually based on the iterative basis selection such as greedy algorithms, or the numerical optimization of regularized cost functions such as LASSO and Bayesian posterior probability distribution, etc., which, however, are not suitable for online sparsity inference when data arrive sequentially. This paper presents recursive algorithms for sparse parameter identification of multivariate stochastic systems with non-stationary observations. First, a new bivariate criterion function is presented by introducing an auxiliary variable matrix into a weighted $L_1$ regularization criterion. The new criterion function is subsequently decomposed into two solvable subproblems via alternating optimization of the two variable matrices, for which the optimizers can be explicitly formulated into recursive equations. Second, under the non-stationary and non-persistent excitation conditions on the systems, theoretical properties of the recursive algorithms are established. That is, the estimates are proved to be with (i) set convergence, i.e., the accurate estimation of the sparse index set of the unknown parameter matrix, and (ii) parameter convergence, i.e., the consistent estimation for values of the non-zero elements of the unknown parameter matrix. Finally, numerical examples are given to support the theoretical analysis.
Trajectory control of a suspended load with non-stopping flying carriers
This work presents the first closed-loop control framework for cooperative payload transportation with non-stopping flying carriers. The proposed method includes a feedback wrench-controller that actively regulates the load's pose by computing the wrench required for tracking its desired pose trajectory. Building upon grasp-matrix formulation and internal force redundancy, an optimization layer dynamically shapes internal-force parameters to guarantee persistent carrier motion, while not altering the desired load wrench. The desired non-stopping carrier's trajectories are computed using the system's kinematics and desired cable forces. Numerical simulations demonstrate that the method successfully prevents the carriers from stopping, while achieving a successful tracking of the desired load trajectory.
comment: Accepted for publication at the 2026 International Conference on Unmanned Aircraft Systems, ICUAS 2026
ROBOPOL: Social Robotics Meets Vehicular Communications for Cooperative Automated Driving
On the way toward full autonomy, sharing roads between automated and autonomous vehicles in so-called mixed traffic is unavoidable. Moreover, even if all vehicles on the road were autonomous, pedestrians would still cross streets. We propose social robots as moderators between autonomous vehicles and vulnerable road users. This paper presents a first proof-of-concept integration of a social robot advising pedestrians in crossing scenarios involving a cooperative automated vehicle. We also discuss key enablers required for designing "robot policeman" in a generic use case of cooperative intersection management. Our work provides a vision of the role of social robotics in future Cooperative Intelligent Transport Systems.
Safe Planning in Interactive Environments via Iterative Policy Updates and Adversarially Robust Conformal Prediction
Safe planning of an autonomous agent in interactive environments -- such as the control of a self-driving vehicle among pedestrians -- poses a major challenge as the behavior of the environment is unknown and reactive to the behavior of the autonomous agent. This coupling gives rise to interaction-driven distribution shifts where the autonomous agent's control policy may change the environment's behavior, thereby invalidating safety guarantees in existing work. Indeed, recent works have used conformal prediction (CP) to generate distribution-free safety guarantees using observed data of the environment. However, CP's assumption on data exchangeability is violated in interactive settings due to a circular dependency where a control policy update changes the environment's behavior, and vice versa. To address this gap, we propose an iterative framework that robustly maintains safety guarantees across policy updates by quantifying the potential impact of a planned policy update on the environment's behavior. We realize this via adversarially robust CP where we perform a regular CP step in each episode using observed data under the current policy, but then transfer safety guarantees across policy updates by analytically adjusting the CP result to account for distribution shifts. This adjustment is performed based on a policy-to-trajectory sensitivity analysis, resulting in a safe, episodic open-loop planner. We further conduct a contraction analysis of the system providing conditions under which both the CP results and the policy updates are guaranteed to converge. We empirically demonstrate these safety and convergence guarantees on a two-dimensional car-pedestrian and a high-dimensional quadcopter case study. To the best of our knowledge, these are the first results that provide valid safety guarantees in such interactive settings.
Algorithmic Power Optimisation in Constrained Railway Networks: A Systematic Review
The decarbonisation of heavy-duty railway networks requires maximising the capacity of existing electrical infrastructure. Integrating heavy freight alongside fast passenger services exposes the hard physical limits of conventional alternating current traction networks, causing severe localised power quality degradation, phase unbalance, and low-voltage behaviour that triggers protective substation tripping. Because upgrading physical hardware is highly capital-intensive, software-based Energy Management Strategies have the potential to offer viable solution for preventing these power capacity challenges. This systematic review demonstrates that traditional, single-train optimisations are fundamentally "grid-blind", necessitating a shift toward multi-train simulations to protect the network's Firm Service Capacity. However, evaluating this shift reveals a critical tension between the computational bottlenecks of deterministic models and the latency of heuristic approaches. Furthermore, a fundamental operational gap exists: while current algorithms generate theoretically optimal speed profiles to increase efficiency and therefore reduce power consumption from the grid, these profiles are excessively complex and inappropriate for human execution. Consequently, future energy management frameworks must bridge this human-machine interface gap to realise capacity improvements on constrained mixed-traffic networks.
A Miniaturized Broadband 1-Bit Coding Reconfigurable Intelligent Surface for NLOS UE Localization and Uplink Communication
In this paper, a broadband 1-bit coding metasurface-based reconfigurable intelligent surface (RIS) is presented. The unit cell of the metasurface consists of a wide dipole modified with interdigital capacitors and loaded with an SMP 1340-040LF PIN diode. The proposed element offers cell miniaturization and a stable angular response. A phase difference of 180$\degree \pm$ 30$\degree$ is achieved for a frequency range of 4.85-6.05 GHz between the ON and OFF states for the normal incidence of the TE polarized wave, whereas it provides a fairly stable response with reflection loss of less than 3 dB and phase difference of 180$\degree$ $\pm$ 50$\degree$ for oblique incidence up to 45$\degree$. The RF is isolated from the DC on the bias lines using properly designed butterfly-shaped radial stubs. Using this unit cell, a prototype with an array of 16 $\times$ 10 elements is constructed. A low-cost microcontroller-based control circuit is designed, which can be plugged-in for biasing the PIN diodes of such array. The theoretically calculated and full-wave simulated radiation patterns of the array are validated using experiments inside anechoic chamber. Furthermore, the capability of the RIS for non-line of sight (NLOS) user equipment (UE) localization and robust uplink communication is demonstrated using LTE communication framework. This shows great potential of our RIS for applications, such as in unmanned aerial vehicle (UAV) localization and its uplink communication at NLOS or extended range.
comment: This work has been submitted to the IEEE for possible publication
LCL Resonance Analysis and Damping in Single-Loop Grid-Forming Wind Turbines
A common assumption in both grid-following (GFL) and grid-forming (GFM) control systems is that they are open-loop (OL) stable in the vicinity of high-frequency resonances. Hence classical loop-shaping approaches are often used for establishing stability margins and designing active damping (AD) strategies. This paper shows that single-loop GFM (SL-GFM) control schemes incorporating a widely used class of reactive power (RAP) control, referred to as droop-I control, can lead to OL unstable poles. This finding reveals a novel instability mechanism resulting in a reduced stability margin and robustness at high frequencies. The sensitivity of this phenomenon to both RAP and electrical parameters is analyzed in detail. An AD design that explicitly accounts for the newly identified instability mechanism is proposed. We also provide a comparison between such SL-GFM and well-studied GFL control schemes, highlighting quite different resonance features between them. Validation is performed through experiments.
GOSPA-Driven Non-Myopic Multi-Sensor Management with Multi-Bernoulli Filtering
In this paper, we propose a non-myopic sensor management algorithm for multi-target tracking, with multiple sensors operating in the same surveillance area. The algorithm is based on multi-Bernoulli filtering and selects the actions that solve a non-myopic minimisation problem, where the cost function is the mean square generalised optimal sub-pattern assignment (GOSPA) error, over a future time window. For tractability, the sensor management algorithm actually uses an upper bound of the GOSPA error and is implemented via Monte Carlo Tree Search (MCTS). The sensors have the ability to jointly optimise and select their actions with the considerations of all other sensors in the surveillance area. The benefits of the proposed algorithm are analysed via simulations.
comment: submitted to Elsevier Signal Processing May 2026
Safe Navigation using Neural Radiance Fields via Reachable Sets
Safe navigation in cluttered environments is an important challenge for autonomous systems. Robots navigating through obstacle ridden scenarios need to be able to navigate safely in the presence of obstacles, goals, and ego objects of varying geometries. In this work, reachable set representations of the robot's real-time capabilities in the state space can be utilized to capture safe navigation requirements. While neural radiance fields (NeRFs) are utilized to compute, store, and manipulate the volumetric representations of the obstacles, or ego vehicle, as needed. Constrained optimal control is employed to represent the resulting path planning problem, involving linear matrix inequality constraints. We present simulation results for path planning in the presence of numerous obstacles in two different scenarios. Safe navigation is demonstrated through using reachable sets in the corresponding constrained optimal control problems.
comment: 5 pages, 8 figures, 2026 4th International Conference on Mechatronics, Control and Robotics (ICMCR)
Optimizing Grasping in Legged Robots: A Deep Learning Approach to Loco-Manipulation
This paper presents a deep learning framework designed to enhance the grasping capabilities of quadrupeds equipped with arms, with a focus on improving precision and adaptability. Our approach centers on a sim-to-real methodology that minimizes reliance on physical data collection. We developed a pipeline within the Genesis simulation environment to generate a synthetic dataset of grasp attempts on common objects. By simulating thousands of interactions from various perspectives, we created pixel-wise annotated grasp-quality maps to serve as the ground truth for our model. This dataset was used to train a custom CNN with a U-Net-like architecture that processes multi-modal input from an onboard RGB and depth cameras, including RGB images, depth maps, segmentation masks, and surface normal maps. The trained model outputs a grasp-quality heatmap to identify the optimal grasp point. We validated the complete framework on a four-legged robot. The system successfully executed a full loco-manipulation task: autonomously navigating to a target object, perceiving it with its sensors, predicting the optimal grasp pose using our model, and performing a precise grasp. This work proves that leveraging simulated training with advanced sensing offers a scalable and effective solution for object handling.
A Vision-Based Shared-Control Teleoperation Scheme for Controlling the Robotic Arm of a Four-Legged Robot
In hazardous and remote environments, robotic systems perform critical tasks demanding improved safety and efficiency. Among these, quadruped robots with manipulator arms offer mobility and versatility for complex operations. However, teleoperating quadruped robots is challenging due to the lack of integrated obstacle detection and intuitive control methods for the robotic arm, increasing collision risks in confined or dynamically changing workspaces. Teleoperation via joysticks or pads can be non-intuitive and demands a high level of expertise due to its complexity, culminating in a high cognitive load on the operator. To address this challenge, a teleoperation approach that directly maps human arm movements to the robotic manipulator offers a simpler and more accessible solution. This work proposes an intuitive remote control by leveraging a vision-based pose estimation pipeline that utilizes an external camera with a machine learning-based model to detect the operator's wrist position. The system maps these wrist movements into robotic arm commands to control the robot's arm in real-time. A trajectory planner ensures safe teleoperation by detecting and preventing collisions with both obstacles and the robotic arm itself. The system was validated on the real robot, demonstrating robust performance in real-time control. This teleoperation approach provides a cost-effective solution for industrial applications where safety, precision, and ease of use are paramount, ensuring reliable and intuitive robotic control in high-risk environments.
Viewpoint-Agnostic Grasp Pipeline using VLM and Partial Observations
Robust grasping in cluttered, unstructured environments remains challenging for mobile legged manipulators due to occlusions that lead to partial observations, unreliable depth estimates, and the need for collision-free, execution-feasible approaches. In this paper we present an end-to-end pipeline for language-guided grasping that bridges open-vocabulary target selection to safe grasp execution on a real robot. Given a natural-language command, the system grounds the target in RGB using open-vocabulary detection and promptable instance segmentation, extracts an object-centric point cloud from RGB-D, and improves geometric reliability under occlusion via back-projected depth compensation and two-stage point cloud completion. We then generate and collision-filter 6-DoF grasp candidates and select an executable grasp using safety-oriented heuristics that account for reachability, approach feasibility, and clearance. We evaluate the method on a quadruped robot with an arm in two cluttered tabletop scenarios, using paired trials against a view-dependent baseline. The proposed approach achieves a 90% overall success rate (9/10) against 30% (3/10) for the baseline, demonstrating substantially improved robustness to occlusions and partial observations in clutter.
Efficient Planning in Large-scale Systems Using Hierarchical Finite State Machines
We consider optimal planning in a large-scale system formalised as a hierarchical finite state machine (HFSM). A planning algorithm is proposed computing an optimal plan between any two states in the HFSM, consisting of two steps: A pre-processing step that computes optimal exit costs of the machines in the HFSM, with time complexity scaling with the number of machines; and a query step that efficiently computes an optimal plan by removing irrelevant subtrees of the HFSM using the optimal exit costs. The algorithm is reconfigurable in the sense that changes in the HFSM are handled with ease, where the pre-processing step recomputes only the optimal exit costs affected by the change. The algorithm can also exploit compact representations that groups together identical machines in the HFSM, where the algorithm only needs to compute the optimal exit costs for one of the identical machines within each group, thereby avoid unnecessary recomputations. We validate the algorithm on large systems with millions of states and a robotic application. It is shown that our approach outperforms Dijkstra's algorithm, Bidirectional Dijkstra and Contraction Hierarchies.
comment: General update to the paper
Higher-Order Uncoupled Learning Dynamics and Nash Equilibrium
We study learnability of mixed-strategy Nash Equilibrium (NE) in general finite games using higher-order replicator dynamics as well as classes of higher-order uncoupled heterogeneous dynamics. In higher-order uncoupled learning dynamics, players have no access to utilities of opponents (uncoupled) but are allowed to use auxiliary states to further process information (higher-order). We establish a link between uncoupled learning and feedback stabilization with decentralized control. Using this association, we show that for any finite game with an isolated completely mixed-strategy NE, there exist higher-order uncoupled learning dynamics that lead (locally) to that NE. We further establish the lack of universality of learning dynamics by linking learning to the control theoretic concept of simultaneous stabilization. We construct two games such that any higher-order dynamics that learn the completely mixed-strategy NE of one of these games can never learn the completely mixed-strategy NE of the other. Next, motivated by imposing natural restrictions on allowable learning dynamics, we introduce the Asymptotic Best Response (ABR) property. Dynamics with the ABR property asymptotically learn a best response in environments that are asymptotically stationary. We show that the ABR property relates to an internal stability condition on higher-order learning dynamics. We provide conditions under which NE are compatible with the ABR property. Finally, we address learnability of mixed-strategy NE in the bandit setting using a bandit version of higher-order replicator dynamics.
An Error Bound for Aggregation in Approximate Dynamic Programming
We consider a general aggregation framework for discounted finite-state infinite horizon dynamic programming (DP) problems. It defines an aggregate problem whose optimal cost function can be obtained off-line by exact DP and then used as a terminal cost approximation for an on-line reinforcement learning (RL) scheme. We derive a bound on the error between the optimal cost functions of the aggregate problem and the original problem. This bound was first derived by Tsitsiklis and van Roy [TvR96] for the special case of hard aggregation. Our bound is similar but applies far more broadly, including to soft aggregation and feature-based aggregation schemes.
Robotics
Observability Conditions and Filter Design for Visual Pose Estimation via Dual Quaternions
This paper presents a dual quaternion framework for 6-DOF visual target tracking that addresses key limitations of perspective-n-point (P$n$P) solvers: sensitivity to noise and outliers, and inability to propagate estimates through measurement dropouts. A nonlinear observability analysis is performed using a Lie algebraic approach, deriving sufficient conditions for local observability under two sensing modalities: relative position vector and unit vector measurements. For the unit vector case, the classical collinear feature point degeneracy of the perspective-three-point problem is recovered through rank analysis of the observability codistribution matrix, providing a control-theoretic interpretation of a previously geometric result. A dual quaternion Lie group unscented Kalman filter is then developed, directly modeling relative dynamics without assumptions about cooperative measurements or slowly-varying motion. Simulations demonstrate improved pose estimation accuracy and robustness to occlusions compared to an off-the-shelf P$n$P solver. Results are broadly applicable to visual-inertial navigation, simultaneous localization and mapping, and P$n$P solver development.
comment: 3 tables, 5 figures
VILAS: A VLA-Integrated Low-cost Architecture with Soft Grasping for Robotic Manipulation
We present VILAS, a fully low-cost, modular robotic manipulation platform designed to support end-to-end vision-language-action (VLA) policy learning and deployment on accessible hardware. The system integrates a Fairino FR5 collaborative arm, a Jodell RG52-50 electric gripper, and a dual-camera perception module, unified through a ZMQ-based communication architecture that seamlessly coordinates teleoperation, data collection, and policy deployment within a single framework. To enable safe manipulation of fragile objects without relying on explicit force sensing, we design a kirigami-based soft compliant gripper extension that induces predictable deformation under compressive loading, providing gentle and repeatable contact with delicate targets. We deploy and evaluate three state-of-the-art VLA models on the VILAS platform: pi_0, pi_0.5, and GR00T N1.6. All models are fine-tuned from publicly released pretrained checkpoints using an identical demonstration dataset collected via our teleoperation pipeline. Experiments on a grape grasping task validate the effectiveness of the proposed system, confirming that capable manipulation policies can be successfully trained and deployed on low-cost modular hardware. Our results further provide practical insights into the deployment characteristics of current VLA models in real-world settings.
Neural Backward Reach-Avoid Tubes with MPC Supervision for High-Dimensional Systems: An Application to Safe Spacecraft Docking
Autonomous spacecraft docking requires control policies that simultaneously ensure collision avoidance and target reachability under coupled, high-dimensional translational-rotational dynamics. Hamilton-Jacobi (HJ) reachability provides formal reach-avoid guarantees, but classical solvers are limited to low-dimensional systems. Learning-based approaches have begun to scale HJ analysis, yet they struggle in reach-avoid settings, especially where goal and failure sets are tightly coupled, as in docking. We propose a learning-based Backward Reach-Avoid Tube (BRAT) framework that addresses this challenge by tightly integrating HJ structure with MPC-based supervision. In the offline phase, we train a neural approximation of the HJ value function using PDE-based losses augmented with curriculum-driven MPC supervision, which provides informative value targets and stabilizes training in regions where purely PDE-based methods fail. In the online phase, the learned value function is deployed through two real-time controllers: (i) a value gradient-driven controller, and (ii) a value-function-augmented terminal MPC that explicitly enforces reachability at the horizon. We evaluate the proposed method on a 6D planar docking problem against grid-based ground truth and then scale to the full 13D system. Across both settings, our approach outperforms existing methods in success rate and computational efficiency.
comment: 8 pages, 7 Figures
Optimized and kinematically feasible multi-agent motion planning
Multi-agent motion planning (MAMP) is an important problem for autonomous systems with multiple agents. In this work we propose a two-step method for finding optimized and kinematically feasible solutions to MAMP problems. The first step finds an initial feasible solution using state-of-the-art methods such as conflict-based search (CBS) or priority-based search (PBS), and the second step is an improvement step which improves the solution by solving a multi-phase optimal control problem (OCP) where the initial solution is used to warm-start the solver. We also propose a method for generating motion primitives in an optimized way under the constraint that the primitive durations are all multiples of the same sample time. We evaluate our proposed framework on a MAMP problem for tractor-trailer systems. We extend the safe interval path planning with interval projections (SIPP-IP) algorithm so it can handle more general cost functions and larger agents, but our results show that for the tractor-trailer system a simple lattice-based planner performs better due to less conservative collision checks. Our experiments also indicate that CBS performs better than PBS for this system as it achieves a higher success rate in environments with obstacles and had a lower average runtime, although both planners achieve solutions of similar quality after the improvement step.
Stability of Control Lyapunov Function Guided Reinforcement Learning
Reinforcement learning (RL) has become the de facto method for achieving locomotion on humanoid robots in practice, yet stability analysis of the corresponding control policies is lacking. Recent work has attempted to merge control theoretic ideas with reinforcement learning through control guided learning. A notable example of this is the use of a control Lyapunov function (CLF) to synthesize the reinforcement learning rewards, a technique known as CLF-RL, which has shown practical success. This paper investigates the stability properties of optimal controllers using CLF-RL with the goal of bridging experimentally observed stability with theoretical guarantees. The RL problem is viewed as an optimal control problem and exponential stability is proven in both continuous and discrete time using both core CLF reward terms and the additional terms used in practice. The theoretical bounds are numerically verified on systems such as the double integrator and cart-pole. Finally, the CLF guided rewards are implemented for a walking humanoid robot to generate stable periodic orbits.
comment: This work has been submitted to the IEEE for possible publication
Sonar-GPS Fusion for Seabed Mapping in Turbid Shallow Waters with an Autonomous Surface Vehicle ICRA 2026
Accurate seabed mapping is essential for habitat monitoring and infrastructure inspection. In turbid, shallow coastal waters, such as shellfish aquaculture farms, the effectiveness of traditional optical methods is limited. Autonomous surface vehicles (ASVs) equipped with forward-looking sonar (FLS) offer a promising alternative. However, existing sonar-based systems face challenges in achieving fine resolution mapping over long trajectories due to low-resolution positioning measurements and accumulated drift over long trajectories. In this paper, we present a drift-resilient seabed mapping framework that integrates local FLS frame alignment using the Fourier-Mellin transform (FMT) with global trajectory optimization based on an extended Kalman filter (EKF) that fuses global positioning system (GPS), inertial measurement unit (IMU), and compass data. A variance-based image blending strategy is used to further reduce visual artifacts in overlapping regions. Field trials on a structured oyster farm site show that our framework helps reduce drift in RMSE by 9.5% relative to the FMT-only baseline. This framework also enables sub-meter reconstruction accuracy and preservation of high-resolution textures needed for oyster inventory estimation within the mapped areas.
comment: Accepted to the 2026 IEEE International Conference on Robotics and Automation (ICRA 2026)
Phone2Act: A Low-Cost, Hardware-Agnostic Teleoperation System for Scalable VLA Data Collection
Collecting diverse, high-quality manipulation data for Vision-Language-Action (VLA) model training remains prohibitively expensive for many research groups, as existing teleoperation frameworks rely on specialized hardware or are tightly coupled to specific robot platforms. We present Phone2Act, a low-cost, hardware-agnostic teleoperation framework that transforms a commodity smartphone into a 6-DoF robot controller via Google ARCore. Built on a modular ROS 2 architecture, Phone2Act decouples control logic from hardware specifics through interchangeable bridge nodes, supporting platforms from industrial cobots to low-cost bimanual arms without code modification. A Universal Recorder synchronizes multi-camera RGB streams with robot state feedback and exports demonstrations natively in the LeRobot dataset format, eliminating post-processing and enabling immediate VLA fine-tuning. We validate the framework by fine-tuning GR00T-N1.5 on 130 collected episodes, achieving a 90% success rate on a real-world multi-stage pick-and-place task deployed on a physical Dobot CR5.
comment: 6 pages, 5 figures
Training Non-Differentiable Networks via Optimal Transport
Neural networks increasingly embed non-differentiable components (spiking neurons, quantized layers, discrete routing, blackbox simulators, etc.) where backpropagation is inapplicable and surrogate gradients introduce bias. We present PolyStep, a gradient-free optimizer that updates parameters using only forward passes. Each step evaluates the loss at structured polytope vertices in a compressed subspace, computes softmax-weighted assignments over the resulting cost matrix, and displaces particles toward low-cost vertices via barycentric projection. This update corresponds to the one-sided limit of a regularized optimal-transport problem, inheriting its geometric structure without Sinkhorn iterations. PolyStep trains genuinely non-differentiable models where existing gradient-free methods collapse to near-random accuracy. On hard-LIF spiking networks we reach 93.4% test accuracy, outperforming all gradient-free baselines by over 60~pp and closing to within 4.4~pp of a surrogate-gradient Adam ceiling. Across four additional non-differentiable architectures (int8 quantization, argmax attention, staircase activations, hard MoE routing) we lead every gradient-free competitor. On MAX-SAT scaling from 100 to 1M variables, we sustain above 92% clause satisfaction while evolution strategies drop 8--12~pp. On RL policy search, we match OpenAI-ES on classical control and retain performance under integer and binary quantization that collapses gradient-based methods. We prove convergence to conservative-stationary points at rate $O(\log T/\sqrt{T})$ on piecewise-smooth losses, upgraded to Clarke-stationary on the headline architectures and extended to the piecewise-constant regime via a hitting-time bound. These rates match the known zeroth-order query-complexity lower bounds that all forward-only methods inherit. Code is available at https://github.com/anindex/polystep.
comment: 52 pages, 20 tables, 9 figures, submitted to Transactions on Machine Learning Research
Optimizing Trajectory-Trees in Belief Space: An Application from Model Predictive Control to Task and Motion Planning
This paper explores the benefits of computing arborescent trajectories (trajectory-trees) instead of commonly used sequential trajectories for partially observable robotic planning problems. In such environments, a robot infers knowledge from observations, and the optimal course of action depends on these observations. \revise{Trajectory-trees, optimized in belief space, naturally capture this dependency by branching where the belief state is expected to evolve into multiple distinct scenarios, such as upon receiving an observation. Unlike sequential trajectories, which model a single forward evolution of the system, trajectory-trees capture multiple possible contingencies.} First, we focus on Model Predictive Control (MPC) and demonstrate the benefits of planning tree-like trajectories. We formulate the control problem as the optimization of a tree with a single branching (PO-MPC). This improves performance by reducing control costs through more informed planning. To satisfy the real-time constraints of MPC, we develop an optimization algorithm called Distributed Augmented Lagrangian (D-AuLa), which leverages the decomposability of the PO-MPC formulation to parallelize and accelerate the optimization. We apply the method to both linear and non-linear MPC problems using autonomous driving examples. Second, we address Task And Motion Planning (TAMP), and introduce a planner (PO-LGP) reasoning on decision trees at task level, and trajectory-trees at motion-planning level. This approach builds upon the Logic-Geometric-Programming Framework (LGP) and extends it to partially observable problems. The experiments show the method's applicability to problems with a small belief state size, and scales to larger problems by optimizing explorative policies, which are used as macro-actions in an overarching task plan.
comment: 41 pages
Hybrid Visual Telemetry for Bandwidth-Constrained Robotic Vision: A Pilot Study with HEVC Base Video and JPEG ROI Stills
Bandwidth-constrained robotic and surveillance systems often rely on a single compressed video stream to support both continuous scene awareness and downstream machine perception. In practice, this creates a mismatch: low-bitrate video can preserve motion and coarse context, but often loses the fine local detail needed for reliable object recognition and decision-making. Motivated by a hybrid architecture in which low-resolution video supports dynamic scene understanding while eventdriven high-detail regions of interest (ROIs) support close-up identification and analytics, this paper formalizes a two-channel visual telemetry scheme in which a continuous low-bitrate video stream is augmented by selectively transmitted high-detail still ROIs. This first paper does not attempt to prove the superiority of a new still-image codec. Instead, it establishes the hybrid transmission paradigm itself using a practical and reproducible codec stack: x265/HEVC for the base video stream and JPEG stills for ROI refinement. We formulate the problem as bitrate-constrained information selection for robotic vision and define an experimental protocol in which video-only and hybrid schemes are compared under matched total communication budgets. The study is designed around UAV-oriented datasets, two practical bitrate regimes, several ROI triggering policies, and object-level classification refinement on selectively transmitted ROI stills. The resulting paper lays the methodological foundation for a second-stage investigation of JPEG AI as the semantic still-image channel within the same hybrid architecture.
comment: 7 pages, 2 figures, 4 tables
Zero-Shot, Safe and Time-Efficient UAV Navigation via Potential-Based Reward Shaping, Control Lyapunov and Barrier Functions
Autonomous navigation and obstacle avoidance remain a core challenge of modern Unmanned Aerial Vehicles (UAVs). While traditional control methods struggle with the complexity and variability of the environment, reinforcement learning (RL) enables UAVs to learn adaptive behaviors through interaction with the environment. Existing research with RL prioritizes the mission success at the expense of mission time and safety of UAVs. This study integrates Potential Based Reward Shaping (PBRS) with Control Lyapunov Functions (CLF) and Control Barrier Functions (CBF) to simultaneously optimize mission time and ensure formal safety guarantees. An RL model is trained in a generalized simple environment, then used in complex scenarios incorporating a CLF-CBF-QP filter without further training. Experimental results in simulated environments demonstrate a significant reduction in mission time and outstanding performance in complex environment.
On the Characterization and Limits of 4D Radar for Aided Inertial Navigation
Frequency Modulated Continuous Wave (FMCW) radar is a promising sensor for aided inertial navigation, due to its robustness in environments that challenge traditional alternatives, such as LiDAR and vision. However, its widespread adoption is hindered by complex, noisy measurements, which make reliable estimation difficult. This manuscript addresses these challenges by analyzing the fundamental measurement relations of FMCW radar sensing and developing a reliable estimator. Noise models are derived by applying first principles to the underlying signal processing of a typical radar sensor. These models guide the design of a factor graph-based estimator, utilizing a first-order approximation for the measurement noise propagation. The approach is first examined through simulation, evaluating the significance of different noise sources, the validity of the first-order approximation, and the state-dependent nature of the covariance expressions. Extensive experiments demonstrate the superior robustness and accuracy of the proposed method across diverse field environments and flight profiles, including beyond the radar's standard operating range. Furthermore, the experiments confirm the insights from the simulation regarding the behavior and performance of different estimator configurations relative to their operating conditions. The evaluation data and estimator implementation are made available at https://github.com/ntnu-arl/rig.
comment: 30 pages, 14 figures, and 20 tables. Accepted for publication to IEEE Transactions on Field Robotics
Anticipation-VLA: Solving Long-Horizon Embodied Tasks via Anticipation-based Subgoal Generation
Vision-Language-Action (VLA) models have emerged as a powerful paradigm for embodied intelligence, enabling robots to perform tasks based on natural language instructions and current visual input. However, existing VLA models struggle with long-horizon tasks due to compounding errors. Prior methods decompose tasks into subtasks of fixed granularity, which cannot adapt to the varying complexity of execution states, limiting their robustness in long-horizon tasks. To overcome this, we introduce Anticipation Model, which adaptively and recursively generates future subgoals. This model continuously adapts as the task unfolds, adjusting future subgoals in response to evolving dynamics, facilitating more reliable planning paths. Building on this concept, we propose Anticipation-VLA, a hierarchical VLA model that leverages the anticipation model to generate actionable subgoals that guide VLA policy execution. We implement Anticipation-VLA with finetuning a Unified Multimodal Model (UMM) for high-level subgoal generation and a goal-conditioned VLA policy for low-level action execution. Experiments in both simulated and real-world robotic tasks demonstrate the effectiveness of Anticipation-VLA, highlighting the importance of adaptive and recursive subgoal generation for robust policy execution.
Lateral String Stability for Vehicle Platoons: Formulation, Definition, and Analysis
Platooning of connected and automated vehicles provides significant benefits in terms of energy efficiency, traffic throughput, and, most critically, safety. These safety benefits depend on string stability, which dictates how disturbances propagate along a vehicle string. Although longitudinal string stability has been extensively examined, lateral string stability, which governs the propagation of path-tracking errors that can lead to unsafe deviations from the desired path, remains underexplored. Its importance is growing as autonomous vehicles increasingly depend on onboard sensing and map-free navigation, where sensor occlusions and tight formations amplify safety risks. This paper presents a framework for lateral string stability that focuses directly on safety-critical, path-relative tracking errors and enables consistent comparison across vehicles that follow the same planned path. The key element of the framework is an arc-length (Eulerian) viewpoint, a departure from traditional analyses, that clarifies how tracking errors at a given point on the path propagate from one vehicle to the next. Building on this foundation, we propose the definition of L2 lateral string stability along with two control strategies: a feedback-feedforward strategy that relies solely on onboard sensing, and a novel learn-from-predecessor strategy that makes use of vehicle-to-vehicle communication. Both strategies are analyzed for lateral string stability with respect to two error measures: tracking error vector and lateral (cross-track) error. Our results show that onboard sensing alone cannot guarantee attenuation of path-tracking errors, imposing a fundamental safety limitation, while V2V communication enables true error attenuation. The analysis further identifies structural controller requirements, showing that nonzero feedback on specific measurements is essential for guaranteeing stability.
IMPACT-HOI: Supervisory Control for Onset-Anchored Partial HOI Event Construction
We present IMPACT-HOI, a mixed-initiative framework for annotating egocentric procedural video by constructing structured event graphs for Human-Object Interactions (HOI), motivated by the need for high-quality structured supervision for learning robot manipulation from human demonstration. IMPACT-HOI frames this task as the incremental resolution of a partially specified, onset-anchored event state. A trust-calibrated controller selects among direct queries, human-confirmed suggestions, and conservative completions based on empirical annotator behavior and evidence quality. A risk-bounded execution protocol, utilizing atomic rollback, ensures that human-confirmed decisions are preserved against conflicting automated updates. A user study with 9 participants shows a 13.5% reduction in manual annotation actions, a 46.67% event match rate, and zero confirmed-field violations under the studied protocol. The code will be made publicly available at https://github.com/541741106/IMPACT_HOI.
comment: 8 pages, 2 figures. Code is available at https://github.com/541741106/IMPACT_HOI
Towards Efficient and Expressive Offline RL via Flow-Anchored Noise-conditioned Q-Learning ICML 2026
We propose Flow-Anchored Noise-conditioned Q-Learning (FAN), a highly efficient and high-performing offline reinforcement learning (RL) algorithm. Recent work has shown that expressive flow policies and distributional critics improve offline RL performance, but at a high computational cost. Specifically, flow policies require iterative sampling to produce a single action, and distributional critics require computation over multiple samples (e.g., quantiles) to estimate value. To address these inefficiencies while maintaining high performance, we introduce FAN. Our method employs a behavior regularization technique that utilizes only a single flow policy iteration and requires only a single Gaussian noise sample for distributional critics. Our theoretical analysis of convergence and performance bounds demonstrates that these simplifications not only improve efficiency but also lead to superior task performance. Experiments on robotic manipulation and locomotion tasks demonstrate that FAN achieves state-of-the-art performance while significantly reducing both training and inference runtimes. We release our code at https://github.com/brianlsy98/FAN.
comment: ICML 2026
HandelBot: Real-World Piano Playing via Fast Adaptation of Dexterous Robot Policies
Mastering dexterous manipulation with multi-fingered hands has been a grand challenge in robotics for decades. Despite its potential, the difficulty of collecting high-quality data remains a primary bottleneck for high-precision tasks. While reinforcement learning and simulation-to-real-world transfer offer a promising alternative, the transferred policies often fail for tasks demanding millimeter-scale precision, such as bimanual piano playing. In this work, we introduce HandelBot, a framework that combines a simulation policy and rapid adaptation through a two-stage pipeline. Starting from a simulation-trained policy, we first apply a structured refinement stage to correct spatial alignments by adjusting lateral finger joints based on physical rollouts. Next, we use residual reinforcement learning to autonomously learn fine-grained corrective actions. Through extensive hardware experiments across five recognized songs, we demonstrate that HandelBot can successfully perform precise bimanual piano playing. Our system outperforms direct simulation deployment by a factor of 1.8x and requires only 30 minutes of physical interaction data.
comment: Website: https://amberxie88.github.io/handelbot
Edge Case Detection in Automated Driving: Methods, Challenges and Future Directions
Automated vehicles promise to enhance transportation safety and efficiency. However, ensuring their reliability in real-world conditions remains challenging, particularly due to rare and unexpected situations known as edge cases. While numerous approaches exist for detecting edge cases, a comprehensive survey reviewing these techniques is lacking. This paper bridges this gap by presenting a hierarchical review and systematic classification of edge case detection and assessment methodologies. Our classification is structured on two levels: first, by AV modules, including perception and trajectory-related (encompassing prediction, planning, and control) sub-systems; and second, by underlying methodologies and theories guiding these techniques. Furthermore, we introduce "knowledge-driven" approaches, which complement data-driven methods by leveraging expert insights and domain knowledge to identify cases absent in training datasets. We then examine techniques and metrics for evaluating edge case detection methods, including detection performance (e.g., precision, recall, false positive rates), practical deployment (e.g., computational overhead, detection delay), and domain-specific measures (e.g., crash rates, severity analysis). We conclude by highlighting key challenges for edge case detection, including data availability and quality issues, validation and interpretability limitations, the sim2real gap, and computational constraints. The hierarchical classification and review of methods and assessment techniques in this survey enable modular and targeted testing frameworks by guiding the selection of detection methods for specific AV subsystems while considering methodological principles. It also supports practical testing by facilitating scenario generation in simulation and focused subsystem validation in the real world.
comment: Preprint submitted to IEEE Transactions on Intelligent Transportation Systems
MorphIt: Flexible Spherical Approximation of Robot Morphology for Representation-driven Adaptation
What if a robot could rethink its own morphological representation to better meet the demands of diverse tasks? Most robotic systems today treat their physical form as a fixed constraint rather than an adaptive resource, forcing the same rigid geometric representation to serve applications with vastly different computational and precision requirements. We introduce MorphIt, a novel spherical approximation framework that treats morphological representation as a tunable resource. MorphIt enables task-driven morphological adaptation through gradient-based optimization with tunable parameters that provide explicit control over the accuracy-efficiency tradeoff. Unlike existing approaches that rely on either labor-intensive manual specification or inflexible computational methods optimized for visualization rather than robotics, MorphIt generates spherical approximations up to 100x faster while maintaining superior geometric fidelity. Quantitative evaluations demonstrate that MorphIt outperforms baseline approaches (Variational Sphere Set Approximation and Adaptive Medial-Axis Approximation), achieving better mesh approximation with fewer spheres. Through seamless integration with existing robotics infrastructure, MorphIt enables enhanced capabilities in collision detection accuracy, contact-rich interaction simulation, and navigation through confined spaces. By dynamically adapting geometric representations to task requirements, robots can now exploit their physical embodiment as an active resource rather than an inflexible parameter, opening new frontiers for manipulation in environments where physical form must continuously balance precision with computational tractability.
Using large language models for embodied planning introduces systematic safety risks
Large language models are increasingly used as planners for robotic systems, yet how safely they plan remains an open question. To evaluate safe planning systematically, we introduce DESPITE, a benchmark of 12,279 tasks spanning physical and normative dangers with fully deterministic validation. Across 23 models, even near-perfect planning ability does not ensure safety: the best-planning model fails to produce a valid plan on only 0.4% of tasks but produces dangerous plans on 28.3%. Among 18 open-source models from 3B to 671B parameters, planning ability improves substantially with scale (0.4-99.3%) while safety awareness remains relatively flat (38-57%). We identify a multiplicative relationship between these two capacities, showing that larger models complete more tasks safely primarily through improved planning, not through better danger avoidance. Three proprietary reasoning models reach notably higher safety awareness (71-81%), while non-reasoning proprietary models and open-source reasoning models remain below 57%. As planning ability approaches saturation for frontier models, improving safety awareness becomes a central challenge for deploying language-model planners in robotic systems.
comment: Project page: https://despite-safety.github.io/
False Feasibility in Variable Impedance MPC for Legged Locomotion
Variable impedance model predictive control (MPC) formulations that treat joint stiffness as an instantaneous decision variable operate on a feasible set strictly larger than the physically realizable set under first-order actuator dynamics. We identify this as a formulation error rather than a modeling approximation, formalize the distinction between the parameter-based feasible set Fparam and the realizable set Freal, and characterize the regime of mismatch via the dimensionless parameter alpha = omega_sT (actuator bandwidth times task timescale). For the 1D hopping monoped, we prove that below an analytical threshold alpha_crit derived in closed form from task physics, no admissible stiffness command realizes the parameter-based prediction. Numerical validation in 1D shows monotonic deviation growth as alpha decreases, with the predicted scaling holding across ten parameter combinations (log-log R2 = 0.99). Mechanism transfer to planar spring-loaded inverted pendulum dynamics confirms center-of-mass and stance-timing deviation as the primary consequence, with regime-dependent friction effects as a tertiary observable. A second threshold alpha_infeas < alpha_crit establishes a floor below which restricting the admissible stiffness range cannot repair realizability, closing the conservative-tuning objection on structural grounds. Augmenting the prediction state with stiffness closes the mismatch by construction.
comment: Paper withdrawn to make some revisions in the discussion and experiments sections
Attention-Based Neural-Augmented Kalman Filter for Legged Robot State Estimation
In this letter, we propose an Attention-Based Neural-Augmented Kalman Filter (AttenNKF) for state estimation in legged robots. Foot slip is a major source of estimation error: when slip occurs, kinematic measurements violate the no-slip assumption and inject bias during the update step. Our objective is to estimate this slip-induced error and compensate for it. To this end, we augment an Invariant Extended Kalman Filter (InEKF) with a neural compensator that uses an attention mechanism to infer error conditioned on foot-slip severity and then applies this estimate as a post-update compensation to the InEKF state (i.e., after the filter update). The compensator is trained in a latent space, which aims to reduce sensitivity to raw input scales and encourages structured slip-conditioned compensations, while preserving the InEKF recursion. Experiments demonstrate improved performance compared to existing legged-robot state estimators, particularly under slip-prone conditions.
comment: 8 pages, 6 figures, Published in IEEE Robotics and Automation Letters (RA-L)
DynFlowDrive: Flow-Based Dynamic World Modeling for Autonomous Driving
Recently, world models have been incorporated into the autonomous driving systems to improve the planning reliability. Existing approaches typically predict future states through appearance generation or deterministic regression, which limits their ability to capture trajectory-conditioned scene evolution and leads to unreliable action planning. To address this, we propose DynFlowDrive, a latent world model that leverages flow-based dynamics to model the transition of world states under different driving actions. By adopting the rectifiedflow formulation, the model learns a velocity field that describes how the scene state changes under different driving actions, enabling progressive prediction of future latent states. Building upon this, we further introduce a stability-aware multi-mode trajectory selection strategy that evaluates candidate trajectories according to the stability of the induced scene transitions. Extensive experiments on the nuScenes and NavSim benchmarks demonstrate consistent improvements across diverse driving frameworks without introducing additional inference overhead. Source code will be abaliable at https://github.com/xiaolul2/DynFlowDrive.
comment: 18 pages, 6 figs
Semantic-Contact Fields for Category-Level Generalizable Tactile Tool Manipulation
Generalizing tool manipulation requires both semantic planning and precise physical control. Modern generalist robot policies, such as Vision-Language-Action (VLA) models, often lack the physical grounding required for contact-rich tool manipulation. Conversely, existing contact-aware policies that leverage tactile or haptic sensing are typically instance-specific and fail to generalize across diverse tool geometries. Bridging this gap requires learning representations that are both semantically transferable and physically grounded, yet a fundamental barrier remains: diverse real-world tactile data are prohibitive to collect at scale, while direct zero-shot sim-to-real transfer is challenging due to the complex nonlinear deformation of soft tactile sensors. To address this, we propose Semantic-Contact Fields (SCFields), a unified 3D representation that fuses visual semantics with dense extrinsic contact estimates, including contact probability and force. SCFields is learned through a two-stage Sim-to-Real Contact Learning Pipeline: we first pre-train on large-scale simulation to learn geometry-aware contact priors, then fine-tune on a small set of real data pseudo-labeled via geometric heuristics and force optimization to align real tactile signals. The resulting force-aware representation serves as the dense observation input to a diffusion policy, enabling physical generalization to unseen tool instances. Experiments on scraping, crayon drawing, and peeling demonstrate robust category-level generalization, significantly outperforming vision-only and raw-tactile baselines. Project page: https://kevinskwk.github.io/SCFields/.
comment: Accepted to RSS 2026
CycleRL: Sim-to-Real Deep Reinforcement Learning for Robust Autonomous Bicycle Control
Autonomous bicycles offer a promising agile solution for urban mobility and last-mile logistics. However, conventional control strategies often struggle with underactuated nonlinear dynamics, suffering from sensitivity to model mismatches and limited adaptability to real-world uncertainties. To address this, we develop CycleRL, a comprehensive sim-to-real framework for robust autonomous bicycle control. Our approach establishes a direct perception-to-action mapping within the high-fidelity NVIDIA Isaac Sim environment, leveraging Proximal Policy Optimization (PPO) to optimize the control policy. The framework features a composite reward function tailored for concurrent balance maintenance, velocity tracking, and steering control. Crucially, systematic domain randomization is employed to reduce the reliance on precise system modeling, bridge the simulation-to-reality gap and facilitate direct transfer. In simulation, CycleRL achieves promising performance, including a 99.90% balance success rate, a heading tracking error of 1.15°, and a velocity tracking error of 0.18 m/s. These quantitative results, coupled with successful hardware deployment, validate DRL as an effective paradigm for autonomous bicycle control, offering superior adaptability over traditional methods. Video demonstrations are available at https://anony6f05.github.io/CycleRL/.
comment: 8 pages, 7 figures, 8 tables
Application Research of a Deep Learning Model Integrating CycleGAN and YOLO in PCB Infrared Defect Detection
This paper addresses the critical bottleneck of infrared (IR) data scarcity in Printed Circuit Board (PCB) defect detection by proposing a cross-modal data augmentation framework integrating CycleGAN and YOLOv8. Unlike conventional methods relying on paired supervision, we leverage CycleGAN to perform unpaired image-to-image translation, mapping abundant visible-light PCB images into the infrared domain. This generative process synthesizes high-fidelity pseudo-IR samples that preserve the structural semantics of defects while accurately simulating thermal distribution patterns. Subsequently, we construct a heterogeneous training strategy that fuses generated pseudo-IR data with limited real IR samples to train a lightweight YOLOv8 detector. Experimental results demonstrate that this method effectively enhances feature learning under low-data conditions. The augmented detector significantly outperforms models trained on limited real data alone and approaches the performance benchmarks of fully supervised training, proving the efficacy of pseudo-IR synthesis as a robust augmentation strategy for industrial inspection.
comment: Authors have conflict of interest
A Unified Multi-Dynamics Framework for Perception-Oriented Modeling in Tendon-Driven Continuum Robots
Tendon-driven continuum robots offer intrinsically safe and contact-rich interactions owing to their kinematic redundancy and structural compliance. However, their perception often depends on external sensors, which increase hardware complexity and limit scalability. This work introduces a unified multi-dynamics modeling framework for tendon-driven continuum robotic systems, exemplified by a spiral-inspired robot named Spirob. The framework integrates motor electrical dynamics, motor-winch dynamics, and continuum robot dynamics into a coherent system model. Within this framework, motor signals such as current and angular displacement are modeled to expose the electromechanical signatures of external interactions, enabling perception grounded in intrinsic dynamics. The model captures and validates key physical behaviors of the real system, including actuation hysteresis and self-contact at motion limits. Building on this foundation, the framework is applied to environmental interaction: first for passive contact detection, verified experimentally against simulation data; then for active contact sensing, where control and perception strategies from simulation are successfully applied to the real robot; and finally for object size estimation, where a policy learned in simulation is directly deployed on hardware. The results demonstrate that the proposed framework provides a physically grounded way to interpret interaction signatures from intrinsic motor signals in tendon-driven continuum robots.
PuppetAI: A Customizable Platform for Designing Tactile-Rich Affective Robot Interaction
We introduce PuppetAI, a modular soft robot interaction platform. This platform offers a scalable cable-driven actuation system and a customizable, puppet-inspired robot gesture framework, supporting a multitude of interaction gesture robot design formats. The platform comprises a four-layer decoupled software architecture that includes perceptual processing, affective modeling, motion scheduling, and low-level actuation. We also implemented an affective expression loop that connects human input to the robot platform by producing real-time emotional gestural responses to human vocal input. For our own designs, we have worked with nuanced gestures enacted by "soft robots" with enhanced dexterity and "pleasant-to-touch" plush exteriors. By reducing operational complexity and production costs while enhancing customizability, our work creates an adaptable and accessible foundation for future tactile-based expressive robot research. Our goal is to provide a platform that allows researchers to independently construct or refine highly specific gestures and movements performed by social robots.
PhysMem: Scaling Test-Time Memory for Embodied Physical Reasoning
Reliable object manipulation requires understanding physical properties that vary across objects and environments. Vision-language model (VLM) planners can reason about friction and stability in general terms; however, they often cannot predict how a specific ball will roll on a particular surface or which stone will provide a stable foundation without direct experience. We present PhysMem, a memory framework that enables VLM robot planners to learn physical principles from interaction at test time, without updating model parameters. The system records experiences, generates candidate hypotheses, and verifies them through targeted interaction before promoting validated knowledge to guide future decisions. A central design choice is verification before application: the system tests hypotheses against new observations rather than applying retrieved experience directly, reducing rigid reliance on prior experience when physical conditions change. We evaluate PhysMem on three real-world manipulation tasks and simulation benchmarks across four VLM backbones. On a controlled brick insertion task, principled abstraction achieves 76% success compared to 23% for direct experience retrieval, and real-world experiments show consistent improvement over 30-minute deployment sessions.
SBAMP: Sampling Based Adaptive Motion Planning
Autonomous robots operating in dynamic environments must balance global path optimality with real-time responsiveness to disturbances. This requires addressing a fundamental trade-off between computationally expensive global planning and fast local adaptation. Sampling-based planners such as RRT* produce near-optimal paths but struggle under perturbations, while dynamical systems approaches like SEDS enable smooth reactive behavior but rely on offline data-driven optimization. We introduce Sampling-Based Adaptive Motion Planning (SBAMP), a hybrid framework that combines RRT*-based global planning with an online, Lyapunov-stable SEDS-inspired controller that requires no pre-trained data. By integrating lightweight constrained optimization into the control loop, SBAMP enables stable, real-time adaptation while preserving global path structure. Experiments in simulation and on RoboRacer hardware demonstrate robust recovery from disturbances, reliable obstacle handling, and consistent performance under dynamic conditions.
Large Language Models for Multi-Robot Systems: A Survey
The rapid advancement of Large Language Models (LLMs) has opened new possibilities in Multi-Robot Systems (MRS), enabling enhanced communication, task allocation and planning, and human-robot interaction. Unlike traditional single-robot and multi-agent systems, MRS poses unique challenges, including coordination, scalability, and real-world adaptability. This survey provides the first dedicated review of LLM integration into MRS. It systematically categorizes their applications across high-level task allocation, mid-level motion planning, low-level action generation, and human intervention. We highlight key applications in diverse domains, such as household robotics, construction, formation control, target tracking, and robot games, showcasing the versatility and transformative potential of LLMs in MRS. Furthermore, we examine the challenges that limit adapting LLMs to MRS, including mathematical reasoning limitations, hallucination, latency issues, and the need for robust benchmarking systems. Finally, we outline opportunities for future research, emphasizing advancements in fine-tuning, reasoning techniques, and task-specific models. This survey aims to guide researchers in the intelligence and real-world deployment of MRS powered by LLMs. Given the rapidly evolving nature of research in the field, we continuously update the paper list in the open-source GitHub repository.
Multiagent Systems
Coopetition-Gym v1: A Formally Grounded Platform for Mixed-Motive Multi-Agent Reinforcement Learning under Strategic Coopetition
We present Coopetition-Gym v1, a benchmark platform for mixed-motive multi-agent reinforcement learning under strategic coopetition. The platform comprises twenty environments organized into four mechanism classes that correspond to four foundational technical reports: interdependence and complementarity (arXiv:2510.18802), trust and reputation dynamics (arXiv:2510.24909), collective action and loyalty (arXiv:2601.16237), and sequential interaction and reciprocity (arXiv:2604.01240). Each environment carries a closed-form payoff structure and a calibrated interdependence matrix derived from the corresponding report. Every environment exposes a parameterized reward layer configurable across three structurally distinct modes (private, integrated, cooperative). This separation of payoff from reward enables reward-type ablation, the platform's principal methodological apparatus. Four of the twenty environments are calibrated against historically documented coopetitive relationships and reproduce their outcomes at 98.3, 81.7, 86.7, and 87.3 percent on the validation rubric (Samsung-Sony LCD, Renault-Nissan Alliance, Apache HTTP Server, Apple iOS App Store). The platform exposes Gymnasium, PettingZoo Parallel, and PettingZoo AEC interfaces and ships 126 reference algorithms: 16 learning algorithms, 7 game-theoretic oracles, 2 heuristic baselines, and 101 constant-action policies. A reference experimental study trained the 16 learning algorithms on every environment under every reward configuration with seven random seeds, producing a 25,708-run training corpus and a 1,116-run behavioral audit corpus, both released under CC-BY-4.0 with Croissant 1.0 metadata. Coopetition-Gym v1 is the first platform to combine continuous-action mixed-motive environments, parameterized reward mutuality, calibrated interdependence coefficients, game-theoretic oracle baselines, and validated case studies.
comment: 82 pages, 14 figures, 9 tables, 51 references. AI-track technical report companion to the four-paper foundational series; should be read with arXiv:2510.18802, arXiv:2510.24909, arXiv:2601.16237, and arXiv:2604.01240. Reproducibility package and source code: https://github.com/vikpant/strategic-coopetition. Datasets released under CC-BY-4.0 at https://huggingface.co/vikpant
Moira: Language-driven Hierarchical Reinforcement Learning for Pair Trading
Many sequential decision-making problems exhibit hierarchical structure, where high-level semantic choices constrain downstream actions and feedback is delayed and ambiguous. Learning in such settings is challenging due to credit assignment: performance degradation may arise from flawed abstractions, suboptimal execution, or their interaction. We study this challenge through pair trading, a domain that naturally combines long-horizon semantic reasoning for asset pair selection with short-horizon execution under partial observability. We formulate pair trading as a hierarchical reinforcement learning problem and propose a language-driven optimization framework in which both high-level and low-level policies are parameterized by large language models (LLMs) and optimized exclusively through prompt updates. Our approach leverages pretrained LLMs as hierarchical policies and uses trajectory- and episode-level textual feedback to adapt abstractions and execution without gradient-based fine-tuning. By explicitly separating abstraction selection from execution, the framework reduces non-stationarity across hierarchical levels and enables targeted adaptation under delayed feedback. Experiments on real-world market data show consistent improvements over traditional and LLM-based baselines, demonstrating the effectiveness of language-driven hierarchical reinforcement learning.
A Language for Describing Agentic LLM Contexts
Large language models are increasingly used within larger systems ("LLM agents"). These make a sequence of LLM calls, each call providing the LLM with a combination of instructions, observations, and interaction history. The design of the encoded information and its structure play a central role in the quality of the resulting system, leading to efforts spent on context engineering. It is therefore critical to communicate the composition of the LLM context in a system, and how it evolves over time. Yet, no standard exists for doing so: context construction is typically conveyed through informal prose, ad hoc diagrams, or direct inspection of code, none of which precisely capture how a prompt evolves across interaction steps or how two context representation strategies differ. To remedy this, we introduce the Agentic Context Description Language (ACDL), a language for specifying the structure and dynamics of LLM input contexts in a precise, readable, and standard manner, along with visualizations. ACDL provides constructs for specifying context aspects such as role message sequences, dynamic content, time-indexed references, and conditional or iterative structure, capturing the full architecture of a prompt independently of any particular implementation. ACDL diagrams can be hand drawn on a whiteboard, or written in formal language which can then be rendered. We describe the language, demonstrate it by documenting several existing systems and their variants, and encourage the community to adopt it for describing LLM systems context, both in day-to-day communication and in papers. Tooling, examples and documentation are available at www.acdlang.org.
comment: 18 pages, 12 figures. Accepted at CAIS '26. Project page: www.acdlang.org
Quality-Aware Exploration Budget Allocation for Cooperative Multi-Agent Reinforcement Learning
Cooperative multi-agent reinforcement learning (MARL) requires agents to discover joint strategies in a combinatorially large state-action space, yet effective coordination configurations are exceedingly rare. Intrinsic motivation, which augments task rewards with novelty bonuses, is a popular approach for driving exploration, but its effectiveness hinges on the exploration intensity $β$, where too large a value overwhelms the task signal and causes coordination collapse, while too small a value prevents discovery of rare strategies. We address two complementary challenges: adapting $β$ globally over training, and allocating the exploration budget across agents whose intrinsic reward signals vary in reliability. Our framework combines a return-conditioned sigmoid schedule (RCB) for global intensity control with a per-agent Reward Signal Quality (RSQ) metric that concentrates the exploration budget on agents with reliable signals. The core insight is that agents receiving noisy intrinsic rewards should explore less aggressively, and this allocation can be determined automatically from signal-to-noise statistics. Successor Distance (SD), a quasimetric intrinsic reward, naturally produces distinguishable per-agent signal quality, completing the framework with convergence and ordering preservation guarantees. On seven cooperative benchmarks (MPE, SMAX, MABrax), our method achieves top-tier returns across all environments.
comment: Submitted to Neurocomputing
MAGIC: Multi-Step Advantage-Gated Causal Influence for Multi-agent Reinforcement Learning
A key challenge in multi-agent reinforcement learning (MARL) lies in designing learning signals that effectively promote coordination among agents. Designing such signals necessitates the ability to quantify the true, long-term causal influence between agents. To address this, we introduce Multi-step Advantage-Gated Interventional Causal MARL (MAGIC), a framework that extracts multi-step causal influences between agents and selectively converts them into intrinsic rewards. MAGIC uses causal intervention with conditional mutual information to quantify long-horizon agent influence, and introduces an advantage-based gating mechanism to ensure exploration is directed toward beneficial, goal-aligned behaviors. Experiments across multiple standard MARL benchmarks and task families, including MPE and SMAC/SMACv2, demonstrate that MAGIC outperforms state-of-the-art methods by a significant margin, achieving an improvement of at least 10.1% in the main evaluation metric.
Koopman Representations for Early Outbreak Warning and Minimal Counterfactual Intervention in Multi-Agent Epidemic Simulations
This paper presents a Koopman-based framework for early outbreak detection and intervention selection in a multi-agent epidemic simulation. Agents exhibit mobility patterns, heterogeneous susceptibility, immunity-dependent viral load progression, and local transmission through co-location. The goal of the simulation is to study near-critical epidemic regimes in which small changes in exposure or timing can alter the final outcome. Aggregate daily observables from early trajectory windows are encoded into a low-dimensional Koopman latent space whose approximately linear evolution supports short-horizon forecasting and outbreak risk estimation. These representations are combined with a random forest classifier trained to predict whether the final attack rate exceeds a major outbreak threshold. Experiments near the system tipping points show strong early warning performance, with Koopman-derived features contributing to class separation. Counterfactual analysis further shows that minimal interventions, such as keeping a single selected agent at home for one day, can reduce attack rates and, often, shift the trajectory below the outbreak threshold.
comment: 37 pages, 12 figures
Talk is Cheap, Communication is Hard: Dynamic Grounding Failures and Repair in Multi-Agent Negotiation
Grounding is the collaborative process of establishing mutual belief sufficient for the current communicative purpose. While static grounding maps language to a shared, externally observable context, dynamic grounding is a joint activity where meaning is negotiated through interaction. Current multi-agent Large Language Model (LLM) benchmarks focus on static, one-shot tasks, overlooking the ability to repair grounding breakdowns across turns. We introduce an iterated, multi-turn negotiation game in which two agents allocate shared resources toward private projects with verifiable jointly optimal outcomes. While individual agents can identify Pareto-optimal allocations in isolation, agent dyads consistently fail to reach them across open- and closed-source models. Our investigation reveals four failure modes: (1) coordination degrades when shared interaction history is absent; (2) yet accumulated context can itself become a liability through stubborn anchoring, where initial proposals are treated as axiomatic rather than negotiable; (3) a reliance on perfunctory fairness (equal resource splits) over reward-maximizing coordination; and (4) failures in referential binding, where agents lose track of commitments across turns. These results highlight dynamic grounding as a critical and understudied axis of multi-agent coordination. Our framework decomposes the coordination gap into measurable components: the oracle baseline establishes that the gap is not attributable to individual reasoning limitations; the no-talk baseline establishes that communication is necessary; and a full-transparency intervention establishes that information exchange alone is insufficient: the bottleneck lies in the interactive processes of joint plan formation, commitment, and execution that constitute dynamic grounding.
Architectural Obsolescence of Unhardened Agentic-AI Runtimes
An agentic-AI runtime issues tool calls, sends messages, and actuates devices on behalf of an LLM. Catching the four ways an action can diverge from its audit record -- F1 gate-bypass, F2 audit-forgery, silent host failure, F4 wrong-target, -- is a load-bearing safety property of any such runtime. We show that upstream OpenClaw, the most engineered single-user agentic-AI gateway in public release, catches none of them: recall is 0.000 on every cell of every confusion matrix, on a 1600-sample template baseline through OpenClaw's actual production command-line interface (CLI) and on a ten-LLM cross-model generalisation run. Detecting F1--F4 requires seven specific runtime structures absent from OpenClaw's source tree: a biconditional checker, a hash-chained audit log, an extension admission gate, a two-layer egress guard, a Bell-LaPadula classification policy, a module-signing trust root, and a bootstrap seal. enclawed-oss -- an MIT-licensed drop-in fork that ships all seven -- reaches $P = R = F_1 =$ accuracy $= 1.000$ on the same input. The gap is structural, not parametric: a six-line append-only widening of enclawed-oss's data-loss-prevention (DLP) regex catalog raises per-channel F3 detection by 14.6\% net at unchanged precision; the same edit on OpenClaw has nowhere to land. The harness deliberately exercises real Discord and Telegram channels -- plugin categories the first enclawed release deleted as unsafe -- to show F1--F4 detection extends to those previously-unsafe extensions. With architectural superiority for security and feature parity for extensions, we argue that unhardened agentic-AI runtimes are architecturally obsolete: a strictly better alternative exists, is adoptable today, and the gap requires re-architecture rather than configuration. We invite reviewers to apply the harness to any candidate runtime.
NaviGNN: Multi-Agent Reinforcement Learning and Graph Neural Network for Sustainable Mobility in Futuristic Smart Cities
This paper investigates the feasibility of human mobility in extreme urban morphologies characterized by high-density vertical structures and linear city layouts. To assess whether agents can navigate efficiently within such unprecedented topologies, we develop a hybrid simulation framework integrating agent-based modeling, reinforcement learning (RL), supervised learning, and graph neural networks (GNNs). The simulation captures multi-modal transportation behaviors across multiple vertical levels and varying density scenarios, using both synthetic data and real-world traces from high-density cities. Experimental results show that the fully integrated AI architecture enables agents to achieve an average commute time of 7.8-8.4 minutes, a satisfaction rate exceeding 89\%, and a reachability index above 91\%, even during peak congestion periods. Ablation studies indicate that removing intelligent modules such as RL or GNNs significantly degrades performance, with commute times increasing by up to 85\% and reachability dropping below 70\%. Baseline comparisons against Dijkstra, A*, DQN, and standard GCN further confirm the superiority of the proposed model across all mobility and sustainability metrics. Environmental modeling demonstrates low energy consumption and minimal CO2 emissions when electric transportation modes are prioritized. These findings suggest that efficient and sustainable mobility in extreme urban environments is achievable, provided that adaptive AI systems, intelligent infrastructure, and real-time feedback mechanisms are effectively implemented.
Agents Trusting Agents? Restoring Lost Capabilities with Inclusive Healthcare
Agent-based simulations have an untapped potential to inform social policies on urgent human development challenges in a non-invasive way, before these are implemented in real-world populations. This paper responds to the request from non-profit and governmental organizations to evaluate policies under discussion to improve equity in health care services for people experiencing homelessness (PEH) in the city of Barcelona. With this goal, we integrate the conceptual framework of the capability approach (CA), which is explicitly designed to promote and assess human well-being, to model and evaluate the behaviour of agents who represent PEH and social workers. We define a reinforcement learning environment where agents aim to restore their central human capabilities, under existing environmental and legal constraints. We use Bayesian inverse reinforcement learning (IRL) to calibrate profile-dependent behavioural parameters in PEH agents, modeling the degree of trust and engagement with social workers, which is reportedly a key element for the success of the policies in scope. Our results open a path to mitigate health inequity by building relationships of trust between social service workers and PEH.
Systems and Control (EESS)
Trajectory Optimization of Morphing Aerial Vehicles Based on Mid-Fidelity Aeroservoelastic Models
Morphing aerial vehicles offer enhanced maneuverability and fuel efficiency compared to fixed-wing configurations. However, the trade-off between performance gains and control cost in dynamic, unsteady maneuvers remains under-explored. This paper addresses this by integrating a trajectory optimization framework with a mid-fidelity aeroservoelastic model, coupling nonlinear multi-body structural dynamics with an unsteady vortex lattice method. A physics-based control cost model captures the energy required to overcome instantaneous aerodynamic hinge moments. Applied to an aircraft with flexible, high-aspect-ratio wings and morphing winglets, the framework evaluates trim, maneuver performance, and lateral obstacle avoidance. Results show morphing wings significantly expand the flight envelope by decoupling lift and pitch requirements. In dynamic maneuvers, morphing yields distinct trade-offs: a pull-up maneuver increased altitude gain by 28.95% at a higher control cost, while a banked turn improved lateral displacement by 8.62% while reducing control cost by 13.40%. Notably, in obstacle avoidance, morphing reduced total control cost by 65.65%. This efficiency stems from exploiting aero-mechanical coupling via trajectory optimization to identify coordinated control strategies that offload aerodynamic loads. These findings underscore wing morphing's potential for achieving extreme maneuvers with superior energy efficiency.
comment: 30 pages, 13 figures, submitted to AIAA Journal of Aircraft
Observability Conditions and Filter Design for Visual Pose Estimation via Dual Quaternions
This paper presents a dual quaternion framework for 6-DOF visual target tracking that addresses key limitations of perspective-n-point (P$n$P) solvers: sensitivity to noise and outliers, and inability to propagate estimates through measurement dropouts. A nonlinear observability analysis is performed using a Lie algebraic approach, deriving sufficient conditions for local observability under two sensing modalities: relative position vector and unit vector measurements. For the unit vector case, the classical collinear feature point degeneracy of the perspective-three-point problem is recovered through rank analysis of the observability codistribution matrix, providing a control-theoretic interpretation of a previously geometric result. A dual quaternion Lie group unscented Kalman filter is then developed, directly modeling relative dynamics without assumptions about cooperative measurements or slowly-varying motion. Simulations demonstrate improved pose estimation accuracy and robustness to occlusions compared to an off-the-shelf P$n$P solver. Results are broadly applicable to visual-inertial navigation, simultaneous localization and mapping, and P$n$P solver development.
comment: 3 tables, 5 figures
Neural Backward Reach-Avoid Tubes with MPC Supervision for High-Dimensional Systems: An Application to Safe Spacecraft Docking
Autonomous spacecraft docking requires control policies that simultaneously ensure collision avoidance and target reachability under coupled, high-dimensional translational-rotational dynamics. Hamilton-Jacobi (HJ) reachability provides formal reach-avoid guarantees, but classical solvers are limited to low-dimensional systems. Learning-based approaches have begun to scale HJ analysis, yet they struggle in reach-avoid settings, especially where goal and failure sets are tightly coupled, as in docking. We propose a learning-based Backward Reach-Avoid Tube (BRAT) framework that addresses this challenge by tightly integrating HJ structure with MPC-based supervision. In the offline phase, we train a neural approximation of the HJ value function using PDE-based losses augmented with curriculum-driven MPC supervision, which provides informative value targets and stabilizes training in regions where purely PDE-based methods fail. In the online phase, the learned value function is deployed through two real-time controllers: (i) a value gradient-driven controller, and (ii) a value-function-augmented terminal MPC that explicitly enforces reachability at the horizon. We evaluate the proposed method on a 6D planar docking problem against grid-based ground truth and then scale to the full 13D system. Across both settings, our approach outperforms existing methods in success rate and computational efficiency.
comment: 8 pages, 7 Figures
Stability of Control Lyapunov Function Guided Reinforcement Learning
Reinforcement learning (RL) has become the de facto method for achieving locomotion on humanoid robots in practice, yet stability analysis of the corresponding control policies is lacking. Recent work has attempted to merge control theoretic ideas with reinforcement learning through control guided learning. A notable example of this is the use of a control Lyapunov function (CLF) to synthesize the reinforcement learning rewards, a technique known as CLF-RL, which has shown practical success. This paper investigates the stability properties of optimal controllers using CLF-RL with the goal of bridging experimentally observed stability with theoretical guarantees. The RL problem is viewed as an optimal control problem and exponential stability is proven in both continuous and discrete time using both core CLF reward terms and the additional terms used in practice. The theoretical bounds are numerically verified on systems such as the double integrator and cart-pole. Finally, the CLF guided rewards are implemented for a walking humanoid robot to generate stable periodic orbits.
comment: This work has been submitted to the IEEE for possible publication
The Control Plant as A Communication Channel: Implicit Communication for Decentralized LQG Control
We study a decentralized linear quadratic Gaussian control problem, in which a leader and a follower must steer a linear system to a target state. The target state is known only to the leader, and no explicit communication channel exists between the agents. To address the challenge posed by this asymmetric information structure, we propose an integrated communication and control (ICoCo) framework in which the control plant itself serves as a communication channel: the leader encodes the target state into its control input through an additive communication term, and the follower decodes it from the resulting state trajectory. We design an implicit coordination scheme based on joint source-channel coding ideas, and prove that the follower's estimation error decreases monotonically to zero, enabling the two agents to coordinate increasingly well and ultimately steer the system to the target state. We then formulate the design of the communication power as an optimal control problem to minimize the overall control cost. In the fully actuated leader case, we derive necessary optimality conditions and in the under-actuated case, we solve the problem numerically. Numerical results show that the proposed scheme effectively coordinates the two agents and achieves a control cost close to that of the explicit-communication lower bound.
comment: 23 pages, 6 figures
Data-Driven Sub-Optimal LQ Regulator for Linear Input-Delay Systems based on Informativity
This paper proposes a novel informativity-based data-driven synthesis method for a sub-optimal linear quadratic (LQ) regulator for linear input-delay systems from noisy input-state data. Exploiting the augmented state structure of input-delay systems with a known delay length, we derive a linear matrix inequality (LMI) condition for the data-driven synthesis of the augmented state-feedback controller that achieves a prescribed LQ performance level for every plant model consistent with the data. The proposed LMI condition enables efficient controller synthesis via convex optimization. Numerical simulations demonstrate the effectiveness of the proposed method. The trade-off between the achievable LQ performance and the uncertainty in the data is also clarified through a numerical example.
nvPAX: Constrained Optimization for Dynamic Power Allocation in Hierarchical and Multi-Tenant Systems
Power oversubscription is increasingly central to datacenter operation as power density grows, making it necessary to dynamically allocate limited power budgets across devices based on real-time demand. Existing approaches typically assume flat power domains, whereas in practice power distribution is hierarchical and allocation decisions must additionally respect tenant-level contractual constraints. We present nvPAX, a constrained-optimization policy that computes feasible power allocations at every control step via a three-phase hybrid QP/LP procedure. Phase I allocates power with minimum deviation from each device's power request, while respecting job priorities. Phase II fairly distributes excess power among active devices. Phase III fairly distributes any remaining power to idle devices. The rationale behind the three phases is to allow power oversubscription while maximizing datacenter utilization. On a trace-driven large-scale simulation using GPU power telemetry from a production datacenter, nvPAX runs with a mean wall-clock time of 264.69 ms per allocation interval and achieves a mean satisfaction ratio of 98.92%, outperforming static equal-share allocation and providing robustness beyond greedy proportional allocation in the presence of non-uniform hierarchical bottlenecks.
A Low-Frequency, Autoresonant Wireless Power Transfer Link for Bidirectional Bionic Interfaces
To provide multimode sensory feedback and motion control, bidirectional bionic interfaces for advanced prosthetic systems require continuous and secure energy delivery to implantable electronics and integration in the sensing WBAN (Wireless Body Area Network) of the patient. However, powering such interfaces is still an open issue. Wireless Power Transfer (WPT) avoids implanted batteries and transcutaneous connections, but its design is constrained by stringent requirements on electromagnetic safety, implant size, voltage compliance, and coexistence with sensitive bio-signal acquisition and stimulation circuitry. This paper presents the design and testing of a low-frequency (127 kHz) inductive WPT link for an implantable bidirectional bionic interface. The system includes an autoresonant driving control to maintain operation at resonance under varying coupling and load conditions of the cyber-physical prosthesis. Starting from the requirements of the bionic interface, the wireless body-area sensing system is designed by selecting the working frequency, drawing the electrical schemes, and checking its safety and regulatory compliance. Preliminary WPT prototypes can provide up to ~140 mA and ~20 V, achieving a maximum power transfer efficiency higher than 40% and satisfying the project requirements up to a 2 cm implantation depth.
comment: This is a peer-reviewed version of an article accepted for presentation at the 8th International Conference on Smart Applications, Communications and Networking (IEEE SmartNets 2026) conference, to be held in Rome, Italy
Zero-Shot, Safe and Time-Efficient UAV Navigation via Potential-Based Reward Shaping, Control Lyapunov and Barrier Functions
Autonomous navigation and obstacle avoidance remain a core challenge of modern Unmanned Aerial Vehicles (UAVs). While traditional control methods struggle with the complexity and variability of the environment, reinforcement learning (RL) enables UAVs to learn adaptive behaviors through interaction with the environment. Existing research with RL prioritizes the mission success at the expense of mission time and safety of UAVs. This study integrates Potential Based Reward Shaping (PBRS) with Control Lyapunov Functions (CLF) and Control Barrier Functions (CBF) to simultaneously optimize mission time and ensure formal safety guarantees. An RL model is trained in a generalized simple environment, then used in complex scenarios incorporating a CLF-CBF-QP filter without further training. Experimental results in simulated environments demonstrate a significant reduction in mission time and outstanding performance in complex environment.
Lateral String Stability for Vehicle Platoons: Formulation, Definition, and Analysis
Platooning of connected and automated vehicles provides significant benefits in terms of energy efficiency, traffic throughput, and, most critically, safety. These safety benefits depend on string stability, which dictates how disturbances propagate along a vehicle string. Although longitudinal string stability has been extensively examined, lateral string stability, which governs the propagation of path-tracking errors that can lead to unsafe deviations from the desired path, remains underexplored. Its importance is growing as autonomous vehicles increasingly depend on onboard sensing and map-free navigation, where sensor occlusions and tight formations amplify safety risks. This paper presents a framework for lateral string stability that focuses directly on safety-critical, path-relative tracking errors and enables consistent comparison across vehicles that follow the same planned path. The key element of the framework is an arc-length (Eulerian) viewpoint, a departure from traditional analyses, that clarifies how tracking errors at a given point on the path propagate from one vehicle to the next. Building on this foundation, we propose the definition of L2 lateral string stability along with two control strategies: a feedback-feedforward strategy that relies solely on onboard sensing, and a novel learn-from-predecessor strategy that makes use of vehicle-to-vehicle communication. Both strategies are analyzed for lateral string stability with respect to two error measures: tracking error vector and lateral (cross-track) error. Our results show that onboard sensing alone cannot guarantee attenuation of path-tracking errors, imposing a fundamental safety limitation, while V2V communication enables true error attenuation. The analysis further identifies structural controller requirements, showing that nonzero feedback on specific measurements is essential for guaranteeing stability.
Nonsmooth Hydraulics, Smooth Control: System Theory Framework for Analyzing Water Networks
This paper presents a comprehensive control-theoretic analysis of water distribution network (WDN) hydraulics. Starting from a general nonlinear differential algebraic equation (DAE) model of WDNs with arbitrary topology and network components (valves and pumps), we investigate three main questions. First, we study local well-posedness of the network dynamics and characterize the loss of differentiability introduced by pump and valve switching. Second, we introduce regularization methods that smooth flow and pressure trajectories under changing controls. Third, we establish error bounds for DAE linearization, local stability, and finite-horizon controllability, and quantify how network-induced parametric uncertainty impacts these properties. We demonstrate that the developed smoothed DAE models produce trajectories closely matching EPANET, a widely used WDN simulator, for various benchmark networks. The case studies also show that the WDN DAE exposes energy dissipation through a weighted Laplacian, ranks pipes by operating point sensitivity, and reveals that aggressive demand variation changes stability and controllability margins without eliminating local stability or pump authority. The developed theoretical foundations enable network analysis, mitigation strategies, and system design.
Operator-Theoretic and physics-guided Sequence Modeling of Lithium-Ion Battery Voltage Dynamics
Lithium-ion batteries exhibit nonlinear voltage dynamics across varying operating conditions and aging states, making accurate modeling essential for estimation, control, and health monitoring. This work compares two data-driven frameworks for modeling voltage responses from hybrid pulse power characterization (HPPC) measurements: an operator-theoretic model based on Dynamic Mode Decomposition with control (DMDc), and a physics-guided transformer-based sequence model. In the DMDc framework, delay-embedded snapshots of terminal voltage and current are used to identify system matrices directly from measurement data, yielding an interpretable state-space model for recursive prediction. In parallel, a modified PatchTST architecture is developed in which terminal voltage is decomposed into an analytically computed open-circuit-voltage (OCV) component and a learned dynamic residual, with a future-current fusion pathway tailored to the prescribed HPPC current profile. Experimental results on a 30 Ah lithium-ion cell show that, although both models capture the sharp transient pulse dynamics, DMDc achieves lower prediction error and greater robustness to cell degradation under the present limited data regime, while the transformer captures qualitatively similar dynamics with greater architectural flexibility. These results highlight the advantages of operator-theoretic models in interpretability, computational efficiency, and robustness, while indicating the promise of physics-guided transformer models when larger and more diverse datasets are available.
comment: 15 pages, Submitted to CDC 2026 conference
A Graph Theoretic Approach in Combination With Dynamic Mode Decomposition With Control (DMDc) to Analyze Battery Degradation
Accurate monitoring of lithium-ion battery (LIB) degradation is essential, yet it remains challenging due to the complex, nonlinear, and time-varying nature of electrochemical aging processes. Conventional equivalent circuit models (ECMs) provide simplified representations of battery behavior using fixed electrical components, but they cannot capture evolving internal degradation mechanisms and structural changes over time. In this study, a data-driven framework is developed by integrating dynamic mode decomposition with control (DMDc) with graph-theoretic analysis to characterize battery degradation from operational data alone. The mode matrix ($\mathbfφ$) obtained from DMDc is transformed into a weighted adjacency matrix, enabling the representation of battery dynamics as an evolving network of interacting states. Graph-based measures, including connectivity and a modularity (proxy), are then used to quantify structural changes in the system across degradation stages. The results show a clear transition from a highly connected and coherent network in the healthy state to a progressively weaker and more fragmented structure as degradation advances, accompanied by increasing heterogeneity. This work demonstrates that graph-theoretic representations can effectively capture the evolving dynamics of battery degradation and provide interpretable insights into system-level aging behavior.
comment: 14 pages,6 figures, submitted to SMASIS2026
Computational foundations of the human world
Human societies continuously transform scattered information into collective judgments and coordinated action, whether through markets discovering prices, governments allocating resources, communities enforcing norms, or science converging on reliable claims. Importantly, the computational difficulty of collective decision-making, particularly the time and communication required to reach solutions, imposes fundamental constraints on social organization. While theoretical computer science offers formal tools for analyzing such problems, for instance, by analyzing resource requirements, including time and memory, surprisingly, there is no domain of social science that focuses on the nature of computation in the human world. This perspective argues that we now have the opportunity to deploy these computational frameworks to study human social organization, opening research directions at the intersection of computer science and social science. We highlight core social phenomena that can be framed as computational, including (i) distributed consensus and coordinated action, (ii) societal restructuring with scale, (iii) hierarchical and modular structure, and (iv) externalized memory systems. We identify several concepts from theoretical computer science that may provide insight into these phenomena, especially emphasizing more recently developed approaches beyond the paradigm of Turing~Machines and worst-case computational complexity.
comment: 16 pages, 2 figures
Joint Energy Management and Coordinated AIGC Workload Scheduling for Distributed Data Centers: A Diffusion-Aided Reward Shaping Approach
Artificial intelligence-generated content (AIGC) has emerged as a transformative paradigm for automating the creation of diverse and customized content, giving rise to rapidly growing computational workloads in cloud data centers. It is imperative for AIGC service providers (ASPs) to strategically schedule AIGC workloads to reduce data center energy costs while guaranteeing high-quality content generation. However, the distinctive characteristics of AIGC services pose critical challenges, including model heterogeneity across ASPs, implicit service quality evaluation, and complex inference process control. To tackle these challenges, we propose a joint energy management and coordinated AIGC workload scheduling framework, which introduces an explicit mathematical characterization of service quality to promote both job transfer among ASPs and fine-grained inference process configuration. Moreover, various energy resources within data centers are jointly considered to enhance power usage flexibility. Subsequently, a system utility maximization problem is formulated to balance AIGC service revenue with operational penalties and costs. Nevertheless, the strong coupling among job scheduling decisions induces severe reward sparsity, which limits the effectiveness of existing deep reinforcement learning (DRL) algorithms. To address this issue, we develop a diffusion model-aided reward shaping approach to synthesize complementary reward signals through a multi-step denoising process. This approach is seamlessly integrated with DRL to enable efficient learning of scheduling policies under sparse environmental feedback. Experiments based on real-world models and datasets demonstrate that our scheme effectively accommodates electricity price fluctuations and AIGC model heterogeneity, while achieving superior learning convergence and system utility compared with benchmark methods.
Analytic Bridge Diffusions for Controlled Path Generation
Most modern bridge-diffusion methods achieve finite-time transport by specifying an interpolation, Schrödinger-bridge, or stochastic-control objective and then learning the associated score or drift field with a neural network. In contrast, we identify a restricted but sufficiently broad and analytically solvable class in which the score, intermediate marginals, and protocol gradients are available in closed form without inner stochastic simulation loops and without neural networks in the optimization loop. We recast the classical linear--quadratic--Gaussian (LQG) stochastic-control structure as a transport problem of the Path Integral Diffusion (PID) type. In classical LQG control, linear dynamics, Gaussian noise, and quadratic costs lead to Riccati equations and closed-form optimal feedback. In LQ-GM-PID, we retain the linear--quadratic stochastic-control backbone, but replace terminal state regulation by a prescribed terminal probability density and allow both the initial and terminal laws to be Gaussian Mixtures (GM). Moreover, LQ-GM-PID turns bridge diffusion from a tool for terminal target matching alone into a tool for path shaping. We demonstrate this on a 2D corridor task, a 2D multi-entrance transport task, and a high-dimensional scaling study with $d=32$ and $M=16$ Gaussian-mixture terminal modes, all with sub-50\,ms analytic precompute on a laptop. We position LQ-GM-PID as an analytically solvable reference model for the state-of-the-art neural bridge-diffusion and generative-transport methods: a controlled setting in which neural approximations, score estimates, path-shaping objectives, and protocol-learning procedures can be tested against exact quantities.
comment: 47 pages, 18 figures
Edge Case Detection in Automated Driving: Methods, Challenges and Future Directions
Automated vehicles promise to enhance transportation safety and efficiency. However, ensuring their reliability in real-world conditions remains challenging, particularly due to rare and unexpected situations known as edge cases. While numerous approaches exist for detecting edge cases, a comprehensive survey reviewing these techniques is lacking. This paper bridges this gap by presenting a hierarchical review and systematic classification of edge case detection and assessment methodologies. Our classification is structured on two levels: first, by AV modules, including perception and trajectory-related (encompassing prediction, planning, and control) sub-systems; and second, by underlying methodologies and theories guiding these techniques. Furthermore, we introduce "knowledge-driven" approaches, which complement data-driven methods by leveraging expert insights and domain knowledge to identify cases absent in training datasets. We then examine techniques and metrics for evaluating edge case detection methods, including detection performance (e.g., precision, recall, false positive rates), practical deployment (e.g., computational overhead, detection delay), and domain-specific measures (e.g., crash rates, severity analysis). We conclude by highlighting key challenges for edge case detection, including data availability and quality issues, validation and interpretability limitations, the sim2real gap, and computational constraints. The hierarchical classification and review of methods and assessment techniques in this survey enable modular and targeted testing frameworks by guiding the selection of detection methods for specific AV subsystems while considering methodological principles. It also supports practical testing by facilitating scenario generation in simulation and focused subsystem validation in the real world.
comment: Preprint submitted to IEEE Transactions on Intelligent Transportation Systems
Ultrafast On-chip Online Learning via Spline Locality in Kolmogorov-Arnold Networks ICML'26
Ultrafast online learning is essential for high-frequency systems, such as controls for quantum computing and nuclear fusion, where adaptation must occur on sub-microsecond timescales. Meeting these requirements demands low-latency, fixed-precision computation under strict memory constraints, a regime in which conventional Multi-Layer Perceptrons (MLPs) are both inefficient and numerically unstable. We identify key properties of Kolmogorov-Arnold Networks (KANs) that align with these constraints. Specifically, we show that: (i) KAN updates exploiting B-spline locality are sparse, enabling superior on-chip resource scaling, and (ii) KANs are inherently robust to fixed-point quantization. By implementing fixed-point online training on Field-Programmable Gate Arrays (FPGAs), a representative platform for on-chip computation, we demonstrate that KAN-based online learners are significantly more efficient and expressive than MLPs across a range of low-latency and resource-constrained tasks. To our knowledge, this work is the first to demonstrate model-free online learning at sub-microsecond latencies.
comment: Forty-Third International Conference on Machine Learning (ICML'26)
Design-OS: A Specification-Driven Framework for Engineering System Design with a Control-Systems Design Case
Engineering system design -- whether mechatronic, control, or embedded -- often proceeds in an ad hoc manner, with requirements left implicit and traceability from intent to parameters largely absent. Existing specification-driven and systematic design methods mostly target software, and AI-assisted tools tend to enter the workflow at solution generation rather than at problem framing. Human--AI collaboration in the design of physical systems remains underexplored. This paper presents Design-OS, a lightweight, specification-driven workflow for engineering system design organized in five stages: concept definition, literature survey, conceptual design, requirements definition, and design definition. Specifications serve as the shared contract between human designers and AI agents; each stage produces structured artifacts that maintain traceability and support agent-augmented execution. We position Design-OS relative to requirements-driven design, systematic design frameworks, and AI-assisted design pipelines, and demonstrate it on a control systems design case using two rotary inverted pendulum platforms -- an open-source SimpleFOC reaction wheel and a commercial Quanser Furuta pendulum -- showing how the same specification-driven workflow accommodates fundamentally different implementations. A blank template and the full design-case artifacts are shared in a public repository to support reproducibility and reuse. The workflow makes the design process visible and auditable, and extends specification-driven orchestration of AI from software to physical engineering system design.
comment: 2 figures, 11 pages, Accepted by ASME IDETC 2026 - DAC-09
Joint Scheduling of Deferrable and Nondeferrable Demand with Colocated Stochastic Supply
We investigate the problem of serving deferrable and nondeferrable electric demands with colocated stochastic supply and grid-imported electricity. Deferrable demands arrive randomly and can be delayed within their service deadlines. Nondeferrable demands are always present and must be served immediately, but the quantity served depends on the cost of electricity. Colocated supply is stochastic with zero marginal cost. It can be used to meet demand or exported to the grid to maximize profit. The stochasticity of demands and local supply makes optimal scheduling a Markov decision process with continuous (uncountable) state and action spaces. Under deterministic, time-varying, and piecewise-linear retail pricing of electricity, we show that the optimal demand scheduling follows the {\em Principle of Procrastination}, which reduces the infinite-dimensional policy space to a finite-dimensional Euclidean space defined by three procrastination parameters for each deferrable demand. For settings in which the underlying probability distributions are unknown, we propose a {\em Procrastination Threshold Reinforcement Learning} algorithm. Numerical experiments based on real-world test data confirm that the proposed threshold learning algorithm closely approximates the optimal policy and outperforms standard benchmarks.
Lyapunov-Certified Direct Switching Theory for Q-Learning
Q-learning is a fundamental algorithmic primitive in reinforcement learning. This paper develops a new framework for analyzing Q-learning from a switching-system viewpoint. In particular, we derive a direct stochastic switching-system representation of the Q-learning error. The key observation is that the Bellman maximization error can be expressed exactly as an average of action-wise Q-errors under a suitable stochastic policy. The resulting recursion has a switched linear conditional-mean drift and martingale-difference noise. To the best of our knowledge, this is the first convergence-rate analysis of standard Q-learning whose leading exponential rate is expressed through the joint spectral radius (JSR) of a direct switching family. Since the JSR is the exact worst-case exponential rate of the associated switched linear drift, the resulting rate is among the tightest drift-based rates that can be certified for this Q-learning representation. Building on this representation, we prove finite-time bounds based on a product-defined JSR-induced Lyapunov function and also give an optional common quadratic Lyapunov certificate. The quadratic certificate is only a sufficient condition and hence applies only to instances for which the certificate is feasible, whereas the JSR-induced Lyapunov construction applies to the full direct switching family whenever its JSR is below one. When feasible, the quadratic certificate replaces product-based verification by a computable matrix inequality and gives a simpler stochastic bound. We further extend the framework to Markovian observation models.
Generalizability of Learning-based Occupancy Detection in Residential Buildings (extended version)
This paper investigates non-intrusive occupancy detection methods for residential buildings using environmental sensor data from the KTH Live-In Lab in Stockholm, Sweden. Three machine learning approaches, namely, logistic regression (LR), support vector machines (SVM), and long short-term memory (LSTM) network enhanced with an attention mechanism, are evaluated in terms of predictive performance and computational complexity. The analysis considers the trade-off between sensor availability (investment cost) and prediction accuracy in real applications, as well as the models' cross-apartment generalizability. Hyperparameters for both the SVM and LSTM models are optimized using Bayesian optimization. All three models are evaluated on data collected from apartments not used during training, and on data generated from a calibrated digital model of the testbed. Results show that all models achieve comparable performance on the same-apartment test data (accuracy of approximately 0.83, F1 score of approximately 0.86). When assessed on cross-apartment data, the LSTM model demonstrates the strongest generalization capability (accuracy of 0.84, F1 score of 0.85), while LR provides a competitive, low-complexity alternative for applications that do not require cross-apartment generalization.
comment: 9 pages, 8 Figures, 4 Tables, Accepted for publication in the proceedings of the 2026 IFAC World Congress, Busan, Republic of Korea
A Continuous-Time Ensemble Kalman-Bucy Smoother for Causal Inference and Model Discovery
Data assimilation (DA) integrates observational information with model predictions to improve state estimation in complex systems. While filtering provides the basis for online forecasts by using only past and present observations, it can exhibit delays and biases when the underlying dynamics evolve rapidly or undergo regime transitions. Smoothing, which additionally incorporates future observations, provides a natural pipeline for hindcasting and reanalysis that yields an uncertainty reduction beyond the filter. This paper introduces an ensemble Kalman-Bucy smoother (EnKBS) for continuous-time DA of nonlinear dynamical systems, where the smoother's conditional distributions are reconstructed using ensemble moments. The result is a derivative-free framework that does not require explicit computation of tangent-linear or adjoint models, which converges to the exact smoother solution at the infinite-ensemble limit for a wide class of complex systems. Incorporating standard regularization techniques for high-dimensional systems, such as covariance localization and inflation, the skill of the EnKBS is demonstrated in various important scientific problems. By integrating future observations, which reveal the underlying causal mechanisms for retrospective state updates, the EnKBS is used for Bayesian-based inference of causal relationships and their temporal influence range in a dyadic trigger-feedback model and the development of a causality-driven iterative learning algorithm that identifies the structure and recovers the hidden parameters of a nonlinear reduced-order model mimicking midlatitude atmospheric circulation. Notably, both tasks remain effective with an ensemble size of $O(10)$ under partial observations, suggesting that EnKBS can support the instantaneous discovery of high-dimensional complex systems over time.
comment: 31 pages, 11 figures. Corresponding author: Nan Chen (chennan@math.wisc.edu)
Adaptive Network Security Policies via Belief Aggregation and Rollout
Evolving security vulnerabilities and shifting operational conditions require frequent updates to network security policies. These updates include adjustments to incident response procedures and modifications to access controls, among others. Reinforcement learning methods have been proposed for automating such policy adaptations, but most methods in the research literature lack performance guarantees and adapt slowly to changes. In this paper, we address these limitations and present a method for computing security policies that is scalable, offers theoretical guarantees, and adapts quickly to changes. The method uses a model or simulator of the system, which is updated when changes occur, and combines three components: belief estimation through particle filtering, offline policy computation through feature-based aggregation, and online policy adaptation through rollout. In particular, feature-based aggregation enables scalable offline optimization of a policy, while rollout adapts the policy online to changes in the system model without repeating the offline optimization. We analyze the approximation error of the aggregation and show that the rollout efficiently adapts policies to changes under certain conditions. Simulations and testbed results demonstrate that our method outperforms state-of-the-art methods on several benchmarks, including CAGE-2.
TARMM: Scaling Delay-Critical Edge AI Offloading in 5G O-RAN via Temporal Graph Mobility Management
Emerging delay-critical edge AI applications, such as VR perception and real-time video analytics, impose stringent latency and reliability requirements on 5G networks. However, existing mobility management mechanisms are largely reactive and fail to adapt to dynamic network conditions, resulting in suboptimal handover decisions and degraded performance. In this paper, we present TARMM, a 5G Open Radio Access Network (O-RAN) system that optimizes user mobility management for delay-critical edge AI offloading. The core of TARMM is a temporal graph model that captures the spatiotemporal dynamics of the RAN across users and cells, enabling near real-time handover decisions. Building on this representation, we design a multi-agent reinforcement learning (MARL) framework with rule-based action masking and proactive resource preparation to ensure safe, stable, and efficient handovers. We implement TARMM on a multi-cell indoor 5G O-RAN testbed and evaluate it using diverse VR workloads. Extensive experiments show that TARMM reduces tail latency by up to 44% and packet loss by up to 56% compared to state-of-the-art approaches. Source code and demo videos are available at: https://margo-source.github.io/Margo/
SBAMP: Sampling Based Adaptive Motion Planning
Autonomous robots operating in dynamic environments must balance global path optimality with real-time responsiveness to disturbances. This requires addressing a fundamental trade-off between computationally expensive global planning and fast local adaptation. Sampling-based planners such as RRT* produce near-optimal paths but struggle under perturbations, while dynamical systems approaches like SEDS enable smooth reactive behavior but rely on offline data-driven optimization. We introduce Sampling-Based Adaptive Motion Planning (SBAMP), a hybrid framework that combines RRT*-based global planning with an online, Lyapunov-stable SEDS-inspired controller that requires no pre-trained data. By integrating lightweight constrained optimization into the control loop, SBAMP enables stable, real-time adaptation while preserving global path structure. Experiments in simulation and on RoboRacer hardware demonstrate robust recovery from disturbances, reliable obstacle handling, and consistent performance under dynamic conditions.
Hierarchical Control for Continuous-time Systems via General Approximate Alternating Simulation Relations
This paper introduces a general approximate alternating simulation relation (\emph{$\varepsilon$-gAAS relation}) for continuous-time systems, which relaxes existing simulation relations to tolerate larger mismatches between abstract and concrete models. The definition of gAAS for continuous-time systems is first proposed, and its properties are investigated. Then, a control refinement method is developed to enable hierarchical control for the gAAS relation. Finally, case studies demonstrate the effectiveness of the proposed approach, highlighting its advantages over existing methods.
Robotics
Hydra-DP3: Frequency-Aware Right-Sizing of 3D Diffusion Policies for Visuomotor Control
Diffusion-based visuomotor policies perform well in robotic manipulation, yet current methods still inherit image-generation-style decoders and multi-step sampling. We revisit this design from a frequency-domain perspective. Robot action trajectories are highly smooth, with most energy concentrated in a few low-frequency discrete cosine transform modes. Under this structure, we show that the error of the optimal denoiser is bounded by the low-frequency subspace dimension and residual high-frequency energy, implying that denoising error saturates after very few reverse steps. This further suggests that action denoising requires a much simpler denoising model than image generation. Motivated by this insight, we propose Hydra-DP3(HDP3), a pocket-scale 3D diffusion policy with a lightweight Diffusion Mixer decoder that supports two-step DDIM inference. Our synthetic experiments validate the theory and support the sufficiency of two-step denoising. Futhermore, across RoboTwin2.0, Adroit, MetaWorld, and real-world tasks, HDP3 achieves state-of-the-art performance with fewer than 1% of the parameters of prior 3D diffusion-based policies and substantially lower inference latency.
An Efficient Metric for Data Quality Measurement in Imitation Learning
Imitation learning (IL) has seen remarkable progress, yet field deployment of IL-powered robots remains hindered by the challenge of out-of-distribution (OOD) scenarios. Fine-tuning pre-trained policies with end-user demonstrations collected in deployment environments is a promising strategy to address this challenge. However, end-user demonstrations are frequently of poor quality, characterized by excessive corrective motions, oscillations, and abrupt adjustments that degrade both learned and fine-tuned policy performance. Existing automated approaches for curating demonstration data require policy rollouts in the environment, making them computationally expensive and impractical for real-world deployment. In this paper, we propose a fast, efficient, and fully automated demonstration ranking metric based on the power spectral density (PSD) of demonstration trajectories. The PSD metric requires no policy learning, environment interaction, or expert labeling, making it well-suited for scalable, in-the-field data curation. Lower PSD values correspond to smoother, higher-quality demonstrations, while higher PSD values indicate erratic, artifact-laden trajectories. We evaluate the proposed metric on two benchmark imitation learning datasets comprising expert and lay-user demonstrations, and through a user study with older adults at a retirement facility, where collected demonstrations are used to fine-tune $\pi0.5$ \cite{intelligence2025pi_} for a daily living task. Results demonstrate that PSD-curated data yields policies with higher task success rates and smoother execution trajectories compared to uncurated baselines and two competitive data-ranking methods.
Good in Bad (GiB): Sifting Through End-user Demonstrations for Learning a Better Policy
Imitation learning offers a promising framework for enabling robots to acquire diverse skills from human users. However, most imitation learning algorithms assume access to high-quality demonstrations an unrealistic expectation when collecting data from non-expert users, whose demonstrations often contain inadvertent errors. Naively learning from such demonstrations can result in unsafe policy behavior, while discarding entire demonstrations due to occasional mistakes wastes valuable data, especially in low-data settings. In this work, we introduce GiB (Good-in-Bad), an algorithm that automatically identifies and discards erroneous subtasks within demonstrations while preserving high-quality subtasks. The filtered data can then be used by any policy learning algorithm to train more robust policies. GiB first trains a self-supervised model to learn latent features and assigns binary weights to label each demonstration as good or bad. It then models the latent feature distribution of high-quality segments and uses the Mahalanobis distance to detect and evaluate poor-quality subtasks. We validate GiB on the Franka robot in both simulated and real-world multi-step tasks, demonstrating improved policy performance when learning from mixed-quality human demonstrations.
VOFA: Visual Object Goal Pushing with Force-Adaptive Control for Humanoids
The ability to push large objects in a goal-directed manner using onboard egocentric perception is an essential skill for humanoid robots to perform complex tasks such as material handling in warehouses. To robustly manipulate heavy objects to arbitrary goal configurations, the robot must cope with unknown object mass and ground friction, noisy onboard perception, and actuation errors; all in a real-time feedback loop. Existing solutions either rely on privileged object-state information without onboard perception or lack robustness to variations in goal configurations and object physical properties. In this work, we present VOFA, a visual goal-conditioned humanoid loco-manipulation system capable of pushing objects with unknown physical properties to arbitrary goal positions. VOFA consists of a two-level hierarchical architecture with a high-level visuomotor policy and a low-level force-adaptive whole-body controller. The high-level policy processes noisy onboard observations and generates goal-conditioned commands to operate in closed loop across diverse object-goal configurations, while the low-level whole-body controller provides robustness to variations in object physical properties. VOFA is extensively evaluated in both simulation and real-world experiments on the Booster T1 humanoid robot. Our results demonstrate strong performance, achieving over 90% success in simulation and over 80% success in real-world trials. Moreover, VOFA successfully pushes objects weighing up to 17kg, exceeding half of the Booster T1's body weight.
Dynamics Distillation for Efficient and Transferable Control Learning
Robust control policy learning for autonomous driving requires training environments to be both physically realistic and computationally scalable, properties that existing simulators provide only in isolation. We introduce Sim2Sim2Sim, a framework that bridges high-fidelity vehicle simulation and scalable reinforcement learning by distilling simulator dynamics into a highly parallelizable learned dynamics model. By training control policies purely within this distilled environment and deploying them back into the high-fidelity source simulator, we demonstrate more efficient policy optimization and reliable transfer under challenging dynamics. We further show that predictive accuracy alone does not fully characterize a learned dynamics model's suitability as a reinforcement learning training environment, which should also be assessed by the quality of the policies it enables.
comment: 9 pages, 3 figures, under review
Distributed Algorithm with Emergent Area Partitioning and Base Station's Situation Awareness for Multi-Robot Patrolling
Patrolling with multiple robots offers efficient surveillance to detect and manage undesired situations. This necessitates improved patrol efficiency and operator situation awareness at base stations. Enhanced situation awareness enables operators to predict robots' behaviors, support recognition and decision-making, and execute emergency interventions. This study presents the Local Reactive and Partition (LR-PT) algorithm, a novel multi-robot patrolling approach. In simulations, LR-PT outperformed existing methods by ensuring frequent patrols of all locations of interest and enhancing the situation awareness of the base station. Robots independently select patrol targets based on locally available information, integrating patrol needs and the urgency of reporting mission progress to the base station into a unified utility function. This locality also contributes to robustness against communication constraints and robot failures, as demonstrated in this research. The algorithm further autonomously emerged the area partition, which can avoid falling into local optima and realize the comprehensive patrol over the whole mission area. The simulation results demonstrated the superior performance of LR-PT for multi-robot patrolling, utilizing the advantages of swarm robotics and addressing real-world operational challenges.
Cut-In Gap Acceptance Toward Autonomous vs. Human-Driven Vehicles: Evidence from the Waymo Open Motion Dataset
Autonomous vehicles (AVs) are widely known to follow conservative, rule-based motion policies that surrounding drivers can learn to anticipate. A direct consequence is that human drivers may accept shorter longitudinal gaps when cutting in front of an AV than when targeting another human-driven vehicle (HDV). We test this hypothesis using the Waymo Open Motion Dataset (WOMD), which provides 25,906 real-world highway scenarios at 10 hertz. An eight-criterion lane-change detector extracts 706 HDV-to-AV and 3,172 HDV-to-HDV cut-in events from the same traffic environment. The median accepted gap in front of the Waymo AV is 7.58 meters versus 9.57 meters for HDV targets, a 1.99 meter reduction that is statistically significant (p equals 5.76 times 10 to the negative eighth power, d equals negative 0.224) and persists under speed-matched resampling. Cut-in speeds toward the AV are 37 percent higher (51.7 versus 37.7 kilometers per hour, d equals 0.502), and 68.0 percent of AV-targeted cut-ins occur below the 10 meter gap boundary versus 51.8 percent of HDV-targeted events (chi-squared equals 60.5, p is less than 10 to the negative thirteenth power). These results reveal a systematic and safety-relevant asymmetry in human gap-acceptance behavior that warrants AV-specific calibration of both motion-planning safety envelopes and traffic simulation models.
Action Agent: Agentic Video Generation Meets Flow-Constrained Diffusion
We present Action Agent, a two-stage framework that unifies agentic navigation video generation with flow-constrained diffusion control for multi-embodiment robot navigation. In Stage I, a large language model (LLM) acts as an orchestration module that selects video diffusion models, refines prompts through iterative validation, and accumulates cross-task memory to synthesize physically plausible first-person navigation videos from language and image inputs. This increases video generation success from 35% (single-shot) to 86% across 50 navigation tasks. In Stage II, we introduce FlowDiT, a Flow-Constrained Diffusion Transformer that converts optimized goal videos and language instructions into continuous velocity commands using action-space denoising diffusion. FlowDiT integrates DINOv2 visual features, learned optical flow for ego-motion representation, and CLIP language embeddings for semantic stopping. We pretrain on the RECON outdoor navigation dataset and fine-tune on 203 Unitree G1 humanoid episodes collected in Isaac Sim to calibrate velocity dynamics. A single 43M-parameter checkpoint achieves 73.2% navigation success in simulation and 64.7% task completion on a real Unitree G1 in unseen indoor environments under open-loop execution, while operating at 40--47 Hz. We evaluate Action Agent across three embodiments: a Unitree G1 humanoid (real hardware), a drone, and a wheeled mobile robot (Isaac Sim), demonstrating that decoupling trajectory imagination from execution yields a scalable and embodiment-aware paradigm for language-guided navigation.
comment: 8 pages, 5 figures
LLM-Foraging: Large Language Models for Decentralized Swarm Robot Foraging
Swarm foraging algorithms, such as the central-place foraging algorithm (CPFA), typically rely on offline parameter optimization using genetic algorithms (GA) or reinforcement learning, yielding policies tightly coupled to a specific combination of team size, arena size, and resource distribution. When deployment conditions change, performance degrades, and retraining is computationally expensive. We propose LLM-Foraging, a decentralized swarm controller that augments the CPFA state machine with a large language model (LLM) tactical decision-maker at three structured decision points, namely post-deposit, central-zone arrival, and search starvation. Each robot runs its own LLM client and queries it using only locally observable state, while the existing CPFA motion and sensing stack executes the selected action. Because the LLM serves as a general decision policy rather than parameters fitted to a single configuration, the controller is training-free at deployment and transfers across configurations without re-optimization. We evaluate LLM-Foraging in Gazebo with TurtleBot3 robots across 36 configurations spanning team sizes of 4 to 10 robots, arena sizes from 6x6 to 10x10 meters, and three resource distributions (clustered, powerlaw, random). LLM-Foraging collects more resources than the GA-tuned CPFA baseline across the evaluated configurations and is more consistent, a property that the GA's single-configuration tuning does not transfer.
Decompose and Recompose: Reasoning New Skills from Existing Abilities for Cross-Task Robotic Manipulation ICML 2026
Cross-task generalization is a core challenge in open-world robotic manipulation, and the key lies in extracting transferable manipulation knowledge from seen tasks. Recent in-context learning approaches leverage seen task demonstrations to generate actions for unseen tasks without parameter updates. However, existing methods provide only low-level continuous action sequences as context, failing to capture composable skill knowledge and causing models to degenerate into superficial trajectory imitation. We propose Decompose and Recompose, a skill reasoning framework using atomic skill-action pairs as intermediate representations. Our approach decomposes seen demonstrations into interpretable skill--action alignments, enabling the model to recompose these skills for unseen tasks through compositional reasoning. Specifically, we construct a task-adaptive dynamic demonstration library via visual-semantic retrieval combined with skill sequences from a planning agent, complemented by a coverage-aware static library to fill missing skill patterns. Together, these yield skill-comprehensive demonstrations that explicitly elicit compositional reasoning for skill composition and execution ordering. Experiments on the AGNOSTOS benchmark and real-world environments validate our method's zero-shot cross-task generalization capability.
comment: Accepted by ICML 2026
High-Speed, Scalable Sensor Readout for Dexterous Robotic Hands via Shift-Register Multiplexing
Dexterous robotic hands require high-speed multimodal sensing across many degrees of freedom, yet existing readout architectures often impose trade-offs between sensor count, wiring complexity, and sampling bandwidth. This paper presents a scalable analog sensor readout architecture based on a serial-in parallel-out (SIPO) shift-register principle. The proposed architecture supports versatile integration of heterogeneous analog-output sensors, scalable expansion using only three signal lines between sensor modules, and fast, configurable sampling. We validate the approach on a tendon-driven robotic hand integrating 16 joint sensor modules and one four-channel tactile sensor module, enabling acquisition of 20 sensor channels at a full-scan rate of 1 kHz, with stable operation up to 1.5 kHz. Joint sensor characterization showed a maximum slope absolute percentage error (APE) of 0.446% and sub-degree estimation error, indicating that the proposed readout system does not significantly degrade sensing performance. For tactile sensing, LSTM-based models achieved an RMSE of 0.125 N for force estimation and 93.4% accuracy for five-class contact-location classification, and were deployed for real-time inference at 1 kHz. System-level experiments showed that the joint sensors provide more accurate feedback than motor-based estimation during interaction, while the tactile sensor enables responsive force estimation in contact. The proposed architecture offers a practical path toward fully sensorized robotic hands for dexterous manipulation.
Evidence-Based Landing Site Selection and Vison-Based Landing for UAVs in Unstructured Environments
Autonomous landing in cluttered or unstructured environments remains a safety-critical challenge for unmanned aerial vehicles (UAVs), particularly under noisy perception caused by sensor uncertainty and platform-induced disturbances such as vibration. This paper presents an evidence-based probabilistic framework for autonomous UAV landing that explicitly separates decision-making under uncertainty from execution via visual servoing. Landing safety is modeled as a latent variable and inferred through recursive accumulation of frame-wise visual likelihoods derived from flatness, slope, and obstacle cues, yielding a temporally consistent belief map that is robust to transient perception errors. Physical feasibility is enforced through a hard geometric constraint based on the minimum required landing radius of the UAV, ensuring that undersized but visually appealing regions are rejected. The final landing site is selected using constrained maximum a posteriori estimation. Once selected, the UAV locks onto the target region using ORB feature tracking and performs precise alignment and descent via image-based visual servoing (IBVS). The proposed approach is validated through both real-world laboratory experiments and high-fidelity simulations in Nvidia Isaac Sim, demonstrating consistent, cautious, and stable landing behavior across domains.
SixthSense: Task-Agnostic Proprioception-Only Whole-Body Wrench Estimation for Humanoids
Humanoid robots are entering our physical world at scale, yet as oversized toys--good at singing and dancing, but short on force-interaction capabilities for practical tasks. Bridging this gap necessitates prioritizing reliable contact perception as a fundamental requirement. Estimating external wrenches in humanoids is complicated by floating-base dynamics and indeterminate contact locations. Existing analytical frameworks require idealistic assumptions and hard-to-obtain measurements, which are often unavailable in practice. To bridge this gap, we propose SixthSense, a task-agnostic approach that infers whole-body contact timing, location, and wrenches from proprioception and IMU data alone. To capture the multi-modal dynamics between unstructured contact inputs and the uncertain motion outputs, we employ conditional flow matching to tokenize proprioceptive histories and estimate a spatiotemporally sparse contact-event flow. SixthSense serves as a plug-and-play perception module for applications including collision detection, physical human-robot interaction, and force-feedback teleoperation. Experiments across standing, walking, and whole-body motion-tracking policies showcased unprecedented performance in diverse behaviors.
Quasi-Static Control of Discrete Cosserat Rod
In this paper, we design feedback control laws for soft robots modelled using the Cosserat rod, which is spatially discretised using the Piecewise Constant Strain (PCS) approach. The PCS approach transforms the nonlinear PDEs describing the Cosserat rod to a system of nonlinear ODEs. This simplification results in a model describing soft robots which is similar to the serial rigid-link manipulators. We design feedback control laws for the quasi-static PCS model by using the external end-effector wrench as control input. The control laws are designed based on state-feedback linearisation in strain and task spaces. An extensive set of numerical results demonstrates the performance of the control laws for end-effector trajectory tracking and shape control of soft robots.
comment: Submitted to 17th APCA International Conference on Automatic Control and Soft Computing (CONTROLO 2026)
ESARBench: A Benchmark for Agentic UAV Embodied Search and Rescue
The rapid advancement of Multimodal Large Language Models (MLLMs) has empowered Unmanned Aerial Vehicle (UAV) with exceptional capabilities in spatial reasoning, semantic understanding, and complex decision-making, making them inherently suited for UAV Search and Rescue (SAR). However, existing UAV SAR research is dominated by traditional vision and path-planning methods and lacks a comprehensive and unified benchmark for embodied agents. To bridge this gap, we first propose the novel task of \textbf{Embodied Search and Rescue (ESAR)}, which requires aerial agents to autonomously explore complex environments, identify rescue clues, and reason about victim locations to execute informed decision-making. Additionally, we present \textbf{ESARBench}, the first comprehensive benchmark designed to evaluate MLLM-driven UAV agents in highly realistic SAR scenarios. Leveraging Unreal Engine 5 and AirSim, we construct four high-fidelity, large-scale open environments mapped directly from real-world Geographic Information System (GIS) data to ensure photorealistic landscapes. To rigorously simulate actual rescue operations, our benchmark incorporates dynamic variables including weather conditions, time of day, and stochastic clue placement. Furthermore, we create a dataset of 600 tasks modeled after real-world rescue cases and propose a robust set of evaluation metrics. We evaluate diverse baselines, ranging from traditional heuristics to advanced ground and aerial MLLM-based ObjectNav agents. Experimental results highlight the challenges in ESAR, revealing critical bottlenecks in spatial memory, aerial adaptation, and the trade-off between search efficiency and flight safety. We hope ESARBench serves as a valuable resource to advance research on Embodied Search and Rescue domain. Source code and project page: https://4amgodvzx.github.io/ESAR.github.io.
comment: 20 pages, 7 figures
Assistance Without Interruption: A Benchmark and LLM-based Framework for Non-Intrusive Human-Robot Assistance
Human-robot interaction (HRI) has long studied how agents and people coordinate to achieve shared goals. In this work, we formalize and benchmark the non-intrusive assistance as an independent paradigm of HRI, where a robot proactively supports a human's ongoing multi-step activities while strictly avoiding interruptions. Unlike conventional HRI tasks that rely on direct commands, explicit negotiation, or proactive interventions based on user habits and history, our task treats the human's plan as the primary process and formulates assistance as a joint decision over when to act and what to do. To systematically evaluate this problem, we establish a simulation benchmark, NIABench, along with new metrics tailored to the non-intrusive assistance task. We further propose a hybrid architecture that integrates an LLM with a scoring model. The scoring model first applies semantic retrieval to prune large candidate action sets, and then a ranker evaluates human-step and robot-action pairs, enabling reasoning over timing and cross-step dependencies. Comprehensive experiments on both NIABench and real-world scenarios demonstrate that our method achieves proactive, non-intrusive assistance that reduces human effort while preserving task effectiveness.
VoxAfford: Multi-Scale Voxel-Token Fusion for Open-Vocabulary 3D Affordance Detection
Open-vocabulary 3D affordance detection requires localizing interaction regions on point clouds given novel affordance descriptions. Recent methods extend multimodal large language models (MLLMs) with special output tokens that are decoded into segmentation masks. However, these tokens are produced through autoregressive generation, which models sequential dependencies rather than spatial neighborhood relations, leaving them semantically rich but spatially impoverished for 3D localization. We propose Voxel-enhanced Affordance detection (VoxAfford), which bypasses this bottleneck by injecting multi-scale geometric features from a frozen pre-trained 3D VQVAE encoder into the output tokens after generation. Each output token uses its affordance semantics as a query to retrieve relevant geometric patterns from its paired voxel scale via cross-attention, with a learned compatibility gate controlling the injection strength. The enhanced tokens are then aggregated into a spatially-aware affordance prompt through semantic-conditioned attention and propagated alongside per-point features to generate the final mask. Experiments on open-vocabulary affordance detection tasks show that VoxAfford achieves state-of-the-art performance with approximately an 8% improvement in mIoU, and real robot experiments confirm zero-shot transfer to novel objects.
Terrain Perception for Agricultural UAVs in Complex Farmland via Rotating mmWave Radar
Accurate terrain perception is essential for terrain-following flight of agricultural unmanned aerial vehicles (UAVs), yet remains challenging in real-world farmland due to occlusions, complex terrain geometry, and environmental disturbances. Millimeter-wave (mmWave) radar is a promising sensing modality for this task due to its robustness to adverse conditions; however, existing UAV-mounted radar systems rely on fixed field of view (FoV) and terrain extraction methods designed for dense LiDAR data, leading to incomplete and unreliable terrain estimation. To address these limitations, we present a low-cost rotating mmWave radar-enabled terrain perception framework for agricultural UAVs operating in complex farmland environments. Specifically, a mechanically rotating sensing design is introduced to enlarge spatial coverage and improve terrain observability beyond the limitations of fixed-view radar under dynamic low-altitude flight. Building upon this sensing capability, we further design a pose-consistent terrain reconstruction pipeline tailored for sparse, noisy, and partially observable radar data, enabling reliable ground extraction and continuous terrain surface estimation in challenging agricultural scenarios. The complete system is deployed on a real agricultural UAV platform and comprehensively evaluated through extensive field experiments. Experimental results demonstrate improved terrain coverage and estimation accuracy, achieving an F1 score of 94.42 for ground segmentation, while the closest rival only achieves 90.48. Thus, leading to more robust terrain following flight.
Bi-Level Reinforcement Learning Control for an Underactuated Blimp via Center-of-Mass Reconfiguration
This paper investigates goal-directed tracking control of underactuated blimps with center-of-mass (CoM) reconfiguration. Unlike conventional overactuated blimp designs that rely on redundant actuation for simplified control, this paper focuses on a compact architecture consisting of two thrusters and a movable internal slider, aiming to improve energy efficiency and payload capacity. This hardware-efficient configuration introduces significant underactuation and strong nonlinear coupling between CoM dynamics and vehicle motion. To address these challenges, this paper proposes a bi-level reinforcement learning framework that explicitly decouples task-level CoM planning from continuous thrust control. The outer policy determines a target-dependent CoM configuration prior to flight, while the inner policy generates thrust commands to track straight-line references. To ensure stable learning, this paper introduces a two-stage learning strategy, supported by a convergence analysis of the resulting bi-level process. Extensive simulations and real-world experiments on a 27-goal evaluation set demonstrate that the proposed method consistently outperforms fixed-CoM baselines and PID-based controllers, achieving higher tracking accuracy, enhanced robustness, and reliable sim-to-real transfer.
What Does a Meow Mean? In Search of Intuitively Understandable Communication by a Nonverbal Companion Robot
Older adults living alone have a number of challenges, and robots can help with some of them--by providing reminders, initiating activity, or offering comfort. As part of developing a cat robot with limited assistive functions, we designed a set of nonverbal communication signals, both auditory (cat sounds) and visual (icons on a small display). To evaluate these signals we used a mixed-methods, user-centered approach. After a pilot study, a focus group with older adults suggested revisions to the initial signal set. A large-sample online experiment then tested whether adults over the age of 65 could accurately infer the robot's communicative intentions. When both visual and auditory signals were present, accuracy was high. When visual signals were absent, accuracy often decreased; when auditory signals were absent, accuracy sometimes increased. So the auditory signals were less helpful, except when the robot conveyed strong sentiments (e.g., purring while being petted).
comment: To appear in the Proceedings of the 18th International Conference on Social Robotics (ICSR 2026)
A Principled Approach for Creating High-fidelity Synthetic Demonstrations for Imitation Learning
Recent advances in 3D Gaussian Splatting (3DGS) have enabled visually realistic demonstration generation from a single expert trajectory and a short multi-view scan. However, existing 3DGS-based synthesis pipelines typically generate new motions using sampling-based planners or trajectory optimization, which often deviate substantially from the expert's demonstrated path. While such deviations may be acceptable for tasks insensitive to motion shape, they discard subtle spatial and temporal structure that is critical for contact-rich and shape-sensitive manipulation, causing increased demonstration diversity to harm downstream policy learning. We argue that demonstration synthesis should treat the expert trajectory as a strong prior. Building on this principle, we propose a framework that synthesizes diverse task demonstrations while explicitly preserving expert motion structure. We model the expert trajectory using Dynamic Movement Primitives (DMPs) and retarget it to new goals, object configurations, and viewpoints within a reconstructed 3DGS scene, yielding phase-consistent, shape-preserving motion by construction. To safely realize this expert-preserving diversity in cluttered scenes, we introduce an analytic obstacle-aware DMP formulation that operates directly on the continuous density field induced by the 3DGS representation. This enables collision avoidance while minimally perturbing the nominal expert motion, unifying photorealistic rendering and geometric reasoning without additional scene representations. We evaluate our approach on a Spot mobile manipulator across three manipulation tasks with increasing sensitivity to trajectory fidelity. Compared to planner- and optimization-based synthesis, our method produces trajectories with lower deviation and collision rates and yields higher task success when training diffusion-based visuomotor policies.
Dynamics Aware Quadrupedal Locomotion via Intrinsic Dynamics Head
Quadrupedal locomotion plays a critical role in enabling agile, versatile movement across complex terrains. Understanding and estimating the underlying physical dynamics are essential for achieving efficient and stable quadrupedal locomotion. We propose a novel training framework for quadrupedal locomotion that enables the Control Policy to understand and reason about physical dynamics. In simulation, we concurrently train an Intrinsic Dynamics (ID) Head that learns state-to-torque dynamics alongside the Control Policy, and we define a dynamics reward enabled by the ID Head that encourages the Policy toward more predictable dynamical behavior. We also provide a mechanism to tune the learned dynamics in the resulting Policy by controlling the training coefficients of the ID Head. Our simulation experiments show that this mechanism drives convergence to better optima across a wide range of standard quadrupedal locomotion rewards, yielding more efficient and smoother policies. Our real-robot experiments demonstrate sim-to-real transfer of these improvements, with significant gains in torque efficiency (16.8%), action rate (18.6%), and mechanical power (12.8%), while improving safe torque occupancy by 6.4%.
comment: 8 pages, 6 figures
To Do or Not to Do: Ensuring the Safety of Visuomotor Policies Learned from Demonstrations
Task success has historically been the primary measure of policy performance in imitation learning (IL) research. This characteristics strictly limits the ubiquitous applications of IL algorithms in field robotics where safety assurance, in addition to task-success, is of paramount importance. It is often desirable for an IL-powered robot in the field not to roll out a policy, and hence score a poor performance, if the safety is not guaranteed. Although this trade-off between safety and performance is well investigated in classical control literature, policy safety is a heavily underexplored domain in IL research. There is no universal definition of safety in IL. To make things worst, many existing theoretical works on safety is notoriously difficult to extend to IL-powered robots in the field. This paper offers important insights on the safety and performance of IL policies. We propose execution guarantee, a policy-agnostic safety measure that guarantees the maximum task success for a visuomotor IL policy, despite minor run-time changes, from within a specific region in the state space. We leverage recent advances in view synthesis to identify such regions in the state space for an IL policy and explore a fundamental result on set invariance - namely, Nagumo's sub-tangentiality condition - to prove and operationalize execution guarantee from inside that region. Experiments with a Franka robot, both in simulation and real world, demonstrate how the proposed safety analysis allows various IL policies to achieve maximum task success with guarantee. We also demonstrate some interesting results on how a recovery policy - a by-product of the proposed safety analysis - can help to increase the policy performance and thereby mitigating the safety-performance tradeoff in IL.
TAIL-Safe: Task-Agnostic Safety Monitoring for Imitation Learning Policies
Recent imitation learning (IL) algorithms such as flow-matching and diffusion policies demonstrate remarkable performance in learning complex manipulation tasks. However, these policies often fail even when operating within their training distribution due to extreme sensitivity to initial conditions and irreducible approximation errors that lead to compounding drift. This makes it unsafe to deploy IL policies in the field where out-of-distribution scenarios are prevalent. A prerequisite for safe deployment is enabling the policy to determine whether it can execute a task the way it was learned from demonstrations. This paper presents TAIL-Safe, a principled approach to identify, for a trained IL policy, a safe set from where the policy empirically succeeds in completing the learned task. We propose a Lipschitz-continuous Q-value function that maps state-action pairs to a long-term safety score based on three short-term task-agnostic criteria: visibility, recognizability, and graspability. The zero-superlevel set of this function characterizes an empirical control invariant set over state-action pairs. When the nominal policy proposes an action outside this set, we apply a recovery mechanism inspired by Nagumo's theorem that uses gradient ascent to the Q-function to steer the policy back to safety. To learn this Q-function, we construct a high-fidelity digital twin using Gaussian Splatting that enables systematic collection of failure data without risk to physical hardware. Experiments with a Franka Emika robot demonstrate that flow-matching policies, which fail under run-time perturbations, achieve consistent task success when guided by the proposed TAIL-Safe.
VLA-ATTC: Adaptive Test-Time Compute for VLA Models with Relative Action Critic Model
Vision-Language-Action (VLA) models have demonstrated remarkable capabilities and generalization in embodied manipulation. However, their decision-making relies on a fast, instinctive process that lacks deliberation. This strategy often leads to suboptimal or catastrophic actions when facing complex or ambiguous scenarios that require greater consideration. In this paper, we introduce \textbf{VLA-ATTC}, a framework that endows VLA models with adaptive test-time compute (TTC). VLA-ATTC employs an uncertainty-based ``cognitive clutch'' to dynamically transition from reflexive execution to a TTC deliberation phase when necessary. During TTC phase, a novel \textbf{Relative Action Critic} (RAC) model identifies the optimal action from generated candidates via pairwise comparisons. This relative mechanism replaces unstable absolute value estimation, significantly simplifying the learning objective. Furthermore, we introduce an efficient sampling strategy to amortize computational costs and an automated data pipeline that curates preference pairs without manual annotation. On the LIBERO-LONG benchmark, VLA-ATTC reduces the failure rate of the SOTA model PI0.5 by over 50\%. We will open-source all the code and weights.
Sentinel-VLA: A Metacognitive VLA Model with Active Status Monitoring for Dynamic Reasoning and Error Recovery
Vision-language-action (VLA) models have advanced the field of embodied manipulation by harnessing broad world knowledge and strong generalization. However, current VLA models still face several key challenges, including limited reasoning capability, lack of status monitoring, and difficulty in self-correction. In this paper, we introduce \textbf{Sentinel-VLA}, a metacognitive VLA model equipped with an active ``sentinel'' module to monitor real-time execution status. Only when necessary, such as during initial planning or upon detecting an error, the model triggers a dynamic reasoning or formulate error recovery solutions. This on-demand reasoning mechanism ensures robust decision-making while minimizing computational overhead. Notably, all training data (spanning 44 tasks and over 2.6 million transitions) is automatically generated and annotated through our designed pipeline. We also propose the Self-Evolving Continual Learning (SECL) algorithm, which allows Sentinel-VLA to identify its capability boundaries and automatically collect data for expansion, paired with Orthogonal Continual Adapter (OC-Adapter) to constrain parameter updates to an orthogonal space, thereby preventing catastrophic forgetting. Real-world experiments demonstrate that Sentinel-VLA boosts the task success rate by over 30\% compared to the SOTA model, PI0. We will open-source all the code, weights, and data generation pipeline.
A skin-like conformal sensor for real-time shape mapping
Reliable real-time 3D shape sensing is essential for robust control and interpretation of deformable systems during motion. Existing vision-based approaches require line-of-sight and complex instrumentation, limiting operation in occluded and space-constrained settings. Here, we introduce a scalable, skin-like sensor that reconstructs its continuous 3D deformation in real time from distributed strain measurements. The device embeds a 2D array of mirror-stacked, printed oxidized eutectic gallium-indium (o-EGaIn) strain gauges within an elastomeric film to measure off-neutral-axis strains. Combined with a mechanics-informed observation model and a fast optimization routine, the system estimates local curvature, elongation, offset, and orientation under concurrent stretching, bending, and indentation, enabling reconstruction of complex surfaces. A 5-by-5 array with a 12 mm pitch achieves a mean surface reconstruction error of 0.62 mm with 0.1s latency across all tested scenarios. When conforming to complex surfaces, the sensor provides fast 3D shape mapping of the underlying geometry. Demonstrations involving palm gesturing, finger indentation, and contact-induced balloon deformation highlight utility for epidermal motion tracking, haptic interaction, and intraoperative monitoring.
comment: 13 pages, 5 figures
SaLF: Sparse Local Fields for Multi-Sensor Rendering in Real-Time ICRA 2026
High-fidelity sensor simulation of light-based sensors such as cameras and LiDARs is critical for safe and accurate autonomy testing. Neural radiance field (NeRF)-based methods that reconstruct sensor observations via ray-casting of implicit representations have demonstrated accurate simulation of driving scenes, but are slow to train and render, hampering scalability. 3D Gaussian Splatting (3DGS) has demonstrated faster training and rendering times through rasterization, but is primarily restricted to pinhole camera sensors, preventing usage for realistic multi-sensor autonomy evaluation. Moreover, both NeRF and 3DGS couple the representation with the rendering procedure (implicit networks for ray-based evaluation, particles for rasterization), preventing interoperability, which is key for general usage. In this work, we present Sparse Local Fields (SaLF), a novel volumetric representation that supports rasterization and raytracing for unified multi-sensor simulation. SaLF represents volumes as a sparse set of 3D voxel primitives, where each voxel is a local implicit field. SaLF has fast training ($<$30 min) and rendering capabilities (50+ FPS for camera and 600+ FPS for LiDAR), has adaptive pruning and densification to easily handle large scenes, and can support non-pinhole cameras and spinning LiDARs. We demonstrate that SaLF has similar realism as existing self-driving sensor simulation methods while improving efficiency and enhancing capabilities, enabling more scalable simulation.
comment: ICRA 2026. Project page: https://waabi.ai/salf/
Privileged Foresight Distillation: Zero-Cost Future Correction for World Action Models
World action models jointly predict future video and action during training, raising an open question about what role the future-prediction branch actually plays. A recent finding shows that this branch can be removed at inference with little to no loss on common manipulation benchmarks, suggesting that future information may act merely as a regularizer on the shared visual backbone. We propose instead that joint training induces an action-conditioned correction that privileged future observations impose on action denoising, and that current-only policies capture this correction only partially. Making the account precise, we formulate privileged foresight as a residual in the action-denoising direction -- the difference between what a model predicts given the true future and what it predicts given only the current frame -- and introduce \emph{Privileged Foresight Distillation (PFD)}, which transfers this residual from a training-time teacher into a small adapter on a current-only student. The teacher and student share the same backbone and differ only in the attention mask over video tokens; future video is never generated at inference. Controlled experiments verify that this gain reflects a genuine future-conditioned correction rather than a side effect of capacity or regularization. Empirically, PFD achieves consistent improvements on LIBERO and RoboTwin manipulation benchmarks while preserving the current-only inference interface at negligible added latency. This view reframes the role of future information in world action models: not as a target to predict, nor as a regularizer to absorb, but as a compressible correction to be distilled.
From Fold to Function: Simulation-Driven Design of Origami Mechanisms
Origami-inspired mechanisms can transform flat sheets into functional three-dimensional dynamic structures that are lightweight, compact, and capable of complex motion. These properties make origami increasingly valuable in robotic and deployable systems. However, accurately simulating their folding behavior and interactions with the environment remains challenging. To address this, we present a design framework for origami mechanism simulation that utilizes MuJoCo's deformable-body capabilities. In our approach, origami sheets are represented as graphs of interconnected deformable elements with user-specified constraints such as creases and actuation, defined through an intuitive graphical user interface (GUI). This framework allows users to generate physically consistent simulations that capture both the geometric structure of origami mechanisms and their interactions with external objects and surfaces. We demonstrate our method's utility through a case study on an origami catapult, where design parameters are optimized in simulation using the Covariance Matrix Adaptation Evolution Strategy (CMA-ES) and validated experimentally on physical prototypes. The optimized structure achieves improved throwing performance, illustrating how our system enables rapid, simulation-driven origami design, optimization, and analysis.
comment: IEEE RoboSoft 2026 (8 Pages, 9 Figures)
Separation is Optimal for LQR under Intermittent Feedback
In this work, we first prove that the separation principle holds for communication-constrained LQR problems under i.i.d. zero-mean disturbances with a symmetric distribution. We then solve the dynamic programming problem and show that the optimal scheduling policy is a symmetric threshold rule on the accumulated disturbance since the most recent update, while the optimal controller is a discounted linear feedback law independent of the scheduling policy.
TouchGuide: Inference-Time Steering of Visuomotor Policies via Touch Guidance
Fine-grained and contact-rich manipulation remain challenging for robots, largely due to the underutilization of tactile feedback. To address this, we introduce TouchGuide, a novel cross-policy visuo-tactile fusion paradigm that fuses modalities within a low-dimensional action space. Specifically, TouchGuide operates in two stages to guide a pre-trained diffusion or flow-matching visuomotor policy at inference time. First, the policy produces a coarse, visually-plausible action using only visual inputs during early sampling. Second, a task-specific Contact Physical Model (CPM) provides tactile guidance to steer and refine the action, ensuring it aligns with realistic physical contact conditions. Trained through contrastive learning on limited expert demonstrations, the CPM provides a tactile-informed feasibility score to steer the sampling process toward refined actions that satisfy physical contact constraints. Furthermore, to facilitate TouchGuide training with high-quality and cost-effective data, we introduce TacUMI, a data collection system. TacUMI achieves a favorable trade-off between precision and affordability; by leveraging rigid fingertips, it obtains direct tactile feedback, thereby enabling the collection of reliable tactile data. Extensive experiments on five challenging contact-rich tasks, such as shoe lacing and chip handover, show that TouchGuide consistently and significantly outperforms state-of-the-art visuo-tactile policies.
Learning to Act Through Contact: A Unified View of Multi-Task Robot Learning
We present a unified framework for multi-task locomotion and manipulation policy learning grounded in a contact-explicit representation. Instead of designing different policies for different tasks, our approach unifies the definition of a task through a sequence of contact goals--desired contact positions, timings, and active end-effectors. This enables leveraging the shared structure across diverse contact-rich tasks, leading to a single policy that can perform a wide range of tasks. In particular, we train a goal-conditioned reinforcement learning (RL) policy to realise given contact plans. We validate our framework on multiple robotic embodiments and tasks: a quadruped performing multiple gaits, a humanoid performing multiple biped and quadrupedal gaits, and a humanoid executing different bimanual object manipulation tasks. Each of these scenarios is controlled by a single policy trained to execute different tasks grounded in contacts, demonstrating versatile and robust behaviours across morphologically distinct systems. Our results show that explicit contact reasoning significantly improves generalisation to unseen scenarios, positioning contact-explicit policy learning as a promising foundation for scalable loco-manipulation. Video available at: https://youtu.be/idHx67oHHU0?si=qZJ7C0ujemXNWgA5
AutoSpatial: Visual-Language Reasoning for Social Robot Navigation through Efficient Spatial Reasoning Learning
We present a novel method, AutoSpatial, an efficient approach with structured spatial grounding to enhance VLMs' spatial reasoning. By combining minimal manual supervision with large-scale Visual Question-Answering (VQA) pairs auto-labeling, our approach tackles the challenge of VLMs' limited spatial understanding in social navigation tasks. By applying a hierarchical two-round VQA strategy during training, AutoSpatial achieves both global and detailed understanding of scenarios, demonstrating more accurate spatial perception, movement prediction, Chain of Thought (CoT) reasoning, final action, and explanation compared to other SOTA approaches. These five components are essential for comprehensive social navigation reasoning. Our approach was evaluated using both expert systems (GPT-4o, Gemini 2.0 Flash, and Claude 3.5 Sonnet) that provided cross-validation scores and human evaluators who assigned relative rankings to compare model performances across four key aspects. Augmented by the enhanced spatial reasoning capabilities, AutoSpatial demonstrates substantial improvements by averaged cross-validation score from expert systems in: perception & prediction (up to 10.71%), reasoning (up to 16.26%), action (up to 20.50%), and explanation (up to 18.73%) compared to baseline models trained only on manually annotated data.
Rhythm: Learning Interactive Whole-Body Control for Dual Humanoids
Realizing interactive whole-body control for multi-humanoid systems is critical for unlocking complex collaborative capabilities in shared environments. Although recent advancements have significantly enhanced the agility of individual robots, bridging the gap to physically coupled multi-humanoid interaction remains challenging, primarily due to severe kinematic mismatches and complex contact dynamics. To address this, we introduce Rhythm, the first unified framework enabling real-world deployment of dual-humanoid systems for complex, physically plausible interactions. Our framework integrates three core components: (1) an Interaction-Aware Motion Retargeting (IAMR) module that generates feasible humanoid interaction references from human data; (2) an Interaction-Guided Reinforcement Learning (IGRL) policy that masters coupled dynamics via graph-based rewards; and (3) a real-world deployment system that enables robust transfer of dual-humanoid interaction. Extensive experiments on physical Unitree G1 robots demonstrate that our framework achieves robust interactive whole-body control, successfully transferring diverse behaviors such as hugging and dancing from simulation to reality.
aerial-autonomy-stack -- a Faster-than-real-time, Autopilot-agnostic, ROS2 Framework to Simulate and Deploy Perception-based Drones
Unmanned aerial vehicles are rapidly transforming multiple applications, from agricultural and infrastructure monitoring to logistics and defense. Introducing greater autonomy to these systems can simultaneously make them more effective as well as reliable. Thus, the ability to rapidly engineer and deploy autonomous aerial systems has become of strategic importance. In the 2010s, a combination of high-performance compute, data, and open-source software led to the current deep learning and AI boom, unlocking decades of prior theoretical work. Robotics is on the cusp of a similar transformation. However, physical AI faces unique hurdles, often combined under the umbrella term "simulation-to-reality gap". These span from modeling shortcomings to the complexity of vertically integrating the highly heterogeneous hardware and software systems typically found in field robots. To address the latter, we introduce aerial-autonomy-stack, an open-source, end-to-end framework designed to streamline the pipeline from (GPU-accelerated) perception to (flight controller-based) action. Our stack allows the development of aerial autonomy using ROS2 and provides a common interface for two of the most popular autopilots: PX4 and ArduPilot. We show that it supports over 20x faster-than-real-time, end-to-end simulation of a complete development and deployment stack -- including edge compute and networking -- significantly compressing the build-test-release cycle of perception-based autonomy.
Need for Speed: Zero-Shot Depth Completion with Single-Step Diffusion
We introduce Marigold-SSD, a single-step, late-fusion depth completion framework that leverages strong diffusion priors while eliminating the costly test-time optimization typically associated with diffusion-based methods. By shifting computational burden from inference to finetuning, our approach enables efficient and robust 3D perception under real-world latency constraints. Marigold-SSD achieves significantly faster inference with a training cost of only 4.5 GPU days. We evaluate our method across four indoor and two outdoor benchmarks, demonstrating strong cross-domain generalization and zero-shot performance compared to existing depth completion approaches. Our approach significantly narrows the efficiency gap between diffusion-based and discriminative models. Finally, we challenge common evaluation protocols by analyzing performance under varying input sparsity levels. Page: https://dtu-pas.github.io/marigold-ssd/
NaviMaster: Learning a Unified Policy for GUI and Embodied Navigation Tasks ACL 2026
Recent advances in Graphical User Interface (GUI) and embodied navigation have driven progress, yet these domains have largely evolved in isolation, with disparate datasets and training paradigms. In this paper, we observe that both tasks can be formulated as Markov Decision Processes (MDP), suggesting a foundational principle for their unification. Hence, we present NaviMaster, the first unified agent capable of unifying GUI navigation and embodied navigation within a single framework. Specifically, NaviMaster (i) proposes a visual-target trajectory collection pipeline that generates trajectories for both GUI and embodied tasks using a single formulation. (ii) employs a unified reinforcement learning framework on the mix data to improve generalization. (iii) designs a novel distance-aware reward to ensure efficient learning from the trajectories. Through extensive experiments on out-of-domain benchmarks, NaviMaster is shown to outperform state-of-the-art agents in GUI navigation, spatial affordance prediction, and embodied navigation. Ablation studies further demonstrate the efficacy of our unified training strategy, data mixing strategy, and reward design. Our codes, data, and checkpoints are available at https://iron-boyy.github.io/navimaster-page/.
comment: ACL 2026 Main Camera Ready
STEP: Warm-Started Visuomotor Policies with Spatiotemporal Consistency Prediction ICML 2026
Diffusion policies have recently emerged as a powerful paradigm for visuomotor control in robotic manipulation due to their ability to model the distribution of action sequences and capture multimodality. However, iterative denoising leads to substantial inference latency, limiting control frequency in real-time closed-loop systems. Existing acceleration methods either reduce sampling steps, bypass diffusion through direct prediction, or reuse past actions, but often struggle to jointly preserve action quality and achieve consistently low latency. In this work, we propose STEP, a lightweight spatiotemporal consistency prediction mechanism to construct high-quality warm-start actions that are both distributionally close to the target action and temporally consistent, without compromising the generative capability of the original diffusion policy. Then, we propose a velocity-aware perturbation injection mechanism that adaptively modulates actuation excitation based on temporal action variation to prevent execution stall especially for real-world tasks. We further provide a theoretical analysis showing that the proposed prediction induces a locally contractive mapping, ensuring convergence of action errors during diffusion refinement. We conduct extensive evaluations on nine simulated benchmarks and two real-world tasks. Notably, STEP with 2 steps can achieve an average 21.6% and 27.5% higher success rate than BRIDGER and DDIM on the RoboMimic benchmark and real-world tasks, respectively. These results demonstrate that STEP consistently advances the Pareto frontier of inference latency and success rate over existing methods.The code is publicly available at https://github.com/Kimho666/STEP.
comment: Accept by ICML 2026
Robo3R: Enhancing Robotic Manipulation with Accurate Feed-Forward 3D Reconstruction
3D spatial perception is fundamental to generalizable robotic manipulation, yet obtaining reliable, high-quality 3D geometry remains challenging. Depth sensors suffer from noise and material sensitivity, while existing reconstruction models lack the precision and metric consistency required for physical interaction. We introduce Robo3R, a feed-forward, manipulation-ready 3D reconstruction model that predicts accurate, metric-scale scene geometry directly from RGB images and robot states in real time. Robo3R jointly infers scale-invariant local geometry and relative camera poses, which are unified into the scene representation in the canonical robot frame via a learned global similarity transformation. To meet the precision demands of manipulation, Robo3R employs a masked point head for sharp, fine-grained point clouds, and a keypoint-based Perspective-n-Point (PnP) formulation to refine camera extrinsics and global alignment. Trained on Robo3R-4M, a curated large-scale synthetic dataset with four million high-fidelity annotated frames, Robo3R consistently outperforms state-of-the-art reconstruction methods and depth sensors. Across downstream tasks including imitation learning, sim-to-real transfer, grasp synthesis, and collision-free motion planning, we observe consistent gains in performance, suggesting the promise of this alternative 3D sensing module for robotic manipulation.
comment: Published at Robotics: Science and Systems (RSS) 2026
Multiagent Systems
Distributed Algorithm with Emergent Area Partitioning and Base Station's Situation Awareness for Multi-Robot Patrolling
Patrolling with multiple robots offers efficient surveillance to detect and manage undesired situations. This necessitates improved patrol efficiency and operator situation awareness at base stations. Enhanced situation awareness enables operators to predict robots' behaviors, support recognition and decision-making, and execute emergency interventions. This study presents the Local Reactive and Partition (LR-PT) algorithm, a novel multi-robot patrolling approach. In simulations, LR-PT outperformed existing methods by ensuring frequent patrols of all locations of interest and enhancing the situation awareness of the base station. Robots independently select patrol targets based on locally available information, integrating patrol needs and the urgency of reporting mission progress to the base station into a unified utility function. This locality also contributes to robustness against communication constraints and robot failures, as demonstrated in this research. The algorithm further autonomously emerged the area partition, which can avoid falling into local optima and realize the comprehensive patrol over the whole mission area. The simulation results demonstrated the superior performance of LR-PT for multi-robot patrolling, utilizing the advantages of swarm robotics and addressing real-world operational challenges.
LLM-Foraging: Large Language Models for Decentralized Swarm Robot Foraging
Swarm foraging algorithms, such as the central-place foraging algorithm (CPFA), typically rely on offline parameter optimization using genetic algorithms (GA) or reinforcement learning, yielding policies tightly coupled to a specific combination of team size, arena size, and resource distribution. When deployment conditions change, performance degrades, and retraining is computationally expensive. We propose LLM-Foraging, a decentralized swarm controller that augments the CPFA state machine with a large language model (LLM) tactical decision-maker at three structured decision points, namely post-deposit, central-zone arrival, and search starvation. Each robot runs its own LLM client and queries it using only locally observable state, while the existing CPFA motion and sensing stack executes the selected action. Because the LLM serves as a general decision policy rather than parameters fitted to a single configuration, the controller is training-free at deployment and transfers across configurations without re-optimization. We evaluate LLM-Foraging in Gazebo with TurtleBot3 robots across 36 configurations spanning team sizes of 4 to 10 robots, arena sizes from 6x6 to 10x10 meters, and three resource distributions (clustered, powerlaw, random). LLM-Foraging collects more resources than the GA-tuned CPFA baseline across the evaluated configurations and is more consistent, a property that the GA's single-configuration tuning does not transfer.
HepScript: A Dual-Use DSL for Human-AI Collaborative Data Analysis Workflows in High-Energy Physics
The escalating data scale in High-Energy Physics (HEP) fuels a growing aspiration for higher analytical efficiency. While Large Language Models (LLMs) offer a path toward automation via agentic AI, they struggle with complex scientific workflows that require deep domain knowledge and are tightly coupled to experiment-specific codebases. To address this, we introduce a methodology centered on HepScript, a dual-use Domain-Specific Language (DSL) for HEP data analysis workflows. HepScript serves as a shared formal interface, abstracting HEP analysis logic into a constrained syntax that is both intuitive for human experts and reliably generable by AI agents. First developed for the Beijing Spectrometer III (BESIII) experiment, HepScript hides the complexity of the underlying software stack, translating high-level analysis intent into low-level, production-ready code. In our case studies, this abstraction reduces the required human-written code by 93\%. Crucially, HepScript's constrained grammar defines a tractable action space, enabling AI agents to autonomously generate executable specifications for core analysis stages directly from published literature with a 95\% success rate. Our work demonstrates a scalable pathway toward human-AI collaborative systems, where a formally specified DSL acts as an unambiguous translation layer between human expertise, AI automation, and production environment, rendering previously intractable automation problems solvable.
rAIson: Developing Reliable Decision-Making Agents AAMAS 2026
This paper presents the rAIson platform, a high-level technological environment for the development of automated, reliable and explainable decision-making agents. The research underlying the platform and its technological progress has now reached a mature stage that allows the platform to be used for the development of complex real-life applications without writing a single line of code.
comment: Accepted as demonstration paper for publication at AAMAS 2026
Learning Decentralized LLM Collaboration with Multi-Agent Actor Critic
Recent work has explored optimizing LLM collaboration through Multi-Agent Reinforcement Learning (MARL). However, most MARL fine-tuning approaches rely on predefined execution protocols, which often require centralized execution. Decentralized LLM collaboration is more appealing in practice, as agents can run inference in parallel with flexible deployments. Also, current approaches use Monte Carlo methods for fine-tuning, which suffer from high variance and thus require more samples to train effectively. Actor-critic methods are prevalent in MARL for dealing with these issues, so we developed Multi-Agent Actor-Critic (MAAC) methods to optimize decentralized LLM collaboration. In this paper, we analyze when and why these MAAC methods are beneficial. We propose 2 MAAC approaches, \textbf{CoLLM-CC} with a \textbf{C}entralized \textbf{C}ritic and \textbf{CoLLM-DC} with \textbf{D}ecentralized \textbf{C}ritics. Our experiments across writing, coding, and game-playing domains show that Monte Carlo methods and CoLLM-DC can achieve performance comparable to CoLLM-CC in short-horizon and dense-reward settings. However, they both underperform CoLLM-CC on long-horizon or sparse-reward tasks, where Monte Carlo methods require substantially more samples and CoLLM-DC struggles to converge. Our code is available at https://github.com/OpenMLRL/CoMLRL/releases/tag/v1.3.6.
Separation is Optimal for LQR under Intermittent Feedback
In this work, we first prove that the separation principle holds for communication-constrained LQR problems under i.i.d. zero-mean disturbances with a symmetric distribution. We then solve the dynamic programming problem and show that the optimal scheduling policy is a symmetric threshold rule on the accumulated disturbance since the most recent update, while the optimal controller is a discounted linear feedback law independent of the scheduling policy.
AI-Gram: When Visual Agents Interact in a Social Network
We present AI-Gram, a fully deployed, continuously operating social platform where every participant is an autonomous LLM-driven agent generating and responding to visual content. Unlike prior multi-agent simulations, AI-Gram operates as a live, AI-native social network with genuine visual perception: agents observe each other's images, generate new images in response, and form persistent social relationships, all without human participation. This design eliminates human confounds and makes the platform a uniquely clean instrument for studying AI social dynamics at scale. Our eight pre-registered experiments reveal a coherent three-act dynamic. Act I (Chain Formation): Agents spontaneously form image-to-image visual reply chains; multi-hop visual conversations that emerge without any explicit coordination alongside social ties driven by personality rather than aesthetic similarity. Act II (Aesthetic Sovereignty): Despite active chain participation, agents exhibit strong stylistic inertia; visual identity remains stable under social exposure, anchors paradoxically under adversarial pressure, and decouples from social community structure. Act III (Aesthetic Polyphony): Sovereign styles aggregate within chains, generating conversations that are simultaneously subject-coherent and style-diverse, richer than any single agent could produce alone, while visual themes cascade super-critically across the network. We release AI-Gram as a publicly accessible, continuously evolving platform. https://ai-gram.ai/
Social Dynamics as Critical Vulnerabilities that Undermine Objective Decision-Making in LLM Collectives ACL 2026
Large language model (LLM) agents are increasingly acting as human delegates in multi-agent environments, where a representative agent integrates diverse peer perspectives to make a final decision. Drawing inspiration from social psychology, we investigate how the reliability of this representative agent is undermined by the social context of its network. We define four key phenomena-social conformity, perceived expertise, dominant speaker effect, and rhetorical persuasion-and systematically manipulate the number of adversaries, relative intelligence, argument length, and argumentative styles. Our experiments demonstrate that the representative agent's accuracy consistently declines as social pressure increases: larger adversarial groups, more capable peers, and longer arguments all lead to significant performance degradation. Furthermore, rhetorical strategies emphasizing credibility or logic can further sway the agent's judgment, depending on the context. These findings reveal that multi-agent systems are sensitive not only to individual reasoning but also to the social dynamics of their configuration, highlighting critical vulnerabilities in AI delegates that mirror the psychological biases observed in human group decision-making.
comment: ACL 2026
Systems and Control (EESS)
Hybrid Optimal Control of Homogeneous Epidemiological Compartmental Models with Regime Switching
Optimal intervention design is formulated as a hybrid optimal control problem for multiphase homogeneous epidemiological systems. The system extends a foundational compartmental model through intermediate phases that incorporate work-from-home (WFH) policies and a vaccination protocol, yielding a four-phase hybrid system that captures policy escalation and relaxation. Key characteristics of the resulting hybrid system include (i) phase-dependent continuous dynamics and running costs that respectively capture distinct disease transmission mechanisms and shifting public health socioeconomic trade-offs, (ii) a combination of autonomous and controlled switchings for intervention policies, whose times are co-optimized - whether indirectly via state thresholds or directly as decision variables alongside continuous inputs to minimize the overall cost, and (iii) nontrivial state jump maps that govern transitions between phases with differing state and control space dimensions. The Hybrid Minimum Principle (HMP) is invoked to obtain the optimal solutions. Numerical results demonstrate that coordinating WFH policies with vaccination efforts provides improved mitigation of disease spread compared to single-phase policy interventions.
comment: 14 pages, 7 figures, Preprint submitted to Elsevier
A Measure-Theoretic Formulation of Behavioral Systems
In Willems' behavioral systems theory, a dynamical system is identified with the set of all trajectories compatible with its laws of motion. For nonlinear or stochastic systems, however, the admissible trajectory set is generally nonconvex, obstructing direct optimization over the behavior. In this paper, we lift the behavioral viewpoint from trajectories to probability measures on trajectories by representing a finite-horizon dynamical system with the set of all Borel probability measures supported on its admissible trajectories. This behavioral-measure set is convex and weakly closed even for nonlinear or stochastic dynamics, because convex combinations of trajectory distributions remain dynamically admissible even when convex combinations of trajectories do not. The extreme points are precisely the Dirac masses on individual admissible trajectories, so the classical deterministic theory is embedded as the extremal skeleton of the richer measure-valued object. On this foundation we establish two core deterministic results and outline a stochastic extension based on conditional kernel consistency. First, optimal control for a prescribed initial distribution becomes a linear program over occupation measures whose dual is exactly Bellman's dynamic-programming recursion, with strong duality under compactness and continuity. Second, for controllable linear time-invariant systems under persistency of excitation, we prove a measure-level Fundamental Lemma: every probability measure on the finite-horizon behavior factors through the data Hankel matrix, reducing any optimization over trajectory distributions to an equivalent optimization over coefficient-space distributions. This is an exact data-driven reformulation requiring no model knowledge beyond a single informative trajectory; the classical Fundamental Lemma is recovered as the special case of Dirac measures.
comment: 15 pages, 3 figures, 1 table
A Universal Optimal Control Strategy for a Tailsitter UAV
This work develops a unified optimal control framework for a Quadrotor Biplane tailsitter UAV capable of operating seamlessly across hover, transition, and cruise flight regimes. Although the tailsitter configuration enables mechanically simple mode switching, the transition maneuver remains challenging due to strong nonlinearities and rapidly varying aerodynamics. To address this, a trajectory optimization scheme based on nonlinear programming with direct collocation is formulated, incorporating nonlinear dynamics, actuator limits, and angle-of-attack constraints. The resulting optimal trajectories are safe, reliable, and time-efficient. For the cruise-to-hover maneuver, optimal trajectories are generated over a range of initial cruise velocities and subsequently learned using feedforward multilayer neural networks. The learned model generalizes across operating conditions and enables real-time generation of constraint-satisfying transition trajectories. These trajectories provide both feedforward control inputs and reference state profiles, which are tracked using a Model Predictive Controller (MPC). The MPC eliminates the need for controller switching or gain scheduling across flight envelopes, enabling a single universal controller for hover, transition, and cruise. A nonlinear Dynamic Inversion (DI) controller is also designed for comparison. Two numerical schemes for MPC are implemented and evaluated. Simulation results across all flight modes demonstrate that MPC achieves superior robustness to parameter uncertainties compared to DI. A computational cost analysis further highlights the trade-off between execution time and performance for the different MPC solvers.
Physics Driven Digital Twin Model for Evaluation of GNSS User Receiver Equipment
This paper presents a physics-consistent digital twin framework for end-to-end modeling and evaluation of Global Navigation Satellite Systems (GNSS) user receiver equipment. In contrast to conventional GNSS simulations that rely on predefined signal models, the proposed framework enforces dynamic consistency between satellite ephemerides, user motion, and received signal observables through trajectory-driven injection of code-phase and Doppler dynamics. The GPS L1 C/A signal is synthesized in accordance with the IS-GPS-200 Rev. N specification, with motion-induced effects derived directly from orbital and user kinematics, and augmented by ionospheric and tropospheric delay models. The resulting complex baseband signal is converted to radio frequency using a software-defined radio platform disciplined by an external reference clock, enabling seamless hardware-in-the-loop integration with commercial and software receivers. Validation across static, moderate-motion, and high-dynamics scenarios, including projectile-like trajectories, demonstrates close agreement between truth-model and receiver-estimated code phase, Doppler, and position, as well as strong correspondence between simulated and measured intermediate frequency spectra. The results establish the proposed digital twin as a high-fidelity, repeatable, and physically consistent platform for GNSS receiver evaluation, tracking-loop stress testing, and development of robust navigation algorithms.
Recommender Systems as Control Systems
We propose a control-theoretic interpretation of recommender systems and use this perspective to analyze how fairness interventions shape long-term system behavior. Fairness concerns arise for both users and creators, ranging from opinion polarization and representation bias on the user side to popularity bias on the creator side. A central insight of our analysis is that fairness should not be viewed as a simple trade-off against utility. When optimized over time, it can in fact be beneficial for overall system performance. Realizing these gains, however, requires a clear understanding of the underlying dynamics.
Point-to-Cloud NMPC with Smooth Avoidance Constraints
This paper proposes a finite-horizon optimal control strategy for set-point tracking using a nonlinear model predictive control framework with integrated avoidance capabilities. The formulation employs a smooth point-to-cloud distance metric that ensures continuously differentiable and numerically well-conditioned gradients, even in the presence of regions with complex and nonconvex geometries. This smoothness allows safety constraints to be formulated consistently and differentiably through control barrier functions, resulting in a reliable avoidance behavior for the closed-loop system. Additionally, stationary artificial variables are introduced in the optimal control problem to preserve feasibility under changing set-points. The proposed approach is validated through numerical experiments of an aerial robot, demonstrating accurate tracking and smooth obstacle avoidance in complex environments.
comment: Accepted for publication at the 2026 European Control Conference (ECC 2026)
Quasi-Static Control of Discrete Cosserat Rod
In this paper, we design feedback control laws for soft robots modelled using the Cosserat rod, which is spatially discretised using the Piecewise Constant Strain (PCS) approach. The PCS approach transforms the nonlinear PDEs describing the Cosserat rod to a system of nonlinear ODEs. This simplification results in a model describing soft robots which is similar to the serial rigid-link manipulators. We design feedback control laws for the quasi-static PCS model by using the external end-effector wrench as control input. The control laws are designed based on state-feedback linearisation in strain and task spaces. An extensive set of numerical results demonstrates the performance of the control laws for end-effector trajectory tracking and shape control of soft robots.
comment: Submitted to 17th APCA International Conference on Automatic Control and Soft Computing (CONTROLO 2026)
From Characterization To Construction: Generative Quantum Circuit Synthesis from Gate Set Tomography Data
High-fidelity circuit execution on noisy intermediate-scale quantum devices is bottlenecked by compilation pipelines that disregard complex, correlated noise. To address this, this methodology article proposes a quantum machine learning control (QMLC) framework for generative quantum circuit synthesis from gate-set tomography (GST) data that bypasses the traditional two-step pipeline of characterizing native quantum gates via GST followed by unitary decomposition algorithms. Instead, a generative concept space is directly learnt from GST data, enabling conditional synthesis of quantum circuits on a desired output distribution. Our approach tokenizes GST germ circuits and embeds them into a structured latent space using a curriculum-learning-motivated strategy, starting with short circuits and progressively incorporating longer ones with diverse output statistics. The embedded sequences are processed by a set-vision transformer with permutation-invariant pooling, producing k-seed vectors that represent the learned concept space of the quantum device. Aggregating data across multiple circuits makes this latent representation inherently context-aware, capturing the shared physical noise environment (e.g., crosstalk, drift) that isolated gate metrics miss. We propose an unconditional diffusion model to sample from the concept space. During inference, a user provides a target measurement distribution, and the model generates a corresponding circuit. To ensure fidelity and robustness, the output is denoised using a diffusion model that operates on the target conditional covariance matrix. This end-to-end framework is a step towards context-aware, hardware-native circuit synthesis directly from raw GST data, which offers a new paradigm for integrating quantum control and compilation. The QMLC framework is particularly suited for near-term quantum devices with complex calibration procedures.
comment: 19 pages, 3 figures
Toward a foundational thermal model for residential buildings
The building energy community lacks a foundational thermal model, i.e., a single pretrained model capable of generalizing across diverse buildings, climates, and control strategies without building-specific calibration. Achieving this vision requires architectural principles that capture universal thermal dynamics rather than memorizing building-specific patterns. We take a step toward this goal by presenting a physics-informed transformer architecture that embeds domain knowledge, e.g., derivative enrichment and Euler-based numerical integration, into a decoder-only framework. We incorporate static building features extracted from simulation models and employ Rotary Position Embedding attention to capture temporal dependencies. Evaluated on the CityLearn dataset spanning 247 residential buildings across three climate zones, our model achieves one-step prediction accuracy (RMSE of 0.30°C in Texas, 0.29°C in Vermont) while outperforming both traditional baselines and fine-tuned Time-Series Foundation Models. We also demonstrate zero-shot transferability: models trained on as few as two buildings generalize to unseen buildings and climate zones without fine-tuning. Despite the limitation of simulated residential buildings, our results establish physics-informed architectural principles as a promising foundation for universal building thermal models.
Coordination Architecture Shapes Continuous Demand Response Outcomes in Building Districts
Grid-integrated building districts must provide energy flexibility while preserving occupant comfort and equitable distribution of control burden. We study how coordination architecture influences the ability of building clusters to track aggregated load profiles, comparing four paradigms: centralized model predictive control (MPC), decentralized independent reinforcement learning (SAC), centralized-training-decentralized-execution multi-agent RL (MAPPO), and a hybrid MPC--SAC controller that separates district-level battery optimization from building-level HVAC regulation. A rule-based controller serves as a baseline. We evaluate a 25-building residential district across three metrics: aggregate load tracking, thermal comfort, and spatial variability of control actions. We find that architecture choice determines the trade-off structure. Centralized MPC achieves low tracking bias (8.8% NMBE) but concentrates actuation on a subset of buildings, causing elevated comfort violations (24.8% exceedance) and spatial imbalance. Decentralized RL distributes control effort more evenly but fails to sustain accurate tracking. The hybrid architecture achieves the best balance: accurate tracking (4.8% NMBE), moderate comfort impact (16.8% exceedance), and the lowest spatial variability. These findings demonstrate that architecture choice determines the trade-off structure between tracking and comfort.
Sequentially decoupling estimators for Box-Jenkins model estimation
In this paper, we propose a consistent and asymptotically efficient estimation method for Box-Jenkins (BJ) models that is applicable under both open-loop and closed-loop data conditions, serving as a possible alternative to the weighted null-space fitting approach. The method comprises two stages: an initial sequentially decoupling (SD) estimator, followed by Gauss-Newton (GN) refinement step. The SD estimator is constructed from three sequential least squares (LS) estimators: (i) estimation of a high-order autoregressive model with exogenous inputs (ARX) model; (ii) estimation of the BJ model's dynamic model via an auxiliary output-error (OE) model; and (iii) estimation of the noise model of the BJ model using another auxiliary OE model. We establish the consistency of the SD estimator under standard regularity conditions, leveraging the consistency of the underlying LS estimators for both the ARX and OE models. Moreover, we show that one-step GN iteration starting from the SD estimator yields an estimator that is asymptotically equivalent to the prediction error method, provided the ARX model order satisfies a mild growth condition. Simulation studies confirm the theoretical properties of the proposed method.
Unified Lyapunov Method for ISS of PDEs: A Tutorial on Constructing Generalized Lyapunov Functionals for Parabolic and Hyperbolic Equations
This tutorial provides an overview of the generalized Lyapunov method (GLM) for analyzing input-to-state stability (ISS) of partial differential equations (PDEs). We begin by revisiting the classical Lyapunov method and the standard ISS-Lyapunov theorem, highlighting their limitations when applied to systems with complex boundary disturbances. In contrast, the GLM, based on the concept of generalized Lyapunov functionals (GLFs) that explicitly depend on the external input, offers greater flexibility and efficiency, particularly for PDEs with Dirichlet-type disturbances. The main objective of this tutorial is to demonstrate how to systematically construct GLFs to establish ISS estimates in $L^q$ spaces with any $q\in[2,\infty]$ for different PDEs. Specifically, we consider three representative classes of PDEs: (i) an $N$-dimensional nonlinear parabolic equation with mixed nonlinear boundary disturbances, (ii) a first order nonlinear hyperbolic equation with boundary disturbances, and (iii) a second order linear hyperbolic equation, i.e., a wave equation, with boundary damping and disturbances. For each case, we provide step-by-step constructions of appropriate GLFs and derive explicit ISS estimates, illustrating the general applicability of the GLM. Finally, we discuss open challenges and future directions, including the systematic construction of GLFs for broader classes of PDEs and their applications in controller design.
Toward LEO Satellite Network Systems for Instantaneous Detection of Environmental Changes
The rapid deployment of Low Earth Orbit (LEO) satellite constellations has enabled the emergence of in-orbit edge computing and data centers-interconnected satellites equipped with onboard computing capabilities and high-speed inter-satellite links (ISLs). This paper investigates whether such architectures, integrated with a deep learning-based computer vision pipeline, can achieve sub-minute information freshness suitable for real-time wildfire detection. To evaluate this hypothesis, we develop a simulation framework that models orbital dynamics, distributed processing, and network routing, using Age of Information (AoI) as the primary performance metric. A total of 720 simulation trials are conducted across 12 real-world constellation configurations, including Starlink, Kuiper, Telesat, and OneWeb. The results demonstrate that constellation design has a significant impact on AoI performance, with average AoI values ranging from 66.5 s to over 6300 s. The best-performing configurations achieve an average AoI below 70 s and a peak AoI under 100 s, indicating that orbital edge computing systems can provide the level of timeliness required for near-instantaneous environmental monitoring.
In-Orbit Optical SSA Using Proliferated LEO Satellites for Space Traffic Monitoring: An Analytical Framework
The increase in space activities has increased the risks of space debris generation, affecting space safety and sustainability. Traditional space situational awareness (SSA) relies on single star trackers and ground-based tracking facilities. There is limited discussion on the use of in-orbit optical sensors on low Earth orbit (LEO) satellite constellations for SSA, despite their importance for efficient space traffic management systems. In this paper, we aim to address this important challenge. We first present a new analytical system model for utilizing LEO satellite constellations for in-orbit SSA. We then develop a method to evaluate and analyze such a system. We also propose a Poisson expected revisit period algorithm and introduce the period of equivalent orbital distributions to reveal the relationship between revisit period and geometric variables, with insightful results based on real-world and custom satellite constellations. Experiments on real-world constellation show that the representative Poisson expected revisit period ranges from 0.4 days to 5.7 days for targets whose apogee altitude ranges from 552 km to 650 km, while requiring a per-case computation time of 0.4 s to 4.8 s. Our work can inform the future design of in-orbit and onboard computing systems for SSA, such as space object detection and space traffic monitoring systems.
Limiting the Impact of AI Data Centers on Fatigue Life of Thermal Turbine Generators in the Grid: A Frequency-Domain Approach
A framework is established that assesses the impact of variations in artificial intelligence (AI) data center (DC) loads on the fatigue damage of steam/gas turbines of the synchronous generators (SGs) from torsional oscillations. Next, a simple three-step process that is supported by frequency-domain analysis is laid out to quantify the limits on fluctuations in AI DC loads. In the first step, the maximum allowable variation in electrical power output at each SG terminal is independently determined from the first principles. This step needs only a lumped multi-mass model of the mechanical side of the SG. In the second step, we propose a new approach that relies on load flow to determine the so-called algebraic `interaction factor' that maps the change in AI DC load at a given bus to the corresponding change in each of the SG power outputs. In the third step, we propose a screening method to rank the candidate buses to site AI DCs and solve an optimization problem to determine the optimal allowable fluctuations in the AI DCs. We demonstrate the applicability of the proposed approach through frequency-domain and time-domain analyses in the modified IEEE 4-machine and IEEE-68 bus systems using a dynamic phasor framework. Finally, we demonstrate the scalability of the proposed approach on the synthetic 2000-bus Texas system.
DRL-based Power Allocation in LiDAL-Assisted RLNC-NOMA OWC Systems
Non-orthogonal multiple access (NOMA) is a promising technique for optical wireless communication (OWC), enabling multiple users to share the optical spectrum simultaneously through the power domain. However, imperfect channel state information (CSI) and residual decoding errors deteriorate NOMA performance, especially in realistic dense-user indoor scenarios. In this work, we model an OWC system that integrates light detection and localization (LiDAL) and random linear network coding (RLNC) within a NOMA framework. LiDAL exploits spatio-temporal information to improve user CSI, while RLNC enhances data resilience in the successive decoding process, resulting in a LiDAL-assisted RLNC-NOMA OWC system. Power allocation (PA) is crucial in this system due to complex interactions between multiple users and the coding and detection processes, but optimizing continuous PA dynamically can be computationally prohibitive. To address this, we adopt a deep reinforcement learning (DRL) framework to efficiently learn near-optimal PA strategies. In particular, a DRL-based normalized advantage function (NAF) algorithm is proposed to maximize the average sum rate, and its performance is compared to deep deterministic policy gradient (DDPG), gain ratio PA (GRPA), and exhaustive search. The results indicate that NAF closely matches exhaustive search, is 39% faster than DDPG, and improves the average sum rate by 4.6% over GRPA, while accounting for user location estimation errors.
A Stochastic Fundamental Lemma with Reduced Disturbance Data Requirements
Recently, the fundamental lemma by Willems et al. has been extended towards stochastic LTI systems subject to process disturbances. Using this lemma requires previously recorded data of inputs, outputs, and disturbances. In this paper, we exploit causality concepts of stochastic control to propose a variant of the stochastic fundamental lemma that does not require past disturbance data in the Hankel matrices. Our developments rely on polynomial chaos expansions and on the knowledge of the disturbance distribution. Similar to our previous results, the proposed variant of the fundamental lemma allows to predict future input-output trajectories of stochastic LTI systems. We draw upon a numerical example to illustrate the proposed variant in data-driven control context.
Separation is Optimal for LQR under Intermittent Feedback
In this work, we first prove that the separation principle holds for communication-constrained LQR problems under i.i.d. zero-mean disturbances with a symmetric distribution. We then solve the dynamic programming problem and show that the optimal scheduling policy is a symmetric threshold rule on the accumulated disturbance since the most recent update, while the optimal controller is a discounted linear feedback law independent of the scheduling policy.
Feedback control of Lagrange multipliers for non-smooth constrained optimization
In this work, we develop a control-theoretic framework for constrained optimization problems with composite objective functions including non-differentiable terms. Building on the proximal augmented Lagrangian formulation, we construct a plant whose equilibria correspond to the stationary points of the optimization problem. Within this framework, we propose two control strategies - a static controller and a dynamic controller - leading to two novel optimization algorithms. We provide a theoretical analysis, establishing global exponential convergence under strong convexity assumptions. Finally, we demonstrate the effectiveness of the proposed methods through numerical experiments, benchmarking their performance against state-of-the-art approaches.
Stability and Sensitivity Analysis for Objective Misspecifications Among Model Predictive Game Controllers
Model-based multi-agent control requires agents to possess a model of the behavior of others to make strategic decisions. Solution concepts from game theory are often used to model the emergent collective behavior of self-interested agents and have found active use in multi-agent control design. Model predictive games are a class of controllers in which an agent iteratively solves a finite-horizon game to predict the behavior of a multi-agent system and synthesize their own control action. When multiple agents implement these types of controllers, there may exist misspecifications in the respective game models embedded in their controllers, stemming from inaccurate estimates or conjectures of other agents' objectives. This paper analyzes the resulting prediction misalignments and their effects on the system's behavior. We provide criteria for the stability of multi-agent dynamic systems with heterogeneous model predictive game controllers, and quantify the sensitivity of the equilibria to individual agents' game parameters.
A two-disk approach to the synthesis of coherent passive equalizers for linear quantum systems
The coherent equalization problem consists in designing a quantum system acting as a mean-square near-optimal filter for a given quantum communication channel. The paper develops an improved method for the synthesis of transfer functions for such equalizing filters, based on a linear quantum system model of the channel. The method draws on a connection with the two-disk problem of ${H}_{\infty}$ control for classical (i.e., non-quantum) linear uncertain systems. Compared with the previous methods, the proposed method applies to a broader class of linear quantum communication channels.
comment: 20 pages, 9 figures
Green Energy Management for Sustainable Data Centers Using Deep Reinforcement Learning
The exponential growth of digital services has positioned data centers among the most energy-intensive infrastructures in the modern economy, raising critical concerns regarding operational costs, carbon emissions, and the sustainable integration of renewable energy sources. This paper proposes a novel Deep Reinforcement Learning (DRL)-based energy management framework for data centers, designed to dynamically coordinate solar photovoltaic generation, wind power, battery storage systems, and conventional grid electricity under highly stochastic operational conditions. The proposed framework formulates the energy management problem as a Markov Decision Process and employs a Proximal Policy Optimization (PPO) agent augmented with a hybrid Long Short-Term Memory and temporal attention architecture, enabling accurate modeling of workload dynamics and renewable generation variability. A multi-objective reward function jointly minimizes energy costs, carbon emissions, and service-level agreement (SLA) violations while promoting efficient storage utilization. Extensive experiments conducted on three datasets demonstrate that the proposed framework achieves a 38\% reduction in energy costs compared to rule-based heuristics and outperforms the strongest DRL baseline by 4.6\%, while maintaining an SLA violation rate as low as 1.5\% and an energy efficiency of 83.7\%. Ablation studies confirm the individual contribution of each architectural component, and hyperparameter sensitivity analysis validates the robustness of the approach across a range of configurations.
Robotics
Learning to Race in Minutes: Infoprop Dyna on the Mini Wheelbot
Reinforcement Learning (RL) has the potential to enable robots with fast, nonlinear, and unstable dynamics to reach the limits of their performance. However, most recent advances rely on carefully designed physics-based simulators and domain randomization to achieve successful sim-to-real transfer within reasonable wall-clock time. In this work, we bypass the need for such simulators and demonstrate that Infoprop Dyna, a state-of-the-art uncertainty-aware model-based reinforcement learning (MBRL) framework, can enable robots to learn directly from real-world interactions. Using Infoprop Dyna, the Mini Wheelbot, an underactuated unicycle robot, learns to race around a track within 11 minutes of real-world experience.
comment: Originally submitted to the German Robotics Conference, 2026
Online Safety Filter for Deformable Object Manipulation with Horizon Agnostic Neural Operators
Safety critical control of robotic manipulation tasks involving deformable media such as fluids, cloth, and soft objects remains challenging because existing learning based approaches encode safety indirectly through reward shaping, which provides no guarantee of constraint satisfaction at deployment. We present a constraint driven online safety filter for deformable object manipulation that enforces explicit task level safety constraints in real time by minimally modifying any nominal control policy. Our approach combines two key components: a horizon agnostic neural operator that learns the boundary input output mapping of the underlying PDE dynamics and generalizes across variable rollout lengths without retraining, and a boundary control barrier function that certifies safety at the task relevant output level via a lightweight quadratic program. The resulting safety constraint is affine in the boundary input rate, enabling real time online filtering. We evaluate the proposed method on fluid manipulation tasks in FluidLab, where the filter improves safe trajectory rates by up to 22% over unfiltered base policies while also reducing the number of steps required to reach the safe set, demonstrating that constraint driven safety enforcement is both more reliable and more efficient than reward shaping approaches.
Value Functions for Temporal Logic: Optimal Policies and Safety Filters
While Bellman equations for basic reach, avoid, and reach-avoid problems are well studied, the relationship between value optimality and policy optimality becomes subtle in the undiscounted infinite-horizon setting, particularly for more complicated tasks. Greedily maximizing the Q-function can produce policies that indefinitely defer task completion for reach-avoid problems, or equivalently, Until specifications, even when the value function is optimal. Building upon recent results decomposing the value function for temporal logic (TL) into a graph of constituent value functions, we construct non-Markovian policies based on state history that avoid this pathology and prove their optimality with respect to the quantitative robustness score for nested Until, Globally, and Globally-Until specifications. We further show how the Q function can serve as a safety filter for complex TL specifications, extending prior results beyond simple avoid or reach-avoid tasks.
Separation Assurance between Heterogeneous Fleets of Small Unmanned Aerial Systems via Multi-Agent Reinforcement Learning
In the envisioned future dense urban airspace, multiple companies will operate heterogeneous fleets of small unmanned aerial systems (sUASs), where each fleet includes several homogeneous aircraft with identical policies and configurations, e.g., equipage, sensing, and communication ranges, making tactical deconfliction highly complex for the aircraft. This paper aims to address two core questions: (1) Can tactical deconfliction policies converge or reach an equilibrium to ensure a conflict-free airspace when companies operate heterogeneous fleets of homogeneous aircraft? (2) If so, will the converged policies discriminate against companies operating sUASs with weaker configurations? We investigate a multi-agent reinforcement learning paradigm in which homogeneous aircraft within heterogeneous fleets operate concurrently to perform package delivery missions over Dallas, Texas, USA. An attention-enhanced Proximal Policy Optimization-based Advantage Actor-Critic (PPOA2C) framework is employed to resolve intra- and inter-fleet conflicts, with each fleet independently training its own policy while preserving privacy. Experimental results show that two fleets with distinct, shared PPOA2C policies can reach an equilibrium to maintain safe separation. While two PPOA2C policies outperform two strong rule-based baselines in terms of conflict resolution, a PPOA2C policy exhibits safer interaction with a rule-based policy, indicating adaptive capabilities of PPOA2C policies. Furthermore, we conducted extensive policy-configuration evaluations, which reveal that equilibria between similar policy types tend to favor fleets with stronger configurations. Even under similar configurations but different policy types, the equilibrium favors one of the heterogeneous policies, underscoring the need for fairness-aware conflict management in heterogeneous sUAS operations.
comment: 8 pages, 3 figure, 1 table
Ablation Study of Multimodal Perception, Language Grounding, and Control for Human-Robot Interaction in an Object Detection and Grasping Task
This manuscript extends our previous multimodal human-robot interaction system by introducing a controlled ablation study of the three modules that most strongly influence end-to-end performance: the large language model used for action extraction, the perception system used for visual grounding, and the controller used for motion execution. The goal is not to redesign the full pipeline, but to isolate the contribution of each component under a common experimental protocol and then evaluate the best combinations end-to-end. We therefore compare three language models, five perception configurations, and three controllers, followed by a second-stage factorial study over the best candidates. The resulting analysis is intended to clarify which choices primarily affect execution time, which primarily affect success rate, and where the largest engineering gains are likely to come from in future revisions of the system.
comment: 10 pages
Affordance Agent Harness: Verification-Gated Skill Orchestration
Affordance grounding requires identifying where and how an agent should interact in open-world scenes, where actionable regions are often small, occluded, reflective, and visually ambiguous. Recent systems therefore combine multiple skills (e.g., detection, segmentation, interaction-imagination), yet most orchestrate them with fixed pipelines that are poorly matched to per-instance difficulty, offer limited targeted recovery from intermediate errors, and fail to reuse experience from recurring objects. These failures expose a systems problem: test-time grounding must acquire the right evidence, decide whether that evidence is reliable enough to commit, and do so under bounded inference cost without access to labels. We propose Affordance Agent Harness, a closed-loop runtime that unifies heterogeneous skills with an evidence store and cost control, retrieves episodic memories to provide priors for recurring categories, and employs a Router to adaptively select and parameterize skills. An affordance-specific Verifier then gates commitments using self-consistency, cross-scale stability, and evidence sufficiency, triggering targeted retries before a final judge fuses accumulated evidence and trajectories into the prediction. Experiments on multiple affordance benchmarks and difficulty-controlled subsets show a stronger accuracy-cost Pareto frontier than fixed-pipeline baselines, improving grounding quality while reducing average skill calls and latency. Project page: https://tenplusgood.github.io/a-harness-page/.
comment: 43 pages, 22 figures, 8 tables. Ongoing work
Paired-CSLiDAR: Height-Stratified Registration for Cross-Source Aerial-Ground LiDAR Pose Refinement
We introduce Paired-CSLiDAR (CSLiDAR), a cross-source aerial-ground LiDAR benchmark for single-scan pose refinement: refining a ground-scan pose within a 50 m-radius aerial crop. The benchmark contains 12,683 ground-aerial pairs across 6 evaluation sites and per-scan reference 6-DoF alignments for sub-meter root-mean-square error (RMSE) evaluation. Because aerial scans capture rooftops and canopy while ground scans capture facades and under-canopy, the two modalities share only a fraction of their geometry, primarily the terrain surface, causing standard registration methods and learned correspondence models to converge to metrically incorrect local minima. We propose Residual-Guided Stratified Registration (RGSR), a training-free, geometry-only refinement pipeline that exploits the shared ground plane through height-stratified ICP, reversed registration directions, and confidence-gated accept-if-better selection. RGSR achieves 86.0% S@0.75 m and 99.8% S@1.0 m on the primary benchmark of 9,012 scans, outperforming both the confidence-gated cascade at 83.7% and GeoTransformer at 76.3%. We validate RMSE-based pose selection with independent survey control and trajectory consistency, and show that added Fourier-Mellin BEV proposals can reduce RMSE while increasing actual pose error under extreme partial overlap. The dataset and code are being prepared for public release.
comment: 8 pages, 4 figures. Dataset and code are being prepared for public release
Recovering Hidden Reward in Diffusion-Based Policies ICML 2026
This paper introduces EnergyFlow, a framework that unifies generative action modeling with inverse reinforcement learning by parameterizing a scalar energy function whose gradient is the denoising field. We establish that under maximum-entropy optimality, the score function learned via denoising score matching recovers the gradient of the expert's soft Q-function, enabling reward extraction without adversarial training. Formally, we prove that constraining the learned field to be conservative reduces hypothesis complexity and tightens out-of-distribution generalization bounds. We further characterize the identifiability of recovered rewards and bound how score estimation errors propagate to action preferences. Empirically, EnergyFlow achieves state-of-the-art imitation performance on various manipulation tasks while providing an effective reward signal for downstream reinforcement learning that outperforms both adversarial IRL methods and likelihood-based alternatives. These results show that the structural constraints required for valid reward extraction simultaneously serve as beneficial inductive biases for policy generalization. The code is available at https://github.com/sotaagi/EnergyFlow.
comment: Accepted by ICML 2026
Robust Fusion of Object-Level V2X for Learned 3D Object Detection
Perception for automated driving is largely based on onboard environmental sensors, such as cameras and radar, which are cost-effective but limited by line-of-sight and field-of-view constraints. These inherent limitations may cause onboard perception to fail under occlusions or poor visibility conditions. In parallel, cooperative awareness via vehicle-to-everything (V2X) communication is becoming increasingly available, enabling vehicles and infrastructure to share their own state as object-level information that complements onboard perception. In this work, we study how such V2X information can be integrated into 3D object detection and how robust the resulting system is to realistic V2X imperfections. Using the nuScenes dataset, we emulate object-level cooperative awareness messages from ground truth, injecting controlled noise and object dropout to mimic real-world conditions such as latency, localization errors, and low V2X penetration rates. We convert these messages into a dedicated bird's-eye view (BEV) input and fuse them into a BEVFusion-style detector. Our results demonstrate that while object-level cooperative information can substantially improve detection performance, achieving an NDS of 0.80 under favorable conditions, models trained on idealized data become fragile and over-reliant on V2X. Conversely, our proposed noise-aware training strategy, coupled with explicit confidence encoding, enhances robustness, maintaining performance gains even under severe noise and reduced V2X penetration.
comment: Accepted at IEEE VTC 2026-Spring, 7 pages
Linking Behaviour and Perception to Evaluate Meaningful Human Control over Partially Automated Driving
Partial driving automation creates a tension: drivers remain legally responsible for vehicle behaviour, yet their active control is significantly reduced. This reduction undermines the engagement and sense of agency needed to intervene safely. Meaningful human control (MHC) has been proposed as a normative framework to address this tension. However, empirical methods for evaluating whether existing systems actually provide MHC remain underdeveloped. In this study, we investigated the extent to which drivers experience MHC when interacting with partially automated driving systems. Twenty-four drivers completed a simulator study involving silent automation failures under two modes - haptic shared control (HSC) and traded control (TC). We derived behavioural metrics from telemetry data, subjective perception scores from post-trial surveys and used them to test hypothesised relations between them derived from the properties of systems under MHC. The confirmatory analysis showed a significant negative correlation between the perception of the automated vehicle (AV) understanding the driver and conflict in steering torques. An exploratory analysis also revealed a surprising positive correlation between reaction times and the perception of sufficient control. Qualitative feedback from open-ended post-experiment questionnaires revealed that mismatches in intentions between the driver and automation, lack of safety, and resistance to driver inputs contribute to the reduction of perceived MHC, while subtle haptic guidance aligned with driver intent had a positive effect. These findings suggest that future designs should prioritise effortless driver interventions, transparent communication of automation intent, and context-sensitive authority allocation to strengthen meaningful human control in partially automated driving.
High-Speed Vision Improves Zero-Shot Semantic Understanding of Human Actions
Understanding human actions from visual observations is essential for human--robot interaction, particularly when semantic interpretation of unfamiliar or hard-to-annotate actions is required. In scenarios such as rapid and less common activities, collecting sufficient labeled data for supervised learning is challenging, making zero-shot approaches a practical alternative for semantic understanding without task-specific training. While recent advances in large-scale pretrained models enable such zero-shot reasoning, the impact of temporal resolution, especially for rapid and fine-grained motions, remains underexplored. In this study, we investigate how temporal resolution affects zero-shot semantic understanding of high-speed human actions. Using kendo as a representative case of rapid and subtle motion patterns, we propose a training-free pipeline that combines a pre-trained video-language model for semantic representation with large language model-based reasoning for pairwise action comparison. Through controlled experiments across multiple frame rates (120 Hz, 60 Hz, and 30 Hz), we show that higher temporal resolution significantly improves semantic separability in zero-shot settings. We further analyze the role of tracking-based human joint information under both full and partial observation scenarios. Quantitative evaluation using a nearest-class prototype strategy demonstrates that high-speed video provides more stable and interpretable semantic representations for fast actions. These findings highlight the importance of temporal resolution in training-free action recognition and suggest that high-speed perception can enhance semantic understanding capabilities.
MSACT: Multistage Spatial Alignment for Stable Low-Latency Fine Manipulation
Real-world fine manipulation, particularly in bimanual manipulation, typically requires low-latency control and stable visual localization, while collecting large-scale data is costly and limited demonstrations may lead to localization drift. Existing approaches make different trade-offs: action-chunking policies such as ACT enable low-latency execution and data efficiency but rely on dense visual features without explicit spatial consistency, generative methods such as Diffusion Policy improve expressiveness but can incur iterative sampling latency, vision-language-action and voxel-based methods enhance generalization and geometric grounding but require higher computational cost and system complexity. We introduce a multistage spatial attention module that extracts stable 2D attention points and jointly predicts future attention sequences with a temporal alignment loss. Built upon ACT with a pretrained ResNet visual prior, a multistage attention module extracts task-relevant 2D attention points as a local spatial modality for action prediction. To maintain consistent object tracking, we introduce a self-supervised objective that aligns predicted attention sequences with visual features from future frames, suppressing drift without keypoint annotations and improving stability of the vision-to-action mapping under limited data. Experiments on simulated and real-world fine manipulation tasks, conducted on the ALOHA bimanual platform, evaluate task success, attention drift, inference latency, and robustness to visual disturbances. Results indicate improvements in localization stability and task performance while maintaining low-latency inference under the tested conditions.
comment: 8 pages, 6 figures
Stereo Multistage Spatial Attention for Real-Time Mobile Manipulation Under Visual Scale Variation and Disturbances
Robots operating in open, unstructured real-world environments must rely on onboard visual perception while autonomously moving across different locations. Continuous changes in onboard camera viewpoints cause significant visual scale variations in target objects, affecting vision-based motion generation. In this work, we present a stereo multistage spatial attention-based deep predictive learning method for real-time mobile manipulation. The proposed methods extracts task-relevant spatial attention points from stereo images and integrates them with robot states through a hierarchical recurrent architecture for closed-loop action prediction. We evaluate the system on four real-world mobile manipulation tasks using a mobile manipulator, including rigid placement, articulated object manipulation, and deformable object interaction. Experiments under randomized initial positions and visual disturbance conditions demonstrate improved robustness and task success rates compared to representative imitation learning and vision-language-action baselines under identical control settings. The results indicate that structured stereo spatial attention combined with predictive temporal modeling provides an effective solution within the evaluated mobile manipulation scenarios.
comment: 8 pages, 10 figures
ARIS: Agentic and Relationship Intelligence System for Social Robots
Foundational models have advanced social robotics, enabling richer perception and communicative interaction with users. However, current systems still struggle with multi-turn engagement, social-relationship reasoning, and contextually grounded dialogue at scale. We present ARIS (Agentic and Relationship Intelligence System), an agentic AI framework that unifies multimodal reasoning, a graph-based Social World Model, and retrieval-augmented generation (RAG) within a single modular architecture for social robots. We evaluate ARIS with the Pepper robot in a robot-mediated dyadic conversational setting, comparing it against a large language model baseline. A user study (N=23) shows that ARIS yields significantly higher perceived intelligence, animacy, anthropomorphism, and likeability. Our contributions are threefold: (1)~a Social World Model that explicitly maps and updates social relationships between users through a knowledge graph, enabling social reasoning and re-identification across encounters; (2)~an efficient RAG-based conversational pipeline that maintains bounded latency as dialogue histories grow to thousands of exchanges while preserving response relevance; and (3)~system integration and empirical validation of these components within a modular agentic architecture that coordinates speech, vision, and physical action through structured APIs. The implementation of ARIS will be released as open source upon publication.
Thinking in Text and Images: Interleaved Vision--Language Reasoning Traces for Long-Horizon Robot Manipulation
Long-horizon robotic manipulation requires plans that are both logically coherent and geometrically grounded. Existing Vision-Language-Action policies usually hide planning in latent states or expose only one modality: text-only chain-of-thought encodes causal order but misses spatial constraints, while visual prediction provides geometric cues but often remains local and semantically underconstrained. We introduce Interleaved Vision--Language Reasoning (IVLR), a policy framework built around \trace{}, an explicit intermediate representation that alternates textual subgoals with visual keyframes over the full task horizon. At test time, a single native multimodal transformer self-generates this global semantic-geometric trace from the initial observation and instruction, caches it, and conditions a closed-loop action decoder on the trace, original instruction, and current observation. Because standard robot datasets lack such traces, we construct pseudo-supervision by temporally segmenting demonstrations and captioning each stage with a vision-language model. Across simulated benchmarks for long-horizon manipulation and visual distribution shift, \method{} reaches 95.5\% average success on LIBERO, including 92.4\% on LIBERO-Long, and 59.4\% overall success on SimplerEnv-WidowX. Ablations show that both modalities are necessary: without traces, LIBERO-Long success drops to 37.7\%; text-only and vision-only traces reach 62.0\% and 68.4\%, while the full interleaved trace reaches 92.4\%. Stress tests with execution perturbations and masked trace content show moderate degradation, suggesting that the trace can tolerate local corruption and moderate execution drift, but remains limited under stale or incorrect global plans.
Learning while Deploying: Fleet-Scale Reinforcement Learning for Generalist Robot Policies
Generalist robot policies increasingly benefit from large-scale pretraining, but offline data alone is insufficient for robust real-world deployment. Deployed robots encounter distribution shifts, long-tail failures, task variations, and human correction opportunities that fixed demonstration datasets cannot fully capture. We present Learning While Deploying (LWD), a fleet-scale offline-to-online reinforcement learning framework for continual post-training of generalist Vision-Language-Action (VLA) policies. Starting from a pretrained VLA policy, LWD closes the loop between deployment, shared physical experience, policy improvement, and redeployment by using autonomous rollouts and human interventions collected across a robot fleet. To stabilize learning from heterogeneous, sparse-reward fleet data, LWD combines Distributional Implicit Value Learning (DIVL) for robust value estimation with Q-learning via Adjoint Matching (QAM) for policy extraction in flow-based VLA action generators. We validate LWD on a fleet of 16 dual-arm robots across eight real-world manipulation tasks, including semantic grocery restocking and 3--5 minute long-horizon tasks. A single generalist policy improves as fleet experience accumulates, reaching an average success rate of 95%, with the largest gains on long-horizon tasks.
comment: No
Physically Native World Models: A Hamiltonian Perspective on Generative World Modeling
World models have recently re-emerged as a central paradigm for embodied intelligence, robotics, autonomous driving, and model-based reinforcement learning. However, current world model research is often dominated by three partially separated routes: 2D video-generative models that emphasize visual future synthesis, 3D scene-centric models that emphasize spatial reconstruction, and JEPA-like latent models that emphasize abstract predictive representations. While each route has made important progress, they still struggle to provide physically reliable, action-controllable, and long-horizon stable predictions for embodied decision making. In this paper, we argue that the bottleneck of world models is no longer only whether they can generate realistic futures, but whether those futures are physically meaningful and useful for action. We propose \emph{Hamiltonian World Models} as a physically grounded perspective on world modeling. The key idea is to encode observations into a structured latent phase space, evolve the latent state through Hamiltonian-inspired dynamics with control, dissipation, and residual terms, decode the predicted trajectory into future observations, and use the resulting rollouts for planning. We discuss how Hamiltonian structure may improve interpretability, data efficiency, and long-horizon stability, while also noting practical challenges in real-world robotic scenes involving friction, contact, non-conservative forces, and deformable objects.
MiniVLA-Nav v1: A Multi-Scene Simulation Dataset for Language-Conditioned Robot Navigation
We present MiniVLA-Nav v1, a simulation dataset for Language-Conditioned Object Approach (LCOA) navigation: given a short natural-language instruction, an NVIDIA Nova Carter differential-drive robot must navigate to the named object and stop within 1 m across four photorealistic Isaac Sim environments (Office, Hospital, Full Warehouse, and Warehouse with Multiple Shelves). Each of the 1,174 episodes pairs an instruction with synchronized 640x640 RGB images, metric depth maps (float32, metres), and instance segmentation masks, together with continuous (v,omega) and 7x7 tokenized expert action labels recorded at 60 Hz from a vision-based proportional controller. Trajectory diversity is ensured through three spawn-distance tiers (near: 1.5-3.5 m, mid: 3.5-7.0 m, far: global curated points; Pearson r=0.94 between spawn distance and trajectory length), 12 object categories, 18 training templates, and 12 paraphrase-OOD templates. Five evaluation splits support in-distribution accuracy, template-paraphrase robustness, and OOD object-category benchmarking. The dataset is publicly available at https://huggingface.co/datasets/alibustami/miniVLA-Nav
comment: 9 pages, 12 figures, 7 tables. Dataset paper
Kinematic Kitbashing
We introduce Kinematic Kitbashing, an optimization framework that synthesizes articulated 3D objects by assembling reusable parts conditioned on an abstract kinematic graph. Given the graph and a library of articulated parts, our method optimizes per-part similarity transformations that place, orient, and scale each component into a coherent articulated object; optional graph edits further enable novel assemblies beyond the prescribed connectivity. Central to our method is an exemplar-based analogy for part placement: each reused component is paired with a single source asset that exemplifies how it attaches to its parent. We capture this attachment context using vector distance fields and measure consistency by integrating the matching error over the joint's full motion range. This yields a kinematics-aware attachment energy that favors placements that preserve the exemplar's local attachment neighborhood throughout articulation. To incorporate task-level functionality, we use this attachment energy as a prior in an annealed Langevin sampling framework, enabling gradient-free optimization of black-box functionality objectives. We demonstrate the versatility of kinematic kitbashing across diverse applications, including instantiating kinematic graphs from user-selected or automatically retrieved parts, synthesizing assemblies with user-defined functionality, and re-targeting articulations via graph edits.
Training-Free Adaptive 360-degree Video Streaming via Semantic Potential Fields
Adaptive 360° video streaming for teleoperation faces two coupled challenges: viewport prediction under uncertain gaze patterns and bitrate adaptation over fluctuating wireless channels. While Deep Reinforcement Learning (DRL) methods achieve high Quality of Experience (QoE), their lack of interpretability and dependence on offline training limit deployment in safety-critical systems. We propose OrbitStream, a training-free framework that formulates viewport prediction as a Gravitational Viewport Prediction (GVP) problem, where semantic objects generate potential fields that attract operator gaze, and employs a Saturation-Based Proportional-Derivative (PD) Controller for buffer regulation. On object-rich teleoperation traces, OrbitStream achieves 94.7% zero-shot viewport prediction accuracy without user-specific profiling, approaching trajectory-extrapolation baselines (~98.5%). Across 3,600 Monte Carlo simulations, it ranks second among 12 algorithms (QoE 2.71 vs. BOLA-E's 2.80), outperforming FastMPC (1.84), with 1.01 ms decision latency and minimal rebuffering.
comment: We are pleased to announce that this paper has been accepted by the 35th International Conference on Computer Communications and Networks (ICCCN 2026). We appreciate the valuable feedback from the reviewers and look forward to sharing our findings with the community
QuadPiPS: A Perception-informed Footstep Planner for Quadrupeds With Semantic Affordance Prediction
This work proposes QuadPiPS, a perception-informed framework for quadrupedal foothold planning in the perception space. QuadPiPS employs a novel ego-centric local environment representation, known as the legged egocan, that is extended here to capture unique legged affordances through a joint geometric and semantic encoding that supports local motion planning and control for quadrupeds. QuadPiPS takes inspiration from the Augmented Leafs with Experience on Foliations (ALEF) planning framework to partition the foothold planning space into its discrete and continuous subspaces. To facilitate real-world deployment, QuadPiPS broadens the ALEF approach by synthesizing perception-informed, real-time, and kinodynamically-feasible reference trajectories through search and trajectory optimization techniques. To support deliberate and exhaustive searching, QuadPiPS over-segments the egocan floor via superpixels to provide a set of planar regions suitable for candidate footholds. Nonlinear trajectory optimization methods then compute swing trajectories to transition between selected footholds and provide long-horizon whole-body reference motions that are tracked under model predictive control and whole body control. Benchmarking with the ANYmal C quadruped across ten simulation environments and five baselines reveals that QuadPiPS excels in safety-critical settings with limited available footholds. Real-world validation on the Unitree Go2 quadruped equipped with a custom computational suite demonstrates that QuadPiPS enables terrain-aware locomotion on hardware.
comment: Under review. Project site: https://quadpips.github.io/
STARRY: Spatial-Temporal Action-Centric World Modeling for Robotic Manipulation
Robotic manipulation requires reasoning about future spatial-temporal interactions and geometric constraints, yet existing Vision-Language-Action (VLA) policies often leave predictive representation weakly coupled with action execution, causing failures in tasks requiring precise spatial-temporal coordination. We propose STARRY, a world-model-enhanced action-generation policy that aligns spatial-temporal prediction and action generation by jointly denoising future spatial-temporal latents and actions through a unified diffusion process. To bridge 2D visual tokens and 3D metric control, STARRY introduces Geometry-Aware Selective Attention Modulation (GASAM), which converts predicted depth and end-effector geometry into token-aligned weights for selective action-attention modulation. On RoboTwin 2.0, STARRY achieves 93.82% / 93.30% average success under Clean and Randomized settings across 50 bimanual tasks. Real-world experiments show that STARRY improves average success from 42.5% to 70.8% compared with $π_{0.5}$. These results demonstrate the effectiveness of action-centric spatial-temporal world modeling for spatially and temporally demanding robotic manipulation.
comment: 19 pages
Causality-enhanced Decision-Making for Autonomous Mobile Robots in Dynamic Environments
The growing integration of robots in shared environments-such as warehouses, shopping centres, and hospitals-demands a deep understanding of the underlying dynamics and human behaviours, including how, when, and where individuals engage in various activities and interactions. This knowledge goes beyond simple correlation studies and requires a more comprehensive causal analysis. By leveraging causal inference to model cause-and-effect relationships, we can better anticipate critical environmental factors and enable autonomous robots to plan and execute tasks more effectively. To this end, we propose a novel causality-based decision-making framework that reasons over a learned causal model to assist the robot in deciding when and how to complete a given task. In the examined use case-i.e., a warehouse shared with people-we exploit the causal model to estimate battery usage and human obstructions as factors influencing the robot's task execution. This reasoning framework supports the robot in making informed decisions about task timing and strategy. To achieve this, we developed also PeopleFlow, a new Gazebo-based simulator designed to model context-sensitive human-robot spatial interactions in shared workspaces. PeopleFlow features realistic human and robot trajectories influenced by contextual factors such as time, environment layout, and robot state, and can simulate a large number of agents. While the simulator is general-purpose, in this paper we focus on a warehouse-like environment as a case study, where we conduct an extensive evaluation benchmarking our causal approach against a non-causal baseline. Our findings demonstrate the efficacy of the proposed solutions, highlighting how causal reasoning enables autonomous robots to operate more efficiently and safely in dynamic environments shared with humans.
comment: Causal Discovery and Inference - Robot Autonomy - Human-Robot Spatial Interaction - Decision-Making
VLBiMan: Vision-Language Anchored One-Shot Demonstration Enables Generalizable Bimanual Robotic Manipulation ICLR 2026
Achieving generalizable bimanual manipulation requires systems that can learn efficiently from minimal human input while adapting to real-world uncertainties and diverse embodiments. Existing approaches face a dilemma: imitation policy learning demands extensive demonstrations to cover task variations, while modular methods often lack flexibility in dynamic scenes. We introduce VLBiMan, a framework that derives reusable skills from a single human example through task-aware decomposition, preserving invariant primitives as anchors while dynamically adapting adjustable components via vision-language grounding. This adaptation mechanism resolves scene ambiguities caused by background changes, object repositioning, or visual clutter without policy retraining, leveraging semantic parsing and geometric feasibility constraints. Moreover, the system inherits human-like hybrid control capabilities, enabling mixed synchronous and asynchronous use of both arms. Extensive experiments validate VLBiMan across tool-use and multi-object tasks, demonstrating: (1) a drastic reduction in demonstration requirements compared to imitation baselines, (2) compositional generalization through atomic skill splicing for long-horizon tasks, (3) robustness to novel but semantically similar objects and external disturbances, and (4) strong cross-embodiment transfer, showing that skills learned from human demonstrations can be instantiated on different robotic platforms without retraining. By bridging human priors with vision-language anchored adaptation, our work takes a step toward practical and versatile dual-arm manipulation in unstructured settings.
comment: accepted by ICLR 2026. The project link is https://hnuzhy.github.io/projects/VLBiMan/
MotuBrain: An Advanced World Action Model for Robot Control
Vision-Language-Action (VLA) models generalize semantically well but often lack fine-grained modeling of world dynamics. We present MotuBrain, a unified World Action Model that jointly models video and action under a UniDiffuser formulation with a three-stream Mixture-of-Transformers architecture. A single model supports policy learning, world modeling, video generation, inverse dynamics, and joint video-action prediction, while scaling to heterogeneous multimodal data such as video-only, task-agnostic, and cross-embodiment robot data. Building on Motus, MotuBrain further introduces unified multiview modeling, an independent text stream for stronger language-action coupling, a shared cross-embodiment action representation, and an efficient post-training and deployment recipe for long-horizon real-world control. Our inference stack combines step reduction, compilation, FP8 quantization, DiT caching, V2A-style action-only inference, and real-time chunked closed-loop execution, achieving over 50x speedup over a naive baseline and up to 11 Hz inference. Experimentally, MotuBrain achieves 95.8% and 96.1% average success on RoboTwin 2.0 under clean and randomized settings, respectively, attains the strongest reported EWMScore in our WorldArena comparison, and adapts to new humanoid embodiments with only 50--100 trajectories. These results show that unified world action models can scale in generality, predictive accuracy, and real-world deployability.
Disentangled Control of Multi-Agent Systems
This paper develops a general framework for multi-agent control synthesis, which applies to a wide range of problems with convergence guarantees, including those with time-varying objective functions. The proposed framework achieves decentralization without inducing dynamical coupling among agents, and it naturally supports multi-objective robotics and real-time implementation. To demonstrate its generality and effectiveness, the framework is applied to solve three representative problems, namely time-varying leader-follower formation control, decentralized coverage control for time-varying density functions without approximations, which is a long-standing open problem, and safe formation navigation in a dense environment.
GSDrive: Reinforcing Driving Policies by Multi-mode Trajectory Probing with 3D Gaussian Splatting Environment
End-to-end (E2E) autonomous driving presents a promising approach for translating perceptual inputs directly into driving actions. However, prohibitive annotation costs and temporal data quality degradation hinder long-term real-world deployment. While combining imitation learning (IL) and reinforcement learning (RL) is a common strategy for policy improvement, conventional RL training relies on delayed, event-based rewards-policies learn only from catastrophic outcomes such as collisions, leading to premature convergence to suboptimal behaviors. To address these limitations, we introduce GSDrive, a framework that exploits 3D Gaussian Splatting (3DGS) for differentiable, physics-based reward shaping in E2E driving policy improvement. Our method incorporates a flow matching-based trajectory predictor within the 3DGS simulator, enabling multi-mode trajectory probing where candidate trajectories are rolled out to assess prospective rewards. This establishes a bidirectional knowledge exchange between IL and RL by grounding reward functions in physically simulated interaction signals, offering immediate dense feedback instead of sparse catastrophic events. Evaluated on the reconstructed nuScenes dataset, our method surpasses existing simulation-based RL driving approaches in closed-loop experiments. Code is available at https://github.com/ZionGo6/GSDrive.
comment: initial version
Multiagent Systems
When Embedding-Based Defenses Fail: Rethinking Safety in LLM-Based Multi-Agent Systems
Large language model (LLM)-powered multi-agent systems (MAS) enable agents to communicate and share information, achieving strong performance on complex tasks. However, this communication also creates an attack surface where malicious agents can propagate misinformation and manipulate group decisions, undermining MAS safety. Existing embedding-based defenses aim to detect and prune suspicious agents, but their effectiveness depends on a clear separation between the text embeddings of malicious and benign messages. Attackers can circumvent such defenses by crafting messages whose embeddings lie close to benign ones. We analyze this failure mode theoretically and validate it empirically with three attacks, Slow Drift, Benign Wrapper, and Chaos Seeding. Our analysis further reveals a fundamental limitation of embedding-based defenses: because they rely solely on the text embeddings, they ignore token-level confidence signals such as logits, which can remain informative when embeddings are not distinguishable under attack. We propose using confidence scores to prune or down-weight messages during MAS communication. Experiments show improved robustness across models, datasets, and communication topologies. Moreover, we find that the effectiveness of confidence signals decays over communication rounds, highlighting the importance of early intervention. This insights can inform and inspire future work on MAS attacks and defenses.
Governing What the EU AI Act Excludes: Accountability for Autonomous AI Agents in Smart City Critical Infrastructure
When a traffic signal controller adjusts green phases and a grid manager curtails power on the same corridor, each system may comply with its own obligations. The resident who suffers the combined effect has no single authority to hold accountable and, under the EU AI Act, limited means to obtain an explanation. Annex III, point 2 excludes safety-component AI in critical infrastructure from Article 86 explanation rights and Article 27 fundamental-rights impact assessment. Provider and deployer duties under Articles 9-15 still apply, and residual pathways under the GDPR, NIS2, and tortious liability offer partial coverage. The Act's principal resident-facing accountability instruments are nonetheless narrowed for the autonomous infrastructure systems most likely to interact across agencies. The paper traces this accountability deficit through four residual pathways (GDPR Article 22, GDPR transparency obligations, tortious liability, and NIS2) and shows that each is structurally bounded by individual-controller, individual-decision scope. As a governance response, it presents AgentGov-SC, a three-layer architecture (Agent, Orchestration, City) specifying 25 governance measures with bidirectional traceability to the EU AI Act, ISO/IEC 42001, and the NIST AI Risk Management Framework. Five conflict resolution rules and an autonomy-calibrated activation model complete the design. A scenario analysis traces governance activation through a multi-agent corridor cascade involving three documented UAE smart-city systems, with a contrasting single-system scenario confirming proportional activation. The paper contributes a regulatory gap analysis and governance architecture for an increasingly important class of urban AI deployment that existing frameworks treat as bounded and isolated.
comment: 24 pages, 3 figures, 8 tables. Submitted to Computer Law & Security Review
Separation Assurance between Heterogeneous Fleets of Small Unmanned Aerial Systems via Multi-Agent Reinforcement Learning
In the envisioned future dense urban airspace, multiple companies will operate heterogeneous fleets of small unmanned aerial systems (sUASs), where each fleet includes several homogeneous aircraft with identical policies and configurations, e.g., equipage, sensing, and communication ranges, making tactical deconfliction highly complex for the aircraft. This paper aims to address two core questions: (1) Can tactical deconfliction policies converge or reach an equilibrium to ensure a conflict-free airspace when companies operate heterogeneous fleets of homogeneous aircraft? (2) If so, will the converged policies discriminate against companies operating sUASs with weaker configurations? We investigate a multi-agent reinforcement learning paradigm in which homogeneous aircraft within heterogeneous fleets operate concurrently to perform package delivery missions over Dallas, Texas, USA. An attention-enhanced Proximal Policy Optimization-based Advantage Actor-Critic (PPOA2C) framework is employed to resolve intra- and inter-fleet conflicts, with each fleet independently training its own policy while preserving privacy. Experimental results show that two fleets with distinct, shared PPOA2C policies can reach an equilibrium to maintain safe separation. While two PPOA2C policies outperform two strong rule-based baselines in terms of conflict resolution, a PPOA2C policy exhibits safer interaction with a rule-based policy, indicating adaptive capabilities of PPOA2C policies. Furthermore, we conducted extensive policy-configuration evaluations, which reveal that equilibria between similar policy types tend to favor fleets with stronger configurations. Even under similar configurations but different policy types, the equilibrium favors one of the heterogeneous policies, underscoring the need for fairness-aware conflict management in heterogeneous sUAS operations.
comment: 8 pages, 3 figure, 1 table
RunAgent: Interpreting Natural-Language Plans with Constraint-Guided Execution
Humans solve problems by executing targeted plans, yet large language models (LLMs) remain unreliable for structured workflow execution. We propose RunAgent, a multi-agent plan execution platform that interprets natural-language plans while enforcing stepwise execution through constraints and rubrics. RunAgent bridges the expressiveness of natural language with the determinism of programming via an agentic language with explicit control constructs (e.g., \texttt{IF}, \texttt{GOTO}, \texttt{FORALL}). Beyond verifying syntactic and semantic verification of the step output, which is performed based on the specific instruction of each step, RunAgent autonomously derives and validates constraints based on the description of the task and its instance at each step. RunAgent also dynamically selects among LLM-based reasoning, tool usage, and code generation and execution (e.g., in Python), and incorporates error correction mechanisms to ensure correctness. Finally, RunAgent filters the context history by retaining only relevant information during the execution of each step. Evaluations on Natural-plan and SciBench Datasets demonstrate that RunAgent outperforms baseline LLMs and state-of-the-art PlanGEN methods.
Meritocratic Fairness in Budgeted Combinatorial Multi-armed Bandits via Shapley Values
We propose a new framework for meritocratic fairness in budgeted combinatorial multi-armed bandits with full-bandit feedback (BCMAB-FBF). Unlike semi-bandit feedback, the contribution of individual arms is not received in full-bandit feedback, making the setting significantly more challenging. To compute arm contributions in BCMAB-FBF, we first extend the Shapley value, a classical solution concept from cooperative game theory, to the $K$-Shapley value, which captures the marginal contribution of an agent restricted to a set of size at most $K$. We show that $K$-Shapley value is a unique solution concept that satisfies Symmetry, Linearity, Null player, and efficiency properties. We next propose K-SVFair-FBF, a fairness-aware bandit algorithm that adaptively estimates $K$-Shapley value with unknown valuation function. Unlike standard bandit literature on full bandit feedback, K-SVFair-FBF not only learns the valuation function under full feedback setting but also mitigates the noise arising from Monte Carlo approximations. Theoretically, we prove that K-SVFair-FBF achieves $O(T^{3/4})$ regret bound on fairness regret. Through experiments on federated learning and social influence maximization datasets, we demonstrate that our approach achieves fairness and performs more effectively than existing baselines.
Learning to Act and Cooperate for Distributed Black-Box Consensus Optimization
Distributed blackbox consensus optimization is a fundamental problem in multi-agent systems, where agents must improve a global objective using only local objective queries and limited neighbor communication. Existing methods largely rely on handcrafted update rules and static cooperation patterns, which often struggle to balance local adaptation, global coordination, and communication efficiency in heterogeneous nonconvex environments. In this paper, we take an initial step toward trajectory-driven self-design for distributed black-box consensus optimization. We first redesign the agent-level swarm dynamics with an adaptive internal mechanism tailored to decentralized consensus settings, improving the balance between exploration, convergence, and local escape. Built on top of this adaptive execution layer, we propose Learning to Act and Cooperate (LACMAS), a trajectorydriven framework in which large language models provide sparse highlevel guidance for shaping both agentinternal action behaviors and agentexternal cooperation patterns from historical optimization trajectories. We further introduce a phased cognitive scheduling strategy to activate different forms of adaptation in a resource-aware manner. Experiments on standard distributed black-box benchmarks and real-world distributed tasks show that LAC-MAS consistently improves solution quality, convergence efficiency, and communication efficiency over strong baselines, suggesting a practical route from handcrafted distributed coordination toward self-designing multi-agent optimization systems.
comment: 20 pages, 5 figures
Breaking the Communication-Accuracy Trade-off: A Sparsified Information Diffusion Framework for Multi-Agent Collaborative Perception
The growing relevance of multi-agent systems has drawn increasing focus on communication-efficient filters for collaborative perception to alleviate the system's communication burden. While the event-triggered (ET) mechanism can improve communication efficiency in collaborative state estimation, an inevitable trade-off exists between estimation accuracy and communication cost in ET filters. This paper proposes a fast and accurate ET diffusion-based filter for real-time multi-agent collaborative target tracking, aiming to reduce the system's data transmission without compromise in tracking performance. The proposed filter achieves improved tracking accuracy, reduced data transmission, and accelerated convergence using an error-minimized ET cubature information filter (CIF) for local estimation, and a correlation-aware diffusion strategy for global fusion. The experimental results confirm the scalability of the proposed EDC-CIF algorithm and demonstrate its efficacy in simultaneously reducing estimation error and computation time while significantly enhancing communication efficiency.
Skills as Verifiable Artifacts: A Trust Schema and a Biconditional Correctness Criterion for Human-in-the-Loop Agent Runtimes
Agent skills -- structured packages of instructions, scripts, and references that augment a large language model (LLM) without modifying the model itself -- have moved from convenience to first-class deployment artifact. The runtime that loads them inherits the same problem package managers and operating systems have always faced: a piece of content claims a behavior; the runtime must decide whether to believe it. We argue this paper's central thesis up front: a skill is \emph{untrusted code} until it is verified, and the runtime that loads it must enforce that default rather than infer trust from a signature, a clearance, or a registry of origin. Without skill verification, a human-in-the-loop (HITL) gate must fire on every irreversible call -- which is operationally untenable and degrades into rubber-stamping at any non-trivial scale. With skill verification treated as a separate, gated process, HITL fires only for what is unverified, and the system becomes sustainable. We give a trust schema (§\ref{sec:schema}) that includes an explicit verification level on every skill manifest; a capability gate (§\ref{sec:gate}) whose HITL policy is a function of that verification level; a \emph{biconditional} correctness criterion (§\ref{sec:biconditional}) that any candidate verification procedure must satisfy on an adversarial-ensemble exercise (§\ref{sec:eval}); and a portable runtime profile (§\ref{sec:guidelines}) with ten normative guidelines abstracted from a working open-source reference implementation \cite{metere2026enclawed}. The contribution is harness- and model-agnostic; nothing here requires retraining, fine-tuning, or proprietary infrastructure.
Neural Cellular Automata: From Cells to Pixels
Neural Cellular Automata (NCAs) are bio-inspired dynamical systems in which identical cells iteratively apply a learned local update rule to self-organize into complex patterns, exhibiting regeneration, robustness, and spontaneous dynamics. Despite their success in texture synthesis and morphogenesis, NCAs remain largely confined to low-resolution outputs. This limitation stems from (1) training time and memory requirements that grow quadratically with grid size, (2) the strictly local propagation of information that impedes long-range cell communication, and (3) the heavy compute demands of real-time inference at high resolution. In this work, we overcome this limitation by pairing an NCA that evolves on a coarse grid with a lightweight implicit decoder that maps cell states and local coordinates to appearance attributes, enabling the same model to render outputs at arbitrary resolution. Moreover, because both the decoder and NCA updates are local, inference remains highly parallelizable. To supervise high-resolution outputs efficiently, we introduce task-specific losses for morphogenesis (growth from a seed) and texture synthesis with minimal additional memory and computation overhead. Our experiments across 2D/3D grids and mesh domains demonstrate that our hybrid models produce high-resolution outputs in real-time, and preserve the characteristic self-organizing behavior of NCAs.
comment: 9 pages, 14 figures, +8 pages of Appendix (20 figures in total)
High entropy leads to symmetry equivariant policies in Dec-POMDPs
We prove that in any Dec-POMDP, sufficiently high entropy regularization ensures that the policy gradient flow with tabular softmax parametrization always converges, for any initialization, to the same joint policy, and that this joint policy is equivariant w.r.t. all symmetries of the Dec-POMDP. In particular, policies coming from different initializations will be fully compatible, in that their cross-play returns are equal to their self-play returns. Through empirical evaluation of independent PPO, arguably the most widely used deep multi-agent policy gradient algorithm, in the Hanabi, Overcooked, and Yokai environments, we find that the entropy coefficient has a massive influence on the cross-play returns between independently trained policies, and that the drop in self-play returns coming from increased entropy regularization can often be counteracted by greedifying the learned policies after training. In Hanabi, we achieve close to perfect scores in inter-seed cross-play this way. Despite clear limitations of this approach, which we point out, both our theoretical and empirical results indicate that during hyperparameter sweeps in Dec-POMDPs, one should consider far higher entropy coefficients than is typically done.
SoK: Security of Autonomous LLM Agents in Agentic Commerce
Autonomous large language model (LLM) agents such as OpenClaw are pushing agentic commerce from human-supervised assistance toward machine actors that can negotiate, purchase services, manage digital assets, and execute transactions across on-chain and off-chain environments. Protocols such as the Trustless Agents standard (ERC-8004), Agent Payments Protocol (AP2), OKX Agent Payments Protocol (APP), the HTTP 402-based payment protocol (x402), Agent Commerce Protocol (ACP), the Agentic Commerce standard (ERC-8183), and Machine Payments Protocol (MPP) enable this transition, but they also create an attack surface that existing security frameworks do not capture well. This Systematization of Knowledge (SoK) develops a unified security framework for autonomous LLM agents in commerce and finance. We organize threats along five dimensions: agent integrity, transaction authorization, inter-agent trust, market manipulation, and regulatory compliance. From a systematically curated public corpus of academic papers, protocol documents, industry reports, and incident evidence, we derive 12 cross-layer attack vectors and show how failures propagate from reasoning and tooling layers into custody, settlement, market harm, and compliance exposure. We then propose a layered defense architecture addressing authorization gaps left by current agent-payment protocols. Overall, our analysis shows that securing agentic commerce is inherently a cross-layer problem that requires coordinated controls across LLM safety, protocol design, identity, market structure, and regulation. We conclude with a research roadmap and a benchmark agenda for secure autonomous commerce.
Outbidding and Outbluffing Elite Humans: Mastering Liar's Poker via Self-Play and Reinforcement Learning
AI researchers have long focused on poker-like games as a testbed for environments characterized by multi-player dynamics, imperfect information, and reasoning under uncertainty. While recent breakthroughs have matched elite human play at no-limit Texas hold'em, the multi-player dynamics are subdued: most hands converge quickly with only two players engaged through multiple rounds of bidding. In this paper, we present Solly, the first AI agent to achieve elite human play in reduced-format Liar's Poker, a game characterized by extensive multi-player engagement. We trained Solly using self-play with a model-free, actor-critic, deep reinforcement learning algorithm. Solly played at an elite human level as measured by win rate (won over 50% of hands) and equity (money won) in heads-up and multi-player Liar's Poker. Solly also outperformed large language models (LLMs), including those with reasoning abilities, on the same metrics. Solly developed novel bidding strategies, randomized play effectively, and was not easily exploitable by world-class human players.
Markets with Heterogeneous Agents: Dynamics and Survival of Bayesian vs. No-Regret Learners
We analyze the performance of heterogeneous learning agents in asset markets with stochastic payoffs. Our main focus is on comparing Bayesian learners and no-regret learners who compete in markets and identifying the conditions under which each approach is more effective. We formally relate the notions of survival and market dominance studied in economics and the framework of regret minimization, thereby bridging these theories. A central finding is that regret plays a key role in market selection, but low regret alone does not guarantee survival: surprisingly, an agent may achieve even logarithmic regret and yet be driven out of the market when competing against a Bayesian learner with a finite prior that assigns positive probability to the correct model. At the same time, we show that Bayesian learning is highly fragile, while no-regret learning requires less knowledge of the environment and is therefore more robust. Motivated by this contrast, we propose two simple hybrid strategies that incorporate Bayesian updates while improving robustness and adaptability to distribution shifts, taking a step toward a best-of-both-worlds learning approach. More broadly, our work contributes to the understanding of dynamics of heterogeneous learning agents and their impact on markets.
comment: Learning in Markets, Heterogeneous Agents, Regret and Survival, Bayesian Learning, No-Regret Learning, Portfolio Optimization, Kelly Rule, Distribution Shifts, Robust Bayesian Updates
Systems and Control (EESS)
A Mission-Centric Cyber-Resilience Benchmark for Silent-Watch Operation of Electrified Ground-Platform Power Architectures
Silent-watch operation makes electrified ground platforms depend on supervisory energy management because mission loads must be sustained from stored energy while the engine is off. This paper develops a mission-centric cyber-resilience benchmark for this operating mode. The benchmark connects battery state-of-charge (SOC) spoofing to mission outcomes rather than evaluating the attack only through detector response or control error. It combines a reduced-order DC-bus model, residual-based detection, fallback shedding, and four mission-facing metrics for endurance, critical-load service, unsafe-voltage exposure, and detection delay. The study shows that SOC spoofing creates a structured stealth-versus-impact envelope. Small biases have limited mission effect, intermediate biases create an endurance deficit bounded by a closed-form expression in bias magnitude, shed power, and average battery draw, and large biases disable the SOC-driven guard. The results also show that defense value depends on fallback depth, not detection alone. An undersized fallback action can leave the Defended case worse than the undefended Attacked case. MATLAB-to-Simulink parity across five regression scenarios provides a software-verified basis for OPAL-RT/EXataCPS hardware-in-the-loop testing.
Distributed Attraction-Repulsion Potential for Multi-Agent Formation Control
In this paper, a distributed multi-agent formation control driven by the gradient of the Lennard-Jones potential is analyzed. For collision-free initial data, we prove global well-posedness together with a uniform lower bound on all inter-agent distances, thereby excluding hard collisions. Taking the total energy as a Lyapunov function, LaSalle's invariance principle shows that every positive limit point is an equilibrium. Since trajectories remain uniformly away from collisions, the energy is analytic along the flow and an argument yields convergence to a single equilibrium modulo translations. Illustrative numerical examples are presented.
From Sensing to Decision: A Generic Architecture for Freight Signal Priority Systems SC 2026
Freight Signal Priority (FSP) systems have emerged as a promising strategy to enhance freight mobility and reduce corridor delays in urban networks. While extensive research has focused on priority control algorithms and operational performance evaluation, comparatively limited attention has been devoted to the architectural design of sensing processes that shape reliable priority decisions. In practice, uncertainties in vehicle detection, communication, and estimated time of arrival (ETA) may propagate within the sensing-to-decision process, affecting priority timing and downstream signal performance. This paper presents a systematic review of FSP systems from a sensing-to-decision perspective. We propose a generic two-layer architecture consisting of a sensing-to-decision layer and a control execution layer. The sensing-to-decision layer transforms sensing inputs into priority decisions, while the control execution layer implements approved actions within traffic controllers. Within this architecture, we systematically compare major sensing modalities, including loop detectors, vision sensors, and V2I, across dimensions such as classification capability, state estimation accuracy, latency, and information richness. We further examine representative FSP systems to analyze how modality-specific characteristics and uncertainties influence ETA computation, priority triggering, and decision reliability. By linking sensing design to decision outcomes, this review identifies key deployment challenges and research gaps in reliability-aware sensing-to-decision design. Ultimately, this work provides a conceptual foundation for developing scalable and robust FSP systems that explicitly account for sensing imperfections rather than assuming idealized inputs.
comment: 8 pages, 1 figure. Accepted at ITSC 2026. Final version to appear in IEEE Xplore
Sampled-data Robust Control of Electrically Stimulated Engineered Cell Factories
Closed-loop bioelectronic regulation of engineered secretory cell systems is challenging because electric-field (EF) stimulation acts indirectly through transcription-factor activation, in the presence of delayed, nonlinear, and noisy intracellular dynamics, sparse measurements, and constrained burst-based actuation. We develop a framework for robust closed-loop endocrine regulation in electrically stimulated engineered cell factories, illustrated through extracellular thyroid hormone \(T_4\) production in engineered thyroid-like cells. The plant is modeled by a control-oriented ODE formulation combining a reduced mechanistic \(T_4\) pathway, an EF-responsive Hill module, and a linear-chain Erlang cascade representing distributed intracellular delay. On this basis, we design a sampled-data adaptive proportional-integral-derivative (PID) controller with derivative filtering, anti-windup, saturation and rate limits, and hysteretic band-locking, together with a robust adaptive extension that accounts for parameter mismatch, sensor noise and bias, actuator mismatch, delay/jitter, and exogenous rhythmic disturbance through a scenario-based risk-aware update. We provide local sampled-data input-to-state stability interpretations for both APID and RAPID, showing that, under standard local Lyapunov and bounded-disturbance conditions, the sampled tracking error is ultimately bounded by a disturbance-dependent constant. In silico experiments demonstrate sustained regulation of extracellular \(T_4\) across prescribed targets despite significant uncertainty.
Voltage Ride-Through in Large Loads- A Dual PQ Approach
This paper provides a detailed investigation of voltage ride-through in large loads, such as Artificial Intelligence data centers. Voltage ride-through capability of large loads during transient disturbances in the power grid is important because of the potential impact on the stability and reliability of the Bulk Power System. A mathematical analysis is presented and it is shown how the traditional approach, based on reactive power compensation, may not be adequate for voltage ride-through in large loads. Ultimately, due to capacity limits of the load's power distribution infrastructure and grid's constraints, there is a limit to using reactive power as a corrective tool. A new dual active and reactive power (PQ) approach is proposed in which non-grid resources with dynamic P and Q capabilities are shown to be needed to help with voltage ride-through. Additionally, the analysis illustrates that at extreme voltage dips in the power grid maintaining an acceptable level of load voltage can become practically or theoretically unattainable, which may lead to the load's disconnection from the grid. Analytical results are provided with practical numerical examples.
comment: 10 pages
HyperCertificates: Verification of Discrete-time Dynamical Systems against HyperLTL Specifications
We introduce a functional inductive framework to verify discrete-time dynamical systems against hyperproperties specified as Hyperlinear temporal logic formulae via a notion of HyperCertificates. Unlike linear temporal logic (LTL) formulae which are concerned with individual traces of a system, hyperproperties are properties that are concerned with how the traces of a system relate to one another. HyperLTL is an extension of LTL for hyperproperties, and is useful to describe specifications such as opacity, privacy as well as notions of robustness. Our notion of HyperCertificates consists of a pair of functions, where the first models the lookahead, and the second relies on a combination of barrier and ranking functions. We use closure certificates, to act as a model for this lookahead and then rely on barrier and ranking function arguments modulo this lookahead to provide guarantees against HyperLTL formulae. We demonstrate how our approach is automatable via existing techniques such as sum-of-squares optimization (SOS) and satisfiability modulo theories (SMT) solvers. Finally, we demonstrate our approach on some case studies.
comment: 24 pages, 3 figures, 1 table
Economic Valuation and Optimal Deployment of Static Synchronous Series Compensators for U.S. Power System Expansion
Flexible AC Transmission Systems (FACTS), particularly Static Synchronous Series Compensators (SSSC), can improve network transfer capability and complement restricted transmission expansion. Evaluations of FACTS within large-scale, real-world power system planning are currently lacking. This paper develops a capacity expansion model for the contiguous U.S. power system toward 2050, incorporating SSSC-modified linear power flow equations and accounting for impedance feedback in transmission expansion. Cost-optimal system expansion leverages widespread nationwide SSSC deployment on small-to-medium capacity lines and reduces the number of corridors to be reinforced. Overall, SSSCs reduce annualized system costs by $1.9 billion or decrease transmission expansion requirements by 20%. The most advantageous deployments achieving benefit-cost ratios of 59 concentrated in the Midwest, facilitating the delivery of central U.S. wind power to eastern load centers. The value proposition of SSSCs is robust to cost sensitivities and potential competition from HVDC network expansion, and increases under higher demand growth and more stringent decarbonization policies. These findings provide a blueprint for leveraging SSSC deployment in the U.S. power system.
comment: 10 pages, 7 figures
Multi-Regional Traffic Control with Travel and Charging Demand Co-Management
Urban traffic management is essential for reducing congestion and supporting sustainable mobility. However, the task is becoming more challenging due to the growing penetration of electric vehicles and their charging demands. This paper presents a regional traffic coordination framework that combines route guidance and charging management to improve traffic network efficiency. Regional traffic dynamics are modeled by the macroscopic fundamental diagram, which allows for the analysis of congestion at the system level. The framework jointly optimizes routes and charging decisions, and it also uses demand management to regulate external inflows into the network. A case study on a 16-region urban network demonstrates the effectiveness of the proposed approach.
The Potential Welfare Gains from Curtailment Trading Under Non-Firm Interconnection
Rapid growth of large loads led by data centers is straining grid capacity. These loads increasingly accept curtailment risk through non-firm interconnection agreements to gain faster grid access, expanding the pool of consumers subject to mandatory disconnection during supply shortfalls. Yet, blunt rules assign curtailment without reference to the wide variation in the value consumers place on avoiding curtailment, often captured by the value of lost load (VOLL). This paper introduces the network-constrained Curtailment Credit Market (CCM), a mechanism in which agents submit bids that determine bilateral credit flows, subject to transmission network constraints. We prove that the bilateral credit flow representation can reach every curtailment allocation available to an omniscient central planner (feasible-set equivalence), so the bilateral flow structure introduces no loss of allocative capability. Under truthful bidding, the CCM achieves the planner's total value of served load, matching the planner's allocative benchmark when bids reflect true interruption costs. The CCM is formulated as a bilevel clearing problem that admits an exact single-level mixed-integer linear program (MILP), solved in 0.01 to 83 seconds. Numerical experiments on three test systems validate the mechanism at increasing scale and complexity: a 3-bus toy network that isolates the core trading logic, the IEEE 24-bus reliability test system as a standard benchmark, and a reduced New York (NY) grid that captures coordination across NY load zones. Our simulations show that the CCM increases the total value of served load by 1.24 to 1.83 times relative to pro-rata curtailment. On the three test systems examined here, no participant is worse off under incentive-compatible benchmark payments than under the administrative baseline.
Deployment-Efficient Short-Term Load Forecasting in AI Data Centers via Sequence-to-Point Knowledge Distillation
Accurately forecasting the bursty and non-stationary power demand of AI data centers has become increasingly important, as abrupt workload-driven variations at the GPU-node level can affect real-time operational efficiency, power management, and grid-data center coordination. However, high-capacity forecasting models are often difficult to deploy at scale because of their memory and latency requirements, while lightweight predictors may fail to capture short-horizon temporal dynamics. To address this accuracy-deployment tradeoff, this paper proposes a deployment-efficient knowledge distillation framework for short-term load forecasting in AI data centers. The proposed framework first trains a high-capacity sequence teacher model for multi-step load trajectory prediction, where residual learning is used to improve robustness under non-stationary operating conditions. A lightweight point-wise student model is then developed for low-latency rolling inference using a compact neural network architecture. To transfer temporal knowledge from the teacher to the student, a sequence-to-point distillation strategy is introduced by aligning near-term predictive behavior and temporally pooled representations. Case studies on the MIT Supercloud dataset demonstrate that the proposed student model improves forecasting accuracy over recent deep learning baselines while reducing the deployment footprint by over 10x in parameter memory and model size.
comment: 7 pages, 4 figures, 3 tables
Equation-Free Digital Twins for Nonlinear Structural Dynamics
Monitoring high-dimensional engineering structures in extreme environments is limited by non-stationary excitation, nonlinear structural kinematics, and stochastic forcing. Traditional model-based and black-box data-driven methods often struggle to resolve these dynamics in real time, particularly under sensor failure or partial observability. This paper introduces a rank-optimized digital twin framework based on Koopman operator theory, Hankel-matrix embeddings, and dynamic mode decomposition. By lifting operational data into a linear invariant subspace, the method enables autonomous, input-blind reconstruction of structural states without requiring a priori mass or stiffness matrices. The framework is validated on an NREL 5MW spar-buoy floating offshore wind turbine, representing a challenging coupled aero-hydro-servo-elastic system. Results show that the rank-optimized Koopman-Hankel manifold separates structural resonances from deterministic 3P rotor harmonics under colored noise, where standard subspace identification can be unreliable. A rolling-horizon virtual sensing strategy achieves high-fidelity reconstruction at critical structural hotspots, with coefficient of determination greater than 0.95 at 1 Hz data assimilation and accuracy exceeding 0.99 at higher sampling rates. By estimating a physical Lyapunov time of approximately 1.0 s, the study defines the predictability horizon associated with the system information barrier. The proposed framework provides a computationally efficient and resilient digital twin approach for real-time identification and virtual sensing of complex structural dynamics.
A Policy-Driven DRL Framework for System-Level Tradeoff Control in NR-U/Wi-Fi Coexistence
The coexistence of NR-U and Wi-Fi in unlicensed spectrum introduces a system-level resource coordination problem, where heterogeneous channel access mechanisms lead to a significant imbalance in spectrum utilization and degraded Wi-Fi performance. To address this challenge, we propose a policy-driven deep reinforcement learning (DRL) framework for adaptive TXOP control, in which the coexistence process is formulated as a Markov decision process (MDP) and a deep Q-network (DQN) learns control policies through online interaction. A key contribution is the introduction of a policy layer via reward design, enabling explicit control of system-level tradeoffs among fairness, throughput, and quality of service (QoS). Three policies, namely absolute fairness, moderate fairness, and utility-based fairness, are developed to achieve different operating points. Simulation results show that the proposed framework achieves a Jain fairness index above 0.9 under strict fairness control. Compared to absolute fairness, moderate fairness improves aggregate throughput by 68.22%, while the utility-based policy further enhances utility by 177.6%. These results demonstrate that policy-driven control provides a flexible and effective solution for managing tradeoffs in heterogeneous coexistence networks.
comment: 12 pages, 13 figures, 1 table, submitted to IEEE Systems Journal
How to Do Statistical Evaluations in ECE/CS Papers: A Practical Playbook for Defensible Results
Strong experimental papers in electrical and computer engineering and computer science (ECE/CS), especially in systems, networking, and applied machine learning, rest on more than a single impressive number. They rest on a chain of design, measurement, analysis, and validation choices that, taken together, make a result believable. This tutorial is a compact, example-driven guide to that chain for beginning researchers. We organize it as an evaluation workflow: claim, hypothesis, unit of analysis, baseline, regime sweep, uncertainty estimate, validation check, and reporting. Within that workflow we cover the classical statistical foundations (descriptive statistics, the central limit theorem, normal- and $t$-based confidence intervals, Student's $t$-test, ANOVA, chi-squared and Pearson correlation, linear regression) alongside the modern, distribution-free techniques (the bootstrap, Wilcoxon and Mann--Whitney tests, Cliff's delta) that are usually preferred for ECE/CS data. We also discuss factorial design, randomization and blocking, multiple-comparison correction, latency-specific pitfalls, simulation verification and validation, equivalence-style claims, and reproducibility. A running example, a comparison of two job-scheduling algorithms on simulated workloads with truncated heavy-tailed job sizes, threads through the tutorial, with Python snippets the reader can paste and adapt. The paper closes with a pre-submission checklist; companion student-facing material (project-type translation tables, an evaluation-plan worksheet, exercises, and a worked ``bad evaluation autopsy'') is collected in a separate workbook released alongside this paper.
comment: 30 pages, 8 figures; Tutorial paper; companion student workbook and claude skill available as ancillary material
Electric Grid Topology and Admittance Estimation using Phasor Measurements
Recent advances in precise phasor measurement units are enabling new approaches to estimate distribution and transmission grid parameters in real-time. In this paper, we investigate voltage and current phasor measurement requirements to estimate the electric grid topology and admittance parameters. We show necessary and sufficient conditions for the number of independent operating points (measurements) required to determine the topology and admittance of a completely unknown electric grid. With prior topology information, we also show that there is a minimum number of measurements required to uniquely determine the admittance matrix and corresponding grid topology. In the presence of noisy phasor measurements, we show that the admittance matrix can be estimated using a structured total least squares approach. By means of numerical simulations on the IEEE 13-node distribution feeder, the IEEE 14-node transmission network, and the IEEE 123-node distribution feeder, we demonstrate our approach is suitable for applications in radial and mesh grid topologies in the presence of measurement noise.
Real-Time Neural Distributed Energy Resources Dispatch with Feasibility Guarantees
The growing penetration of renewable energy necessitates high-frequency real-time scheduling. While neural network-based surrogates enable computationally efficient scheduling, strictly enforcing nonconvex power flow constraints without external solvers remains a fundamental challenge. To bridge this gap, this letter proposes a solver-free neural dispatch framework with rigorous feasibility guarantees. A convex inner approximation of the DistFlow model is first derived via the convex envelope theorem. Building upon this approximation, a robust optimization-based affine policy is formulated to yield a theoretically certified interior-point mapping rule, which is then embedded within a bisection-based projection scheme to efficiently recover feasibility for infeasible NN outputs without any external solver. Experimental results demonstrate that the proposed method restores feasibility on the order of $10^{-3}$ s while maintaining near-optimal performance.
Distributed Coordination of Grid-Forming and Grid-Following Inverters for Optimal Frequency Control in Power Systems
The large-scale integration of inverter-interfaced renewable energy sources presents significant challenges to maintaining power balance and nominal frequency in modern power systems. This paper studies grid-level coordinated control of grid-forming (GFM) and grid-following (GFL) inverter-based resources (IBRs) for scalable and optimal frequency control. We propose a fully distributed optimal frequency control algorithm based on the projected primal-dual gradient method and by leveraging the structure of the underlying physical system dynamics. The proposed algorithm i) restores the nominal system frequency while minimizing total control cost and enforcing IBR power capacity limits and line thermal constraints, and ii) operates in a distributed manner that only needs local measurements and neighbor-to-neighbor communication. In particular, when the line thermal constraints are disregarded, the proposed algorithm admits a fully local implementation that requires no communication, while still ensuring optimality and satisfying IBR power capacity limits. We establish the global asymptotic convergence of the algorithm using Lyapunov stability analysis. The effectiveness and optimality of the proposed algorithms are validated through high-fidelity, 100% inverter-based electromagnetic transient (EMT) simulations on the IEEE 39-bus system.
Area Optimization of Open-Source Low-Power INA in 130nm CMOS using Hybrid Mixed-Variable PSO
As open-source silicon initiatives democratize access to integrated circuit development using multi-project environments, silicon area has become a premium resource. However, minimizing this layout area traditionally forces designers to compromise on core performance specifications. To address this challenge, this paper presents an open-source framework based on a hybrid mixed-variable particle swarm optimization algorithm and the gm/ID methodology to minimize the layout area of complex analog circuits while meeting design requirements. The framework's efficacy is demonstrated by designing a low-power instrumentation amplifier that achieves a 90.33% reduction in gate area over existing implementations.
What price to pay? Auto-tuning a building MPC controller for optimal economic cost
Demand-side management (DSM) programs introduce complex pricing, requiring advanced control for cost minimization. Model Predictive Control (MPC) offers a solution but its performance hinges on appropriate hyperparameter tuning. We propose using Constrained Bayesian Optimization (CONFIG) to automate this process. In a case study, our optimized MPC reduced electricity costs by 26.90% compared to a rule-based controller and by 17.46% versus an manually tuned MPC. Analysis of real contracts further showed that optimal DSM program selection can lower monthly bills by up to 20.18%, demonstrating a data-driven path to significant consumer savings.
comment: 11 pages, 5 figures
Generalized Spectral Clustering of Low-Inertia Power Networks
Large-scale integration of distributed energy resources has led to a rapid increase in the number of controllable devices and a significant change in system dynamics. This has necessitating the shift towards more distributed and scalable control strategies to manage the increasing system complexity. In this work, we address the problem of partitioning a low-inertia power network into dynamically coherent subsystems to facilitate the utilization of distributed control schemes. We show that an embedding of the power network using the spectrum of the linearized synchronization dynamics matrix results in a natural decomposition of the network. We establish the connection between our approach and the broader framework of spectral clustering using the Laplacian matrix of the admittance network. The proposed method is demonstrated on the IEEE 30-bus test system. We consider the robustness of the clusters by analyzing the sensitivity of the small eigenvalues and their corresponding eigenspaces to perturbations caused by variation in the steady-state operating points of the network.
comment: This manuscript has been submitted to IEEE Transactions on Power Systems
State Forecasting in an Estimation Framework with Surrogate Sensor Modeling
In recent years, computational power and data availability breakthroughs have revolutionized our ability to analyze complex physical systems through the inverse problem approach. Data-driven techniques like system identification and machine learning play an important role in this field, allowing us to gain insights into previously inaccessible phenomena. However, a major hurdle remains: How can meaningful information from partial measurements be extracted? In the aerospace domain, the challenge of state estimation is particularly pronounced due to the limited availability of observational data and the constraints imposed by sensor capabilities for tracking resident space objects (RSOs). To address these limitations, advanced compensation methodologies are required. Currently, range and bearing measurements obtained from radar and optical systems constitute the primary observational tools in the space situational awareness (SSA) community. In this work, we propose a novel framework that integrates a simplified reference dynamics model with a data-driven surrogate measurement model. This fusion process leverages the strengths of both models to estimate complex dynamical behaviors under conditions of partial observability. Extensive numerical experiments were conducted across multiple datasets to validate the proposed framework. The results demonstrate its efficacy in accurately reconstructing system dynamics from incomplete measurement data. Furthermore, to ensure the robustness of the framework, an initial consistency analysis of the surrogate modeling approach is presented. By addressing the current challenges and refining the integration of data-driven techniques with traditional physics-based modeling, this framework aims to advance state estimation methodologies in the aerospace sector.
Reduced-Order Data Assimilation for Thermospheric Density Using Physics-informed SINDyc Models
Accurate estimation of thermospheric mass density is a prerequisite for orbit prediction and space situational awareness, where the upper atmosphere responds nonlinearly to solar and geomagnetic forcing across several orders of magnitude. Physics-based general circulation models resolve this response but are computationally expensive, while empirical models run cheaply but lack a time-evolving atmospheric state. This work couples a data-driven reduced-order thermospheric model with a Kalman filter that assimilates in situ density observations. An autoregressive Sparse Identification of Nonlinear Dynamics with control (SINDy$_c$-AR) reduced-order model derived from the Thermosphere-Ionosphere-Electrodynamics General Circulation Model (TIE-GCM) captures the dominant modes of variability and their dependence on solar and geomagnetic drivers at a fraction of the parent model's cost. Density observations from CHAMP, GRACE, GRACE-FO, GOCE, and Swarm are assimilated across a range of orbital configurations and geomagnetic conditions, with a linear DMDc model evaluated as a reference. Assimilation reduces density estimation error relative to open-loop predictions, most visibly during geomagnetic storms and under single-satellite coverage. SINDy$_c$-AR and DMDc perform comparably on assimilated orbits; on withheld orbits, SINDy$_c$-AR is more accurate in the in-training scenarios while DMDc is better in the out-of-training 2024 Swarm-C case. Benchmarks against NRLMSIS~2.1 and HASDM (2000--2019, where available) show that empirical references can outperform the assimilated model far from the assimilated track, so results are framed as improvements over the open-loop forecast.
A Process-Aware Demand Response Evaluation Framework for Hydrogen-Integrated Zero-Carbon Steel Plants Coupled with Methanol Production
High penetration of renewables (RES) and the retirement of thermal units aggravate flexibility scarcity in power systems. Hydrogen-based low-carbon steel production systems possess substantial demand response (DR) potential. This paper proposes a process-aware DR evaluation framework for hydrogen-integrated zero-carbon steel plants coupled with methanol production (H2-DRI-EAF-MeOH). First, a novel H2-DRI-EAF-MeOH architecture is introduced to eliminate residual emissions via methanol synthesis. Integrated energy-material flows are formulated to reflect coupling interactions governing DR potential. Second, to capture electric arc furnace (EAF) operational constraints while preserving tractability, an operating feasible region model is developed and validated using field data from a pure hydrogen direct reduced iron and EAF plant, yielding a 4.1% average relative error. Third, a process-aware DR potential evaluation model is formulated, incorporating a nonlinear asymmetric penalty and an adaptive rolling mechanism to reflect operators' aversion to process deviations and avoid myopic scheduling. Finally, dual-side evaluation metrics are established to quantify grid-side delivered DR capacity and ramping risks, making load-side unit-level regulation behaviors observable. Case studies show the proposed framework achieves an average effective delivered DR capacity of 178.3 MW, improves RES-load matching from 0.257 to 0.587, and reduces costs by 15.68% compared to the baseline. Furthermore, the exponential asymmetric penalty mitigates extreme tail risks of process deviations. Ultimately, this work provides a theoretical foundation for leveraging RES-steel-chemical synergies to mitigate flexibility scarcity.
Application of Deep Reinforcement Learning to Event-Triggered Control for Networked Artificial Pancreas Systems
This paper proposes a deep reinforcement learning (DRL)-based event-triggered controller design for networked artificial pancreas (AP) systems. Although existing DRL-based AP controllers typically assume periodic control updates, networked control systems (NCSs) require a reduction in communication frequency to achieve energy-efficient operation, which is directly tied to control updates. However, jointly learning both insulin dosing and update timing significantly increases the complexity of the learning problem. To alleviate this complexity, we develop a practical DRL-based controller design that avoids explicitly learning update timing by introducing a rule-based criterion defined by changes in blood glucose. As a result, decision-making occurs at irregular intervals, and the problem is naturally formulated as a semi-Markov decision process (SMDP), for which we extend a standard DRL algorithm. Numerical experiments demonstrate that the proposed method improves communication efficiency while maintaining control performance.
comment: Submitted to a journal
Stability Analysis and Data-Driven State Estimation for Generalized Persidskii Systems with Time Delays: Theory and Experimental Validation on PMSM Drives
This paper addresses the stability analysis and state estimation of generalized Persidskii systems subject to time-varying delays and external disturbances. The generalized Persidskii class, which couples linear dynamics with sector-bounded nonlinear feedback loops, offers a tractable yet expressive framework for modeling electromechanical and neural network systems. We develop delay-dependent conditions for input-to-state stability (ISS) via Lyapunov--Krasovskii functionals incorporating Persidskii-type integral terms, and cast these conditions as linear matrix inequalities (LMIs). A structured robust observer is proposed for systems with partial state measurement, and its convergence is guaranteed through an $H_\infty$ synchronization criterion. To handle plant uncertainty, the system matrices are identified from trajectory data using a stability-preserving Koopman lifting procedure, in which the ISS-LMI constraint is embedded as a convex side condition during parameter regression. The identified model populates the prediction horizon of an ICODE-MPPI (Input-dependent Control-oriented Dynamical Estimation -- Model Predictive Path Integral) controller. The complete framework is validated on a 1.5 kW Permanent Magnet Synchronous Motor (PMSM) drive equipped with a programmable load brake. Experimental results confirm a 35\% reduction in velocity estimation RMSE relative to an Extended Kalman Filter and a 67\% improvement in speed-tracking accuracy relative to standard Field-Oriented Control, corroborating the theoretical ISS bounds established herein.
Disentangled Control of Multi-Agent Systems
This paper develops a general framework for multi-agent control synthesis, which applies to a wide range of problems with convergence guarantees, including those with time-varying objective functions. The proposed framework achieves decentralization without inducing dynamical coupling among agents, and it naturally supports multi-objective robotics and real-time implementation. To demonstrate its generality and effectiveness, the framework is applied to solve three representative problems, namely time-varying leader-follower formation control, decentralized coverage control for time-varying density functions without approximations, which is a long-standing open problem, and safe formation navigation in a dense environment.
From Cursed to Competitive: Closing the ZO-FO Gap via Input-to-State Stability
While it is generally understood that zeroth-order (ZO) algorithms have an extra dependency on their number of iterations for any choice of parameters, compared to their first-order (FO) counterparts, in this work, we show that under several conditions, in expectation, ZO methods do not suffer from extra dimension dependencies in their convergence rates with respect to their FO counterparts. We look at optimisation algorithms from the dynamical systems perspective and analyse the conditions under which one can formulate the average of a ZO algorithm as the average of its FO counterpart with bounded perturbations with values dependent on design parameters. Then, using input-to-state stability properties, we show ZO methods follow the same decay rate as their FO counterparts and converge to a neighbourhood of the fixed point of FO methods, where its radius depends on the bound of the norm of the perturbations, which can be made arbitrarily small. The theoretical findings are illustrated via numerical examples.
Data-Driven Privacy-Preserving Modeling and Frequency Regulation with Aggregated Electric Vehicles via Bilinear Hidden Markov Model
Vehicle-to-Grid (V2G) technology allows bidirectional power flow for real-time grid support, making electric vehicles (EVs) well-suited for ancillary services such as frequency regulation. However, existing methods for flexibility estimation and coordinating aggregated EVs often rely on individual EV traveling information (e.g., arrival/departure time) and/or characteristic parameters (e.g., charging efficiency, battery capacity) as well as real-time state-of-charge (SOC), which raises privacy concerns and faces data quality issues. To address these challenges, this paper proposes a data-driven, privacy-preserving modeling and control framework for frequency regulation using aggregated EVs. The proposed method can provide accurate estimation for power outputs and flexibility of aggregated EVs and carry out effective frequency regulation without any individual EV information. Simulation results validate the accuracy and effectiveness of the proposed method, which also outperforms the model-based and federated learning-based method under SOC data inaccuracies.
comment: 12 Pages, 10 figures, Accepted for publication in IEEE Transactions on Smart Grid
Robotics
OmniRobotHome: A Multi-Camera Platform for Real-Time Multiadic Human-Robot Interaction
Human-robot collaboration has been studied primarily in dyadic or sequential settings. However, real homes require multiadic collaboration, where multiple humans and robots share a workspace, acting concurrently on interleaved subtasks with tight spatial and temporal coupling. This regime remains underexplored because close-proximity interaction between humans, robots, and objects creates persistent occlusion and rapid state changes, making reliable real-time 3D tracking the central bottleneck. No existing platform provides the real-time, occlusion-robust, room-scale perception needed to make this regime experimentally tractable. We present OmniRobotHome, the first room-scale residential platform that unifies wide-area real-time 3D human and object perception with coordinated multi-robot actuation in a shared world frame. The system instruments a natural home environment with 48 hardware-synchronized RGB cameras for markerless, occlusion-robust tracking of multiple humans and objects, temporally aligned with two Franka arms that act on live scene state. Continuous capture within this consistent frame further supports long-horizon human behavior modeling from accumulated trajectories. The platform makes the multiadic collaboration regime experimentally tractable. We focus on two central problems: safety in shared human-robot environments and human-anticipatory robotic assistance, and show that real-time perception and accumulated behavior memory each yield measurable gains in both.
comment: Project Page: https://junc0ng.github.io/omnirobothome
LaST-R1: Reinforcing Action via Adaptive Physical Latent Reasoning for VLA Models
Vision-Language-Action (VLA) models have increasingly incorporated reasoning mechanisms for complex robotic manipulation. However, existing approaches share a critical limitation: whether employing explicit linguistic reasoning that suffers from latency and discretization, or utilizing more expressive continuous latent reasoning, they are predominantly confined to static imitation learning that limits adaptability and generalization. While online reinforcement learning (RL) has been introduced to VLAs to enable trial-and-error exploration, current methods exclusively optimize the vanilla action space, bypassing the underlying physical reasoning process. In this paper, we present \textbf{LaST-R1}, a unified VLA framework that integrates latent Chain-of-Thought (CoT) reasoning over physical dynamics prior to action execution, along with a tailored RL post-training paradigm. Specifically, we propose \textbf{Latent-to-Action Policy Optimization (LAPO)}, a novel RL algorithm that jointly optimizes the latent reasoning process and the action generation. By bridging reasoning and control, LAPO improves the representation of physical world modeling and enhances robustness in interactive environments. Furthermore, an \textbf{adaptive latent CoT mechanism} is introduced to allow the policy to dynamically adjust its reasoning horizon based on environment complexity. Extensive experiments show that LaST-R1 achieves a near-perfect 99.8\% average success rate on the LIBERO benchmark with only one-shot supervised warm-up, significantly improving convergence speed and performance over prior state-of-the-art methods. In real-world deployments, LAPO post-training yields up to a 44\% improvement over the initial warm-up policy across four complex tasks, including both single-arm and dual-arm settings. Finally, LaST-R1 demonstrates strong generalization across simulated and real-world environments.
RopeDreamer: A Kinematic Recurrent State Space Model for Dynamics of Flexible Deformable Linear Objects
The robotic manipulation of Deformable Linear Objects (DLOs) is a fundamental challenge due to the high-dimensional, non-linear dynamics of flexible structures and the complexity of maintaining topological integrity during contact-rich tasks. While recent data-driven methods have utilized Recurrent and Graph Neural Networks for dynamics modeling, they often struggle with self-intersections and non-physical deformations, such as tangling and link stretching. In this paper, we propose a latent dynamics framework that combines a Recurrent State Space Model with a Quaternionic Kinematic Chain representation to enable robust, long-term forecasting of DLO states. By encoding the DLO as a sequence of relative rotations (quaternions) rather than independent Cartesian positions, we inherently constrain the model to a physically valid manifold that preserves link-length constancy. Furthermore, we introduce a dual-decoder architecture that decouples state reconstruction from future-state prediction, forcing the latent space to capture the underlying physics of deformation. We evaluate our approach on a large-scale simulated dataset of complex pick-and-place trajectories involving self-intersections. Our results demonstrate that the proposed model achieves a 40.52% reduction in open-loop prediction error over 50-step horizons compared to the state-of-the-art baseline, while reducing inference time by 31.17%. Our model further maintains superior topological consistency in scenarios with multiple crossings, proving its efficacy as a compositional primitive for long-horizon manipulation planning.
FlexiTac: A Low-Cost, Open-Source, Scalable Tactile Sensing Solution for Robotic Systems
We present FlexiTac, a low-cost, open-source, and scalable piezoresistive tactile sensing solution designed for robotic end-effectors. FlexiTac is a practical "plug-in" module consisting of (i) thin, flexible tactile sensor pads that provide dense tactile signals and (ii) a compact multi-channel readout board that streams synchronized measurements for real-time control and large-scale data collection. FlexiTac pads adopt a sealed three-layer laminate stack (FPC-Velostat-FPC) with electrode patterns directly integrated into flexible printed circuits, substantially improving fabrication throughput and repeatability while maintaining mechanical compliance for deployment on both rigid and soft grippers. The readout electronics use widely available, low-cost components and stream tactile signals to a host computer at 100 Hz via serial communication. Across multiple configurations, including fingertip pads and larger tactile mats, FlexiTac can be mounted on diverse platforms without major mechanical redesign. We further show that FlexiTac supports modern tactile learning pipelines, including 3D visuo-tactile fusion for contact-aware decision making, cross-embodiment skill transfer, and real-to-sim-to-real fine-tuning with GPU-parallel tactile simulation. Our project page is available at https://flexitac.github.io/.
comment: Website: https://flexitac.github.io/
Design and Characteristics of a Thin-Film ThermoMesh for the Efficient Embedded Sensing of a Spatio-Temporally Sparse Heat Source
This work presents ThermoMesh, a passive thin-film thermoelectric mesh sensor designed to detect and characterize spatio-temporally sparse heat sources through conduction-based thermal imaging. The device integrates thermoelectric junctions with linear or nonlinear interlayer resistive elements to perform simultaneous sensing and in-sensor compression. We focus on the single-event (1-sparse) operation and define four performance metrics: range, efficiency, sensitivity, and accuracy. Numerical modeling shows that a linear resistive interlayer flattens the sensitivity distribution and improves minimum sensitivity by approximately tenfold for a $16\times16$ mesh. Nonlinear temperature-dependent interlayers further enhance minimum sensitivity at scale: a ceramic negative-temperature-coefficient (NTC) layer over 973--1273~K yields a $\sim14{,}500\times$ higher minimum sensitivity than the linear design at a $200\times200$ mesh, while a VO$_2$ interlayer modeled across its metal--insulator transition (MIT) over 298--373~K yields a $\sim24\times$ improvement. Using synthetic 1-sparse datasets with white boundary-channel noise at a signal-to-noise ratio of 40~dB, the VO$_2$ case achieved $98\%$ localization accuracy, a mean absolute temperature error of $0.23$~K, and a noise-equivalent temperature (NET) of $0.07$~K. For the ceramic-NTC case no localization errors were observed under the tested conditions, with a mean absolute temperature error of $1.83$~K and a NET of $1.49$~K. These results indicate that ThermoMesh could enable energy-efficient embedded thermal sensing in scenarios where conventional infrared imaging is limited, such as molten-droplet detection or hot-spot monitoring in harsh environments.
comment: 45 pages, 13 figures, 63 references, under review in Sensors and Actuators A: Physical
FreeOcc: Training-Free Embodied Open-Vocabulary Occupancy Prediction
Existing learning-based occupancy prediction methods rely on large-scale 3D annotations and generalize poorly across environments. We present FreeOcc, a training-free framework for open-vocabulary occupancy prediction from monocular or RGB-D sequences. Unlike prior approaches that require voxel-level supervision and ground-truth camera poses, FreeOcc operates without 3D annotations, pose ground truth, or any learning stage. FreeOcc incrementally builds a globally consistent occupancy map via a four-layer pipeline: a SLAM backbone estimates poses and sparse geometry; a geometrically consistent Gaussian update constructs dense 3D Gaussian maps; open-vocabulary semantics from off-the-shelf vision-language models are associated with Gaussian primitives; and a probabilistic Gaussian-to-occupancy projection produces dense voxel occupancy. Despite being entirely training-free and pose-agnostic, FreeOcc achieves over $2\times$ improvements in IoU and mIoU on EmbodiedOcc-ScanNet compared to prior self-supervised methods. We further introduce ReplicaOcc, a benchmark for indoor open-vocabulary occupancy prediction, and show that FreeOcc transfers zero-shot to novel environments, substantially outperforming both supervised and self-supervised baselines. Project page: https://the-masses.github.io/freeocc-web/.
comment: RSS 2026
GSDrive: Reinforcing Driving Policies by Multi-mode Trajectory Probing with 3D Gaussian Splatting Environment
End-to-end (E2E) autonomous driving presents a promising approach for translating perceptual inputs directly into driving actions. However, prohibitive annotation costs and temporal data quality degradation hinder long-term real-world deployment. While combining imitation learning (IL) and reinforcement learning (RL) is a common strategy for policy improvement, conventional RL training relies on delayed, event-based rewards-policies learn only from catastrophic outcomes such as collisions, leading to premature convergence to suboptimal behaviors. To address these limitations, we introduce GSDrive, a framework that exploits 3D Gaussian Splatting (3DGS) for differentiable, physics-based reward shaping in E2E driving policy improvement. Our method incorporates a flow matching-based trajectory predictor within the 3DGS simulator, enabling multi-mode trajectory probing where candidate trajectories are rolled out to assess prospective rewards. This establishes a bidirectional knowledge exchange between IL and RL by grounding reward functions in physically simulated interaction signals, offering immediate dense feedback instead of sparse catastrophic events. Evaluated on the reconstructed nuScenes dataset, our method surpasses existing simulation-based RL driving approaches in closed-loop experiments. Code is available at https://github.com/ZionGo6/GSDrive.
comment: initial version
Framework for Collaborative Operation of Autonomous Delivery Vehicles Within a Marshaling Yard
As autonomous vehicles slowly deploy into urban roads for limited use cases with significant edge case issues, closed facilities like marshaling yards provide a ripe case for combining lower-level vehicle autonomy with fixed infrastructure to create full autonomy without similar edge case concerns. Within a delivery marshaling yard, electric fleet vehicles complete a set of sequential tasks (charging, inspection, cleaning, and loading) before exiting the yard with their new load of deliveries. Hybrid automation of the vehicles and infrastructure can allow these vehicles to reach full autonomy and navigate the facility without the need of a driver, allowing for quicker movement between tasks increasing vehicle throughput. However, isolated autonomous operations based on static rules are prone to gridlock causing facility failures that temporarily shut down operations. Our orchestrated autonomy solution uses decentralized, dynamic priority scoring of vehicles based on the current status of the marshaling yard to optimally assign vehicles to tasks to increase vehicle throughput. Using a simulated facility with three marshaling yard sizes (small, medium, and large) and three demand levels (low, medium, high), we demonstrated that our orchestration solution increases vehicle throughput above static, isolated autonomy for all combinations of yard size and demand, while reducing facility failures at high demand levels.
Dreaming Across Towns: Semantic Rollout and Town-Adversarial Regularization for Zero-Shot Held-Out-Town Fixed-Route Driving in CARLA
Learned driving agents often degrade when deployed in unseen environments. This paper studies a deliberately bounded instance of that problem in the CARLA simulator: zero-shot transfer of a closed-loop fixed-route driving agent from Town05 and Town06 to unseen Town03 and Town04. The study isolates structural town shift by keeping weather fixed to ClearNoon and removing traffic and pedestrians. We build on a Dreamer-style latent world-model agent and add two training-only auxiliary losses: multi-horizon prediction of future visual-semantic embeddings along imagined rollouts and town-adversarial supervision on a semantic projection of the recurrent latent state. A causal context feature conditions the semantic rollout predictor, while the actor and critic retain the standard control feature. The policy receives no navigation command, route polyline, goal pose, or map input; the reference route is used only by the environment for reward, progress, success, and termination. Across the evaluated held-out towns, the proposed model achieves the highest mean success rate among the included Dreamer-family methods. Secondary safety and lane-keeping metrics are mixed across towns. These results support a bounded conclusion: in this controlled fixed-weather CARLA setting, semantic rollout supervision combined with town-adversarial regularization improves mean held-out-town route completion.
Flying by Inference: Active Inference World Models for Adaptive UAV Swarms
This paper presents an expert-guided active-inference-inspired framework for adaptive UAV swarm trajectory planning. The proposed method converts multi-UAV trajectory design from a repeated combinatorial optimization problem into a hierarchical probabilistic inference problem. In the offline phase, a genetic-algorithm planner with repulsive-force collision avoidance (GA--RF) generates expert demonstrations, which are abstracted into Mission, Route, and Motion dictionaries. These dictionaries are used to learn a probabilistic world model that captures how expert mission allocations induce route orders and how route orders induce motion-level behaviors. During online operation, the UAV swarm evaluates candidate actions by forming posterior beliefs over symbolic states and minimizing KL-divergence-based abnormality indicators with respect to expert-derived reference distributions. This enables mission allocation, route insertion, motion adaptation, and collision-aware replanning without rerunning the offline optimizer. Bayesian state estimators, including EKF and PF modules, are integrated at the motion level to improve trajectory correction under uncertainty. Simulation results show that the proposed framework preserves expert-like planning structure while producing smoother and more stable behavior than modified Q-learning. Additional validation using real-flight UAV trajectory data demonstrates that the learned world model can correct symbolic predictions under noisy and non-smooth observations, supporting its applicability to adaptive UAV swarm autonomy.
comment: Submitted to IEEE journal
Learning-Based Hierarchical Scene Graph Matching for Robot Localization Leveraging Prior Maps
Accurate localization is a fundamental requirement for autonomous robots operating in indoor environments. Scene graphs encode the spatial structure of an environment as a hierarchy of semantic entities and their relationships, and can be constructed both online from robot sensor data and offline from architectural priors such as Building Information Models (BIM). Matching these two complementary representations enables drift correction in SLAM by grounding robot observations against a known structural prior. However, establishing reliable node-to-node correspondences between them remains an open challenge: existing combinatorial methods are prohibitively expensive at scale, and prior learned approaches address only flat graph matching, ignoring the multi-level semantic structure present in both representations. Here we present a learned, end-to-end differentiable pipeline that augments both graphs with semantically motivated edge types encoding intra- and inter- level relationships, explicitly exploiting this hierarchy to enable simultaneous matching from high-level room concepts down to low-level wall surfaces. Trained exclusively on floor plans, the proposed method outperforms the combinatorial baseline in F1 on real LiDAR environments while running an order of magnitude faster, demonstrating viable zero-shot generalization for BIM-assisted robot localization.
MotuBrain: An Advanced World Action Model for Robot Control
Vision-Language-Action (VLA) models achieve strong semantic generalization but often lack fine-grained modeling of world dynamics. Recent work explores video generation models as a foundation for world modeling, leading to unified World Action Models (WAMs) that jointly model visual dynamics and actions. We present MotuBrain, a unified multimodal generative model that jointly models video and action under a UniDiffuser formulation with a three-stream Mixture-of-Transformers architecture. A single model supports multiple inference modes, including policy learning, world modeling, video generation, inverse dynamics, and joint video-action prediction, while scaling to heterogeneous multimodal data such as video-only and cross-embodiment robot data. To improve real-world applicability, MotuBrain introduces a unified multiview representation, explicit language-action coupling, and an efficient inference stack, achieving over 50x speedup for real-time deployment.
Connected Dependability Cage: Run-Time Function and Anomaly Monitoring for the Development and Operation of Safe Automated Vehicles
The advancement of automated vehicles introduces complex safety challenges, particularly in dynamic and unpredictable environments where AI-enabled perception systems must operate reliably. Ensuring compliance with safety standards such as ISO 26262 and ISO/PAS 21448 (SOTIF) is essential for addressing system malfunctions and mitigating unsafe behavior in unknown scenarios. However, as automation levels increase, vehicles must go beyond conventional functional safety by incorporating fail-operational capabilities that enable continued safe operation during system or component failures and the handling of unfamiliar or degraded operational conditions. To address these safety concerns, we propose the Connected Dependability Cage, an architectural framework designed to enable hierarchical fail-operational behavior in AI-enabled perception systems. This framework integrates two complementary monitoring mechanisms: a Function Monitor that oversees multiple heterogeneous AI-based perception pipelines and detects inconsistencies through a voting mechanism, and an Anomaly Monitor that evaluates the reliability of AI perception by detecting unknown or novel objects in scenes that may be excluded from the training dataset. In the presence of critical discrepancies, the system supports graceful degradation, ultimately enabling a transition to a minimal-risk maneuver strategy. Furthermore, whenever either monitor raises a safety flag, an automated data recording process is initiated to facilitate iterative system development and continuous improvement. Both monitors have been implemented and validated through extensive vehicle testing, demonstrating their practical effectiveness in real-world applications.
ExoActor: Exocentric Video Generation as Generalizable Interactive Humanoid Control
Humanoid control systems have made significant progress in recent years, yet modeling fluent interaction-rich behavior between a robot, its surrounding environment, and task-relevant objects remains a fundamental challenge. This difficulty arises from the need to jointly capture spatial context, temporal dynamics, robot actions, and task intent at scale, which is a poor match to conventional supervision. We propose ExoActor, a novel framework that leverages the generalization capabilities of large-scale video generation models to address this problem. The key insight in ExoActor is to use third-person video generation as a unified interface for modeling interaction dynamics. Given a task instruction and scene context, ExoActor synthesizes plausible execution processes that implicitly encode coordinated interactions between robot, environment, and objects. Such video output is then transformed into executable humanoid behaviors through a pipeline that estimates human motion and executes it via a general motion controller, yielding a task-conditioned behavior sequence. To validate the proposed framework, we implement it as an end-to-end system and demonstrate its generalization to new scenarios without additional real-world data collection. Furthermore, we conclude by discussing limitations of the current implementation and outlining promising directions for future research, illustrating how ExoActor provides a scalable approach to modeling interaction-rich humanoid behaviors, potentially opening a new avenue for generative models to advance general-purpose humanoid intelligence.
comment: Work in progress. Project page: https://baai-agents.github.io/ExoActor/
Can Tabular Foundation Models Guide Exploration in Robot Policy Learning?
Policy optimization in high-dimensional continuous control for robotics remains a challenging problem. Predominant methods are inherently local and often require extensive tuning and carefully chosen initial guesses for good performance, whereas more global and less initialization-sensitive search methods typically incur high rollout costs. We propose TFM-S3, a tabular hybrid local-global method for improving global exploration in robot policy learning with limited rollout cost. We interleave high-frequency local updates with intermittent rounds of global search. In each search round, we construct a dynamically updated low-dimensional policy subspace via SVD and perform iterative surrogate-guided refinement within this space. A pretrained tabular foundation model predicts candidate returns from a small context set, enabling large-scale screening with limited rollout cost. Experiments on continuous control benchmarks show that TFM-S3 consistently accelerates early-stage convergence and improves final performance compared to TD3 and population-based baselines under an identical rollout budget. These results demonstrate that foundation models are a powerful new tool for creating sample-efficient policy learning methods for continuous control in robotics.
comment: 8 pages, 6 figures
Robot Learning from Human Videos: A Survey
A critical bottleneck hindering further advancement in embodied AI and robotics is the challenge of scaling robot data. To address this, the field of learning robot manipulation skills from human video data has attracted rapidly growing attention in recent years, driven by the abundance of human activity videos and advances in computer vision. This line of research promises to enable robots to acquire skills passively from the vast and readily available resource of human demonstrations, substantially favoring scalable learning for generalist robotic systems. Therefore, we present this survey to provide a comprehensive and up-to-date review of human-video-based learning techniques in robotics, focusing on both human-robot skill transfer and data foundations. We first review the policy learning foundations in robotics, and then describe the fundamental interfaces to incorporate human videos. Subsequently, we introduce a hierarchical taxonomy of transferring human videos to robot skills, covering task-, observation-, and action-oriented pathways, along with a cross-family analysis of their couplings with different data configurations and learning paradigms. In addition, we investigate the data foundations including widely-used human video datasets and video generation schemes, and provide large-scale statistical trends in dataset development and utilization. Ultimately, we emphasize the challenges and limitations intrinsic to this field, and delineate potential avenues for future research. The paper list of our survey is available at https://github.com/IRMVLab/awesome-robot-learning-from-human-videos.
comment: Paper list: https://github.com/IRMVLab/awesome-robot-learning-from-human-videos
Simulating Infant First-Person Sensorimotor Experience via Motion Retargeting from Babies to Humanoids
Motion retargeting from humans to human-like artificial agents is becoming increasingly important as humanoid robots grow more capable. However, most existing approaches focus only on reproducing kinematics and ignore the rich sensorimotor experience associated with human movement. In this work, we present a framework for simulating the multimodal sensorimotor experiences of infants using physical and virtual humanoids. From a single video, our method reconstructs the infant's body configuration by extracting its skeletal structure and estimating the full 3D pose from each frame. Then we map the reconstructed motion onto several developmental platforms: the physical iCub robot and the virtual simulators pyCub, EMFANT and MIMo. Replaying the retargeted motions on these embodiments produces simulated multisensory streams including proprioception (joints and muscles), touch, and vision. For the best-matching embodiment, the retargeting achieves sub-centimeter accuracy and enables a rich multimodal analysis of infant development as well as enhanced automated annotation of behaviors. This framework provides a unique window into the infant's sensorimotor experience, offering new tools for robotics, developmental science, and early detection of neurodevelopmental disorders. The code is available at https://github.com/ctu-vras/motion-retargeting/.
comment: Submitted to IEEE ICDL. 8 pages, 6 figures
Function-based Parametric Co-Design Optimization of Dexterous Hands
Despite advances in dexterous hand manipulation, robotic hand design is still largely decoupled from task-driven evaluation and control, limiting systematic optimization. Existing robotic hand co-design approaches are often limited in scope, optimizing a small subset of design parameters. We introduce a comprehensive parametric framework for robotic hand generation that unifies palm structure, finger kinematics, fingertip geometry, and fine-scale surface curvatures within a single design space. Fine geometric features are introduced through parametric surface deformation kernels that directly influence contact interactions. We validate the framework on design optimization in grasp stability tasks in simulation and real-world dynamic scenarios. Our framework produces simulation- and fabrication-ready hand models and will be released as open-source to enable rapid design iteration for dexterous hand co-design optimization frameworks and cross-embodiment policy training and control research.
comment: 8 pages, 7 figures, https://www.aminmirzaee.com/HandCDO/
SASI: Leveraging Sub-Action Semantics for Robust Early Action Recognition in Human-Robot Interaction
Understanding human actions is critical for advancing behavior analysis in human-robot interaction. Particularly in tasks that demand quick and proactive feedback, robots must recognize human actions as early as possible from incomplete observations. \textit{Sub-actions} offer the semantic and hierarchical cues needed for this, since human actions are inherently structured and can be decomposed into smaller, meaningful units. However, conventional approaches focus primarily on holistic actions and often overlook the rich semantic structure embedded in sub-actions, making them poorly suited for early recognition. To address this gap, we introduce SASI (Sub-Action Semantics Integrated cross-modal fusion), a novel framework that integrates existing graph convolution networks to fuse spatiotemporal features with sub-action semantics. SASI exploits a segmentation model with a traditional skeleton-based graph convolution network, capturing both fine-grained sub-action semantics and overall spatial context, while operating in real-time at 29 Hz. Experiments on BABEL, a skeleton-based dataset with frame-level annotations, demonstrate that our method improves recognition accuracy over conventional approaches, with additional gains expected as the quality of sub-action segmentation improves. Notably, SASI also achieves superior performance in understanding partial action sequences, revealing its capability for early recognition, which is essential for proactive and seamless Human-Robot Interaction (HRI). Code is available at https://anonymous.4open.science/r/SASI .
PRTS: A Primitive Reasoning and Tasking System via Contrastive Representations
Vision-Language-Action (VLA) models advance robotic control via strong visual-linguistic priors. However, existing VLAs predominantly frame pretraining as supervised behavior cloning, overlooking the fundamental nature of robot learning as a goal-reaching process that requires understanding temporal task progress. We present \textbf{PRTS} (\textbf{P}rimitive \textbf{R}easoning and \textbf{T}asking \textbf{S}ystem), a VLA foundation model that reformulates pretraining through Goal-Conditioned Reinforcement Learning. By treating language instructions as goals and employing contrastive reinforcement learning, PRTS learns a unified embedding space where the inner product of state-action and goal embeddings approximates the log-discounted goal occupancy, the probability of reaching the language-specified goal from the current state-action, quantitatively assessing physical feasibility beyond static semantic matching. PRTS draws this dense goal-reachability supervision directly from offline trajectories without reward annotations, and folds it into the VLM backbone via a role-aware causal mask, incurring negligible overhead over vanilla behavior cloning. This paradigm endows the high-level reasoning system with intrinsic goal reachability awareness, bridging semantic reasoning and temporal task progress, and further benefits goal-conditioned action prediction. Pretrained on 167B tokens of diverse manipulation and embodied-reasoning data, PRTS reaches state-of-the-art performance on LIBERO, LIBERO-Pro, LIBERO-Plus, SimplerEnv, and a real-world suite of 14 complex tasks, with particularly substantial gains on long-horizon, contact-rich, and zero-shot novel-instruction settings, confirming that injecting goal-reachability awareness significantly improves both execution success and long-horizon planning of general-purpose robotic foundation policies.
comment: 38 pages, 12 figures
RAY-TOLD: Ray-Based Latent Dynamics for Dense Dynamic Obstacle Avoidance with TDMPC
Dense, dynamic crowds pose a persistent challenge for autonomous mobile robots. Purely reactive planning methods, such as Model Predictive Path Integral (MPPI) control, often fail to escape local minima in complex scenarios due to their limited prediction horizon. To bridge this gap, we propose Ray-based Task-Oriented Latent Dynamics (RAY-TOLD), a hybrid control architecture that integrates obstacle information into latent dynamics and utilizes the robustness of physics-based MPPI with the long-horizon foresight of reinforcement learning. RAY-TOLD leverages a LiDAR-centric latent dynamics model to encode high-dimensional sensor data into a compact state representation, enabling the learning of a terminal value function and a policy prior. We introduce a policy mixture sampling strategy that augments the MPPI candidate population with trajectories derived from the learned policy, effectively guiding the planner towards the goal while maintaining kinematic feasibility. Extensive tests in a stochastic environment with high-density dynamic obstacles demonstrate that our method outperforms the MPPI baseline, reducing the collision rate. The results confirm that blending short-horizon physics-based rollouts with learned long-horizon intent significantly enhances navigation reliability and safety.
comment: 8 pages, 4 figures
An Experimental Modular Instrument With a Haptic Feedback Framework for Robotic Surgery Training
Robotic-assisted surgery offers significant clinical advantages but largely eliminates direct haptic feedback, increasing the risk of excessive tool-tissue interaction forces. Although recent commercial systems have begun to introduce force feedback, their high cost limits accessibility, particularly for surgical training. This paper presents a modular experimental robotic laparoscopic instrument integrated with a real-time haptic feedback framework. The proposed instrument employs a wrist-mounted force/torque (F/T) sensor to estimate tool-tissue interaction forces while avoiding the durability and integration challenges of tip-mounted sensors. A haptic feedback framework is developed to extract the external contact forces, render them to the haptic device, and generate stable and perceptually meaningful feedback. The instrument is integrated into the robotic surgery training system (RoboScope) and evaluated through a controlled user study involving a force regulation task. Experimental results demonstrate that haptic feedback significantly improves task success rate, force regulation accuracy, and task efficiency compared to visual-only feedback. The proposed instrument enables stable, high-fidelity haptic interaction, supporting effective robotic surgery training.
comment: Accepted to the 11th IEEE RAS/EMBS International Conference on Biomedical Robotics and Biomechatronics (BioRob 2026)
DOT-Sim: Differentiable Optical Tactile Simulation with Precise Real-to-Sim Physical Calibration ICRA 2026
Simulating optical tactile sensors presents significant challenges due to their high deformability and intricate optical properties. To address these issues and enable a physically accurate simulation, we propose DOT-Sim: Differentiable Optical Tactile Simulation. Unlike prior simulators that rely on simplified models of deformable sensors, DOT-Sim accurately captures the physical behavior of soft sensors by modeling them as elastic materials using the Material Point Method (MPM). DOT-Sim enables rapid calibration of optical tactile sensor simulation using a small number of demonstrations within minutes, which is substantially faster than existing methods. Compared to current baselines, our approach supports much larger and non-linear deformations. To handle the optical aspect, we propose a novel approach to simulating optical responses by learning a residual image relative to the real-world idle state. We validate the physical and visual realism of our method through a series of zero-shot sim-to-real tasks. Our experiments show that DOT-Sim (1) accurately replicates the physical dynamics of a DenseTact optical tactile sensor in reality, (2) generates realistic optical outputs in contact-rich scenarios, (3) enables direct deployment of simulation-trained classifiers in the real world, achieving 85% classification accuracy on challenging objects and 90% accuracy in embedded tumor-type detection, and (4) allows precise trajectory following with a policy trained from demonstrations in simulation, with an average error of less than 0.9 mm.
comment: Accepted at ICRA 2026
K2MUSE: A human lower-limb multimodal walking dataset spanning task and acquisition variability for rehabilitation robotics
The natural interaction and control performance of lower limb rehabilitation robots are closely linked to biomechanical information from various human locomotion activities. Multidimensional human motion data significantly deepen the understanding of the complex mechanisms governing neuromuscular alterations, thereby facilitating the development and application of rehabilitation robots in multifaceted real-world environments.However, existing lower limb datasets are inadequate for supplying the essential multimodal data and large-scale gait samples necessary for the development of effective data-driven approaches, and the significant effects of acquisition interference in real applications are neglected.To fill this gap, we present the K2MUSE dataset, which includes a comprehensive collection of multimodal data, comprising kinematic, kinetic, amplitude mode ultrasound (AUS), and surface electromyography (sEMG) measurements. The proposed dataset includes lower-limb multimodal data collected from two cohorts, including 30 able-bodied young adults and 12 older adults, across different inclines (0$^\circ$, $\pm$5$^\circ$, and $\pm$10$^\circ$), speeds (0.5 m/s, 1.0 m/s, and 1.5 m/s), and representative non-ideal acquisition conditions (muscle fatigue, electrode shifts, and interday differences). The kinematic and ground reaction force data were collected with a Vicon motion capture system and an instrumented treadmill with embedded force plates, whereas the sEMG and AUS data of thirteen muscles on the bilateral lower limbs were synchronously recorded.K2MUSE is released with the corresponding structured documentation, preprocessing pipelines, and example code, thereby providing a comprehensive resource for rehabilitation robot development, biomechanical analysis, and wearable sensing research. The dataset is available at https://k2muse.github.io/.
comment: 34 pages, 30 figures,7 tables
From Action Labels to Sets: Rethinking Action Supervision for Imitation Learning from Corrective Feedback
Behavior cloning (BC) optimizes policies by treating human demonstrations as pointwise action labels. While effective with accurate action labels, this formulation is brittle in practice: when human-provided actions are imperfect, treating each label as an exact target can steer the policy away from the underlying desired behavior, particularly when expressive models are used (e.g., energy-based models). As a result, we propose a human-in-the-loop alternative that replaces pointwise supervision with set-valued action targets. We introduce Contrastive policy Learning from Interactive Corrections (CLIC). CLIC leverages human corrections to construct and refine sets of desired actions, and optimizes a policy to place probability mass over these sets rather than over a single action target. This formulation naturally accommodates both absolute and relative corrections and can represent complex multi-modal behaviors. Extensive simulation and real-robot experiments show that the proposed approach leads to effective policy learning across diverse settings: CLIC remains competitive with the state of the art under accurate data while being substantially more robust under noisy, relative, and partial feedback. Our implementation is publicly available at https://clic-webpage.github.io/.
Clinical Evaluation of a Tongue-Controlled Wrist Abduction-Adduction Assistance in a 6-DoF Upper-Limb Exoskeleton for Individuals with ALS and SCI
Upper-limb exoskeletons (ULEs) have the potential to restore functional independence in individuals with severe motor impairments; however, the clinical relevance of wrist degrees of freedom (DoF), particularly abduction-adduction (Ab-Ad), remains insufficiently evaluated. This study investigates the functional and user-perceived impact of wrist Ab-Ad assistance during two activities of daily living (ADLs). Wrist Ab-Ad assistance in a tongue-controlled 6-DoF ULE, EXOTIC2, was evaluated in a within-subject study involving one individual with amyotrophic lateral sclerosis and five individuals with spinal cord injury. Participants performed drinking and scratch stick leveling tasks with EXOTIC2 under two conditions: with and without wrist Ab-Ad assistance. Outcome measure included task success, task completion time, kinematic measures, and a usability questionnaire capturing comfort, functional perception, and acceptance. Enabling wrist Ab-Ad improved task success rates across both ADLs, with consistent reductions in spillage (from 77.8% spillages to 22.2%) and failed placements (from 66.7% to 16.7%). Participants utilized task-specific subsets of the available wrist range of motion, indicating that effective control within functional ranges was more critical than maximal joint excursion. Questionnaire responses indicated no increase in discomfort with the additional DoF and reflected perceived improvements in task performance. In conclusion, wrist Ab-Ad assistance enhances functional task performance in assistive exoskeleton use without compromising user comfort. However, its effectiveness depends on task context, control usability, and individual user strategies. This study provides clinically relevant, user-centered evidence supporting the inclusion of wrist Ab-Ad in ULEs, emphasizing the importance of balancing functional capability with usability in assistive device design.
comment: 9 pages, 7 figures and 2 tables. This work has been submitted to the IEEE Transactions on Neural Systems and Rehabilitation Engineering
Design, Modelling and Experimental Evaluation of a Tendon-driven Wrist Abduction-Adduction Mechanism for an upper limb exoskeleton
Wrist exoskeletons play a vital role in rehabilitation and assistive applications, yet conventional actuation mechanisms such as electric motors or pneumatics often introduce undesirable weight, friction, and complexity. This paper presents a novel single-cable (tendon), torsional-spring-assisted actuation mechanism for wrist abduction-adduction, and a simulation-based method for selecting its stiffness parameters. The mechanism employs a single Bowden cable passively tensioned by a spiral torsional spring (clock spring) to maintain continuous cable tension without antagonistic actuation. Kinematic and dynamic modeling of the mechanism was performed to estimate the required torque and identify optimal spring parameters. These simulation-derived parameters guided the design of a functional prototype, which was experimentally evaluated with five participants with no motor disabilities (NMD) under varying arm positions and loading conditions using three spring configurations to account for user variability and modeling uncertainties. Experimental results show consistent agreement with simulation-derived trends, with the nominal spring configuration achieving balanced motion range, torque demand, and repeatability. The results demonstrate that simulation-informed stiffness selection can effectively guide the design of compact, cable-driven wrist exoskeletons while reducing reliance on empirical tuning.
comment: 8 pages and 8 figures. Submitted to IEEE/ASME Transactions on Mechatronics. Includes experimental validation on human participants
IKSPARK: Obstacle-Aware Inverse Kinematics via Convex Optimization
Inverse kinematics (IK) is central to robot control and motion planning, yet its nonlinear kinematic mapping makes it inherently nonconvex and particularly challenging under complex constraints. We present IKSPARK (Inverse Kinematics using Semidefinite Programming And RanK minimization), an obstacle-aware IK solver for robots with diverse morphologies, including open and closed kinematic chains with spherical, revolute, and prismatic joints. Our formulation expresses IK as a semidefinite programming (SDP) problem with additional rank-1 constraints on symmetric matrices with fixed traces. IKSPARK first solves the relaxed SDP, whose infeasibility certifies infeasibility of the original IK problem, and then recovers a rank-1 solution using iterative rank-minimization methods with proven local convergence. Obstacle avoidance is handled through a convexified formulation of mixed-integer constraints. Extensive experiments show that IKSPARK computes highly accurate solutions across various kinematic structures and constrained environments without post-processing. In obstacle-rich settings, especially fixed workcell environments, IKSPARK achieves substantially higher success rates than traditional nonlinear optimization methods.
GazeVLA: Learning Human Intention for Robotic Manipulation
Embodied foundation models have achieved significant breakthroughs in robotic manipulation, yet they still depend heavily on large-scale robot demonstrations. Although recent works have explored leveraging human data to alleviate this dependency, effectively extracting transferable knowledge remains a significant challenge due to the inherent embodiment gap between human and robot. We argue that the intention underlying human actions can serve as a powerful intermediate representation for bridging this gap. In this paper, we introduce a novel framework that explicitly learns and transfers human intention to facilitate robotic manipulation. Specifically, we model intention through gaze, as it naturally precedes physical actions and serves as an observable proxy for human intent. Our model is first pretrained on a large-scale egocentric human dataset to capture human intention and its synergy with action, followed by finetuning on a small set of robot and human data. During inference, the model adopts a Chain-of-Thought reasoning paradigm, sequentially predicting intention before executing the action. Extensive evaluations in simulation and real-world settings, across long-horizon and fine-grained tasks, and under few-shot and robustness benchmarks, show that our method consistently outperforms strong baselines, generalizes better, and achieves state-of-the-art performance. Project page: https://gazevla.github.io .
comment: Project page: https://gazevla.github.io
TouchGuide: Inference-Time Steering of Visuomotor Policies via Touch Guidance
Fine-grained and contact-rich manipulation remain challenging for robots, largely due to the underutilization of tactile feedback. To address this, we introduce TouchGuide, a novel cross-policy visuo-tactile fusion paradigm that fuses modalities within a low-dimensional action space. Specifically, TouchGuide operates in two stages to guide a pre-trained diffusion or flow-matching visuomotor policy at inference time. First, the policy produces a coarse, visually-plausible action using only visual inputs during early sampling. Second, a task-specific Contact Physical Model (CPM) provides tactile guidance to steer and refine the action, ensuring it aligns with realistic physical contact conditions. Trained through contrastive learning on limited expert demonstrations, the CPM provides a tactile-informed feasibility score to steer the sampling process toward refined actions that satisfy physical contact constraints. Furthermore, to facilitate TouchGuide training with high-quality and cost-effective data, we introduce TacUMI, a data collection system. TacUMI achieves a favorable trade-off between precision and affordability; by leveraging rigid fingertips, it obtains direct tactile feedback, thereby enabling the collection of reliable tactile data. Extensive experiments on five challenging contact-rich tasks, such as shoe lacing and chip handover, show that TouchGuide consistently and significantly outperforms state-of-the-art visuo-tactile policies.
ImagineNav++: Prompting Vision-Language Models as Embodied Navigator through Scene Imagination
Visual navigation is a fundamental capability for autonomous home-assistance robots, enabling long-horizon tasks such as object search. While recent methods have leveraged Large Language Models (LLMs) to incorporate commonsense reasoning and improve exploration efficiency, their planning remains constrained by textual representations, which cannot adequately capture spatial occupancy or scene geometry--critical factors for navigation decisions. We explore whether Vision-Language Models (VLMs) can achieve mapless visual navigation using only onboard RGB/RGB-D streams, unlocking their potential for spatial perception and planning. We achieve this through an imagination-powered navigation framework, ImagineNav++, which imagines future observation images from candidate robot views and translates navigation planning into a simple best-view image selection problem for VLMs. First, a future-view imagination module distills human navigation preferences to generate semantically meaningful viewpoints with high exploration potential. These imagined views then serve as visual prompts for the VLM to identify the most informative viewpoint. To maintain spatial consistency, we develop a selective foveation memory mechanism, which hierarchically integrates keyframe observations via a sparse-to-dense framework, constructing a compact yet comprehensive memory for long-term spatial reasoning. This approach transforms goal-oriented navigation into a series of tractable point-goal navigation tasks. Extensive experiments on open-vocabulary object and instance navigation benchmarks show that ImagineNav++ achieves SOTA performance in mapless settings, even surpassing most map-based methods, highlighting the importance of scene imagination and memory in VLM-based spatial reasoning.
comment: 17 pages, 10 figures. arXiv admin note: text overlap with arXiv:2410.09874
Adaptive Nonlinear MPC for Trajectory Tracking of An Overactuated Tiltrotor Hexacopter
Omnidirectional micro aerial vehicles (OMAVs) are more capable of doing environmentally interactive tasks due to their ability to exert full wrenches while maintaining stable poses. However, OMAVs often incorporate additional actuators and complex mechanical structures to achieve omnidirectionality. Obtaining precise mathematical models is difficult, and the mismatch between the model and the real physical system is not trivial. The large model-plant mismatch significantly degrades overall system performance if a non-adaptive model predictive controller (MPC) is used. This work presents the $\mathcal{L}_1$-MPC, an adaptive nonlinear model predictive controller for accurate 6-DOF trajectory tracking of an overactuated tiltrotor hexacopter in the presence of model uncertainties and external disturbances. The $\mathcal{L}_1$-MPC adopts a cascaded system architecture in which a nominal MPC is followed and augmented by an $\mathcal{L}_1$ adaptive controller. The proposed method is evaluated against the non-adaptive MPC, the EKF-MPC, and the PID method in both numerical and PX4 software-in-the-loop simulation with Gazebo. The $\mathcal{L}_1$-MPC reduces the tracking error by around 90% when compared to a non-adaptive MPC, and the $\mathcal{L}_1$-MPC has lower tracking errors, higher uncertainty estimation rates, and less tuning requirements over the EKF-MPC. We will make the implementations, including the hardware-verified PX4 firmware and Gazebo plugins, open-source at https://github.com/HITSZ-NRSL/omniHex.
comment: (1) Eq. (10) sign error, inconsistent with Eq. (14). (2) Eq. (15) spurious Coriolis term (skips transport theorem). (3) typo before Eq. (21): _Bω_dot_EKF?_Bτ_dot_EKF. (4) Sec. IV comparison lacks systematic tuning and does not support its claims. (5) the open-source release at github.com/HITSZ-NRSL/omniHex will not happen
Event-Centric World Modeling with Memory-Augmented Retrieval for Embodied Decision-Making
Autonomous agents operating in dynamic and safety-critical environments require decision-making frameworks that are both computationally efficient and physically grounded. However, many existing approaches rely on end-to-end learning, which often lacks interpretability and explicit mechanisms for ensuring consistency with physical constraints. In this work, we propose an event-centric world modeling framework with memory-augmented retrieval for embodied decision-making. The framework represents the environment as a structured set of semantic events, which are encoded into a permutation-invariant latent representation. Decision-making is performed via retrieval over a knowledge bank of prior experiences, where each entry associates an event representation with a corresponding maneuver. The final action is computed as a weighted combination of retrieved solutions, providing a transparent link between decision and stored experiences. The proposed design enables structured abstraction of dynamic environments and supports interpretable decision-making through case-based reasoning. In addition, incorporating physics-informed knowledge into the retrieval process encourages the selection of maneuvers that are consistent with observed system dynamics. Experimental evaluation in UAV flight scenarios demonstrates that the framework operates within real-time control constraints while maintaining interpretable and consistent behavior.
comment: This is the initial version (v1) released to establish priority for the proposed framework. Subsequent versions will include expanded experimental validation and exhaustive hardware benchmarking
Make Tracking Easy: Neural Motion Retargeting for Humanoid Whole-body Control
Humanoid robots require diverse motor skills to integrate into complex environments, but bridging the kinematic and dynamic embodiment gap from human data remains a major bottleneck. We demonstrate through Hessian analysis that traditional optimization-based retargeting is inherently non-convex and prone to local optima, leading to physical artifacts like joint jumps and self-penetration. To address this, we reformulate the targeting problem as learning data distribution rather than optimizing optimal solutions, where we propose NMR, a Neural Motion Retargeting framework that transforms static geometric mapping into a dynamics-aware learned process. We first propose Clustered-Expert Physics Refinement (CEPR), a hierarchical data pipeline that leverages VAE-based motion clustering to group heterogeneous movements into latent motifs. This strategy significantly reduces the computational overhead of massively parallel reinforcement learning experts, which project and repair noisy human demonstrations onto the robot's feasible motion manifold. The resulting high-fidelity data supervises a non-autoregressive CNN-Transformer architecture that reasons over global temporal context to suppress reconstruction noise and bypass geometric traps. Experiments on the Unitree G1 humanoid across diverse dynamic tasks (e.g., martial arts, dancing) show that NMR eliminates joint jumps and significantly reduces self-collisions compared to state-of-the-art baselines. Furthermore, NMR-generated references accelerate the convergence of downstream whole-body control policies, establishing a scalable path for bridging the human-robot embodiment gap.
comment: Report, 12 pages, 5 figures, 4 tables, webpage: https://nju3dv-humanoidgroup.github.io/nmr.github.io
FeaXDrive: Feasibility-aware Trajectory-Centric Diffusion Planning for End-to-End Autonomous Driving
End-to-end diffusion planning has shown strong potential for autonomous driving, but the physical feasibility of generated trajectories remains insufficiently addressed. In particular, generated trajectories may exhibit local geometric irregularities, violate trajectory-level kinematic constraints, or deviate from the drivable area, indicating that the commonly used noise-centric formulation in diffusion planning is not yet well aligned with the trajectory space where feasibility is more naturally characterized. To address this issue, we propose FeaXDrive, a feasibility-aware trajectory-centric diffusion planning method for end-to-end autonomous driving. The core idea is to treat the clean trajectory as the unified object for feasibility-aware modeling throughout the diffusion process. Built on this trajectory-centric formulation, FeaXDrive integrates adaptive curvature-constrained training to improve intrinsic geometric and kinematic feasibility, drivable-area guidance within reverse diffusion sampling to enhance consistency with the drivable area, and feasibility-aware GRPO post-training to further improve planning performance while balancing trajectory-space feasibility. Experiments on the NAVSIM benchmark show that FeaXDrive achieves strong closed-loop planning performance while substantially improving trajectory-space feasibility. These findings highlight the importance of explicitly modeling trajectory-space feasibility in end-to-end diffusion planning and provide a step toward more reliable and physically grounded autonomous driving planners.
comment: 22 pages, 6 figures
Do World Action Models Generalize Better than VLAs? A Robustness Study
Robot action planning in the real world is challenging as it requires not only understanding the current state of the environment but also predicting how it will evolve in response to actions. Vision-language-action (VLA), which repurpose large-scale vision-language models for robot action generation using action experts, have achieved notable success across a variety of robotic tasks. Nevertheless, their performance remains constrained by the scope of their training data, exhibiting limited generalization to unseen scenarios and vulnerability to diverse contextual perturbations. More recently, world models have been revisited as an alternative to VLAs. These models, referred to as world action models (WAMs), are built upon world models that are trained on large corpora of video data to predict future states. With minor adaptations, their latent representation can be decoded into robot actions. It has been suggested that their explicit dynamic prediction capacity, combined with spatiotemporal priors acquired from web-scale video pretraining, enables WAMs to generalize more effectively than VLAs. In this paper, we conduct a comparative study of prominent state-of-the-art VLA policies and recently released WAMs. We evaluate their performance on the LIBERO-Plus and RoboTwin 2.0-Plus benchmarks under various visual and language perturbations. Our results show that WAMs achieve strong robustness, with LingBot-VA reaching 74.2% success rate on RoboTwin 2.0-Plus and Cosmos-Policy achieving 82.2% on LIBERO-Plus. While VLAs such as $π_{0.5}$ can achieve comparable robustness on certain tasks, they typically require extensive training with diverse robotic datasets and varied learning objectives. Hybrid approaches that partially incorporate video-based dynamic learning exhibit intermediate robustness, highlighting the importance of how video priors are integrated.
AutoVDC: Automated Vision Data Cleaning Using Vision-Language Models
Training of autonomous driving systems requires extensive datasets with precise annotations to attain robust performance. Human annotations suffer from imperfections, and multiple iterations are often needed to produce high-quality datasets. However, manually reviewing large datasets is laborious and expensive. In this paper, we introduce AutoVDC (Automated Vision Data Cleaning) framework and investigate the utilization of Vision-Language Models (VLMs) to automatically identify erroneous annotations in vision datasets, thereby enabling users to eliminate these errors and enhance data quality. We validate our approach using the KITTI and nuImages datasets, which contain object detection benchmarks for autonomous driving. To test the effectiveness of AutoVDC, we create dataset variants with intentionally injected erroneous annotations and observe the error detection rate of our approach. Additionally, we compare the detection rates using different VLMs and explore the impact of VLM fine-tuning on our pipeline. The results demonstrate our method's high performance in error detection and data cleaning experiments, indicating its potential to significantly improve the reliability and accuracy of large-scale production datasets in autonomous driving.
comment: Accepted to IV 2026 Drive-X Foundation Models for Autonomous Driving (Oral presentation)
AID: Agent Intent from Diffusion for Multi-Agent Informative Path Planning
Information gathering in large-scale or time-critical scenarios (e.g., environmental monitoring, search and rescue) requires broad coverage within limited time budgets, motivating the use of multi-agent systems. These scenarios are commonly formulated as multi-agent informative path planning (MAIPP), where multiple agents must coordinate to maximize information gain while operating under budget constraints. A central challenge in MAIPP is ensuring effective coordination while the belief over the environment evolves with incoming measurements. Recent learning-based approaches address this by using distributions over future positions as "intent" to support coordination. However, these autoregressive intent predictors are computationally expensive and prone to compounding errors. Inspired by the effectiveness of diffusion models as expressive, long-horizon policies, we propose AID, a fully decentralized MAIPP framework that leverages diffusion models to generate long-term trajectories in a non-autoregressive manner. AID first performs behavior cloning on trajectories produced by existing MAIPP planners and then fine-tunes the policy using reinforcement learning via Diffusion Policy Policy Optimization (DPPO). This two-stage pipeline enables the policy to inherit expert behavior while learning improved coordination through online reward feedback. Experiments demonstrate that AID consistently improves upon the MAIPP planners it is trained from, achieving 4x faster execution and up to 17% increased information gain, while scaling effectively to larger numbers of agents. Our implementation is publicly available at https://github.com/marmotlab/AID.
Multiagent Systems
Framework for Collaborative Operation of Autonomous Delivery Vehicles Within a Marshaling Yard
As autonomous vehicles slowly deploy into urban roads for limited use cases with significant edge case issues, closed facilities like marshaling yards provide a ripe case for combining lower-level vehicle autonomy with fixed infrastructure to create full autonomy without similar edge case concerns. Within a delivery marshaling yard, electric fleet vehicles complete a set of sequential tasks (charging, inspection, cleaning, and loading) before exiting the yard with their new load of deliveries. Hybrid automation of the vehicles and infrastructure can allow these vehicles to reach full autonomy and navigate the facility without the need of a driver, allowing for quicker movement between tasks increasing vehicle throughput. However, isolated autonomous operations based on static rules are prone to gridlock causing facility failures that temporarily shut down operations. Our orchestrated autonomy solution uses decentralized, dynamic priority scoring of vehicles based on the current status of the marshaling yard to optimally assign vehicles to tasks to increase vehicle throughput. Using a simulated facility with three marshaling yard sizes (small, medium, and large) and three demand levels (low, medium, high), we demonstrated that our orchestration solution increases vehicle throughput above static, isolated autonomy for all combinations of yard size and demand, while reducing facility failures at high demand levels.
Language Models Refine Mechanical Linkage Designs Through Symbolic Reflection and Modular Optimisation
Designing mechanical linkages involves combinatorial topology selection and continuous parameter fitting. We show that language models can systematically improve linkage designs through symbolic representations. Language model agents explore discrete topologies while numerical optimisers fit continuous parameters. A symbolic lifting operator translates simulator trajectories into qualitative descriptors, motion labels, temporal predicates, and structural diagnostics that models interpret across iterative design cycles. Across six engineering-relevant motion targets and three open-source models (Llama 3.3 70B, Qwen3 4B, Qwen3 MoE 30B-A3B), the modular architecture reduces geometric error by up to 68% and improves structural validity by up to 134% over monolithic baselines. Critically, 78.6% of iterative refinement trajectories show measurable improvement, with the system correctly diagnosing overconstraint (56.3%) and underconstraint (35.6%) failure modes and proposing grounded corrections. Models across all three families acquire interpretable mechanical reasoning strategies without fine-tuning, demonstrating that principled symbolic abstraction bridges generative AI and the numerical precision required for engineering design.
Can We Volunteer Out of the Peer Review Crisis?
The volume of scientific manuscripts is growing faster than the capacity to evaluate them, yet the institutions that govern peer review have remained largely unchanged. The result is a widening mismatch: reviewer scarcity, noisier assessments, and declining confidence in editorial decisions. Every scientist wants better reviews, but review quality depends on the total burden, which no single author can shift. To isolate this tension, we provide a game-theoretic thought experiment: a voluntary lottery in which authors accept a chance of random pre-review rejection, reducing reviewer burden and improving the quality of surviving evaluations. We show that a Nash equilibrium emerges in which authors voluntarily enter the lottery. Scientists who care about the literature they read, not just the papers they publish, will opt in, raising the quality of published science for all.
comment: Main text: 13 pages, 4 figures. Supplementary Information: 18 pages
ObjectGraph: From Document Injection to Knowledge Traversal -- A Native File Format for the Agentic Era
Every document format in existence was designed for a human reader moving linearly through text. Autonomous LLM agents do not read - they retrieve. This fundamental mismatch forces agents to inject entire documents into their context window, wasting tokens on irrelevant content, compounding state across multi-turn loops, and broadcasting information indiscriminately across agent roles. We argue this is not a prompt engineering problem, not a retrieval problem, and not a compression problem: it is a format problem. We introduce OBJECTGRAPH (.og), a file format that reconceives the document as a typed, directed knowledge graph to be traversed rather than a string to be injected. OBJECTGRAPH is a strict superset of Markdown - every .md file is a valid .og file - requires no infrastructure beyond a two-primitive query protocol, and is readable by both humans and agents without tooling. We formalize the Document Consumption Problem, characterise six structural properties no existing format satisfies simultaneously, and prove OBJECTGRAPH satisfies all six. We further introduce the Progressive Disclosure Model, the Role-Scoped Access Protocol, and Executable Assertion Nodes as native format primitives. Empirical evaluation across five document classes and eight agent task types demonstrates up to 95.3 percent token reduction with no statistically significant degradation in task accuracy (p > 0.05). Transpiler fidelity reaches 98.7 percent content preservation on a held-out document benchmark.
comment: 12 pages, 4 figures, 4 tables
Autonomous Traffic Signal Optimization Using Digital Twin and Agentic AI for Real-Time Decision-Making
This article outlines a new framework of traffic light optimization through a digital twin of the transport infrastructure, managed by agentic AI to ensure real-time autonomous decisions. The framework relies on physical sensors and edge computing to measure real-time traffic information and simulate traffic flow in a constantly updated digital twin. The traffic light is automatically controlled through the digital twin according to traffic congestion, travel delay and traffic patterns. This approach is implemented as a three-layer system: perception, conceptualization and action. The perception layer receives data on physical systems; the conceptualization layer uses LangChain to process the data; and the action layer links to the Model Context Protocol (MCP) and traffic management APIs to implement optimised traffic signal control algorithms. The results show that the framework minimizes waiting time at traffic lights and positively affects the effectiveness of the entire traffic flow, which is better than the fixed-time and reinforcement learning-based baselines.
comment: This paper is submitted to MECON2026 conference
RoadMapper: A Multi-Agent System for Roadmap Generation of Solving Complex Research Problems ACL 2026
People commonly leverage structured content to accelerate knowledge acquisition and research problem solving. Among these, roadmaps guide researchers through hierarchical subtasks to solve complex research problems step by step. Despite progress in structured content generation, the roadmap generation task has remained unexplored. To bridge this gap, we introduce RoadMap, a novel benchmark designed to evaluate the ability of large language models (LLMs) to construct high-quality roadmaps for solving complex research problems. Based on this, we identify three limitations of LLMs: (1) lack of professional knowledge, (2) unreasonable task decomposition, and (3) disordered logical relationships. To address these challenges, we propose RoadMapper, an LLM-based multi-agent system that decomposes the research roadmap generation task into three key stages (i.e., initial generation, knowledge augmentation, and iterative "critique-revise-evaluate"). Extensive experiments demonstrate that RoadMapper can improve LLMs' ability for roadmap generation, while enhancing average performance by more than 8% and saving 84% of the time required by human experts, highlighting its effectiveness and application potential.
comment: Accepted to Findings of ACL 2026
Continuous-time q-learning for mean-field control with common noise, part-II: q-learning algorithms
This paper is a continuation work of Ren et al. (2026) aiming to further devise q-learning algorithms for mean-field control (MFC) with controlled common noise. Based on the relaxed control formulation, we first establish the martingale condition of the value function and the Iq-function by evaluating along the conditional state distributions generated by all test policies. As the data in the relaxed control formulation are not observable in practice, we quantify the error incurred when they are replaced by the observable ones in the exploratory formulation under discretely sampled actions. This, together with a two-layer fixed point characterization of an optimal policy in Ren et al. (2026), allows us to propose several algorithms including the Actor-Critic q-learning algorithm, in which the policy is updated in the Actor-step based on the iteration rule induced by the improved Iq-function, and the value function and Iq-function are updated in the Critic-step based on the martingale orthogonality condition using the data from the exploratory formulation. We also establish the convergence of the inner iterations in the Actor-step in an infinite-horizon linear quadratic (LQ) framework. In two examples, within and beyond LQ framework, our q-learning algorithms are implemented with satisfactory performance.
comment: Keywords: Mean-field control, common noise, martingale characterization, optimal q-learning algorithm, Actor-Critic q-learning algorithm
Continuous-time q-learning for mean-field control with common noise, part-I: Theoretical foundations
This paper investigates the continuous-time counterpart of the Q-function for entropy-regularized mean-field control (MFC) with controlled common noise, coined as q-function by Jia and Zhou (2023) in the single agent's model. We first show that, under discretely sampled actions, the value function in the exploratory formulation converges to the one in the relaxed control formulation as the time grid refines. Leveraging the relaxed control formulation, we derive the exploratory Hamilton-Jacobi-Bellman (HJB) equation, in which the controlled common noise gives rise to an additional nonlinear functional of policy, rendering the policy iteration intricate. Under certain concavity condition, we establish the existence and uniqueness of the optimal one-step policy iteration via a first-order condition using the partial linear functional derivative with respect to policy. The policy improvement at each iteration is verified by relating to an entropy-regularized optimization problem over the space of policies. In the mean-field setting, we introduce the integrated q-function (Iq-function) defined on the state distribution and the policy, and it is shown that an optimal policy is identified as a two-layer fixed point to the argmax operator of the Iq-function. Finally, we provide the explicit characterization of an optimal policy as a Gaussian distribution in the general linear-quadratic (LQ) setting.
comment: Keywords: Continuous-time reinforcement learning, mean-field control, common noise, policy improvement, integrated q-function, two-layer fixed point
High-Probability Convergence in Decentralized Stochastic Optimization with Gradient Tracking
We study high-probability (HP) convergence guarantees in decentralized stochastic optimization, where multiple agents collaborate to jointly train a model over a network. Existing HP results in decentralized settings almost exclusively focus on the Decentralized Stochastic Gradient Descent ($\mathtt{DSGD}$) algorithm, which requires strong assumptions, such as bounded data heterogeneity, or strong convexity of each agent's cost. This is contrary to the mean-squared error (MSE) results, where methods incorporating bias-correction techniques are known to converge under relaxed assumptions and achieve better practical performance. In this paper we provide the first step toward bridging the gap, by studying HP convergence of $\mathtt{DSGD}$ incorporating the gradient tracking technique, in the presence of noise satisfying a relaxed sub-Gaussian condition. We show that the resulting method, dubbed $\mathtt{GT-DSGD}$, achieves order-optimal HP convergence rates for both non-convex and Polyak-Łojasiewicz costs, of order $\mathcal{O}\Big(\frac{\log(1/δ)}{\sqrt{nT}}\Big)$ and $\mathcal{O}\Big(\frac{\log(1/δ)}{nT}\Big)$, respectively, where $n$ is the number of agents, $T$ is the time horizon and $δ\in (0,1)$ is the confidence parameter. Our results establish that $\mathtt{GT-DSGD}$ converges in the HP sense under the same conditions on the cost as in the MSE sense, while achieving comparable transient times. To the best of our knowledge, these are the first HP guarantees for decentralized optimization methods incorporating bias-correction. Numerical experiments on real and synthetic data verify our theoretical findings, underlining the superior performance of $\mathtt{GT-DSGD}$ and highlighting that the benefits of incorporating bias-correction are also maintained in the HP sense.
comment: 49 pages, 4 figures. arXiv admin note: text overlap with arXiv:2510.06141
Causal Foundations of Collective Agency
A key challenge for the safety of advanced AI systems is the possibility that multiple simpler agents might inadvertently form a collective agent with capabilities and goals distinct from those of any individual. More generally, determining when a group of agents can be viewed as a unified collective agent is a foundational question in the study of interactions and incentives in both biological and artificial systems. We adopt a behavioral perspective in answering this question, ascribing collective agency to a group when viewing the group's joint actions as rational and goal-directed successfully predicts its behavior. We formalize this perspective on collective agency using causal games -- which are causal models of strategic, multi-agent interactions -- and causal abstraction -- which formalizes when a simple, high-level model faithfully captures a more complex, low-level model. We use this framework to solve a puzzle regarding multi-agent incentives in actor-critic models and to make quantitative assessments of the degree of collective agency exhibited by different voting mechanisms. Our framework aims to provide a foundation for theoretical and empirical work to understand, predict, and control emergent collective agents in multi-agent AI systems.
comment: CLeaR 2026
The $\textit{Silicon Society}$ Cookbook: Design Space of LLM-based Social Simulations
Studies attempting to simulate human behavior with $\textit{Silicon Societies}$ grow in numbers while LLM-only social networks have started appearing outside of controlled settings. However, the design space of these networks remains under-studied, which contributes to a gap in validating model realism. To enable future works to make more informed design decisions, we perform a systematic analysis of the consequences and interactions of key design choices in simulated social networks, including the choice of base model used to model individual agents, and how they are connected to each other. Using surveys as a proxy for agent opinions, our findings suggest that the geometry of the design space is non-trivial, with some parameters behaving in additive ways while others display more complex interactions. In particular, the choice of the base LLM is the most important variable impacting the simulation outcomes.
comment: 20 pages, 12 tables, under review at COLM 2026
Compliance-Aware Agentic Payments on Stablecoin Rails
Agentic payment systems extend delegated action to financial transfers, but scaling them on stablecoin rails in regulated settings requires safeguards that remain effective when humans are not continuously in the loop. We present a compliance-aware architecture that combines x402-style, signature-based payment authorisation and relayed execution with programmable compliance embedded as an on-chain guardrail via a policy wrapper and policy manager coordinating modular checks. By enforcing compliance at the point of execution, rather than as a separate off-chain workflow, the approach preserves low-friction settlement when conditions are satisfied, records transaction-linked on-chain attestations, and supports structured resolution when requirements are pending.
comment: Demo Paper Track
Learning to Aggregate Zero-Shot LLM Agents for Corporate Disclosure Classification
This paper studies whether a lightweight supervised aggregator can combine diverse zero-shot large language model outputs into a stronger downstream signal for corporate disclosure classification. Zero-shot LLMs can read disclosures without task-specific fine-tuning, but their predictions often vary across prompt perspectives, model families, and confidence levels. I examine this problem with a multi-prompt framework in which three fixed zero-shot LLM classifiers read each disclosure from different financial perspectives and output a sentiment label, a confidence score, and a short rationale. A logistic meta-classifier then aggregates these outputs to predict next-day stock return direction. To reduce pretrained-model contamination, I restrict evaluation to a post-release sample of 9{,}860 U.S.\ corporate disclosures issued by large publicly traded firms between January 2025 and March 2026, after the release of the frozen base LLMs used in the experiment. Results show that the trained aggregator outperforms single classifiers, majority vote, confidence-weighted voting, a zero-shot LLM judge, and a FinBERT baseline. Balanced accuracy rises from 0.566 for the best single classifier to 0.606 for the trained aggregator. The gain is largest in mixed-signal disclosures where classifiers disagree. The results suggest that zero-shot LLM outputs contain complementary financial signals, while also showing that the strongest gains come from supervised aggregation rather than from zero-shot voting alone.
Can Large Language Models Implement Agent-Based Models? An ODD-based Replication Study
Large language models (LLMs) can now synthesize non-trivial executable code from textual descriptions, raising an important question: can LLMs reliably implement agent-based models from standardized specifications in a way that supports replication, verification, and validation? We address this question by evaluating 17 contemporary LLMs on a controlled ODD-to-code translation task, using the PPHPC predator-prey model as a fully specified reference. Generated Python implementations are assessed through staged executability checks, model-independent statistical comparison against a validated NetLogo baseline, and quantitative measures of runtime efficiency and maintainability. Results show that behaviorally faithful implementations are achievable but not guaranteed, and that executability alone is insufficient for scientific use. GPT-4.1 consistently produces statistically valid and efficient implementations, with Claude 3.7 Sonnet performing well but less reliably. Overall, the findings clarify both the promise and current limitations of LLMs as model engineering tools, with implications for reproducible agent-based and ecological modeling.
comment: The peer-reviewed version of this paper is published in Ecological Modelling at https://doi.org/10.1016/j.ecolmodel.2026.111624. This version is typeset by the author and differs only in pagination and typographical detail
Chronology of Multi-Agent Interactions for Provenance of Evolving Information
Provenance is the chronological history of things, resonating with the fundamental pursuit to uncover origins, trace connections, and situate entities within the flow of space and time. As artificial intelligence advances towards autonomous agents capable of interactive collaboration on complex tasks, the provenance of generated content becomes entangled in the interplay of collective creation, where contributions are continuously revised, extended or overwritten. In a multi-agent generative chain, content undergoes successive transformations, often leaving little, if any, trace of prior contributions. In this study, we investigate the problem of tracking multi-agent provenance across the temporal dimension of generation. We propose a chronological system for post hoc attribution of generative history from content alone, without reliance on internal memory states or external meta-information. At its core lies the notion of symbolic chronicles, representing signed and time-stamped records, in a form analogous to the chain of custody in forensic science. The system operates through a feedback loop, whereby each generative timestep updates the chronicle of prior interactions and synchronises it with the synthetic content in the very act of generation. This research seeks to develop an accountable form of collaborative artificial intelligence within evolving cyber ecosystems.
Progressive Multi-Agent Reasoning for Biological Perturbation Prediction
Predicting gene regulation responses to biological perturbations requires reasoning about underlying biological causalities. While large language models (LLMs) show promise for such tasks, they are often overwhelmed by the entangled nature of high-dimensional perturbation results. Moreover, recent works have primarily focused on genetic perturbations in single-cell experiments, leaving bulk-cell chemical perturbations, which is central to drug discovery, largely unexplored. Motivated by this, we present LINCSQA, a novel benchmark for predicting target gene regulation under complex chemical perturbations in bulk-cell environments. We further propose PBio-Agent, a multi-agent framework that integrates difficulty-aware task sequencing with iterative knowledge refinement. Our key insight is that genes affected by the same perturbation share causal structure, allowing confidently predicted genes to contextualize more challenging cases. The framework employs specialized agents enriched with biological knowledge graphs, while a synthesis agent integrates outputs and specialized judges ensure logical coherence. PBio-Agent outperforms existing baselines on both LINCSQA and PerturbQA, enabling even smaller models to predict and explain complex biological processes without additional training.
comment: 17 pages, 4 figures, 9 tables
AblateCell: A Reproduce-then-Ablate Agent for Virtual Cell Repositories
Systematic ablations are essential to attribute performance gains in AI Virtual Cells, yet they are rarely performed because biological repositories are under-standardized and tightly coupled to domain-specific data and formats. While recent coding agents can translate ideas into implementations, they typically stop at producing code and lack a verifier that can reproduce strong baselines and rigorously test which components truly matter. We introduce AblateCell, a reproduce-then-ablate agent for virtual cell repositories that closes this verification gap. AblateCell first reproduces reported baselines end-to-end by auto-configuring environments, resolving dependency and data issues, and rerunning official evaluations while emitting verifiable artifacts. It then conducts closed-loop ablation by generating a graph of isolated repository mutations and adaptively selecting experiments under a reward that trades off performance impact and execution cost. Evaluated on three single-cell perturbation prediction repositories (CPA, GEARS, BioLORD), AblateCell achieves 88.9% (+29.9% to human expert) end-to-end workflow success and 93.3% (+53.3% to heuristic) accuracy in recovering ground-truth critical components. These results enable scalable, repository-grounded verification and attribution directly on biological codebases.
comment: 25 pages, 5 figures
R3DM: Enabling Role Discovery and Diversity Through Dynamics Models in Multi-agent Reinforcement Learning ICML 2025
Multi-agent reinforcement learning (MARL) has achieved significant progress in large-scale traffic control, autonomous vehicles, and robotics. Drawing inspiration from biological systems where roles naturally emerge to enable coordination, role-based MARL methods have been proposed to enhance cooperation learning for complex tasks. However, existing methods exclusively derive roles from an agent's past experience during training, neglecting their influence on its future trajectories. This paper introduces a key insight: an agent's role should shape its future behavior to enable effective coordination. Hence, we propose Role Discovery and Diversity through Dynamics Models (R3DM), a novel role-based MARL framework that learns emergent roles by maximizing the mutual information between agents' roles, observed trajectories, and expected future behaviors. R3DM optimizes the proposed objective through contrastive learning on past trajectories to first derive intermediate roles that shape intrinsic rewards to promote diversity in future behaviors across different roles through a learned dynamics model. Benchmarking on SMAC and SMACv2 environments demonstrates that R3DM outperforms state-of-the-art MARL approaches, improving multi-agent coordination to increase win rates by up to 20%. The code is available at https://github.com/UTAustin-SwarmLab/R3DM.
comment: 21 pages, To appear in the International Conference of Machine Learning (ICML 2025)
LLM-Based Agentic Negotiation for 6G: Addressing Uncertainty Neglect and Tail-Event Risk
A critical barrier to the trustworthiness of sixth-generation (6G) agentic autonomous networks is the uncertainty neglect bias; a cognitive tendency for large language model (LLM)-powered agents to make high-stakes decisions based on simple averages while ignoring the tail risk of extreme events. This paper proposes an unbiased, risk-aware framework for agentic negotiation, designed to ensure robust resource allocation in 6G network slicing. Specifically, agents leverage Digital Twins (DTs) to predict full latency distributions, which are then evaluated using a formal framework from extreme value theory, namely, Conditional Value-at-Risk (CVaR). This approach fundamentally shifts the agent's objective from reasoning over the mean to reasoning over the tail, thereby building a statistically-grounded buffer against worst-case outcomes. Furthermore, our framework ensures full uncertainty awareness by requiring agents to quantify epistemic uncertainty -- confidence in their own DTs predictions -- and propagate this meta-verification to make robust decisions, preventing them from acting on unreliable data. We validate this framework in a 6G inter-slice negotiation use-case between an eMBB and a URLLC agent across 200 trials. The results demonstrate the profound failure of the biased, mean-based baseline, which systematically violates the strict URLLC SLA 11 times. Our unbiased, CVaR-aware agent successfully mitigates this bias, eliminating SLA violations entirely and significantly reducing the 99.999th-percentile latencies by up to 51.7\%. We show this reliability comes at the rational and quantifiable cost of reduced energy savings, exposing the false economy of the biased approach. Crucially, executing our framework with an otel-llm-1b-it model on a single NVIDIA RTX A4000 GPU achieves sub-1.5-second inference times, validating the feasibility for non-real-time RIC use-cases.
Systems and Control (EESS)
Hierarchical Control for Continuous-time Systems via General Approximate Alternating Simulation Relations
This paper introduces a general approximate alternating simulation relation (\emph{$\varepsilon$-gAAS relation}) for continuous-time systems, which relaxes existing simulation relations to tolerate larger mismatches between abstract and concrete models. The definition of gAAS for continuous-time systems is first proposed, and its properties are investigated. Then, a control refinement method is developed to enable hierarchical control for the gAAS relation. Finally, case studies demonstrate the effectiveness of the proposed approach, highlighting its advantages over existing methods.
Intelligent Self-tuning Active EMI Filtering for Electrified Automotive Power Systems Using Reinforcement Learning
The rapid electrification and intelligence of modern transportation systems place stringent demands on the electromagnetic compatibility, reliability, and adaptability of automotive power electronics. In electric and autonomous vehicles, electromagnetic interference (EMI) generated by high-frequency switching power converters can compromise safety-critical functions, in-vehicle communications, and system efficiency under dynamic operating conditions. Conventional passive EMI filters, while robust, are often oversized and lack adaptability, leading to increased weight, volume, and energy losses. This paper proposes an intelligent self-tuning active EMI filtering approach for electrified automotive power systems based on reinforcement learning (RL). The EMI mitigation problem is formulated as a Markov decision process, enabling an RL agent to continuously adapt filter parameters in response to time-varying interference characteristics. To improve robustness and generalisation under complex and non-stationary conditions, a variational autoencoder is employed for compact state representation, while a noise-based exploration mechanism enhances learning efficiency and prevents suboptimal convergence. The proposed method is evaluated using experimentally measured EMI spectra from an automotive electric drive unit within a MATLAB/Simulink co-simulation framework. Results demonstrate consistent EMI attenuation improvements of 25-30 dB across a wide frequency range compared with conventional control strategies and passive filtering solutions. By reducing reliance on oversized passive components and enabling adaptive EMI suppression, the proposed framework supports lightweight, energy-efficient, and reliable power-electronic systems for intelligent and green transportation applications.
A MEC-Based Optimization Framework for Dynamic Inductive Charging
Range anxiety and long recharging times remain critical barriers to electric vehicle adoption. Dynamic Inductive Charging (DIC) offers a compelling solution by enabling wireless power transfer while driving, potentially reducing battery size requirements and thus vehicle costs. However, DIC infrastructures are expensive and power-constrained, requiring intelligent resource allocation to maximize user satisfaction and economic viability. We propose a Model Predictive Control framework for optimal power allocation in DIC systems, using edge computing and vehicular communications to prioritize vehicles with critical battery states. The framework is implemented and evaluated through SUMO-based simulations on a realistic 10 km urban scenario in Istanbul, Turkey, under varying traffic intensities. Results demonstrate two critical limitations of uncoordinated allocation. First, resource utilization remains suboptimal despite available power when demand saturates system capacity. Second, when demand exceeds capacity, uniform distribution of power leaves a heavy tail of critically unsatisfied vehicles that may require emergency stops. Our MPC-based strategy addresses both regimes -- maximizing power utilization during saturation through dynamic stripe rebalancing, and improving satisfaction fairness under scarcity by aggressively prioritizing depleted batteries at the expense of well-charged vehicles. The framework and simulation tools are released as open-source to support further research in this emerging domain.
comment: Accepted for publication at IEEE Vehicular Networking Conference (VNC) 2026, Montreal, Canada, June 2026
Flying by Inference: Active Inference World Models for Adaptive UAV Swarms
This paper presents an expert-guided active-inference-inspired framework for adaptive UAV swarm trajectory planning. The proposed method converts multi-UAV trajectory design from a repeated combinatorial optimization problem into a hierarchical probabilistic inference problem. In the offline phase, a genetic-algorithm planner with repulsive-force collision avoidance (GA--RF) generates expert demonstrations, which are abstracted into Mission, Route, and Motion dictionaries. These dictionaries are used to learn a probabilistic world model that captures how expert mission allocations induce route orders and how route orders induce motion-level behaviors. During online operation, the UAV swarm evaluates candidate actions by forming posterior beliefs over symbolic states and minimizing KL-divergence-based abnormality indicators with respect to expert-derived reference distributions. This enables mission allocation, route insertion, motion adaptation, and collision-aware replanning without rerunning the offline optimizer. Bayesian state estimators, including EKF and PF modules, are integrated at the motion level to improve trajectory correction under uncertainty. Simulation results show that the proposed framework preserves expert-like planning structure while producing smoother and more stable behavior than modified Q-learning. Additional validation using real-flight UAV trajectory data demonstrates that the learned world model can correct symbolic predictions under noisy and non-smooth observations, supporting its applicability to adaptive UAV swarm autonomy.
comment: Submitted to IEEE journal
Data-Driven Continuous-Time Linear Quadratic Regulator via Closed-Loop and Reinforcement Learning Parameterizations
This paper studies data-driven approaches to the continuous-time linear quadratic regulator (LQR) problem based on two existing parameterizations, namely a closed-loop (CL) parameterization from behavioral system theory and an integral reinforcement learning (IRL) parameterization. The CL parameterization characterizes the closed-loop system via a matrix that satisfies equality constraints. While this parameterization has been extensively studied for discrete-time systems, we adapt key results to the continuous-time setting and develop a policy iteration (PI) scheme, derive a data-driven continuous-time algebraic Riccati equation (CARE), and introduce an alternative convex problem formulation. The IRL parameterization utilizes off-policy data to perform policy evaluation, which is then used for PI or value iteration. Within the IRL framework, we derive a policy gradient flow and propose convex reformulations of the LQR problem. Finally, we provide a unified treatment of these parameterizations that enables a systematic understanding of existing approaches and clarifies their structural relationships.
comment: Submitted to IEEE TAC
On the Nesterov's acceleration: A NAIM perspective
We present a unifying Nearly Asymptotically Invariant Manifold (NAIM) framework for understanding Nesterovs Accelerated Gradient (NAG) method. By lifting the first-order gradient flow into a second-order phase space we construct a NAIM a slow, attracting graph and show that acceleration emerges from a curvature aware perturbation of this graph. The evolving slope of the perturbed manifold is governed by a Differential Riccati Equation (DRE), which enforces strict tangency of the vector field to the manifold surface. In the quadratic case the DRE reduces to an Algebraic Riccati Equation (ARE), and the requirement of spectral resonance equal contraction rates across all curvature modes uniquely determines the damping coefficient, directly yielding the continuous time Nesterov ODE. Fenichels theorem then extends this picture rigorously to general smooth, strongly convex landscapes: normal hyperbolicity guarantees persistence of the accelerated manifold despite varying Hessian curvature. The method is further extended to unified geometric derivation of NAG methods for smooth convex and strongly convex optimization in the discrete case. We exploit the underlying geometric structure and derive both cases from the same principle of preserving the projective structure under discretization process. A Lie Trotter splitting separates the linear dissipative dynamics from the nonlinear gradient flow. The dissipative subsystem is integrated by the Cayley (bilinear) transform, which preserves the underlying projective (Mobius) structure unconditionally and produces the classical Nesterov momentum coefficient as the unique Pade multiplier. For the convex case, projective flatness (vanishing Schwarzian derivative) uniquely selects the time-varying damping recovering the canonical Nesterov ODE for convex functions.
Optimal Functional Incentives for Control: The Linear-Quadratic Case with Bilinear Incentives
We study the design of functional incentive mechanisms for dynamical systems, in which a leader designs a fixed incentive function to motivate a self-interested follower to actuate the system beneficially over an extended horizon, without real-time revision of the incentive. This stands in contrast to the adaptive paradigm, in which the incentive is itself a continuously updated control variable. We formalize the problem as a discrete-time bi-level optimal control problem and derive analytical results for the linear-quadratic case with bilinear incentives and a myopic follower. Specifically, we establish a necessary and sufficient stability condition for the induced closed-loop system, derive a closed-form expression for the gradient of the expected leader cost with respect to the incentive parameter matrix, and obtain a fully closed-form cost expression in the scalar setting. Based on the latter, explicit characterizations of the optimal incentive parameter are provided in two asymptotic regimes: the infinite-horizon limit and the limit of high follower cost. For long horizons, the optimal incentive is shown to become independent of the follower's private cost parameter, with direct implications for robust mechanism design under private information.
comment: Submitted to IEEE CDC 2026
Robust Geometric Control of Catenary Robots under Unstructured Force Uncertainties
This paper considers the robust control of a catenary robot composed of two quadrotors connected by an inextensible cable. The system is modeled on \(SE(3)\), with the cable treated as a geometric subsystem induced by the UAV configuration rather than as an independent dynamical element. The catenary shape determines configuration-dependent forces that couple the translational dynamics of the vehicles. We propose a geometric tracking controller for the relative configuration of the agents and analyze its robustness with respect to unstructured uncertainties in the catenary-induced forces. The main theoretical result establishes local input-to-state stability of the closed-loop tracking errors. In particular, we obtain asymptotic convergence in the nominal case and an explicit ultimate bound for the tracking errors under bounded catenary-force perturbations.
comment: 6 pages, conference
LZn : Robust LoRa Frame Synchronization Under Frame Collisions and Ultra-Low SNR Conditions
LoRa has become a widely adopted wireless modulation scheme in LPWANs due to its low cost, long range, and minimal transmission power. However, collisions between frames of the same spreading factor -- common in dense LoRa deployments -- prevent conventional LoRa receivers from detecting and correctly decoding frames. Recent work has introduced methods to improve recovery, yet their detection stage degrades sharply under low signal-to-noise ratio (SNR) and high collision rates. In this work, we introduce LZn, a low-complexity synchronization scheme driven by a spectral intersection operation. Our method enables robust frame synchronization even under multiple packet overlaps or extremely low SNR conditions. We evaluate LZn on simulations and three independent, real-world LoRa datasets. LZn improves detection sensitivity by up to 10dB and increases detection probability by up to 1.54x. In real-world datasets, LZn improves decoding by 3.46x in the most challenging single-user scenario and up to 1.22x in collision scenarios compared to the second best collision-tolerant scheme (TnB). These results demonstrate that LZn substantially improves the frame recovery of LoRa receivers, while remaining compatible with real-time requirements.
comment: 16 pages, 2 tables, 13 figures
Fairness for distribution network operations and planning
The incorporation of fairness into the distribution network (DN) planning and operation has become a key goal of recent studies. The cost of implementing fairness, denominated the price of fairness (PoF), covers the efficiency that is renounced for attaining social cohesion through fair outcomes. Locational disparity makes fairness schemes emerge to level the consumers playing field. However, fairness encompasses a range of notions. From egalitarian to merit-based criteria, various metrics are implemented as a tool for measuring equitable utility distribution. These have different mathematical complexities, from linear to non-linear programming cases, which affect their overall applicability. Hence, this study compiles the overarching fairness notions and metrics, reviewing how these affect stakeholders and the inherent mathematical optimisation in resource allocation problems. The aim is to support consistent and transparent planning and decision-making within DN operations.
comment: 16 pages, 0 figures, 2 tables, CIRED Conference Workshop Brussels 2026
Multi-Connectivity for UAVs: A Measurement Study of Integrating Cellular, Aerial Mesh, and LEO Satellite Links
Future uncrewed aerial vehicle (UAV) systems increasingly combine heterogeneous communication technologies, such as low-latency aerial mesh, terrestrial cellular, and satellite links, to improve robustness and coverage. Multipath transport is a natural mechanism for aggregating these links, yet its ability to support real-time UAV services in highly heterogeneous environments remains insufficiently characterized. We present a measurement-driven study based on UAV flight experiments in an integrated network comprising UAV-to-UAV aerial mesh, private cellular, and low Earth orbit (LEO) satellite connectivity. Using Multipath TCP (MPTCP) as a representative lossless, in-order multipath transport framework, we find that aggregation can preserve end-to-end connectivity under severe link outages. However, large round-trip time (RTT) heterogeneity amplifies packet reordering, leading to substantial receiver-side buffering and bursty delivery. In addition, when the available links do not provide sufficient capacity for the offered load, pronounced sender-side buffering emerges. These effects cause real-time streaming to violate delay constraints, including cases where aggregate capacity is sufficient. To interpret these results, we formalize the distinction between connectivity continuity and service continuity and show empirically that maintaining connectivity is necessary but not sufficient for timely real-time delivery in multi-technology UAV networks. The findings motivate multipath designs that explicitly account for delay constraints, rather than optimizing for connectivity alone.
comment: Accepted in IEEE EuCNC
Robust Constrained Optimization via Sliding Mode Control
This paper develops a sliding mode control based frame work for equality constrained optimization by reformulation the first order Karush Kuhn Tucker conditions as control affine dynamical system. The optimization variables are treated as states and the Lagrange multipliers as control input, with equality constraints defined as sliding manifold. The resulting design guarantees exact constraint enforcement with finite time convergence, independent of objective convexity, and exhibits robustness to matched disturbance, structural uncertainty and bounded measurement noise. To accelerate the convergence, a nonsingular terminal sliding mode based normed gradient flow is introduced, ensuring both finite time convergence to optimal solution and constraint satisfaction. Rigorous Lyapunov analysis establishes closed loop stability and convergence. Numerical studies across diverse benchmark problems demonstrate superior accuracy and robustness over classical continuous time optimization method, highlighting effectiveness under disturbance.
comment: 9 pages and 5 Figures. Previously submitted to Automatica (2025);under review at IFAC Journal of Systems and Control(Early 2026)
Stability Analysis and Data-Driven State Estimation for Generalized Persidskii Systems with Time Delays: Theory and Experimental Validation on PMSM Drives
This paper addresses the stability analysis and state estimation of generalized Persidskii systems subject to time-varying delays and external disturbances. The generalized Persidskii class, which couples linear dynamics with sector-bounded nonlinear feedback loops, offers a tractable yet expressive framework for modeling electromechanical and neural network systems. We develop delay-dependent conditions for input-to-state stability (ISS) via Lyapunov--Krasovskii functionals incorporating Persidskii-type integral terms, and cast these conditions as linear matrix inequalities (LMIs). A structured robust observer is proposed for systems with partial state measurement, and its convergence is guaranteed through an $H_\infty$ synchronization criterion. To handle plant uncertainty, the system matrices are identified from trajectory data using a stability-preserving Koopman lifting procedure, in which the ISS-LMI constraint is embedded as a convex side condition during parameter regression. The identified model populates the prediction horizon of an ICODE-MPPI (Input-dependent Control-oriented Dynamical Estimation -- Model Predictive Path Integral) controller. The complete framework is validated on a 1.5 kW Permanent Magnet Synchronous Motor (PMSM) drive equipped with a programmable load brake. Experimental results confirm a 35\% reduction in velocity estimation RMSE relative to an Extended Kalman Filter and a 67\% improvement in speed-tracking accuracy relative to standard Field-Oriented Control, corroborating the theoretical ISS bounds established herein.
Toward Scalable SDN for LEO Mega-Constellations: A Graph Learning Approach
Terrestrial network limitations drive the integration of non-terrestrial networks (NTNs), notably mega-constellations comprising thousands of low Earth orbit (LEO) satellites. While these satellites act as interconnected network switches via inter-satellite links (ISLs), their massive scale creates severe bottlenecks for network management. To address this, we propose a scalable, hierarchical software-defined networking (SDN) framework. Our architecture leverages graph neural networks (GNNs) to compactly represent the constellation topology, and Koopman theory to linearize nonlinear dynamics. Specifically, a Graph Koopman Autoencoder (GKAE) forecasts spatio-temporal behavior within a linear subspace for each orbital shell. A central SDN controller then aggregates these shell-level predictions for globally coordinated control. Simulations on the Starlink constellation demonstrate that our approach achieves at least a 42.8\% improvement in spatial compression and a 10.81\% improvement in temporal forecasting compared to established baselines, all while utilizing a significantly smaller model footprint.
Solution Sets for Inverse Infinite-Horizon Linear-Quadratic Descriptor Differential Games
In this letter, we study a model-based inverse problem for infinite-horizon linear-quadratic differential games with descriptor dynamics. Specifically, we seek to identify the set of all cost functions that rationalize an observed feedback strategy profile of the players as a feedback Nash equilibrium, referred to here as the solution set. We characterize the solution set, show that it is rectangular and convex, and provide an algorithm to compute an admissible realization. Finally, we illustrate our results with numerical examples.
Feedback Linearization of Hyperbolic PDEs with Volterra Nonlinearities
Alberto Isidori's framework of geometric nonlinear control, and particularly of feedback linearization, is the inspiration behind PDE backstepping: apply a transfromation of the state to cast the plant into a canonical form, bring all the non-canonical effects within the "span" of (boundary) control, and close the design with a feedback that makes the closed loop evolve in accordance with well-studied stable dynamics. The specificity of this approach is that, for PDEs, there is not one canonical form (like Brunovsky for ODEs) but the canonical forms are PDE-class-specific. When conducting this process for nonlinear PDEs, where the "transformation of the state" is performed using a nonlinear Volterra series indexed by the spatial variable, enormous technical challenges arise. One has to deal with kernels governed by PDEs on simplex domains growing in dimension to infinity, capture the growth rates of these kernels of the "direct transformation," and conduct the same for the "inverse transformation" without directly studying its Volterra kernels. So far, this agenda has been executed only once, two decades ago: for parabolic PDEs by Vazquez and Krstic [Automatica, 2008]. Generalization attempts have not followed because of the immense complexity involved in feedback-linearizing nonlinear PDEs. In this paper, dedicated to Professor Isidori, we convert the PDE feedback-linearizing methodology of 2008 from the parabolic to a hyperbolic class and, for a transport-adapted subclass of Chen-Fliess series, construct controllers without kernel PDEs.
An Experimental Modular Instrument With a Haptic Feedback Framework for Robotic Surgery Training
Robotic-assisted surgery offers significant clinical advantages but largely eliminates direct haptic feedback, increasing the risk of excessive tool-tissue interaction forces. Although recent commercial systems have begun to introduce force feedback, their high cost limits accessibility, particularly for surgical training. This paper presents a modular experimental robotic laparoscopic instrument integrated with a real-time haptic feedback framework. The proposed instrument employs a wrist-mounted force/torque (F/T) sensor to estimate tool-tissue interaction forces while avoiding the durability and integration challenges of tip-mounted sensors. A haptic feedback framework is developed to extract the external contact forces, render them to the haptic device, and generate stable and perceptually meaningful feedback. The instrument is integrated into the robotic surgery training system (RoboScope) and evaluated through a controlled user study involving a force regulation task. Experimental results demonstrate that haptic feedback significantly improves task success rate, force regulation accuracy, and task efficiency compared to visual-only feedback. The proposed instrument enables stable, high-fidelity haptic interaction, supporting effective robotic surgery training.
comment: Accepted to the 11th IEEE RAS/EMBS International Conference on Biomedical Robotics and Biomechatronics (BioRob 2026)
Over-Approximating Minimizer Sets of Constrained Convex Programs with Parametric Uncertainty via Reachability Analysis
We study the set of solutions to a parameterized, strongly convex optimization problem whose cost depends on uncertain, bounded parameters. We compute a certified outer approximation of the corresponding set of optimizers, using convergence properties of the projected gradient descent (PGD) algorithm for convex programs. Concretely, by treating the cost parameter as constant but unknown, we interpret the PGD iterates as an uncertain dynamical system and analyze its forward reachable sets. Since PGD converges exponentially to the unique optimizer for each fixed parameter, these reachable sets provide outer approximations of the optimizer set, with an explicit error bound that decays exponentially with the iteration count. We apply system-level synthesis (SLS) on the PGD dynamics to optimize the step-size sequence and obtain reachable-set over-approximations. Our method outperforms existing baselines in over-approximating, with low conservativeness, the minimizer sets of convex programs with uncertain costs and high-dimensional decision variables.
comment: 8 pages, 3 figures
Boundedness of solutions in feedback systems with antithetic controllers
This paper studies whether solutions of a class of nonlinear feedback systems remain bounded over time. The systems we consider arise naturally in synthetic biology, where the antithetic feedback controller regulates a biological process through a delayed feedback loop. Our main result is that every trajectory of such a system is bounded. The key insight is simple: if the regulated state grows too large for too long, the feedback loop will eventually respond and push it back down. More precisely, we show that whenever the state exceeds a threshold and remains there long enough, the feedback signal becomes strong enough to force the state to decrease. We then show that once this happens, the feedback remains strong enough to keep the state from growing unbounded. The proof works directly with differential inequalities and does not require constructing a Lyapunov function, making the mechanism transparent and easy to interpret. The boundedness result can be understood as a time-domain small-gain effect, where the delayed feedback ultimately counteracts any persistent growth in the system.
comment: This version will be extended for more general systems
Joint Accuracy and Confidentiality in Semantic-Aware Secure Remote Reconstruction
In this paper, we consider remote reconstruction over wireless networks when simultaneous accuracy at the legitimate receiver and confidentiality against eavesdropping are required. These two objectives are often treated separately, even though they arise from the same update process and are marginals of a joint reconstruction event. This paper introduces confidential reconstruction accuracy (CRA), a metric to capture the joint event in which the legitimate receiver reconstructs correctly while the eavesdropper fails. Under randomized stationary policies, we develop a three-dimensional stationary analysis and derive closed-form expressions for the long-term average CRA and the optimal transmission probability. The results show that conventional marginal analysis can misidentify the optimal policy and misestimate the achievable simultaneous accuracy-confidentiality performance. They also reveal nontrivial behaviors: more frequent transmissions or better legitimate channels do not necessarily improve joint accurate and confidential reconstruction, and when the eavesdropping channel is strong, improving the legitimate channel alone may be insufficient. Finally, the framework induces the spatial safety boundary in a geofencing setting for secure remote reconstruction.
An Annual Quasi-Static Time-Series Simulation Framework for Enhanced Transmission System Expansion Planning
The increasing integration of distributed energy resources (DERs), variable renewable energy sources, and emerging technologies presents new challenges for transmission system expansion planning (TSEP). Traditional snapshot-based and deterministic approaches are inadequate for capturing the temporal dynamics and operational constraints of modern power systems. This paper introduces an annual quasi-static time-series simulation (AQSTSS) framework that enables high-resolution, year-round modeling of transmission systems, incorporating detailed equipment behavior, control strategies, and DER interactions. By simulating system performance across all seasons and operating conditions, AQSTSS uncovers flexibility opportunities and operational constraints that static methods overlook. Applied to Hydro-Québec's projected 2035/2036 grid, the framework reveals critical insights under high wind and electric vehicle penetration. It also integrates an energy storage control strategy designed to mitigate wind variability and support grid reliability. Furthermore, AQSTSS facilitates the assessment of system resilience under diverse scenarios, including extreme weather and load variability. The simulation results underscore the importance of aligning planning with operational realities to ensure secure, efficient, and future-ready grid development. Overall, the proposed framework enhances the robustness of TSEP by bridging the gap between long-term planning and real-time operational needs.
Distributed Observer Design for Discrete-Time LTI Systems via Jordan Canonical Form
This paper addresses the problem of distributed state estimation for discrete-time linear time-invariant systems. Building on the framework proposed in Gao & Yang (2025), we exploit the Jordan canonical form of the system matrix to develop two distributed estimation schemes that ensure asymptotic convergence of local estimates to the true system state. In both approaches, each node reconstructs the components of the state that are locally detectable for it via a Luenberger observer, while employing a consensus-based mechanism to estimate the components that are not directly detectable. The first scheme relies on local observers whose dimension matches that of the original state vector; however, its applicability requires the satisfaction of a large set of inequalities. The second scheme, in contrast, can be implemented under less restrictive conditions, but results in observers of increased (augmented) order. For both methods, we derive necessary and sufficient conditions - expressed in terms of the eigenvalues of the system matrix and certain submatrices of the communication network Laplacian - that guarantee the existence of a distributed observer achieving asymptotically accurate estimation. Compared to Gao & Yang (2025), the proposed approaches offer greater flexibility in the selection of coupling gains and impose less stringent solvability conditions.
comment: This manuscript is an extended version of the paper accepted for presentation at ECC 2026 and is currently under review for potential publication in Automatica
Moral Hazard in LTI Dynamics: A Hypothesis Testing Approach
Many incentive design problems must contend with information asymmetries due to non-observation of efficiency (adverse selection) or non-observation of effort (moral hazard). And although a growing body of literature considers incentive design in control systems, the problem of designing incentives for control systems under information asymmetries has been less well-studied. This paper considers a model of moral hazard within control systems. In our model, the control system is described by an (affine) linear time-invariant (LTI) system with process noise. There is an agent who gets to choose (from between two choices) a linear state-feedback controller to apply to the LTI system, with one of the state-feedback controllers having a higher quadratic cost on the control inputs than the other. Our goal is to design a payment scheme that incentivizes the agent to choose the state-feedback controller that minimizes a quadratic cost on system states plus the time-discounted payment amount, subject to the understanding that the agent bears the control cost while being risk-averse with respect to their time-discounted payment. We formulate the problem as a constrained optimization, and prove that for a payment given after a fixed (but optimizable) time horizon the optimal payment scheme chooses the payment amount using a likelihood ratio hypothesis test. We numerically demonstrate our results by applying the derived optimal payment scheme to two examples: load frequency control (LFC) in power systems and wellness interventions for body weight loss.
TADI: Tool-Augmented Drilling Intelligence via Agentic LLM Orchestration over Heterogeneous Wellsite Data
We present TADI (Tool-Augmented Drilling Intelligence), an agentic AI system that transforms drilling operational data into evidence-based analytical intelligence. Applied to the Equinor Volve Field dataset, TADI integrates 1,759 daily drilling reports, selected WITSML real-time objects, 15,634 production records, formation tops, and perforations into a dual-store architecture: DuckDB for structured queries over 12 tables with 65,447 rows, and ChromaDB for semantic search over 36,709 embedded documents. Twelve domain-specialized tools, orchestrated by a large language model via iterative function calling, support multi-step evidence gathering that cross-references structured drilling measurements with daily report narratives. The system parses all 1,759 DDR XML files with zero errors, handles three incompatible well naming conventions, and is backed by 95 automated tests plus a 130-question stress-question taxonomy spanning six operational categories. We formalize the agent's behavior as a sequential tool-selection problem and propose the Evidence Grounding Score (EGS) as a simple grounding-compliance proxy based on measurements, attributed DDR quotations, and required answer sections. The complete 6,084-line, framework-free implementation is reproducible given the public Volve download and an API key, and the case studies and qualitative ablation analysis suggest that domain-specialized tool design, rather than model scale alone, is the primary driver of analytical quality in technical operations.
On Distributed Control of Continuum Swarms: Local Controllers as Differential Operators
We study the problem of distributed control of large-scale robotic swarms which can be modeled as continuum densities evolving under the continuity equation. We propose a formalization of distributed controllers as (generally nonlinear) differential operators, in which control inputs depend only on local information about the state and environment. This perspective yields a fully local, PDE-based framework for analysis and design. We apply this framework to the problem of stabilizing a swarm density around an arbitrary target density, and investigate fundamental limitations of low-order distributed controllers in achieving this goal. In particular, we show that controllers which act in a purely pointwise manner are incompatible with natural system symmetries and strong forms of stability, and must rely on mixing-type behavior to achieve stabilization. In contrast, we present a simple first-order control law which achieves stabilization and enjoys substantially stronger properties.
comment: 12 pages
Clinical Evaluation of a Tongue-Controlled Wrist Abduction-Adduction Assistance in a 6-DoF Upper-Limb Exoskeleton for Individuals with ALS and SCI
Upper-limb exoskeletons (ULEs) have the potential to restore functional independence in individuals with severe motor impairments; however, the clinical relevance of wrist degrees of freedom (DoF), particularly abduction-adduction (Ab-Ad), remains insufficiently evaluated. This study investigates the functional and user-perceived impact of wrist Ab-Ad assistance during two activities of daily living (ADLs). Wrist Ab-Ad assistance in a tongue-controlled 6-DoF ULE, EXOTIC2, was evaluated in a within-subject study involving one individual with amyotrophic lateral sclerosis and five individuals with spinal cord injury. Participants performed drinking and scratch stick leveling tasks with EXOTIC2 under two conditions: with and without wrist Ab-Ad assistance. Outcome measure included task success, task completion time, kinematic measures, and a usability questionnaire capturing comfort, functional perception, and acceptance. Enabling wrist Ab-Ad improved task success rates across both ADLs, with consistent reductions in spillage (from 77.8% spillages to 22.2%) and failed placements (from 66.7% to 16.7%). Participants utilized task-specific subsets of the available wrist range of motion, indicating that effective control within functional ranges was more critical than maximal joint excursion. Questionnaire responses indicated no increase in discomfort with the additional DoF and reflected perceived improvements in task performance. In conclusion, wrist Ab-Ad assistance enhances functional task performance in assistive exoskeleton use without compromising user comfort. However, its effectiveness depends on task context, control usability, and individual user strategies. This study provides clinically relevant, user-centered evidence supporting the inclusion of wrist Ab-Ad in ULEs, emphasizing the importance of balancing functional capability with usability in assistive device design.
comment: 9 pages, 7 figures and 2 tables. This work has been submitted to the IEEE Transactions on Neural Systems and Rehabilitation Engineering
Cooperative ISAC for LAE: Joint Trajectory Planning, Power allocation, and Dynamic Time Division
To enhance the performance of aerial-ground networks, this paper proposes an integrated sensing and communication (ISAC) framework for multi-UAV systems. In our model, ground base stations (BSs) cooperatively serve multiple unmanned aerial vehicles (UAVs), employing a dynamic time-division strategy where beam scanning for sensing precedes data communication in each time slot. To maximize the sum communication rate while satisfying a mission-level cumulative radar mutual information (MI) requirement, we jointly optimize the UAV trajectories, communication and sensing power allocation, and the time-division ratio. The resulting highly coupled non-convex optimization problem is efficiently solved using an alternating optimization (AO) and successive convex approximation (SCA) framework, which yields a non-decreasing objective sequence and convergence to a finite objective value under the adopted surrogate-based iterative procedure. Extensive simulation results demonstrate that our proposed joint design significantly outperforms benchmark schemes with static trajectories, partially optimized resources, or non-cooperative single-BS transmission. Furthermore, a comprehensive sensitivity analysis reveals the distinct mechanisms by which sensing thresholds and the number of UAVs influence resource allocation and spatial organization, highlighting the critical importance of dynamic, multi-dimensional resource management for effectively navigating the sensing-communication trade-off in low-altitude economies.
Generating Sustainability-Targeting Attacks For Cyber-Physical Systems
Sustainability-targeting attacks (STA) are a growing threat to cyber-physical system (CPS)-based infrastructure, as sustainability goals become an integral part of CPS objectives. STA can be especially disruptive if it impacts the long-term sustainability cost of CPS, while its performance goals remain within acceptable parameters. Thus, in this work, we propose a general mathematical framework for modeling such stealthy STA and derive the feasibility conditions for generating a minimum-effort maximum-impact STA on a linear CPS using a max-min formulation. A gradient ascent descent algorithm is used to construct this attack policy with an added constraint on stealthiness. An illustrative example has been simulated to demonstrate the impact of the generated attack on the sustainability cost of the CPS.
comment: 10 pages, 3 figures
IKSPARK: Obstacle-Aware Inverse Kinematics via Convex Optimization
Inverse kinematics (IK) is central to robot control and motion planning, yet its nonlinear kinematic mapping makes it inherently nonconvex and particularly challenging under complex constraints. We present IKSPARK (Inverse Kinematics using Semidefinite Programming And RanK minimization), an obstacle-aware IK solver for robots with diverse morphologies, including open and closed kinematic chains with spherical, revolute, and prismatic joints. Our formulation expresses IK as a semidefinite programming (SDP) problem with additional rank-1 constraints on symmetric matrices with fixed traces. IKSPARK first solves the relaxed SDP, whose infeasibility certifies infeasibility of the original IK problem, and then recovers a rank-1 solution using iterative rank-minimization methods with proven local convergence. Obstacle avoidance is handled through a convexified formulation of mixed-integer constraints. Extensive experiments show that IKSPARK computes highly accurate solutions across various kinematic structures and constrained environments without post-processing. In obstacle-rich settings, especially fixed workcell environments, IKSPARK achieves substantially higher success rates than traditional nonlinear optimization methods.
Scrap Composition Estimation in EAF and BOF: State-Space Models, Hyperparameters, and Validation
Accurate knowledge of scrap composition can increase the usage of recycled material to produce steel, reducing the need for raw ore extraction and minimizing environmental impact by conserving natural resources and lowering carbon emissions. First, we introduce two state-space models for the elemental composition of scrap in Electric Arc Furnaces (EAF) and Basic Oxygen Furnaces (BOF): a linear model for elements that transfer entirely into steel, and a non-linear model for elements that partition between steel and slag. The models are fitted with the Kalman filter and the unscented Kalman filter, respectively, using only data already collected in the standard steel production process. Crucially, the resulting scrap composition estimates can in turn be used to predict the elemental composition of future steel production. Second, we analyze how key hyperparameters affect estimation accuracy and stability, and we provide practical guidelines for tuning them from expert knowledge and historical data. Third, we validate the models on real BOF data from ArcelorMittal, using Cu and Cr as representative elements. Both filters outperform windowed non-negative least squares regression, a strong baseline method for scrap composition estimation, yielding reliable real-time estimates of scrap composition.
comment: 25 pages, 4 figures
Data-Driven Regularized Time-Limited h2 Model Reduction from Noisy Impulse Responses
This paper develops a data-driven time-limited h2 model reduction method for discrete-time linear time-invariant systems. Specifically, we formulate and solve a regularized time-limited h2 model reduction problem using only noisy impulse response data. Furthermore, we show that the objective function and its gradient can be represented using only noisy impulse response data. Numerical experiments using SLICOT benchmarks demonstrate that the proposed regularized method achieves lower relative time-limited h2 errors than the tested alternatives and is effective in situations where the unregularized method may deteriorate under noise.
comment: Accepted for publication in IEEE Control Systems Letters (L-CSS)
Robust Time-Varying Control Barrier Functions with Sector-Bounded Nonlinearities
This paper presents a novel approach for ensuring safe operation of systems subject to input nonlinearities and time-varying safety constraints. We extend the time-varying barrier function framework to address time-varying safety constraints and explicitly account for control-dependent nonlinearities at the plant input. Guaranteed bounds on the input-output behavior of these nonlinearities are provided through pointwise-in-time quadratic constraints. The result is a class of robust time-varying control barrier functions that define a safety filter. This filter ensures robust safety for all admissible nonlinearities while minimally modifying the command generated by a baseline controller. We derive a second-order cone program (SOCP) to compute this safety filter online and provide novel feasibility conditions for ball-constrained inputs. The proposed approach is demonstrated on a spacecraft docking maneuver.
Input-to-State Stability of Gradient Flows in Distributional Space
This paper proposes a new notion of distributional Input-to-State Stability (dISS) for dynamic systems evolving in probability spaces over a domain. Unlike other norm-based ISS concepts, we rely on the Wasserstein metric, which captures more precisely the effects of the disturbances on atomic and non-atomic measures. We show how dISS unifies both ISS and Noise to State Stability (NSS) over compact domains for particle dynamics, while extending the classical notions to sets of probability distributions. We then apply the dISS framework to study the robustness of various Wasserstein gradient flows with respect to perturbations. In particular, we establish dISS for gradient flows defined by a class of $l$-smooth and $λ$-convex functionals subject to bounded disturbances, such as those induced by entropy in optimal transport. Further, we study the dISS robustness of the large-scale algorithms when using Kernel and sample-based approximations. This results into a characterization of the error incurred when using a finite number of agents, which can guide the selection of the swarm size to achieve a mean-field objective with prescribed accuracy and stability guarantees.
comment: 11 pages, 5 Figures, submitted to the 2026 Conference on Decision and Control
Fast and Certified Bounding of Security-Constrained DCOPF via Interval Bound Propagation
Security-Constrained DC Optimal Power Flow (SC DCOPF) is an important tool for transmission system operators, enabling economically efficient and physically secure dispatch decisions. Although CPU-based commercial solvers (e.g., Gurobi) can efficiently solve SC-DCOPF problems with a reasonable number of security constraints, their performance degrades rapidly as both system size and the number of contingencies grow into thousands. In this paper, we design a computational graph representation of the SC-DCOPF-based market-clearing problem, inspired by the third ARPA-E Grid Optimization Competition. Using a tool from the neural network verification community known as Interval Bound Propagation (IBP), we quickly compute bounds on the optimal objective across the full set of N-1 contingencies. Our results demonstrate that IBP can compute certified bounds with mean optimal solution gaps below 3.98% on small cases, and it can efficiently scale up to 8,316 bus systems with thousands of contingencies.
Robotics
Three-Step Nav: A Hierarchical Global-Local Planner for Zero-Shot Vision-and-Language Navigation AISTATS 2026
Breakthrough progress in vision-based navigation through unknown environments has been achieved by using multimodal large language models (MLLMs). These models can plan a sequence of motions by evaluating the current view at each time step against the task and goal given to the agent. However, current zero-shot Vision-and-Language Navigation (VLN) agents powered by MLLMs still tend to drift off course, halt prematurely, and achieve low overall success rates. We propose Three-Step Nav to counteract these failures with a three-view protocol: First, "look forward" to extract global landmarks and sketch a coarse plan. Then, "look now" to align the current visual observation with the next sub-goal for fine-grained guidance. Finally, "look backward" audits the entire trajectory to correct accumulated drift before stopping. Requiring no gradient updates or task-specific fine-tuning, our planner drops into existing VLN pipelines with minimal overhead. Three-Step Nav achieves state-of-the-art zero-shot performance on the R2R-CE and RxR-CE dataset. Our code is available at https://github.com/ZoeyZheng0/3-step-Nav.
comment: Accepted to AISTATS 2026. Code: https://github.com/ZoeyZheng0/3-step-Nav
Bi-Level Optimization for Contact and Motion Planning in Rope-Assisted Legged Robots
This paper presents a planning pipeline framework for locomotion in rope-assisted robots climbing vertical surfaces. The proposed framework is formulated as a bi-level optimization scheme that addresses a mixed-integer problem: selecting feasible terrain regions for landing while simultaneously optimizing the control inputs, namely rope tensions and leg forces, and landing location. The outer level of the optimization is solved using the Cross-Entropy Method, while the inner level relies on gradient-based nonlinear optimization to compute dynamically feasible motions. The approach is validated on a novel climbing robot platform, ALPINE, across a variety of challenging terrain configurations.
Safe Navigation using Neural Radiance Fields via Reachable Sets
Safe navigation in cluttered environments is an important challenge for autonomous systems. Robots navigating through obstacle ridden scenarios need to be able to navigate safely in the presence of obstacles, goals, and ego objects of varying geometries. In this work, reachable set representations of the robot's real-time capabilities in the state space can be utilized to capture safe navigation requirements. While neural radiance fields (NeRFs) are utilized to compute, store, and manipulate the volumetric representations of the obstacles, or ego vehicle, as needed. Constrained optimal control is employed to represent the resulting path planning problem, involving linear matrix inequality constraints. We present simulation results for path planning in the presence of numerous obstacles in two different scenarios. Safe navigation is demonstrated through using reachable sets in the corresponding constrained optimal control problems.
comment: 5 pages, 8 figures, 2026 4th International Conference on Mechatronics, Control and Robotics (ICMCR)
Stochastic Entanglement of Deterministic Origami Tentacles For Universal Robotic Gripping
Origami-inspired robotic grippers have shown promising potential for object manipulation tasks due to their compact volume and mechanical flexibility. However, robust capture of objects with random shapes in dynamic working environments often comes at the cost of additional actuation channels and control complexity. Here, we introduce a tendon-driven origami tentacle gripper capable of universal object gripping by exploiting a synergy between local, deterministic deformation programming and global, stochastic entanglements. Each origami tentacle is made by cutting thin Mylar sheets; It features carefully placed holes for routing an actuation tendon, origami creases for controlling the deformation, and a tapered shape. By tailoring these design features, one can prescribe the shrinking, bending, and twisting deformation, eventually creating deterministic coiling with a simple tendon pull. Then, when multiple coiling tentacles are placed in proximity, stochastic entanglement emerges, allowing the tentacles to braid, knot, and grip objects with random shapes. We derived a simulation model by integrating origami mechanics with Cosserat rods to correlate origami design, tendon deformation, and their collective gripping performance. Then, we experimentally tested how these coiling and entangling origami tentacles can grasp objects under gravity and in water. A stow-and-release deployment mechanism was also tested to simulate in-orbit grasping. Overall, the entertaining origami tentacle gripper presents a new strategy for robust object grasping with simple design and actuation.
Edge AI for Automotive Vulnerable Road User Safety: Deployable Detection via Knowledge Distillation
Deploying accurate object detection for Vulnerable Road User (VRU) safety on edge hardware requires balancing model capacity against computational constraints. Large models achieve high accuracy but fail under INT8 quantization required for edge deployment, while small models sacrifice detection performance. This paper presents a knowledge distillation (KD) framework that trains a compact YOLOv8-S student (11.2M parameters) to mimic a YOLOv8-L teacher (43.7M parameters), achieving 3.9x compression while preserving quantization robustness. We evaluate on full-scale BDD100K (70K training images) with Post-Training Quantization to INT8. The teacher suffers catastrophic degradation under INT8 (-23% mAP), while the KD student retains accuracy (-5.6% mAP). Analysis reveals that KD transfers precision calibration rather than raw detection capacity: the KD student achieves 0.748 precision versus 0.653 for direct training at INT8, a 14.5% gain at equivalent recall, reducing false alarms by 44% versus the collapsed teacher. At INT8, the KD student exceeds the teacher's FP32 precision (0.748 vs. 0.718) in a model 3.9x smaller. These findings establish knowledge distillation as a requirement for deploying accurate, safety-critical VRU detection on edge hardware.
comment: 6 pages, 3 figures
STARRY: Spatial-Temporal Action-Centric World Modeling for Robotic Manipulation
Robotic manipulation critically requires reasoning about future spatial-temporal interactions, yet existing VLA policies and world-model-enhanced policies do not fully model action-relevant spatial-temporal interaction structure. We propose STARRY, a world-model-enhanced action-generation policy that aligns spatial-temporal prediction with action generation. STARRY jointly denoises future spatial-temporal latents and action sequences, and introduces Geometry-Aware Selective Attention Modulation to convert predicted depth and end-effector geometry into token-aligned weights for selective action-attention modulation. On RoboTwin 2.0, STARRY achieves 93.82% / 93.30% average success under Clean and Randomized settings. Real-world experiments further improve average success from 42.5% to 70.8% over $π_{0.5}$, demonstrating the effectiveness of action-centric spatial-temporal world modeling for spatial-temporally demanding robotic action generation.
comment: 19 pages
Walk With Me: Long-Horizon Social Navigation for Human-Centric Outdoor Assistance
Assisting humans in open-world outdoor environments requires robots to translate high-level natural-language intentions into safe, long-horizon, and socially compliant navigation behavior. Existing map-based methods rely on costly pre-built HD maps, while learning-based policies are mostly limited to indoor and short-horizon settings. To bridge this gap, we propose Walk with Me, a map-free framework for long-horizon social navigation from high-level human instructions. Walk with Me leverages GPS context and lightweight candidate points-of-interest from a public map API for semantic destination grounding and waypoint proposal. A High-Level Vision-Language Model grounds abstract instructions into concrete destinations and plans coarse waypoint sequences. During execution, an observation-aware routing mechanism determines whether the Low-Level Vision-Language-Action policy can handle the current situation or whether explicit safety reasoning from the High-Level VLM is needed. Routine segments are executed by the Low-Level VLA, while complex situations such as crowded crossings trigger high-level reasoning and stop-and-wait behavior when unsafe. By combining semantic intent grounding, map-free long-horizon planning, safety-aware reasoning, and low-level action generation, Walk with Me enables practical outdoor social navigation for human-centric assistance.
Rule-based High-Level Coaching for Goal-Conditioned Reinforcement Learning in Search-and-Rescue UAV Missions Under Limited-Simulation Training
This paper presents a hierarchical decision-making framework for unmanned aerial vehicle (UAV) missions motivated by search-and-rescue (SAR) scenarios under limited simulation training. The framework combines a fixed rule-based high-level advisor with an online goal-conditioned low-level reinforcement learning (RL) controller. To stress-test early adaptation, we also consider a strict no-pretraining deployment regime. The high-level advisor is defined offline from a structured task specification and compiled into deterministic rules. It provides interpretable mission- and safety-aware guidance through recommended actions, avoided actions, and regime-dependent arbitration weights. The low-level controller learns online from task-defined dense rewards and reuses experience through a mode-aware prioritized replay mechanism augmented with rule-derived metadata. We evaluate the framework on two tasks: battery-aware multi-goal delivery and moving-target delivery in obstacle-rich environments. Across both tasks, the proposed method improves early safety and sample efficiency primarily by reducing collision terminations, while preserving the ability to adapt online to scenario-specific dynamics.
Unified 4D World Action Modeling from Video Priors with Asynchronous Denoising
We propose X-WAM, a Unified 4D World Model that unifies real-time robotic action execution and high-fidelity 4D world synthesis (video + 3D reconstruction) in a single framework, addressing the critical limitations of prior unified world models (e.g., UWM) that only model 2D pixel-space and fail to balance action efficiency and world modeling quality. To leverage the strong visual priors of pretrained video diffusion models, X-WAM imagines the future world by predicting multi-view RGB-D videos, and obtains spatial information efficiently through a lightweight structural adaptation: replicating the final few blocks of the pretrained Diffusion Transformer into a dedicated depth prediction branch for the reconstruction of future spatial information. Moreover, we propose Asynchronous Noise Sampling (ANS) to jointly optimize generation quality and action decoding efficiency. ANS applies a specialized asynchronous denoising schedule during inference, which rapidly decodes actions with fewer steps to enable efficient real-time execution, while dedicating the full sequence of steps to generate high-fidelity video. Rather than entirely decoupling the timesteps during training, ANS samples from their joint distribution to align with the inference distribution. Pretrained on over 5,800 hours of robotic data, X-WAM achieves 79.2% and 90.7% average success rate on RoboCasa and RoboTwin 2.0 benchmarks, while producing high-fidelity 4D reconstruction and generation surpassing existing methods in both visual and geometric metrics.
comment: Project website: https://sharinka0715.github.io/X-WAM/
Atomic-Probe Governance for Skill Updates in Compositional Robot Policies
Skill libraries in deployed robotic systems are continually updated through fine-tuning, fresh demonstrations, or domain adaptation, yet existing typed-composition methods (BLADE, SymSkill, Generative Skill Chaining) treat the library as frozen at test time and do not analyze how composition outcomes change when a skill is replaced. We introduce a paired-sampling cross-version swap protocol on robosuite manipulation tasks to characterize this dimension of compositional skill learning. On a dual-arm peg-in-hole task we discover a dominant-skill effect: one ECM achieves 86.7% atomic success rate while every other ECM is at or below 26.7%, and whether this dominant ECM enters a composition shifts the success rate by up to +50pp. We characterize the boundary on a simpler pick task where all atomic policies saturate at 100% and the effect is undefined. Across three tasks we further find that off-policy behavioral distance metrics fail to identify the dominant ECM, ruling out the natural cheap predictor. We propose an atomic-quality probe and a Hybrid Selector combining per-skill probes (zero per-decision cost) with selective composition revalidation (full cost), and characterize its Pareto frontier on 144 skill-update decisions. On T6 the atomic-only probe sits 23pp below full revalidation (64.6% vs 87.5% oracle match) at zero per-decision cost; a Hybrid Selector with m=10 closes most of that gap to ~12pp at 46% of full-revalidation cost. On the cross-task average over 144 events, atomic-only is within 3pp of full revalidation under a mixed-oracle caveat. The atomic-quality probe is, to our knowledge, the first principled, deployment-ready primitive for skill-update governance in compositional robot policies.
comment: 8 pages main text + appendix; 3 figures, 12 tables;
ATLAS: An Annotation Tool for Long-horizon Robotic Action Segmentation
Annotating long-horizon robotic demonstrations with precise temporal action boundaries is crucial for training and evaluating action segmentation and manipulation policy learning methods. Existing annotation tools, however, are often limited: they are designed primarily for vision-only data, do not natively support synchronized visualization of robot-specific time-series signals (e.g., gripper state or force/torque), or require substantial effort to adapt to different dataset formats. In this paper, we introduce ATLAS, an annotation tool tailored for long-horizon robotic action segmentation. ATLAS provides time-synchronized visualization of multi-modal robotic data, including multi-view video and proprioceptive signals, and supports annotation of action boundaries, action labels, and task outcomes. The tool natively handles widely used robotics dataset formats such as ROS bags and the Reinforcement Learning Dataset (RLDS) format, and provides direct support for specific datasets such as REASSEMBLE. ATLAS can be easily extended to new formats via a modular dataset abstraction layer. Its keyboard-centric interface minimizes annotation effort and improves efficiency. In experiments on a contact-rich assembly task, ATLAS reduced the average per-action annotation time by at least 6% compared to ELAN, while the inclusion of time-series data improved temporal alignment with expert annotations by more than 2.8% and decreased boundary error fivefold compared to vision-only annotation tools.
comment: 7 pages, 2 figures, 2 tables
STAR-Filter: Efficient Convex Free-Space Approximation via Starshaped Set Filtering in Noisy Environments
Approximating collision-free space is fundamental to robot planning in complex environments. Convex geometric representations, such as polytopes and ellipsoids, are widely employed due to their structural properties, which can be easily integrated with convex optimization. Iterative optimization-based inflation methods can generate large volume polytopes in cluttered environments, but their efficiency degrades as the obstacle set becomes more complex or when sensor data are noisy. These methods are also sensitive to initialization and often rely on accurate geometric models. In this paper, we propose the STAR-Filter, a lightweight framework that employs starshaped set construction as a fast filter for convex region generation in collision-free space. By identifying obstacle points as active supporting constraints, the proposed method significantly reduces redundant computation while preserving feasibility and robustness to sensor noise. We provide theoretical and numerical analyses that characterize the structural properties of the starshaped set and proposed pipeline in environments of varying complexity. Simulation results show that the proposed framework achieves the lowest computation time and reduces conservativeness in polytope generation for real-world noisy and large-scale data. We demonstrate the effectiveness of the framework for Safe Flight Corridor (SFC) generation and agile quadrotor planning in noisy environments.
Benchmarking the Safety of Large Language Models for Robotic Health Attendant Control
Large language models (LLMs) are increasingly considered for deployment as the control component of robotic health attendants, yet their safety in this context remains poorly characterized. We introduce a dataset of 270 harmful instructions spanning nine prohibited behavior categories grounded in the American Medical Association Principles of Medical Ethics, and use it to evaluate 72 LLMs in a simulation environment based on the Robotic Health Attendant framework. The mean violation rate across all models was 54.4\%, with more than half exceeding 50\%, and violation rates varied substantially across behavior categories, with superficially plausible instructions such as device manipulation and emergency delay proving harder to refuse than overtly destructive ones. Model size and release date were the primary determinants of safety performance among open-weight models, and proprietary models were substantially safer than open-weight counterparts (median 23.7\% versus 72.8\%). Medical domain fine-tuning conferred no significant overall safety benefit, and a prompt-based defense strategy produced only a modest reduction in violation rates among the least safe models, leaving absolute violation rates at levels that would preclude safe clinical deployment. These findings demonstrate that safety evaluation must be treated as a first-class criterion in the development and deployment of LLMs for robotic health attendants.
comment: 20 pages, 9 figures, 3 tables, 8 pages supplementary material
LLM-Flax : Generalizable Robotic Task Planning via Neuro-Symbolic Approaches with Large Language Models
Deploying a neuro-symbolic task planner on a new domain today requires significant manual effort: a domain expert must author relaxation and complementary rules, and hundreds of training problems must be solved to supervise a Graph Neural Network (GNN) object scorer. We propose LLM-Flax, a three-stage framework that eliminates all three sources of manual effort using a locally hosted LLM given only a PDDL domain file. Stage 1 automatically generates relaxation and complementary rules via structured prompting with format validation and self-correction. Stage 2 introduces LLM-guided failure recovery with a feasibility-gated budget policy that explicitly reserves API latency cost before each LLM call, preventing the downstream relaxation fallback from being starved. Stage 3 replaces the domain-trained GNN entirely with zero-shot LLM object importance scoring, requiring no training data. We evaluate all three stages on the MazeNamo benchmark across 10x10, 12x12, and 15x15 grids (8 benchmarks total). LLM-Flax achieves average SR 0.945 versus the manual baseline's 0.828 (+0.117), matching or outperforming manual rules on every one of the eight benchmarks. On 12x12 Expert, LLM-Flax attains SR 0.733 where the manual planner fails entirely (SR 0.000); on 15x15 Hard, it achieves SR 1.000 versus Manual's 0.900. Stage 3 demonstrates feasibility (SR 0.720 on 12x12 Hard with no training data) but faces a context-window bottleneck at scale, pointing to the primary open challenge for future work.
Persona-Based Process Design for Assistive Human-Robot Workplaces for Persons with Disabilities
Human-robot interaction is emerging as an important paradigm for integrating persons with disabilities into the workplace. While these systems can enable individuals to work, their design is mostly personalized, hindering widespread use beyond the individual user. The universal design paradigm is a central pillar of inclusive design, describing usability of systems by all. To incorporate universal design into process design for human-robot workplaces expert knowledge is required that is often not available. To simplify process design of human-robot workplaces, we propose a persona-based design approach. First, typical impairments prevalent in the workforce or particularly relevant for the processes are abstracted into personas with disabilities. The work process is subdivided into sequential actions. For each action and persona, strategies are developed to reach the action goal by a design thinking approach. The resulting actions are ordered by level of robot assistance, i.e. robot involvement, and implemented in a behavior tree. Therefore, the macro-behavior of the workplace may adapt to individual personas online. We demonstrate the method in a collaborative box folding process with a total of seven personas with disabilities. The persona-based process design shows promising results by generating more comprehensive process strategies while enabling adaptive behavior in the sense of universal design.
comment: Accepted at IEEE International Conference on Human-Machine Systems (ICHMS), Singapore, 2026
3D Generation for Embodied AI and Robotic Simulation: A Survey
Embodied AI and robotic systems increasingly depend on scalable, diverse, and physically grounded 3D content for simulation-based training and real-world deployment. While 3D generative modeling has advanced rapidly, embodied applications impose requirements far beyond visual realism: generated objects must carry kinematic structure and material properties, scenes must support interaction and task execution, and the resulting content must bridge the gap between simulation and reality. This survey presents the first survey of 3D generation for embodied AI and organizes the literature around three roles that 3D generation plays in embodied systems. In \emph{Data Generator}, 3D generation produces simulation-ready objects and assets, including articulated, physically grounded, and deformable content for downstream interaction; in \emph{Simulation Environments}, it constructs interactive and task-oriented worlds, spanning structure-aware, controllable, and agentic scene generation; and in \emph{Sim2Real Bridge}, it supports digital twin reconstruction, data augmentation, and synthetic demonstrations for downstream robot learning and real-world transfer. We also show that the field is shifting from visual realism toward interaction readiness, and we identify the main bottlenecks, including limited physical annotations, the gap between geometric quality and physical validity, fragmented evaluation, and the persistent sim-to-real divide, that must be addressed for 3D generation to become a dependable foundation for embodied intelligence. Our project page is at https://3dgen4robot.github.io.
comment: 26 pages, 11 figures, 8 tables. Project Page: https://3dgen4robot.github.io
HiPAN: Hierarchical Posture-Adaptive Navigation for Quadruped Robots in Unstructured 3D Environments
Navigating quadruped robots in unstructured 3D environments poses significant challenges, requiring goal-directed motion, effective exploration to escape from local minima, and posture adaptation to traverse narrow, height-constrained spaces. Conventional approaches employ a sequential mapping-planning pipeline but suffer from accumulated perception errors and high computational overhead, restricting their applicability on resource-constrained platforms. To address these challenges, we propose Hierarchical Posture-Adaptive Navigation (HiPAN), a framework that operates directly on onboard depth images at deployment. HiPAN adopts a hierarchical design: a high-level policy generates strategic navigation commands (planar velocity and body posture), which are executed by a low-level, posture-adaptive locomotion controller. To mitigate myopic behaviors and facilitate long-horizon navigation, we introduce Path-Guided Curriculum Learning, which progressively extends the navigation horizon from reactive obstacle avoidance to strategic navigation. In simulation, HiPAN achieves higher navigation success rates and greater path efficiency than classical reactive planners and end-to-end baselines, while real-world experiments further validate its applicability across diverse, unstructured 3D environments.
comment: Accepted to RA-L 2026 | Project page: https://sgvr.kaist.ac.kr/~Jeil/project_page_HiPAN/
Alter-Art: Exploring Embodied Artistic Creation through a Robot Avatar
As with every emerging technology, new tools in the hands of artists reshape the nature of artwork creation. Current frameworks for robotics in arts deploy the robot as an autonomous creator or a collaborator, thus leaving a certain gap between the human artist and the machine. Now, we stand at the dawn of an era where artists can escape physical limitations and reshape their creative identity by inhabiting an alternative body. This new paradigm allows artists not only to command a robot remotely, but also to {\it be} a robot, to see and feel through it, experiencing a new embodied reality. Unlike virtual reality, where art is created in a digital dimension, in this case art creation is still firmly grounded in the material world: clay molded by mechanical hands, paint swept across a canvas or gestures performed on a physical stage alongside human actors. Through the robot avatar Alter-Ego, we explore the Alter-Art paradigm in dance, theater, and painting; it integrates immersive teleoperation and compliant actuation to enable a first-person creative experience. Analyzing qualitative artistic feedback, we investigate how embodiment shapes creative agency, identity and interaction with the environment. Our findings suggest that artists rapidly develop a sense of presence within the robotic body. The robot's physical constraints influence the creative process, manifesting differently across artistic domains. We highlight embodiment as a central design principle, contributing to social robotics and expanding the possibilities for telepresence and accessible artistic expression.
comment: 12 pages, 6 figures
Reactive Motion Generation via Phase-varying Neural Potential Functions
Dynamical systems (DS) methods for Learning-from-Demonstration (LfD) provide stable, continuous policies from few demonstrations. First-order dynamical systems (DS) are effective for many point-to-point and periodic tasks, as long as a unique velocity is defined for each state. For tasks with intersections (e.g., drawing an "8"), extensions such as second-order dynamics or phase variables are often used. However, by incorporating velocity, second-order models become sensitive to disturbances near intersections, as velocity is used to disambiguate motion direction. Moreover, this disambiguation may fail when nearly identical position-velocity pairs correspond to different onward motions. In contrast, phase-based methods rely on open-loop time or phase variables, which limit their ability to recover after perturbations. We introduce Phase-varying Neural Potential Functions (PNPF), an LfD framework that conditions a potential function on a phase variable which is estimated directly from state progression, rather than on open-loop temporal inputs. This phase variable allows the system to handle state revisits, while the learned potential function generates local vector fields for reactive and stable control. PNPF generalizes effectively across point-to-point, periodic, and full 6D motion tasks, outperforms existing baselines on trajectories with intersections, and demonstrates robust performance in real-time robotic manipulation under external disturbances.
comment: Accepted by IEEE Robotics and Automation Letters (RAL)
Split over $n$ resource sharing problem: Are fewer capable agents better than many simpler ones?
In multi-agent systems, should limited resources be concentrated into a few capable agents or distributed among many simpler ones? This work formulates the split over $n$ resource sharing problem where a group of $n$ agents equally shares a common resource (e.g., monetary budget, computational resources, physical size). We present a case study in multi-agent coverage where the area of the disk-shaped footprint of agents scales as $1/n$. A formal analysis reveals that the initial coverage rate grows with $n$. However, if the speed of agents decreases proportionally with their radii, groups of all sizes perform equally well, whereas if it decreases proportionally with their footprints, a single agent performs best. We also present computer simulations in which resource splitting increases the failure rates of individual agents. The models and findings help identify optimal distributiveness levels and inform the design of multi-agent systems under resource constraints.
comment: Short paper presented at the 15th International Conference on Swarm Intelligence (ANTS 2026)
2D and 3D Grasp Planners for the GET Asymmetrical Gripper
In this paper, we introduce GET-2D-1.0, a fast grasp planner for the GET asymmetrical gripper that operates from a single-view RGB-D image, using the Ferrari-Canny metric and a novel sampling strategy, and GET-3D-1.0, a mesh-based method using a 3D gripper model and ray-tracing. We evaluate both grasp planners against baselines with physical experiments, which suggest that GET-2D-1.0 can improve over a bounding box baseline by over 40% in lift success, shake survival, and force resistance. Experiments with GET-3D-1.0 suggest slight improvement compared to GET-2D-1.0 on lift success and shake survival, but are more computationally expensive, averaging 17 seconds of planning compared to 683 ms for GET-2D-1.0.
Lights Out: A Nighttime UAV Localization Framework Using Thermal Imagery and Semantic 3D Maps
Reliable backup localization for unmanned aerial vehicles (UAVs) operating in GNSS-denied nighttime conditions remains an open challenge due to the severe modality gap between daytime RGB maps and nighttime thermal imagery. This work presents a semantic reprojection framework for map-relative nighttime UAV localization by aligning segmented thermal observations with a globally referenced, semantically labeled 3D map constructed from daytime RGB data. Rather than relying on appearance-based correspondence, localization is formulated in a shared semantic domain and solved via a symmetric bidirectional reprojection objective with confusion-aware weighting to improve robustness under segmentation uncertainty. The approach is evaluated offline across 6.5 km of nighttime, real-world UAV flight trajectories in urban and semi-structured environments. Relative to RTK GNSS ground truth, the system achieves a bias-corrected RMSE2D of 2.18 m and a median RMSE2D of 1.52 m. Results show that localization performance is strongly correlated with the availability of semantic edge evidence and that large-error events are spatially localized to semantically ambiguous areas rather than uniformly distributed. These findings indicate that semantic reprojection offers a promising pathway toward globally referenced nighttime UAV localization using thermal imagery alone.
comment: 8 pages, 4 figures, accepted to ICUAS 2025
From Prompt to Physical Actuation: Holistic Threat Modeling of LLM-Enabled Robotic Systems
As large language models are integrated into autonomous robotic systems for task planning and control, compromised inputs or unsafe model outputs can propagate through the planning pipeline to physical-world consequences. Although prior work has studied robotic cybersecurity, adversarial perception attacks, and LLM safety independently, no existing study traces how these threat categories interact and propagate across trust boundaries in a unified architectural model. We address this gap by modeling an LLM-enabled autonomous robot in an edge-cloud architecture as a hierarchical Data Flow Diagram and applying STRIDE-per-interaction analysis across six boundary-crossing interaction points using a three-category taxonomy of Conventional Cyber Threats, Adversarial Threats, and Conversational Threats. The analysis reveals that these categories converge at the same boundary crossings, and we trace three cross-boundary attack chains from external entry points to unsafe physical actuation, each exposing a distinct architectural property: the absence of independent semantic validation between user input and actuator dispatch, cross-modal translation from visual perception to language-model instruction, and unmediated boundary crossing through provider-side tool use. To our knowledge, this is the first DFD-based threat analysis integrating all three threat categories across the full perception-planning-actuation pipeline of an LLM-enabled robotic system.
comment: Submitted to 23rd Annual International Conference on Privacy, Security, and Trust (PST2026)
Learning Tactile-Aware Quadrupedal Loco-Manipulation Policies
Quadrupedal loco-manipulation is commonly built on visual perception and proprioception. Yet reliable contact-rich manipulation remains difficult: vision and proprioception alone cannot resolve uncertain, evolving interactions with the environment. Tactile sensing offers direct contact observability, but scalable tactile-aware learning framework for quadrupedal loco-manipulation is still underexplored. In this paper, we present a tactile-aware loco-manipulation policy learning pipeline with a hierarchical structure. Our approach has two key components. First, we leverage real-world human demonstrations to train a tactile-conditioned visuotactile high-level policy. This policy predicts not only end-effector trajectories for manipulation, but also the evolving tactile interaction cues that characterize how contact should develop over time. Second, we perform large-scale reinforcement learning in simulation to learn a tactile-aware whole-body control policy that tracks diverse commanded trajectories and tactile interaction cues, and transfers zero-shot to the real world. Together, these components enable coordinated locomotion and manipulation under contact-rich scenarios. We evaluate the system on real-world contact-rich tasks, including in-hand reorientation with insertion, valve tightening, and delicate object manipulation. Compared to vision-only and visuotactile baselines, our method improves performance by 28.54% on average across these tasks.
Real-Time GPU-Accelerated Monte Carlo Evaluation of Safety-Critical AEB Systems Under Uncertainty
Automatic Emergency Braking (AEB) systems represent a safety-critical national interest, with the National Highway Traffic Safety Administration (NHTSA) Federal Motor Vehicle Safety Standard (FMVSS No. 127) requiring AEB in all new light vehicles sold in the United States by September 2029. However, production implementations frequently rely on deterministic stopping-distance or Time-to-Collision (TTC) thresholds that fail to capture uncertainty in sensing, road conditions, and vehicle dynamics. This paper presents a GPU-accelerated Monte Carlo framework for stochastic evaluation of emergency braking performance using a high-fidelity longitudinal vehicle model incorporating aerodynamic drag, road grade, brake actuator dynamics, and weight transfer effects. A one-thread-per-sample execution strategy exploits the independence of Monte Carlo rollouts, while deterministic CPU-generated sampling ensures bit-exact numerical consistency between CPU and GPU implementations. The framework is evaluated across four hardware platforms spanning development and deployment environments: two laptop GPUs (GTX 1650, RTX 5070) and two automotive-grade embedded platforms (Jetson Orin Nano, Jetson AGX Orin). Peak speedups of 54.57x are achieved while maintaining exact numerical agreement. Real-time feasibility analysis with a complete AEB timing budget (700 ms human reaction time minus 120 ms perception and 50 ms decision overhead) demonstrates that the Jetson AGX Orin can execute approximately 25,000 Monte Carlo samples within a 530 ms budget, enabling real-time probabilistic AEB evaluation as part of a complete embedded pipeline. These results establish Monte Carlo-based uncertainty evaluation as a deployable runtime component rather than an offline validation tool and provide quantitative guidance for risk-aware AEB threshold selection under the NHTSA final rule.
comment: 10 pages, 6 figures. Submitted to IEEE journal for possible publication; under review
Global Sampling-Based Trajectory Optimization for Contact-Rich Manipulation via KernelSOS
Contact-rich manipulation is challenging due to its high dimensionality, the requirement for long time horizons, and the presence of hybrid contact dynamics. Sampling-based methods have become a popular approach for this class of problems, but without explicit mechanisms for global exploration, they are susceptible to converging to poor local minima. In this paper, we introduce Global-MPPI, a unified trajectory optimization framework that integrates global exploration and local refinement. At the global level, we leverage kernel sum-of-squares optimization to identify globally promising regions of the solution space. To enable reliable performance for the non-smooth landscapes inherent to contact-rich manipulation, we introduce a graduated non-convexity strategy based on log-sum-exp smoothing, which transitions the optimization landscape from a smoothed surrogate to the original non-smooth objective. Finally, we employ the model-predictive path integral method to locally refine the solution. We evaluate Global-MPPI on high-dimensional, long-horizon contact-rich tasks, including the PushT task and dexterous in-hand manipulation. Experimental results demonstrate that our approach robustly uncovers high-quality solutions, achieving faster convergence and lower final costs compared to existing baseline methods.
comment: 8 pages, 5 figures
The Field of Safe Motion: Operationalizing Affordances in the Field of Safe Travel Using Reachability Analysis
We present the Field of Safe Motion (FSM), a quantitative safety model for determining whether a driver maintains a collision-free escape route, or "out," at any given moment by accounting for that driver's physical capabilities and the foreseeable actions of other road users. The Field of Safe Travel (FST) provides a framework for representing the types of sensory information and actions available to drivers. However, the FST has remained conceptual in nature since its initial publication almost 90 years ago -- and a concrete computational operationalization is still lacking. At the same time, reachability analysis provides a quantitative basis for assessing the possible actions available to road users, using interpretable kinematic models, but reachability models have so far remained confined largely to the engineering and robotics literature. Bringing these two approaches together provides for an interpretable, quantitative tool for assessing driving behavior across a wide range of driving scenarios. Beyond being interpretable, our approach relies on a relatively small set of basic assumptions that are easy to enumerate and reason about. Furthermore, an interpretable reachability model paired with kinematic assumptions provides a way to bound uncertainty about road users' reasonably foreseeable future locations. We demonstrate the applicability of the FSM to different driving scenarios and discuss the strengths and weaknesses of the model.
PALCAS: A Priority-Aware Intelligent Lane Change Advisory System for Autonomous Vehicles using Federated Reinforcement Learning
We present a priority-aware intelligent lane change advisory system based on multi-agent federated reinforcement learning, namely PALCAS, for autonomous vehicles (AVs). While existing lane-change approaches typically focus on single-agent systems or centralized multi-agent systems, we introduce a federated reinforcement learning-based multi-agent lane change system prioritizing lane changing based on vehicle destination urgency. PALCAS incorporates a novel priority-aware safe lane-change reward function to enable judicious lane-change decisions in both mandatory and discretionary scenarios. PALCAS leverages the parameterized deep Q-network (PDQN) algorithm to facilitate effective cooperation among agents, enabling both lateral and longitudinal motion controls of AVs. Extensive simulations conducted using the SUMO traffic simulator and Mosaic V2X communication framework demonstrate that PALCAS significantly improves traffic efficiency, driving safety, comfort, destination arrival rates, and merging success rates compared to baseline methods.
Reconstruction by Generation: 3D Multi-Object Scene Reconstruction from Sparse Observations
Accurately reconstructing complex full multi-object scenes from sparse observations remains a core challenge in computer vision and a key step toward scalable and reliable simulation for robotics. In this work, we introduce RecGen, a generative framework for probabilistic joint estimation of object and part shapes, as well as their pose under occlusion and partial visibility from one or multiple RGB-D images. By leveraging compositional synthetic scene generation and strong 3D shape priors, RecGen generalizes across diverse object types and real-world environments. RecGen achieves state-of-the-art performance on complex, heavily occluded datasets, robustly handling severe occlusions, symmetric objects, object parts, and intricate geometry and texture. Despite using nearly 80% fewer training meshes than the previous state of the art SAM3D, RecGen outperforms it by 30.1% in geometric shape quality, 9.1% in texture reconstruction, and 33.9% in pose estimation.
comment: Website: https://reconstruction-by-generation.github.io
Interaction Forces and Internal Loads in Parallel Manipulators with Actuation Redundancy
This paper discusses null-space wrench components in parallel manipulators. We examine the adaptation of the two most common characterizations of these components in grasp-like systems, namely, interaction forces and internal loads, to parallel manipulators with actuation redundancy. We identify critical oversights in the existing literature on the subject, resolve ambiguities related to the definitions of interaction forces and internal loads, and provide explicit methods for synthesizing equilibrating and manipulating joint torque vectors. A case study is also provided to justify the validity of our novel methods and correct erroneous results reported in the literature.
comment: 13 pages, 11 figures. Submitted to Mechanism and Machine Theory
FASTER: Rethinking Real-Time Flow VLAs FAST
Real-time execution is crucial for deploying Vision-Language-Action (VLA) models in the physical world. Existing asynchronous inference methods primarily optimize trajectory smoothness, but neglect the critical latency in reacting to environmental changes. By rethinking the notion of reaction in action chunking policies, this paper presents a systematic analysis of the factors governing reaction time. We show that reaction time follows a uniform distribution determined jointly by the Time to First Action (TTFA) and the execution horizon. Moreover, we reveal that the standard practice of applying a constant schedule in flow-based VLAs can be inefficient and forces the system to complete all sampling steps before any movement can start, forming the bottleneck in reaction latency. To overcome this issue, we propose Fast Action Sampling for ImmediaTE Reaction (FASTER). By introducing a Horizon-Aware Schedule, FASTER adaptively prioritizes near-term actions during flow sampling, compressing the denoising of the immediate reaction by tenfold (e.g., in $π_{0.5}$ and X-VLA) into a single step, while preserving the quality of long-horizon trajectory. Coupled with a streaming client-server pipeline, FASTER substantially reduces the effective reaction latency on real robots, especially when deployed on consumer-grade GPUs. Real-world experiments, including a highly dynamic table tennis task, prove that FASTER unlocks unprecedented real-time responsiveness for generalist policies, enabling rapid generation of accurate and smooth trajectories.
comment: Project page: https://innovator-zero.github.io/FASTER
DC-Ada: Reward-Only Decentralized Sensor Adaptation for Heterogeneous Multi-Robot Teams
Heterogeneity is a defining feature of deployed multi-robot teams: platforms often differ in sensing modalities, ranges, fields of view, and failure patterns. Controllers trained under nominal sensing can degrade sharply when deployed on robots with missing or mismatched sensors, even when the task and action interface are unchanged. We present DC-Ada, a reward-only decentralized adaptation method that keeps a pretrained shared policy frozen and instead adapts compact per-robot observation transforms to map heterogeneous sensing into a fixed inference interface. DC-Ada is gradient-free and communication-minimal: it uses budgeted accept/reject random search with short common-random-number rollouts under a strict step budget. We evaluate DC-Ada against four baselines in a deterministic 2D multi-robot simulator covering warehouse logistics, search and rescue, and collaborative mapping, across four heterogeneity regimes (H0--H3) and five seeds with a matched budget of $200{,}000$ joint environment steps per run. Results show that heterogeneity can substantially degrade a frozen shared policy and that no single mitigation dominates across all tasks and metrics. Observation normalization is strongest for reward robustness in warehouse logistics and competitive in search and rescue, while the frozen shared policy is strongest for reward in collaborative mapping. DC-Ada offers a useful complementary operating point: it improves completion most clearly in severe coverage-based mapping while requiring only scalar team returns and no policy fine-tuning or persistent communication. These results position DC-Ada as a practical deploy-time adaptation method for heterogeneous teams.
The Alignment Flywheel: A Governance-Centric Hybrid MAS for Architecture-Agnostic Safety AAMAS 2026
Multi-agent systems provide mature methodologies for role decomposition, coordination, and normative governance, capabilities that remain essential as increasingly powerful autonomous decision components are embedded within agent-based systems. While learned and generative models substantially expand system capability, their safety behavior is often entangled with training, making it opaque, difficult to audit, and costly to update after deployment. This paper formalizes the Alignment Flywheel as a governance-centric hybrid MAS architecture that decouples decision generation from safety governance. A Proposer, representing any autonomous decision component, generates candidate trajectories, while a Safety Oracle returns raw safety signals through a stable interface. An enforcement layer applies explicit risk policy at runtime, and a governance MAS supervises the Oracle through auditing, uncertainty-driven verification, and versioned refinement. The central engineering principle is patch locality: many newly observed safety failures can be mitigated by updating the governed oracle artifact and its release pipeline rather than retracting or retraining the underlying decision component. The architecture is implementation-agnostic with respect to both the Proposer and the Safety Oracle, and specifies the roles, artifacts, protocols, and release semantics needed for runtime gating, audit intake, signed patching, and staged rollout across distributed deployments. The result is a hybrid MAS engineering framework for integrating highly capable but fallible autonomous systems under explicit, version-controlled, and auditable oversight.
comment: Accepted for the EMAS workshop at AAMAS 2026
A Multimodal Depth-Aware Method For Embodied Reference Understanding ICASSP 2026
Embodied Reference Understanding requires identifying a target object in a visual scene based on both language instructions and pointing cues. While prior works have shown progress in open-vocabulary object detection, they often fail in ambiguous scenarios where multiple candidate objects exist in the scene. To address these challenges, we propose a novel ERU framework that jointly leverages LLM-based data augmentation, depth-map modality, and a depth-aware decision module. This design enables robust integration of linguistic and embodied cues, improving disambiguation in complex or cluttered environments. Experimental results on two datasets demonstrate that our approach significantly outperforms existing baselines, achieving more accurate and reliable referent detection.
comment: Accepted by ICASSP 2026
Bridging Discrete Planning and Continuous Execution for Redundant Robot
Voxel-grid reinforcement learning is widely adopted for path planning in redundant manipulators due to its simplicity and reproducibility. However, direct execution through point-wise numerical inverse kinematics on 7-DoF arms often yields step-size jitter, abrupt joint transitions, and instability near singular configurations. This work proposes a bridging framework between discrete planning and continuous execution without modifying the discrete planner itself. On the planning side, step-normalized 26-neighbor Cartesian actions and a geometric tie-breaking mechanism are introduced to suppress unnecessary turns and eliminate step-size oscillations. On the execution side, a task-priority damped least-squares (TP-DLS) inverse kinematics layer is implemented. This layer treats end-effector position as a primary task, while posture and joint centering are handled as subordinate tasks projected into the null space, combined with trust-region clipping and joint velocity constraints. On a 7-DoF manipulator in random sparse, medium, and dense environments, this bridge raises planning success in dense scenes from about 0.58 to 1.00, shortens representative path length from roughly 1.53 m to 1.10 m, and while keeping end-effector error below 1 mm, reduces peak joint accelerations by over an order of magnitude, substantially improving the continuous execution quality of voxel-based RL paths on redundant manipulators.
comment: 8 pages, 3 figures. Submitted to IFAC World Congress 2026
Learning Vision-Based Omnidirectional Navigation: A Teacher-Student Approach Using Monocular Depth Estimation
Reliable obstacle avoidance in industrial settings demands 3D scene understanding, but widely used 2D LiDAR sensors perceive only a single horizontal slice of the environment, missing critical obstacles above or below the scan plane. We present a teacher-student framework for vision-based mobile robot navigation that eliminates the need for LiDAR sensors. A teacher policy trained via Proximal Policy Optimization (PPO) in NVIDIA Isaac Lab leverages privileged 2D LiDAR observations that account for the full robot footprint to learn robust navigation. The learned behavior is distilled into a student policy that relies solely on monocular depth maps predicted by a fine-tuned Depth Anything V2 model from four RGB cameras. The complete inference pipeline, comprising monocular depth estimation (MDE), policy execution, and motor control, runs entirely onboard an NVIDIA Jetson Orin AGX mounted on a DJI RoboMaster platform, requiring no external computation for inference. In simulation, the student achieves success rates of 82-96.5%, consistently outperforming the standard 2D LiDAR teacher (50-89%). In real-world experiments, the MDE-based student outperforms the 2D LiDAR teacher when navigating around obstacles with complex 3D geometries, such as overhanging structures and low-profile objects, that fall outside the single scan plane of a 2D LiDAR.
Geometric Inverse Flight Dynamics on SO(3) and Application to Tethered Fixed-Wing Aircraft
We present a robotics-oriented, coordinate-free formulation of inverse flight dynamics for fixed-wing aircraft on SO(3). Translational force balance is written in the world frame and rotational dynamics in the body frame; aerodynamic directions (drag, lift, side) are defined geometrically, avoiding local attitude coordinates. Enforcing coordinated flight (no sideslip), we derive a closed-form trajectory-to-input map yielding the attitude, angular velocity, and thrust-angle-of-attack pair, and we recover the aerodynamic moment coefficients component-wise. Applying such a map to tethered flight on spherical parallels, we obtain analytic expressions for the required bank angle and identify a specific zero-bank locus where the tether tension exactly balances centrifugal effects, highlighting the decoupling between aerodynamic coordination and the apparent gravity vector. Under a simple lift/drag law, the minimal-thrust angle of attack admits a closed form. These pointwise quasi-steady inversion solutions become steady-flight trim when the trajectory and rotational dynamics are time-invariant. The framework bridges inverse simulation in aeronautics with geometric modeling in robotics, providing a rigorous building block for trajectory design and feasibility checks.
comment: ACCEPTED ICUAS 2026
ViTaPEs: Visuotactile Position Encodings for Cross-Modal Alignment in Multimodal Transformers
Tactile sensing provides local essential information that is complementary to visual perception, such as texture, compliance, and force. Despite recent advances in visuotactile representation learning, challenges remain in fusing these modalities and generalizing across tasks and environments without heavy reliance on pre-trained vision-language models. Moreover, existing methods do not study positional encodings, thereby overlooking the multi-stage spatial reasoning needed to capture fine-grained visuotactile correlations. We introduce ViTaPEs, a transformer-based architecture for learning task-agnostic visuotactile representations from paired vision and tactile inputs. Our key idea is a two-stage positional injection: local (modality-specific) positional encodings are added within each stream, and a global positional encoding is added on the joint token sequence immediately before attention, providing a shared positional vocabulary at the stage where cross-modal interaction occurs. We make the positional injection points explicit and conduct controlled ablations that isolate their effect before a token-wise nonlinearity versus immediately before self-attention. Experiments on multiple large-scale real-world datasets show that ViTaPEs not only surpasses state-of-the-art baselines across various recognition tasks but also demonstrates zero-shot generalization to unseen, out-of-domain scenarios. We further demonstrate the transfer-learning strength of \emph{ViTaPEs} in a robotic grasping task, where it outperforms state-of-the-art baselines in predicting grasp success. Project page: https://sites.google.com/view/vitapes
Neural-Geometric Tunnel Traversal: Localization-free UAV Flight with Tilted LiDARs
Navigation of UAVs in challenging environments like tunnels or mines, where it is not possible to use GNSS methods to self-localize, illumination may be uneven or nonexistent, and wall features are likely to be scarce, is a complex task, especially if the navigation has to be done at high speed. In this paper we propose a novel proof-of-concept navigation technique for UAVs based on the use of LiDAR information through the joint use of geometric and machine-learning algorithms. The perceived information is processed by a deep neural network to establish the yaw of the UAV with respect to the tunnel's longitudinal axis, in order to adjust the direction of navigation. Additionally, a geometric method is used to compute the safest location inside the tunnel (i.e. the one that maximizes the distance to the closest obstacle). This information proves to be sufficient for simple yet effective navigation in straight and curved tunnels.
InCoM: Intent-Driven Perception and Structured Coordination for Mobile Manipulation
Mobile manipulation is a fundamental capability for general-purpose robotic agents, requiring both coordinated control of the mobile base and manipulator and robust perception under dynamically changing viewpoints. However, existing approaches face two key challenges: strong coupling between base and arm actions complicates control optimization, and perceptual attention is often poorly allocated as viewpoints shift during mobile manipulation. We propose InCoM, an intent-driven perception and structured coordination framework for mobile manipulation. InCoM infers latent motion intent to dynamically reweight multi-scale perceptual features, enabling stage-adaptive allocation of perceptual attention. To support robust cross-modal perception, InCoM further incorporates a geometric-semantic structured alignment mechanism that enhances multimodal correspondence. On the control side, we design a decoupled coordinated flow matching action decoder that explicitly models coordinated base-arm action generation, alleviating optimization difficulties caused by control coupling. Experimental results demonstrate that InCoM significantly outperforms state-of-the-art methods, achieving success rate gains of 28.2%, 26.1%, and 23.6% across three ManiSkill-HAB scenarios without privileged information. Furthermore, its effectiveness is consistently validated in real-world mobile manipulation tasks, where InCoM maintains a superior success rate over existing baselines.
EvolvingAgent: Curriculum Self-evolving Agent with Continual World Model for Long-Horizon Tasks
Completing Long-Horizon (LH) tasks in open-ended worlds is an important yet difficult problem for embodied agents. Existing approaches suffer from two key challenges: (1) they heavily rely on experiences obtained from human-created data or curricula, failing to autonomously update and select multimodal experiences, and (2) they may encounter catastrophic forgetting issues when faced with new tasks, failing to autonomously update world knowledge. To solve these challenges, this paper presents {\bf EvolvingAgent}, a curriculum self-evolving agent with a continual World Model (WM), which can autonomously complete various LH tasks across environments through self-planning, self-control, and self-reflection, without human intervention. Specifically, EvolvingAgent contains three modules, i.e., i) the experience-driven task planner, which uses an LLM along with multimodal experiences to convert LH tasks into executable sub-tasks; ii) the WM-guided action controller, which leverages WM to generate low-level actions and incorporates a self-verification mechanism to update multimodal experiences; iii) the Curriculum Learning (CL) -based reflector, which implements a two-stage CL algorithm to select multimodal experiences for task-adaptive WM updates. By building a planner-controller-reflector closed-loop dynamic, the continual WM for EvolvingAgent can autonomously update multimodal experiences and world knowledge. We conducted extensive experiments on Minecraft, compared with existing methods, EvolvingAgent can improve 111.74{\%} average success rate, reduce more than 6x ineffective actions, and generalize to the Atari environment with human-level performance.
A Virtual Mechanical Interaction Layer Enables Resilient Human-to-Robot Object Handovers
Object handover is a common form of interaction that is widely present in collaborative tasks. However, achieving it efficiently remains a challenge. We address the problem of ensuring resilient robotic actions that can adapt to complex changes in object pose during human-to-robot object handovers. We propose the use of Virtual Model Control to create an interaction layer that controls the robot and adapts to the dynamic changes in the handover process. Additionally, we propose the use of augmented reality to facilitate bidirectional communication between humans and robots during handovers. We assess the performance of our controller in a set of experiments that demonstrate its resilience to various sources of uncertainties, including complex changes to the object's pose during the handover. Finally, we performed a user study with 16 participants to understand human preferences for different robot control profiles and augmented reality visuals in object handovers. Our results showed a general preference for the proposed approach and revealed insights that can guide further development in adapting the interaction with the user.
comment: Accepted for publication in IEEE Robotics and Automation Letters (RA-L)
RetroMotion: Retrocausal Motion Forecasting Models are Instructable CVPR
Motion forecasts of road users (i.e., agents) vary in complexity depending on the number of agents, scene constraints, and interactions. In particular, the output space of joint trajectory distributions grows exponentially with the number of agents. Therefore, we decompose multi-agent motion forecasts into (1) marginal distributions for all modeled agents and (2) joint distributions for interacting agents. Using a transformer model, we generate joint distributions by re-encoding marginal distributions followed by pairwise modeling. This incorporates a retrocausal flow of information from later points in marginal trajectories to earlier points in joint trajectories. For each time step, we model the positional uncertainty using compressed exponential power distributions. Notably, our method achieves strong results in the Waymo Interaction Prediction Challenge and generalizes well to the Argoverse 2 and V2X-Seq datasets. Additionally, our method provides an interface for issuing instructions. We show that standard motion forecasting training implicitly enables the model to follow instructions and adapt them to the scene context. GitHub repository: https://github.com/kit-mrt/future-motion
comment: CVPRW26
Explainable Representation of Finite-Memory Policies for POMDPs using Decision Trees AAMAS 2026
Partially Observable Markov Decision Processes (POMDPs) are a fundamental framework for decision-making under uncertainty and partial observability. Since in general optimal policies may require infinite memory, they are hard to implement and often render most problems undecidable. Consequently, finite-memory policies are mostly considered instead. However, the algorithms for computing them are typically very complex, and so are the resulting policies. Facing the need for their explainability, we provide a representation of such policies, both (i) in an interpretable formalism and (ii) typically of smaller size, together yielding higher explainability. To that end, we combine models of Mealy machines and decision trees; the latter describing simple, stationary parts of the policies and the former describing how to switch among them. We design a translation for policies of the finite-state-controller (FSC) form from standard literature and show how our method smoothly generalizes to other variants of finite-memory policies. Further, we identify specific properties of recently used "attractor-based" policies, which allow us to construct yet simpler and smaller representations. Finally, we illustrate the higher explainability in a few case studies.
comment: Full version of the extended abstract accepted at AAMAS 2026
R2RGEN: Real-to-Real 3D Data Generation for Spatially Generalized Manipulation
Towards the aim of generalized robotic manipulation, spatial generalization is the most fundamental capability that requires the policy to work robustly under different spatial distribution of objects, environment and agent itself. To achieve this, substantial human demonstrations need to be collected to cover different spatial configurations for training a generalized visuomotor policy via imitation learning. Prior works explore a promising direction that leverages data generation to acquire abundant spatially diverse data from minimal source demonstrations. However, most approaches face significant sim-to-real gap and are often limited to constrained settings, such as fixed-base scenarios and predefined camera viewpoints. In this paper, we propose a real-to-real 3D data generation framework (R2RGen) that directly augments the pointcloud observation-action pairs to generate real-world data. R2RGen is simulator- and rendering-free, thus being efficient and plug-and-play. Specifically, we propose a unified three-stage framework, which (1) pre-processes source demonstrations under different camera setups in a shared 3D space with scene / trajectory parsing; (2) augments objects and robot's position with a group-wise backtracking strategy; (3) aligns the distribution of generated data with real-world 3D sensor using camera-aware post-processing. Empirically, R2RGen substantially enhances data efficiency on extensive experiments and demonstrates strong potential for scaling and application on mobile manipulation.
comment: Accepted to RSS 2026. Project page: https://r2rgen.github.io/
M2R2: MultiModal Robotic Representation for Temporal Action Segmentation
Temporal action segmentation (TAS) has long been a key area of research in both robotics and computer vision. In robotics, algorithms have primarily focused on leveraging proprioceptive information to determine skill boundaries, with recent approaches in surgical robotics incorporating vision. In contrast, computer vision typically relies on exteroceptive sensors, such as cameras. Existing multimodal TAS models in robotics integrate feature fusion within the model, making it difficult to reuse learned features across different models. Meanwhile, pretrained vision-only feature extractors commonly used in computer vision struggle in scenarios with limited object visibility. In this work, we address these challenges by proposing M2R2, a multimodal feature extractor tailored for TAS, which combines information from both proprioceptive and exteroceptive sensors. We introduce a novel training strategy that enables the reuse of learned features across multiple TAS models. Our method sets a new state-of-the-art performance on three robotic datasets REASSEMBLE, (Im)PerfectPour, and JIGSAWS. Additionally, we conduct an extensive ablation study to evaluate the contribution of different modalities in robotic TAS tasks.
comment: 8 pages, 6 figures, 2 tables
VLN-Cache: Enabling Token Caching for VLN Models with Visual/Semantic Dynamics Awareness
Vision-and-Language Navigation (VLN) increasingly relies on large vision-language models, but their inference cost conflicts with real-time deployment. Token caching is a promising training-free strategy that avoids redundant computation by reusing stable visual tokens across frames. However, existing methods assume a static camera and fixed semantic focus, assumptions that VLN fundamentally violates. We identify two failure modes: (1) visual dynamics, where viewpoint shift displaces token positions across frames, causing position-wise matching to pair misaligned content; (2) semantic dynamics, where token relevance shifts across task stages as navigation progresses, making cached states stale. We propose VLN-Cache, a visual-dynamic-aware and semantic-dynamic-aware caching framework that introduces view-aligned remapping to recover geometric correspondences and a task-relevance saliency filter to veto reuse at semantic transitions. A layer-adaptive entropy policy further balances the per-layer reuse budget. Experiments on the R2R-CE simulation benchmark show up to 1.52x speedup while maintaining competitive navigation success rates.
Source-Free Bistable Fluidic Gripper for Size-Selective and Stiffness-Adaptive Grasping
Conventional fluid-driven soft grippers typically depend on external sources, which limit portability and long-term autonomy. This work introduces a self-contained soft gripper with fixed size that operates solely through internal liquid redistribution among three interconnected bistable snap-through chambers. When the top sensing chamber deforms upon contact, the displaced liquid triggers snap-through expansion of the grasping chambers, enabling stable and size-selective grasping without continuous energy input. The internal hydraulic feedback further allows passive adaptation of gripping pressure to object stiffness. This source-free and compact design opens new possibilities for lightweight, stiffness-adaptive fluid-driven manipulation in soft robotics, providing a feasible approach for targeted size-specific sampling and operation in underwater and field environments.
CoFL: Continuous Flow Fields for Language-Conditioned Navigation
Existing language-conditioned navigation systems typically rely on modular pipelines or trajectory generators, but the latter use each scene--instruction annotation mainly to supervise one start-conditioned rollout. To address these limitations, we present CoFL, an end-to-end policy that maps a bird's-eye view (BEV) observation and a language instruction to a continuous flow field for navigation. CoFL reformulates navigation as workspace-conditioned field learning rather than start-conditioned trajectory prediction: it learns local motion vectors at arbitrary BEV locations, turning each scene--instruction annotation into dense spatial control supervision. Trajectories are generated from any start by numerical integration of the predicted field, enabling simple real-time rollout and closed-loop recovery. To enable large-scale training and evaluation, we build a dataset of over 500k BEV image--instruction pairs, each procedurally annotated with a flow field and a trajectory derived from semantic maps built on Matterport3D and ScanNet. Evaluating on strictly unseen scenes, CoFL significantly outperforms modular Vision-Language Model (VLM)-based planners and trajectory generation policies in both navigation precision and safety, while maintaining real-time inference. Finally, we deploy CoFL zero-shot in real-world experiments with BEV observations across multiple layouts, maintaining feasible closed-loop control and a high success rate.
comment: 18 pages, 13 figures
SD2AIL: Adversarial Imitation Learning from Synthetic Demonstrations via Diffusion Models
Adversarial Imitation Learning (AIL) is a dominant framework in imitation learning that infers rewards from expert demonstrations to guide policy optimization. Although providing more expert demonstrations typically leads to improved performance and greater stability, collecting such demonstrations can be challenging in certain scenarios. Inspired by the success of diffusion models in data generation, we propose SD2AIL, which utilizes synthetic demonstrations via diffusion models. We first employ a diffusion model in the discriminator to generate synthetic demonstrations as pseudo-expert data that augment the expert demonstrations. To selectively replay the most valuable demonstrations from the large pool of (pseudo-) expert demonstrations, we further introduce a prioritized expert demonstration replay strategy (PEDR). The experimental results on simulation tasks demonstrate the effectiveness and robustness of our method. In particular, in the Hopper task, our method achieves an average return of 3441, surpassing the state-of-the-art method by 89. Our code will be available at https://github.com/positron-lpc/SD2AIL.
comment: This paper has the following problems: Limited novelty, not clearly differentiated from existing methods/concepts; The level of experimental validation is limited; Sufficient serious structural, language, or other issues that impact the comprehensibility of the manuscript
OnSiteVRU: A High-Resolution Trajectory Dataset for High-Density Vulnerable Road Users
With the acceleration of urbanization and the growth of transportation demands, the safety of vulnerable road users (VRUs, such as pedestrians and cyclists) in mixed traffic flows has become increasingly prominent, necessitating high-precision and diverse trajectory data to support the development and optimization of autonomous driving systems. However, existing datasets fall short in capturing the diversity and dynamics of VRU behaviors, making it difficult to meet the research demands of complex traffic environments. To address this gap, this study developed the OnSiteVRU datasets, which cover a variety of scenarios, including intersections, road segments, and urban villages. These datasets provide trajectory data for motor vehicles, electric bicycles, and human-powered bicycles, totaling approximately 17,429 trajectories with a precision of 0.04 seconds. The datasets integrate both aerial-view natural driving data and onboard real-time dynamic detection data, along with environmental information such as traffic signals, obstacles, and real-time maps, enabling a comprehensive reconstruction of interaction events. The results demonstrate that VRU\_Data outperforms traditional datasets in terms of VRU density and scene coverage, offering a more comprehensive representation of VRU behavioral characteristics. This provides critical support for traffic flow modeling, trajectory prediction, and autonomous driving virtual testing. The dataset is publicly available for download at: https://www.kaggle.com/datasets/zcyan2/mixed-traffic-trajectory-dataset-in-from-shanghai.
Open-H-Embodiment: A Large-Scale Dataset for Enabling Foundation Models in Medical Robotics
Autonomous medical robots hold promise to improve patient outcomes, reduce provider workload, democratize access to care, and enable superhuman precision. However, autonomous medical robotics has been limited by a fundamental data problem: existing medical robotic datasets are small, single-embodiment, and rarely shared openly, restricting the development of foundation models that the field needs to advance. We introduce Open-H-Embodiment, the largest open dataset of medical robotic video with synchronized kinematics to date, spanning more than 49 institutions and multiple robotic platforms including the CMR Versius, Intuitive Surgical's da Vinci, da Vinci Research Kit (dVRK), Rob Surgical BiTrack, Virtual Incision's MIRA, Moon Surgical Maestro, and a variety of custom systems, spanning surgical manipulation, robotic ultrasound, and endoscopy procedures. We demonstrate the research enabled by this dataset through two foundation models. GR00T-H is the first open foundation vision-language-action model for medical robotics, which is the only evaluated model to achieve full end-to-end task completion on a structured suturing benchmark (25% of trials vs. 0% for all others) and achieves 64% average success across a 29-step ex vivo suturing sequence. We also train Cosmos-H-Surgical-Simulator, the first action-conditioned world model to enable multi-embodiment surgical simulation from a single checkpoint, spanning nine robotic platforms and supporting in silico policy evaluation and synthetic data generation for the medical domain. These results suggest that open, large-scale medical robot data collection can serve as critical infrastructure for the research community, enabling advances in robot learning, world modeling, and beyond.
comment: Project website: https://open-h.github.io/open-h-embodiment/
Distributional Stability of Tangent-Linearized Gaussian Inference on Smooth Manifolds
Gaussian inference on smooth manifolds is central to robotics, but exact marginalization and conditioning are generally non-Gaussian and geometry-dependent. We study tangent-linearized Gaussian inference and derive explicit non-asymptotic $W_2$ stability bounds for projection marginalization and surface-measure conditioning. The bounds separate local second-order geometric distortion from nonlocal tail leakage and, for Gaussian inputs, yield closed-form diagnostics from $(μ,Σ)$ and curvature/reach surrogates. Circle and planar-pushing experiments validate the predicted calibration transition near $\sqrt{\|Σ\|_{\mathrm{op}}}/R\approx 1/6$ and indicate that normal-direction uncertainty is the dominant failure mode when locality breaks. These diagnostics provide practical triggers for switching from single-chart linearization to multi-chart or sample-based manifold inference. Code and Jupyter notebooks are available at https://github.com/mikigom/StabilityTLGaussian.
comment: To appear in IEEE Robotics and Automation Letters (IEEE RA-L)
Language-Conditioned Safe Trajectory Generation for Spacecraft Rendezvous
Reliable real-time trajectory generation is essential for future autonomous spacecraft. While recent progress in nonconvex guidance and control is paving the way for onboard autonomous trajectory optimization, these methods still rely on extensive expert input (e.g., waypoints, constraints, mission timelines, etc.), which limits operational scalability in complex missions such as rendezvous and proximity operations. This paper introduces SAGES (Semantic Autonomous Guidance Engine for Space), a trajectory-generation framework that translates natural-language commands into spacecraft trajectories that reflect high-level intent while respecting nonconvex constraints. Experiments in two settings (fault-tolerant proximity operations with continuous-time constraint enforcement and a free-flying robotic platform) demonstrate that SAGES reliably produces trajectories aligned with human commands, achieving over 90% semantic-behavioral consistency across diverse behavior modes. Ultimately, this work marks an initial step toward language-conditioned, constraint-aware spacecraft trajectory generation, enabling operators to interactively guide both safety and behavior through intuitive natural-language commands with reduced expert burden. Project Website: https://semantic-guidance4space.github.io/
comment: 42 pages, 12 figures. Submitted to AIAA Journal of Guidance, Control, and Dynamics
CLAMP: Contrastive Learning for 3D Multi-View Action-Conditioned Robotic Manipulation Pretraining
Leveraging pre-trained 2D image representations in behavior cloning policies has achieved great success and has become a standard approach for robotic manipulation. However, such representations fail to capture the 3D spatial information about objects and scenes that is essential for precise manipulation. In this work, we introduce Contrastive Learning for 3D Multi-View Action-Conditioned Robotic Manipulation Pretraining (CLAMP), a novel 3D pre-training framework that utilizes point clouds and robot actions. From the merged point cloud computed from RGB-D images and camera extrinsics, we re-render multi-view four-channel image observations with depth and 3D coordinates, including dynamic wrist views, to provide clearer views of target objects for high-precision manipulation tasks. The pre-trained encoders learn to associate the 3D geometric and positional information of objects with robot action patterns via contrastive learning on large-scale simulated robot trajectories. During encoder pre-training, we pre-train a Diffusion Policy to initialize the policy weights for fine-tuning, which is essential for improving fine-tuning sample efficiency and performance. After pre-training, we fine-tune the policy on a limited amount of task demonstrations using the learned image and action representations. We demonstrate that this pre-training and fine-tuning design substantially improves learning efficiency and policy performance on unseen tasks. Furthermore, we show that CLAMP outperforms state-of-the-art baselines across six simulated tasks and five real-world tasks. The project website and videos can be found at https://clamp3d.github.io/CLAMP/.
comment: Accepted to the Robotics: Science and Systems (RSS) 2026
Electrostatic Clutch-Based Mechanical Multiplexer with Increased Force Capability
Robotic systems with many degrees of freedom (DoF) are constrained by the demands of dedicating a motor to each joint, and while mechanical multiplexing reduces actuator count, existing clutch designs are bulky, force-limited, or restricted to one output at a time. The problem addressed in this study is how to achieve high-force multiplexing that supports both simultaneous and sequential control from a single motor. Here we show an electrostatic capstan clutch-based transmission that enables both single-input-single-output (SISO) and single-input-multiple-output (SIMO) multiplexing. We demonstrated these on a four-DoF tendon-driven robotic hand where a single motor achieved output forces of up to 212 N, increased vertical grip strength by 4.09 times, and raised horizontal carrying capacity to 111.2 N, the highest currently among five-fingered tendon-driven robotic hands. These results demonstrate that electrostatic-based multiplexing provides versatile actuation, overcoming the limitations of prior systems.
A Compact Peristaltic Pump Based on Magneto-Elastic Hysteresis with Single Pneumatic Control
Pumping fluids is fundamental to a wide range of industrial, environmental, and biomedical applications. Among various pumping mechanisms, peristaltic pumps enable efficient and safe fluid transport by deforming an elastic tube without direct contact with the working fluid. Although previous studies have introduced mechanical, pneumatic, or magnetic actuations to drive membrane deformation, these approaches often lead to complex pump architectures and control schemes. In this study, we present a soft membrane pump that achieves peristaltic motion through a single pneumatic input combined with an embedded passive magnet. The actuation mechanism and system dynamics were analyzed and simplified through modeling. Numerical simulations were conducted to predict the internal fluid flow, and the magneto-elastic hysteresis behavior observed in the simulations was successfully validated by experiments with a proof-of-concept prototype.
comment: Submitted to IEEE CBS 2026. This work has been submitted to the IEEE for possible publication
Radar Odometry Subject to High Tilt Dynamics of Subarctic Environments
Rotating FMCW radar odometry methods often assume flat ground conditions. While this assumption is sufficient in many scenarios, including urban environments or flat mining setups, the highly dynamic terrain of subarctic environments poses a challenge to standard feature extraction and state estimation techniques. This paper benchmarks three existing radar odometry methods under demanding conditions, exhibiting up to 13° in pitch and 4° in roll difference between consecutive scans, with absolute pitch and roll reaching 30° and 8°, respectively. Furthermore, we propose a novel radar-inertial odometry method utilizing tilt-proximity submap search and a hard threshold for vertical displacement between scan points and the estimated axis of rotation. Experimental results demonstrate a state-of-the-art performance of our method on an urban baseline and a 0.3% improvement over the second-best comparative method on a 2-kilometer-long dynamic trajectory. Finally, we analyze the performance of the four evaluated methods on a complex radar sequence characterized by high lateral slip and a steep ditch traversal.
Multiagent Systems
Bian Que: An Agentic Framework with Flexible Skill Arrangement for Online System Operations
Operating and maintaining (O&M) large-scale online engine systems (search, recommendation, advertising) demands substantial human effort for release monitoring, alert response, and root cause analysis. While LLM-based agents are a natural fit for these tasks, the deployment bottleneck is not reasoning capability but orchestration: selecting, for each operational event, the relevant data (metrics, logs, change events) and the applicable operational knowledge (handbook rules and practitioner experience). Feeding all signals indiscriminately causes dilution and hallucination, while manually curating the event-to-(data, knowledge) mapping is intractable under dozens of daily releases. We present Bian Que, an agentic framework with three contributions: (i) a \emph{unified operational paradigm} abstracting day-to-day O&M into three canonical patterns: release interception, proactive inspection, and alert root cause analysis; (ii) \emph{Flexible Skill Arrangement}, where each Skill specifies which data and knowledge to retrieve for a given business-module context and can be automatically generated and updated by LLMs or iteratively refined through natural-language instructions from on-call engineers; (iii) a \emph{unified self-evolving mechanism} in which one correction signal drives two parallel pathways, case-memory-to-knowledge distillation and targeted Skill refinement. Deployed on the e-commerce search engine of KuaiShou, the major short-video platform in China, Bian Que reduces alert volume by 75%, achieves 80% root-cause analysis accuracy, and cuts mean time to resolution by over 50%. Our framework achieves 99.0% pass rate on offline evaluations. Our code is available at https://github.com/benchen4395/BianQue_Assistant.
comment: Codes are https://github.com/benchen4395/BianQue_Assistant
Preserving Disagreement: Architectural Heterogeneity and Coherence Validation in Multi-Agent Policy Simulation
Multi-agent deliberation systems using large language models (LLMs) are increasingly proposed for policy simulation, yet they suffer from artificial consensus: evaluator agents converge on the same option regardless of their assigned value perspectives. We present the AI Council, a three-phase deliberation framework, and conduct 120 deliberations across two policy scenarios to test two interventions. First, architectural heterogeneity (assigning a different 7-9B parameter model to each value perspective) significantly reduces first-choice concentration compared to a homogeneous baseline (child welfare: 70.9% to 46.1%, p < 0.001, r = 0.58; housing: 46.0% to 22.9%, p < 0.001, r = 0.50). This contrasts with accuracy-oriented multi-agent debate, where heterogeneity does not reduce convergence, suggesting model diversity operates differently when no objectively correct answer exists. Second, coherence validation (using a frontier model to assess whether each evaluator's reasoning is grounded in its assigned values) reveals a fidelity-diversity tradeoff: on a scenario with a dominant option, it further reduces concentration (46.1% to 40.8%, p = 0.004), but on a scenario with genuinely competitive options, it increases concentration (22.9% to 26.6%, p = 0.96) by amplifying high-coherence evaluators who cluster on one option. This tradeoff may be a general property of multi-agent systems employing quality weighting. We report negative results from three failed Delphi designs, demonstrate that 8B models exhibit binary rather than graded responses to counter-arguments, and propose the trustworthy tension rate as a diagnostic measure of small-model deliberation capabilities.
comment: 14 pages, 7 tables, 120 deliberations across 2 policy scenarios
AGEL-Comp: A Neuro-Symbolic Framework for Compositional Generalization in Interactive Agents
Large Language Model (LLM)-based agents exhibit systemic failures in compositional generalization, limiting their robustness in interactive environments. This work introduces AGEL-Comp, a neuro-symbolic AI agent architecture designed to address this challenge by grounding actions of the agent. AGEL-Comp integrates three core innovations: (1) a dynamic Causal Program Graph (CPG) as a world model, representing procedural and causal knowledge as a directed hypergraph; (2) an Inductive Logic Programming (ILP) engine that synthesizes new Horn clauses from experiential feedback, grounding symbolic knowledge through interaction; and (3) a hybrid reasoning core where an LLM proposes a set of candidate sub-goals that are verified for logical consistency by a Neural Theorem Prover (NTP). Together, these components operationalize a deduction--abduction learning cycle: enabling the agent to deduce plans and abductively expand its symbolic world model, while a neural adaptation phase keeps its reasoning engine aligned with new knowledge. We propose an evaluation protocol within the \texttt{Retro Quest} simulation environment to probe for compositional generalization scenarios to evaluate our AGEL agent. Our findings clearly indicate the better performance of our AGEL model over pure LLM-based models. Our framework presents a principled path toward agents that build an explicit, interpretable, and compositionally structured understanding of their world.
comment: Accepted at IntelliSys 2026
Split over $n$ resource sharing problem: Are fewer capable agents better than many simpler ones?
In multi-agent systems, should limited resources be concentrated into a few capable agents or distributed among many simpler ones? This work formulates the split over $n$ resource sharing problem where a group of $n$ agents equally shares a common resource (e.g., monetary budget, computational resources, physical size). We present a case study in multi-agent coverage where the area of the disk-shaped footprint of agents scales as $1/n$. A formal analysis reveals that the initial coverage rate grows with $n$. However, if the speed of agents decreases proportionally with their radii, groups of all sizes perform equally well, whereas if it decreases proportionally with their footprints, a single agent performs best. We also present computer simulations in which resource splitting increases the failure rates of individual agents. The models and findings help identify optimal distributiveness levels and inform the design of multi-agent systems under resource constraints.
comment: Short paper presented at the 15th International Conference on Swarm Intelligence (ANTS 2026)
When Agents Shop for You: Role Coherence in AI-Mediated Markets
Consumers are increasingly delegating purchase decisions to AI agents, providing natural-language descriptions of their preferences and identity. We argue that these representations constitute an information channel, role coherence, through which sellers can infer willingness to pay without explicit disclosure by the buyer agent, leading to preference leakage. In an experiment where a language-model buyer agent shops on behalf of a verbal consumer profile, we show that seller-side inference from dialogue alone recovers willingness to pay nearly one-for-one. Comparing this setting to a numeric-budget condition with confidentiality instructions cleanly isolates role coherence as distinct from instruction-following failure. Because this leakage arises from delegation itself, it cannot be mitigated at the prompt level. Instead, we propose architectural interventions that trade off personalization against preference privacy.
Reinforced Agent: Inference-Time Feedback for Tool-Calling Agents
Tool-calling agents are evaluated on tool selection, parameter accuracy, and scope recognition, yet LLM trajectory assessments remain inherently post-hoc. Disconnected from the active execution loop, such assessments identify errors that are usually addressed through prompt-tuning or retraining, and fundamentally cannot course-correct the agent in real time. To close this gap, we move evaluation into the execution loop at inference time: a specialized reviewer agent evaluates provisional tool calls prior to execution, shifting the paradigm from post-hoc recovery to proactive evaluation and error mitigation. In practice, this architecture establishes a clear separation of concerns between the primary execution agent and a secondary review agent. As with any multi-agent system, the reviewer can introduce new errors while correcting others, yet no prior work to our knowledge has systematically measured this tradeoff. To quantify this tradeoff, we introduce Helpfulness-Harmfulness metrics: helpfulness measures the percentage of base agent errors that feedback corrects; harmfulness measures the percentage of correct responses that feedback degrades. These metrics directly inform reviewer design by revealing whether a given model or prompt provides net positive value. We evaluate our approach on BFCL (single-turn) and Tau2-Bench (multi-turn stateful scenarios), achieving +5.5% on irrelevance detection and +7.1% on multi-turn tasks. Our metrics reveal that reviewer model choice is critical: the reasoning model o3-mini achieves a 3:1 benefit-to-risk ratio versus 2.1:1 for GPT-4o. Automated prompt optimization via GEPA provides an additional +1.5-2.8%. Together, these results demonstrate a core advantage of separating execution and review: the reviewer can be systematically improved through model selection and prompt optimization, without retraining the base agent.
When Roles Fail: Epistemic Constraints on Advocate Role Fidelity in LLM-Based Political Statement Analysis
Democratic discourse analysis systems increasingly rely on multi-agent LLM pipelines in which distinct evaluator models are assigned adversarial roles to generate structured, multi-perspective assessments of political statements. A core assumption is that models will reliably maintain their assigned roles. This paper provides the first systematic empirical test of that assumption using the TRUST pipeline. We develop an epistemic stance classifier that identifies advocate roles from reasoning text without relying on surface vocabulary, and measure role fidelity across 60 political statements (30 English, 30 German) using four metrics: Role Drift Index (RDI), Expected Drift Distance (EDD), Directional Drift Index (DDI), and Entropy-based Role Stability (ERS). We identify two failure modes - the Epistemic Floor Effect (fact-check results create an absolute lower bound below which the legitimizing role cannot be maintained) and Role-Prior Conflict (training-time knowledge overrides role instructions for factually unambiguous statements) - as manifestations of a single mechanism: Epistemic Role Override (ERO). Model choice significantly affects role fidelity: Mistral Large outperforms Claude Sonnet by 28pp (67% vs. 39%) and exhibits a qualitatively different failure mode - role abandonment without polarity reversal - compared to Claude's active switch to the opposing stance. Role fidelity is language-robust. Fact-check provider choice is not universally neutral: Perplexity significantly reduces Claude's role fidelity on German statements (Delta = -15pp, p = 0.007) while leaving Mistral unaffected. These findings have direct implications for multi-agent LLM validation: a system validated without role fidelity measurement may systematically misrepresent the epistemic diversity it was designed to provide.
comment: 22 pages
Nothing Deceives Like Success: Social Learning and the Illusion of Understanding in Science
Success-driven social learning, in which individuals preferentially adopt the ideas and methods that appear most successful, is a foundational principle of collective behavior across systems ranging from ant colonies to scientific communities. But science is a particular kind of collective search -- one in which the quality of an explanation is itself difficult to assess. Is success bias adaptive in this setting? In agent-based simulations of collective theory building, we find that it is not. Scientists in our model systematically overestimate the quality of their own theories, creating an illusion of understanding: a persistent gap between perceived and actual performance. Success bias amplifies this illusion; communities that favor apparently successful theories explore a narrower range of possibilities, efficiently filtering out poor explanations but failing to discover better ones. This effect intensifies with problem complexity, as scientists in more complex environments become increasingly unable to assess how well their theories actually perform. Most strikingly, when agents optimize their social behavior to maximize the perceived success of their theories, they paradoxically undermine their actual performance, and produce levels of inequality that mirror those found in real scientific communities.
comment: 36 pages, 7 figures
A High-Throughput Compute-Efficient POMDP Hide-And-Seek-Engine (HASE) for Multi-Agent Operations
Reinforcement Learning (RL) algorithms exhibit high sample complexity, particularly when applied to Decentralized Partially Observable Markov Decision Processes (Dec-POMDPs). As a response, projects such as SampleFactory, EnvPool, Brax, and IsaacLab migrate parallel execution of classic environments such as MuJoCo and Atari into C++ thread pools or the GPU to decrease the computational cost of environment steps. We are interested in optimizing the decision-level of human-AI joint operations, so we introduce a compute-efficient Dec-POMDP engine natively architected in C++ called Hide-And-Seek-Engine. By employing Data-Oriented Design (DOD) principles, explicit 64-byte cache-line alignment to remove false sharing, and a zero-copy PyTorch memory bridge using pinned memory and Direct Memory Access (DMA), our engine sustains throughput of up to 33,000,000 steps per second (SPS) in a single-agent, 1024-environment, decentralized observations on an AMD Ryzen 9950X (16 cores). Ten agents reduces FPS to 7M SPS with generating random actions contributing 1/3rd the total runtime for reference. The engine achieves a throughput increase of approximately 3,500$\times$ over the baseline single threaded vectorized NumPy implementation and successfully trains cooperative multi-agent policies via PPO, DQN, and SAC in minutes, validating both its performance and generality.
comment: 21 pages, 10 figures, 5 tables. Includes appendix
Agent Name Service (ANS): A Proof-of-Concept Trust Layer for Secure AI Agent Discovery, Identity, and Governance in Kubernetes
Autonomous AI agent ecosystems require stronger mechanisms for secure discovery, identity verification, capability attestation, and policy governance. Current deployments frequently lack (1) uniform agent discovery, (2) cryptographic agent authentication, (3) capability proofs that protect secrets, and (4) enforceable policy controls. This paper presents an implementation-oriented proof of concept for the Agent Name Service (ANS), a DNS-inspired trust layer for AI agent discovery and interoperability in Kubernetes, grounded in the ANS protocol specification~\cite{huang2025ans}. The implementation uses Decentralized Identifiers (DIDs), Verifiable Credentials (VCs), policy-as-code enforcement with Open Policy Agent (OPA), and Kubernetes-native integration patterns (CRDs, admission controls, service mesh integration). In a demo research environment (3-node cluster, 50-agent workflow simulation), we observe sub-10ms response in demonstrated service paths and full success for scripted demo deployment scenarios. We explicitly scope these findings as proof-of-concept evidence rather than production certification. We further provide a threat model, assumptions, and limitations to separate implemented evidence from protocol-defined and roadmap capabilities. The result is an evidence-grounded pathway from ANS protocol concepts to reproducible engineering practice for secure multi-agent systems.
comment: 9 pages, 2 figures
Ambient Persuasion in a Deployed AI Agent: Unauthorized Escalation Following Routine Non-Adversarial Content Exposure
We report a safety incident in a deployed multi-agent research system in which a primary AI agent installed 107 unauthorized software components, overwrote a system registry, overrode a prior negative decision from an oversight agent, and escalated through increasingly privileged operations up to an attempted system administrator command. The incident was preceded not by an adversarial attack but by routine content: a forwarded technology article written for human developers and shared by the principal investigator for discussion. The agent operated in a permissive environment, with unrestricted shell access, soft behavioral guidelines containing genuinely conflicting instructions, and no machine-enforced installation policy, and had recommended installing the same tool six hours earlier before being told to stand down. We analyze the behavioral cascade, the control boundaries that failed, and the limitations of multi-agent oversight in detecting and remediating the damage. We use directive weighting error as a descriptive interpretation of the observed failure and ambient persuasion as a provisional analytic label for the broader trigger configuration of non-adversarial environmental content preceding unauthorized agent action. The case highlights ethical and governance implications for deployed agent systems: ambiguous conversational cues are insufficient authorization for consequential actions, prior refusals must persist as enforceable constraints rather than message-level reminders, and oversight mechanisms require systematic post-incident auditing in addition to routine monitoring.
The Cost of Consensus: Isolated Self-Correction Prevails Over Unguided Homogeneous Multi-Agent Debate
Multi-agent debate, where teams of LLMs iteratively exchange rationales and vote on answers, is widely deployed under the assumption that peer review filters hallucinations. Yet the failure dynamics of homogeneous debate remain poorly understood, therefore we report findings from a controlled empirical study of teams of $N{=}10$ homogeneous agents (Qwen2.5-7B, Llama-3.1-8B, Ministral-3-8B) across $R{=}3$ debate rounds on two high-difficulty benchmarks (GSM-Hard and MMLU-Hard). We compare peer debate against isolated self-correction and a stochastic noise control that injects rationales from unrelated problems. We decompose debate failure into three model-dependent pathways: sycophantic conformity, where agents uncritically adopt majority answers (modal adoption up to 85.5%); contextual fragility, where peer rationales destabilize previously correct reasoning (vulnerability rate up to 70.0%); and consensus collapse, where plurality voting discards correct answers already present in the generation pool (oracle gap up to 32.3 percentage points). Ablations over communication density ($K \in \{2,4,9\}$) and sampling temperature ($T \in \{0.4, 0.7\}$) show that conformity reaches high levels at minimal peer exposure ($K{=}2$) and intensifies with greater initial diversity. Across all configurations, debate consumes 2.1-3.4$\times$ more tokens (up to 28,631 tokens per problem) than self-correction for equal or lower accuracy. Our results indicate that, within the 7-8B parameter class, homogeneous teams without structured roles do not benefit from unguided peer exchange, and that isolated self-correction consistently offers a more favorable cost-accuracy tradeoff.
comment: 19 pages, ACM Conference on AI and Agentic Systems
SiriusHelper: An LLM Agent-Based Operations Assistant for Big Data Platforms
Big data platforms are widely used in modern enterprises, and an in-production intelligent assistant is increasingly important to help users quickly find actionable guidance and reduce operational burden. While recent LLM+RAG assistants provide a natural interface, they face practical challenges in real deployments: limited scenario coverage across both general consultation and domain-specific troubleshooting workflows, inefficient knowledge access due to inadequate multi-hop retrieval and flat knowledge organization, and high maintenance cost because escalated tickets are unstructured and hard to convert into assistant improvements and reusable SOPs. In this paper, we present SiriusHelper, a deployed intelligent assistant for big data platforms. SiriusHelper serves as a unified online assistant that automatically identifies user intent and routes queries to the right handling path, including dedicated expert workflows for specialized scenarios (e.g., SQL execution diagnosis). To support complex troubleshooting, SiriusHelper combines a DeepSearch-driven mechanism with a priority-based hierarchical knowledge base to enable multi-hop retrieval without context overload, thus improving answer reliability and latency. To reduce expert overhead, SiriusHelper further introduces automated ticket understanding and SOP distillation: it diagnoses the assistant failure reason (e.g., missing knowledge or wrong routing) and extracts domain-specific SOPs to continuously enrich the knowledge base. Experiments and online deployment on Tencent Big Data platform show that SiriusHelper outperforms representative alternatives and reduces online ticket volume by 20.8\%.
A Decision-Theoretic Formalisation of Steganography With Applications to LLM Monitoring
Large language models are beginning to show steganographic capabilities. Such capabilities could allow misaligned models to evade oversight mechanisms. Yet principled methods to detect and quantify such behaviours are lacking. Classical definitions of steganography, and detection methods based on them, require a known reference distribution of non-steganographic signals. For the case of steganographic reasoning in LLMs, knowing such a reference distribution is not feasible; this renders these approaches inapplicable. We propose an alternative, \textbf{decision-theoretic view of steganography}. Our central insight is that steganography creates an asymmetry in usable information between agents who can and cannot decode the hidden content (present within a steganographic signal), and this otherwise latent asymmetry can be inferred from the agents' observable actions. To formalise this perspective, we introduce generalised $\mathcal{V}$-information: a utilitarian framework for measuring the amount of usable information within some input. We use this to define the \textbf{steganographic gap} -- a measure that quantifies steganography by comparing the downstream utility of the steganographic signal to agents that can and cannot decode the hidden content. We empirically validate our formalism, and show that it can be used to detect, quantify, and mitigate steganographic reasoning in LLMs.
comment: First two authors contributed equally
DC-Ada: Reward-Only Decentralized Sensor Adaptation for Heterogeneous Multi-Robot Teams
Heterogeneity is a defining feature of deployed multi-robot teams: platforms often differ in sensing modalities, ranges, fields of view, and failure patterns. Controllers trained under nominal sensing can degrade sharply when deployed on robots with missing or mismatched sensors, even when the task and action interface are unchanged. We present DC-Ada, a reward-only decentralized adaptation method that keeps a pretrained shared policy frozen and instead adapts compact per-robot observation transforms to map heterogeneous sensing into a fixed inference interface. DC-Ada is gradient-free and communication-minimal: it uses budgeted accept/reject random search with short common-random-number rollouts under a strict step budget. We evaluate DC-Ada against four baselines in a deterministic 2D multi-robot simulator covering warehouse logistics, search and rescue, and collaborative mapping, across four heterogeneity regimes (H0--H3) and five seeds with a matched budget of $200{,}000$ joint environment steps per run. Results show that heterogeneity can substantially degrade a frozen shared policy and that no single mitigation dominates across all tasks and metrics. Observation normalization is strongest for reward robustness in warehouse logistics and competitive in search and rescue, while the frozen shared policy is strongest for reward in collaborative mapping. DC-Ada offers a useful complementary operating point: it improves completion most clearly in severe coverage-based mapping while requiring only scalar team returns and no policy fine-tuning or persistent communication. These results position DC-Ada as a practical deploy-time adaptation method for heterogeneous teams.
The Alignment Flywheel: A Governance-Centric Hybrid MAS for Architecture-Agnostic Safety AAMAS 2026
Multi-agent systems provide mature methodologies for role decomposition, coordination, and normative governance, capabilities that remain essential as increasingly powerful autonomous decision components are embedded within agent-based systems. While learned and generative models substantially expand system capability, their safety behavior is often entangled with training, making it opaque, difficult to audit, and costly to update after deployment. This paper formalizes the Alignment Flywheel as a governance-centric hybrid MAS architecture that decouples decision generation from safety governance. A Proposer, representing any autonomous decision component, generates candidate trajectories, while a Safety Oracle returns raw safety signals through a stable interface. An enforcement layer applies explicit risk policy at runtime, and a governance MAS supervises the Oracle through auditing, uncertainty-driven verification, and versioned refinement. The central engineering principle is patch locality: many newly observed safety failures can be mitigated by updating the governed oracle artifact and its release pipeline rather than retracting or retraining the underlying decision component. The architecture is implementation-agnostic with respect to both the Proposer and the Safety Oracle, and specifies the roles, artifacts, protocols, and release semantics needed for runtime gating, audit intake, signed patching, and staged rollout across distributed deployments. The result is a hybrid MAS engineering framework for integrating highly capable but fallible autonomous systems under explicit, version-controlled, and auditable oversight.
comment: Accepted for the EMAS workshop at AAMAS 2026
Impacts of Electric Vehicle Charging Regimes and Infrastructure Deployments on System Performance: An Agent-Based Study
The rapid growth of electric vehicles (EVs) requires more effective charging infrastructure planning. Infrastructure layout not only determines deployment cost, but also reshapes charging behavior and influences overall system performance. In addition, destination charging and en-route charging represent distinct charging regimes associated with different power requirements, which may lead to substantially different infrastructure deployment outcomes. This study applies an agent-based modeling framework to generate trajectory-level latent public charging demand under three charging regimes based on a synthetic representation of the Melbourne (Australia) metropolitan area. Two deployment strategies, an optimization-based approach and a utilization-refined approach, are evaluated across different infrastructure layouts. Results show that utilization-refined deployments reduce total system cost, accounting for both infrastructure deployment cost and user generalized charging cost, with the most significant improvement observed under the combined charging regime. In particular, a more effective allocation of AC slow chargers reshapes destination charging behavior, which in turn reduces unnecessary reliance on en-route charging and lowers detour costs associated with en-route charging. This interaction highlights the behavioral linkage between destination and en-route charging regimes and demonstrates the importance of accounting for user response and multiple charging regimes in charging infrastructure planning.
comment: 7 pages, 4 figures
Open Challenges in Multi-Agent Security: Towards Secure Systems of Interacting AI Agents
AI agents are beginning to interact with each other directly and across internet platforms and physical environments, creating security challenges beyond traditional cybersecurity and AI safety frameworks. Free-form protocols are essential for AI's task generalization but enable new threats like secret collusion and coordinated swarm attacks. Network effects can rapidly spread privacy breaches, disinformation, jailbreaks, and data poisoning, while multi-agent dispersion and stealth optimization help adversaries evade oversight - creating novel persistent threats at a systemic level. Despite their critical importance, these security challenges remain understudied, with research fragmented across disparate fields including AI security, multi-agent learning, complex systems, cybersecurity, game theory, distributed systems, and technical AI governance. We introduce multi-agent security, a new field dedicated to securing networks of AI agents against threats that emerge or amplify through their interactions - whether direct or indirect via shared environments - with each other, humans, and institutions, and characterise fundamental security-utility and security-security trade-offs across both distributed and decentralised settings. Our preliminary work (1) taxonomizes the threat landscape arising from interacting AI agents, (2) offers applications to multi-agent security for work across diffuse subfields, and (3) proposes a unified research agenda addressing open challenges in designing secure agent systems and interaction environments. By identifying these gaps, we aim to guide research in this critical area to unlock the socioeconomic potential of large-scale agent deployment, foster public trust, and mitigate national security risks in critical infrastructure and defense contexts.
Don't Retrieve, Navigate: Distilling Enterprise Knowledge into Navigable Agent Skills for QA and RAG
Retrieval-Augmented Generation (RAG) grounds LLM responses in external evidence but treats the model as a passive consumer of search results: it never sees how the corpus is organized or what it has not yet retrieved, limiting its ability to backtrack or combine scattered evidence. We present Corpus2Skill, which distills a document corpus into a hierarchical skill directory offline and lets an LLM agent navigate it at serve time. The compilation pipeline iteratively clusters documents, generates LLM-written summaries at each level, and materializes the result as a tree of navigable skill files. At serve time, the agent receives a bird's-eye view of the corpus, drills into topic branches via progressively finer summaries, and retrieves full documents by ID. Because the hierarchy is explicitly visible, the agent can reason about where to look, backtrack from unproductive paths, and combine evidence across branches. On WixQA, an enterprise customer-support benchmark for RAG, Corpus2Skill outperforms dense retrieval, RAPTOR, and agentic RAG baselines across all quality metrics. We further evaluate generalization on nine RAGBench subsets reformulated as retrieval-stress benchmarks: Corpus2Skill attains the highest macro-average F1 across the full 10-dataset suite and characterizes a clear regime -- single-domain, atomic-document corpora -- where corpus navigation is the right primitive, while flat retrieval remains preferable for open-domain or extractive pools.
Principled Learning-to-Communicate with Quasi-Classical Information Structures
Learning-to-communicate (LTC) in partially observable environments has received increasing attention in deep multi-agent reinforcement learning, where the control and communication strategies are jointly learned. Meanwhile, the impact of communication on decision-making has been extensively studied in control theory. In this paper, we seek to formalize and better understand LTC by bridging these two lines of work, through the lens of information structures (ISs). To this end, we formalize LTC in decentralized partially observable Markov decision processes (Dec-POMDPs) under the common-information-based framework from decentralized stochastic control, and classify LTC problems based on the ISs before (additional) information sharing. We first show that non-classical LTCs are computationally intractable in general, and thus focus on quasi-classical (QC) LTCs. We then propose a series of conditions for QC LTCs, under which LTC preserves the QC IS after information sharing, whereas violating them can cause computational hardness in general. Further, we develop provable planning and learning algorithms for QC LTCs, and establish quasi-polynomial time and sample complexities for several QC LTC examples that satisfy the above conditions. Along the way, we also establish new results on a relationship between (strictly) QC IS and the condition of having strategy-independent common-information-based beliefs (SI-CIBs), as well as on solving Dec-POMDPs without computationally intractable oracles but beyond those with SI-CIBs, which may be of independent interest.
comment: Preliminary version appeared at IEEE CDC 2025
Frontier Coding Agents Can Now Implement an AlphaZero Self-Play Machine Learning Pipeline For Connect Four That Performs Comparably to an External Solver
Forecasting when AI systems will become capable of meaningfully accelerating AI research is a central challenge for AI safety. Existing benchmarks measure broad capability growth, but may not provide ample early warning signals for recursive self-improvement. We propose measuring AI's capability to autonomously implement end-to-end machine learning pipelines from past AI research breakthroughs, given a minimal task description. By providing a concise task description instead of the full prior work as reference, we hope to better elicit emerging AI research taste. We introduce a proof-of-concept benchmark in which frontier coding agents autonomously implement an AlphaZero-style machine learning pipeline for Connect Four on consumer hardware within a three-hour budget, and we evaluate the resulting game AIs in a round-robin tournament anchored to the Pascal Pons Connect Four solver. Across four agents with eight trials each, we find substantial differentiation: Claude Opus 4.7 won as first-mover against Pons in seven of eight trials, statistically significantly better than the other agents tested, none of which exceeded two of eight. The task, which no frontier agent could reliably complete when we began development in January of 2026, is now near-saturation. Our evaluation also surfaced anomalous behavior in GPT-5.4, which consistently used far less of its allocated time budget than other agents. A follow-up 16-trial probe using shorter, less evaluation-coded prompts substantially increased GPT-5.4's time-budget usage, consistent with but not diagnostic of sandbagging; Bradley-Terry ratings across probe conditions showed only directional differences, despite significant differences in time-budget usage. We release our data, code, and prompts to support reproduction and extension.
Designing Digital Humans with Ambient Intelligence
Digital humans are lifelike virtual agents capable of natural conversation and are increasingly deployed in domains like retail and finance. However, most current digital humans operate in isolation from their surroundings and lack contextual awareness beyond the dialogue itself. We address this limitation by integrating ambient intelligence (AmI) - i.e., environmental sensors, IoT data, and contextual modeling - with digital human systems. This integration enables situational awareness of the user's environment, anticipatory and proactive assistance, seamless cross-device interactions, and personalized long-term user support. We present a conceptual framework defining key roles that AmI can play in shaping digital human behavior, a design space highlighting dimensions such as proactivity levels and privacy strategies, and application-driven patterns with case studies in financial and retail services. We also discuss an architecture for ambient-enabled digital humans and provide guidelines for responsible design regarding privacy and data governance. Together, our work positions ambient intelligent digital humans as a new class of interactive agents powered by AI that respond not only to users' queries but also to the context and situations in which the interaction occurs.
Systems and Control (EESS)
Recent Advances in mm-Wave and Sub-THz/THz Oscillators for FutureG Technologies
This paper provides a concise yet comprehensive review of recent advancements in millimeter-wave (mm-wave) oscillators below 100 GHz and sub-terahertz (sub-THz/THz) oscillators above 100 GHz for next-generation computing and communication systems, including 5G, 6G, and beyond. Various design approaches, including CMOS, SiGe, and III-V semiconductor technologies, are explored in terms of performance metrics such as phase noise, output power, efficiency, frequency tunability, and stability. The review highlights key challenges in achieving high-performance and reliable oscillator designs while discussing emerging techniques for performance enhancement. By evaluating recent design trends, this work aims to offer valuable insights and design guidelines that facilitate the development of robust mm-wave and sub-THz/THz oscillators for future communication, computing, and sensing applications.
Safe Navigation using Neural Radiance Fields via Reachable Sets
Safe navigation in cluttered environments is an important challenge for autonomous systems. Robots navigating through obstacle ridden scenarios need to be able to navigate safely in the presence of obstacles, goals, and ego objects of varying geometries. In this work, reachable set representations of the robot's real-time capabilities in the state space can be utilized to capture safe navigation requirements. While neural radiance fields (NeRFs) are utilized to compute, store, and manipulate the volumetric representations of the obstacles, or ego vehicle, as needed. Constrained optimal control is employed to represent the resulting path planning problem, involving linear matrix inequality constraints. We present simulation results for path planning in the presence of numerous obstacles in two different scenarios. Safe navigation is demonstrated through using reachable sets in the corresponding constrained optimal control problems.
comment: 5 pages, 8 figures, 2026 4th International Conference on Mechatronics, Control and Robotics (ICMCR)
Stochastic Entanglement of Deterministic Origami Tentacles For Universal Robotic Gripping
Origami-inspired robotic grippers have shown promising potential for object manipulation tasks due to their compact volume and mechanical flexibility. However, robust capture of objects with random shapes in dynamic working environments often comes at the cost of additional actuation channels and control complexity. Here, we introduce a tendon-driven origami tentacle gripper capable of universal object gripping by exploiting a synergy between local, deterministic deformation programming and global, stochastic entanglements. Each origami tentacle is made by cutting thin Mylar sheets; It features carefully placed holes for routing an actuation tendon, origami creases for controlling the deformation, and a tapered shape. By tailoring these design features, one can prescribe the shrinking, bending, and twisting deformation, eventually creating deterministic coiling with a simple tendon pull. Then, when multiple coiling tentacles are placed in proximity, stochastic entanglement emerges, allowing the tentacles to braid, knot, and grip objects with random shapes. We derived a simulation model by integrating origami mechanics with Cosserat rods to correlate origami design, tendon deformation, and their collective gripping performance. Then, we experimentally tested how these coiling and entangling origami tentacles can grasp objects under gravity and in water. A stow-and-release deployment mechanism was also tested to simulate in-orbit grasping. Overall, the entertaining origami tentacle gripper presents a new strategy for robust object grasping with simple design and actuation.
Spectral Boundary Observer for Counter-Flow Heat Exchangers
We consider a system of two coupled first-order linear hyperbolic partial differential equations modeling heat transport in a counter-flow heat exchanger: one equation describes the transport of a hot fluid, and the other the transport of a cold fluid in the opposite direction. For this system, we design a boundary observer that uses only the temperature of the cold fluid measured at one boundary. Our approach is spectral: by assigning the spectrum of the operator governing the observation error dynamics to a prescribed region within the open left-half complex plane, we can freely tune the convergence rate of the observation error to zero in the $L^2$ norm. The main technical contribution is the proof that spectral stability, that is, the location of the spectrum in the open left-half plane, is equivalent to $L^2$ exponential stability of the origin for the observation error dynamics. This equivalence is established by showing that the operator governing the observation error dynamics satisfies the so-called spectral mapping property.
comment: This paper has been submitted to CDC'2026
Uncertainty-Aware Predictive Safety Filters for Probabilistic Neural Network Dynamics
Predictive safety filters (PSFs) leverage model predictive control to enforce constraint satisfaction during deep reinforcement learning (RL) exploration, yet their reliance on first-principles models or Gaussian processes limits scalability and broader applicability. Meanwhile, model-based RL (MBRL) methods routinely employ probabilistic ensemble (PE) neural networks to capture complex, high-dimensional dynamics from data with minimal prior knowledge. However, existing attempts to integrate PEs into PSFs lack rigorous uncertainty quantification. We introduce the Uncertainty-Aware Predictive Safety Filter (UPSi), a PSF that provides rigorous safety predictions using PE dynamics models by formulating future outcomes as reachable sets. UPSi introduces an explicit certainty constraint that prevents model exploitation and integrates seamlessly into common MBRL frameworks. We evaluate UPSi within Dyna-style MBRL on standard safe RL benchmarks and report substantial improvements in exploration safety over prior neural network PSFs while maintaining performance on par with standard MBRL. UPSi bridges the gap between the scalability and generality of modern MBRL and the safety guarantees of predictive safety filters.
PM-EKF: A Physiological Model-Based Extended Kalman Filter for Daily-Life Physical Activity Energy Expenditure Estimation
Monitoring physical activity energy expenditure (PAEE) in daily life is essential for characterizing individual health and metabolic status. Although indirect calorimetry provides gold-standard PAEE measurements, it is impractical for continuous daily-life monitoring. Consequently, wearable sensing approaches using inertial measurement units (IMUs) and heart rate (HR) sensors have attracted substantial interest. However, most existing IMU- and HR-based methods are purely data-driven and offer limited physiological interpretability. In this work, we propose a simplified physiological model that explicitly links body movement during activities of daily living to the underlying metabolic gas-exchange processes governing PAEE. The model is formulated as a nonlinear state-space system and embedded within an Extended Kalman Filter (EKF), enabling principled handling of measurement noise, model uncertainty, and system nonlinearities. The proposed framework provides personalized, interpretable PAEE estimates without employing black-box models. Our model was validated using a dataset, including 9 subjects with around 50 minutes of measurements per subject, collected in our lab simulating a free-living condition. Using the respiratory data measured by COSMED K5 as reference and explained variance (R^2) as evaluation metric, our model's predicted PAEE yielded median (min-max) R^2 = 0.72 (0.60--0.87), using three IMUs (pelvis and two thighs) for capturing the body-center-of-mass motion and measured HR for the time-varying cardiac output. Our model outperformed a linear regression (LR) model (R^2 = 0.52 (0.23--0.92)) and CNN-LSTM model (R^2 = 0.65 (0.46--0.78)) on the same dataset. Notably, excluding the sensory HR measurement did not significantly degrade PAEE estimation of all three models, indicating that IMU-captured mechanical workload dominated PAEE estimation performance in our protocol.
comment: The main body consists of 11 pages. A 2-page supplementary material is included in the source file as pdf. This manuscript is currently in the process of being submitted the IEEE JBHI journal
A Control Framework for Induced Seismicity Mitigation in Groningen Gas Reservoir
Induced seismicity associated with gas production poses major operational and societal challenges, as illustrated by the Groningen field in the Netherlands. While many studies have focused on forecasting seismicity under prescribed production scenarios, fewer works address the inverse problem: designing operational strategies that minimize seismicity while maintaining production objectives. In this paper, we propose a control-oriented methodology for operating Groningen under induced-seismicity mitigation constraints. We employ a cascade model coupling pore-pressure diffusion with seismicity rate (SR) dynamics, and complement it with a stochastic event-generation procedure to convert the continuous SR field into a synthetic earthquake catalog with event times, locations, and magnitudes. From this catalog, we estimate regional SR measurements and design a robust feedback controller that computes well-rate commands to regulate the SR toward a desired reference while satisfying operational requirements, including prescribed production constraints. The proposed control architecture explicitly accounts for injection and extraction flux limits (actuator saturation). The well fluxes generated by the controller are updated at discrete-time intervals (digital control). We validate the modeling components against Groningen data and illustrate the approach through numerical experiments under different scenarios, including various control update periods and gain selections, as well as combined production with compensating injection (e.g., reinjection of nitrogen). The results illustrate how the proposed framework can reduce seismicity levels in a controlled manner while maximizing production targets.
Model-Free Dynamic Mode Adaptive Control for Data-Driven Control Synthesis
This paper presents a model-free, data-driven control synthesis method called dynamic mode adaptive control (DMAC) for systems whose mathematical models are unavailable or unsuitable for classical control design. The proposed approach combines data-driven dynamics approximation with adaptive control synthesis to enable online controller design using measured system data. DMAC comprises two main components: a dynamics-approximation module and a controller-synthesis module. The dynamics approximation module estimates a local linear representation of the system dynamics directly from measurements using a matrix recursive least-squares algorithm with a forgetting factor. The estimated dynamics are then used to compute an online stabilizing controller with full-state feedback and integral action. Theoretical analysis establishes convergence properties of the recursive dynamics approximation and boundedness of the closed-loop system under the DMAC controller. The performance of the proposed method is demonstrated through numerical examples involving representative dynamical systems, including an unstable linear system, the Van der Pol oscillator, and the Burgers' equation. Sensitivity studies further demonstrate the robustness of DMAC with respect to both algorithm hyperparameters and variations in system parameters.
Exploring Converter Control Duality in Microgrids: AC Grid-Forming vs DC Droop Control
Power electronic converters are fundamental building blocks of both AC and DC microgrids, enabling the integration of renewable energy sources, energy storage systems, electronic loads, and electric vehicles. In contrast, converter control in DC microgrids has developed along the path of droop control, which is widely adopted for decentralized DC-bus voltage regulation and power sharing. Although these control strategies share certain characteristics, their similarities remain largely unexplored due to the distinct physical domains in which they operate. To bridge this gap, we introduce a novel perspective based on the concept of duality to reveal the underlying isomorphism between the two control approaches. We show that AC grid-forming and DC I--V droop control are duals of each other in several aspects, including: (i) the small-signal model of the converter; (ii) the inner current control structure; (iii) power-sharing mechanisms based on the AC swing equation and DC capacitor power balance; and (iv) disturbance signals and dynamic response. Theoretical analysis, validated through simulations on simple converter setups, illustrates these dualities and provides new insights towards a unified control design.
comment: Accepted for presentation at 2026 IEEE International Conference on DC Microgrids (ICDCM)
Learning to Route Electric Trucks Under Operational Uncertainty
Electric truck operations require routing decisions that remain feasible under limited battery range, long charging times, travel and energy consumption, and competition for shared charging infrastructure. These features make electric truck routing a coupled logistics and energy problem, limiting the practicality of heuristics-based methods and rendering them computationally infeasible at scale. This paper proposes a learning-based framework for the stochastic electric truck routing under charging constraints and operational uncertainty. The problem, solved by Reinforcement Learning, is formulated as an event-driven semi-Markov decision process with shared charging resources, stochastic travel and energy requirements, and realistic nonlinear fast-charging behavior. To support learning in this setting, a graph-based representation of system state and feasible decisions is introduced, together with a rule-based action mask that restricts policies to operationally admissible actions; thus, improving training efficiency. Building on this formulation, an event-driven simulation environment is developed that supports both Reinforcement Learning and benchmarking against heuristic and mathematical programming baselines. Computational experiments across a range of fleet sizes show that the proposed learning-based algorithm consistently outperforms baselines and attains performance close to optimization benchmarks in many settings, while preserving high success rates under charging congestion and uncertainty.
comment: Reinforcement Learning, Electric Truck Routing, Freight Transportation, Graph Neural Networks, Stochastic Optimization, Vehicle Routing
Persona-Based Process Design for Assistive Human-Robot Workplaces for Persons with Disabilities
Human-robot interaction is emerging as an important paradigm for integrating persons with disabilities into the workplace. While these systems can enable individuals to work, their design is mostly personalized, hindering widespread use beyond the individual user. The universal design paradigm is a central pillar of inclusive design, describing usability of systems by all. To incorporate universal design into process design for human-robot workplaces expert knowledge is required that is often not available. To simplify process design of human-robot workplaces, we propose a persona-based design approach. First, typical impairments prevalent in the workforce or particularly relevant for the processes are abstracted into personas with disabilities. The work process is subdivided into sequential actions. For each action and persona, strategies are developed to reach the action goal by a design thinking approach. The resulting actions are ordered by level of robot assistance, i.e. robot involvement, and implemented in a behavior tree. Therefore, the macro-behavior of the workplace may adapt to individual personas online. We demonstrate the method in a collaborative box folding process with a total of seven personas with disabilities. The persona-based process design shows promising results by generating more comprehensive process strategies while enabling adaptive behavior in the sense of universal design.
comment: Accepted at IEEE International Conference on Human-Machine Systems (ICHMS), Singapore, 2026
Fuelling fusion plasmas with pellets: Can neuromorphic control outperform Sigma-Delta modulation?
Nuclear fusion is a promising clean energy source in which deuterium and tritium fuse inside a magnetically confined plasma in a tokamak, releasing energy. A key challenge on the route to practical nuclear fusion is the control of the plasma density which has to be done through adding fuel in the form of deuterium and tritium to the plasma. Pellet injection, firing frozen fuel into the plasma, is used to accomplish this. Since the injection of a pellet causes an almost instantaneous increase in particle density compared to the time scales of the plasma dynamics, the problem is of a hybrid nature in which continuous plasma dynamics are interrupted by discrete bursts of particles. In this paper, we propose a formal hybrid model for this fuelling process and we propose a new, neuron-inspired control method that treats pellets much like spikes as in a brain-like system. The neuromorphic controller offers a lightweight solution that naturally fits the hybrid character of pellet fuelling. For comparison, we also develop a hybrid model of sigma-delta modulation, which is used in current tokamaks. For both the neuromorphic controller and the sigma-delta modulation we present formal analysis results for this control problem in nuclear fusion. We derive explicit actuator and controller parameter constraints, key for controller tuning, that lead to practical stability guarantees. Numerical simulations compare the different controller variants and validate the theoretical results.
comment: Submitted to Automatica, under review. 16 pages, 9 figures
Risk-Aware Multi-Market Scheduling of Virtual Power Plants with Dynamic Network Tariffs SC
As the penetration of distributed energy resources (DERs) increases, harnessing their flexibility becomes critical for power system operations. Virtual power plants (VPPs) offer a promising solution. However, most existing scheduling tools rely on simplified DER or grid models and largely overlook local flexibility procurement mechanisms such as dynamic network tariffs. This paper proposes a two-stage stochastic optimization framework for VPP multi-market scheduling that integrates detailed device-level constraints, network limitations, and operational and market uncertainties. Conditional value-at-risk is incorporated to represent risk preferences, and Benders decomposition ensures tractability with extensive scenario sets. The model jointly optimizes bidding across energy and reserve markets while explicitly accounting for local flexibility procurement through dynamic network tariffs. The results from a realistic case study show that both risk-neutral and risk-averse strategies exploit arbitrage opportunities. However, risk aversion reduces profit volatility through closer alignment with physical dispatch. Dynamic tariffs unlock local flexibility by shifting demand across the day, though strong tariff signals reduce expected profitability by up to 65% with limited additional flexibility gains.
comment: Accepted to PSCC 2026
Asset Administration Shell-Based OCL Validation Framework for Model-Based System Engineering
Increasing complexity of modern enterprise systems and the demand for automation and interoperability require consistent and semantically validated models in Model-Based Systems Engineering (MBSE). The Object Constraint Language (OCL) supports formal definition of such constraint validations. However, MBSE models and OCL constraints are typically managed in separate tools, causing manual effort during model constraint application and result interpretation. To address this gap, this paper proposes an approach to managing OCL constraints and their validation results through Asset Administration Shells (a well-established technology for interoperability in enterprise systems). The methodology is demonstrated through a fictional industrial scenario, and to support reproducibility, all artifacts are publicly available in a GitHub repository.
comment: 7 Pages, Accepted to IFAC 2026 (23rd IFAC World Congress)
Real-Time Minimum-Energy Operating-Point Tracking for Battery-Powered Micro DC Motors Under Dynamically Variable Loading
Micro DC brushed motors are widely deployed in battery-powered biomedical systems, where limited energy budgets and variable physiological loading impose stringent efficiency and safety constraints. However, conventional actuation strategies rely on conservative voltage margins to avoid stalling, leading to systematic energy inefficiency. Furthermore, existing methods primarily optimize steady-state performance, neglecting the energy required to complete individual actuation cycles under dynamic conditions. This paper reveals that the energy consumption per mechanical cycle of a DC motor exhibits a non-monotonic dependence on driving voltage, with a load-dependent minimum that shifts with external loading. Based on this insight, we propose a real-time operating-point tracking method that enables the motor to autonomously converge to its minimum-energy condition. A lightweight load metric derived from current waveform features is introduced to detect load variation, and a two-phase adaptive voltage strategy is developed to track the optimal operating point online. Experimental results demonstrate that the proposed method can track the new minimum-energy operating region under both low-to-high and high-to-low loading transitions. With 3-cycle averaging, the mean response time is 11.55s for the low-to-high case and 11.16s for the high-to-low case, while the mean convergence voltage is 2.73V and 2.0V, respectively.
comment: 7 pages, 8 figures, 2026 IEEE/ASME International Conference on Advanced Intelligent Mechatronics
Regime-Adaptive Weighted Ensemble Learning for Computing-Driven Dynamic Load Forecasting in AI Data Centers
Short-term load forecasting for AI data centers presents new challenges because it is computing-driven, with heterogeneous job arrivals, sizes, and durations exhibiting bursty, non-stationary dynamics. Compared with traditional load types, data center loads are less researched and can pose greater threats to the efficiency and stability of power grids. To close the gap, this paper proposes a regime-adaptive ensemble learning forecasting algorithm to predict computing-driven dynamic workloads in AI data centers. A weight-learned neural network within an ensemble learning framework is developed to exploit the complementary strengths of two machine learning (ML) submodels across varying operating regimes. Furthermore, a novel feature engineering strategy is developed to incrementally learn from a non-stationary data stream. Thus, the ensemble weights are dynamically optimized to facilitate adaptive calibration of inter-submodel contributions. Comparative case studies on the MIT Supercloud dataset demonstrate that the proposed method significantly enhances load forecasting accuracy and adaptivity across various regimes, and the selected combination of ML models for ensemble learning outperforms other possible combinations. To the best of our knowledge, our method is the first to reduce minute-class forecasting errors for AI data center loads to below 1%, highlighting its potential for grid-interactive coordination and demand response.
Orbital Data Centers: Spacecraft Constraints and Economic Viability
Orbital data centers are being evaluated as solar-powered compute constellations and relay-integrated processing platforms. Their feasibility is not set by orbital solar flux alone, but by simultaneous closure of photovoltaic generation, eclipse recharge, radiative heat rejection, sustained space-to-ground communications, utilization, replacement cadence, and delivered compute-years over finite mission life. This paper derives necessary cluster-level competitiveness conditions using delivered information-technology (IT) electrical power $P_{\rm IT}$, deployed mass per delivered IT power $m_{\rm kW}$ in kg/kW, communication intensity $Γ=D_{\rm sg}/E_{\rm IT}$, sustained communication ceiling $Γ_{\max}$, effective utilization $U_{\rm eff}$, and lifetime penalty $Π_{\rm life}$. For a representative $P_{\rm IT}$=1 MW high-sunlight anchor, the base case gives beginning-of-life photovoltaic area $A^{\rm BOL}_{\rm PV}=5.64 \times 10^3 {\rm m}^2$, radiator area $A_{\rm rad}=2.50 \times 10^3 {\rm m^2}$, and 29.4 kg/kW for photovoltaic, storage, and radiator mass; fixed spacecraft mass raises the total to 34-59 kg/kW. At m_kW ~ 40 kg/kW, a terrestrial infrastructure benchmark of 10-40 k\$/kW allows only 250-1000 \$/kg for the combined launch and spacecraft-build cost before space-to-ground communications, operations, utilization, and lifetime terms are included. That allowance is 3.4-13.5 times below the current public Falcon 9 dedicated low-Earth-orbit launch-price benchmark alone, before spacecraft build is included. Space-native preprocessing and communications-integrated edge compute are credible early regimes; terrestrial-user general compute closes only for low Earth-coupled communication intensity, high effective utilization, long delivered lifetime, and very low combined launch-plus-build cost.
comment: 29 pages, 5 figures, 10 tables
Real-Time GPU-Accelerated Monte Carlo Evaluation of Safety-Critical AEB Systems Under Uncertainty
Automatic Emergency Braking (AEB) systems represent a safety-critical national interest, with the National Highway Traffic Safety Administration (NHTSA) Federal Motor Vehicle Safety Standard (FMVSS No. 127) requiring AEB in all new light vehicles sold in the United States by September 2029. However, production implementations frequently rely on deterministic stopping-distance or Time-to-Collision (TTC) thresholds that fail to capture uncertainty in sensing, road conditions, and vehicle dynamics. This paper presents a GPU-accelerated Monte Carlo framework for stochastic evaluation of emergency braking performance using a high-fidelity longitudinal vehicle model incorporating aerodynamic drag, road grade, brake actuator dynamics, and weight transfer effects. A one-thread-per-sample execution strategy exploits the independence of Monte Carlo rollouts, while deterministic CPU-generated sampling ensures bit-exact numerical consistency between CPU and GPU implementations. The framework is evaluated across four hardware platforms spanning development and deployment environments: two laptop GPUs (GTX 1650, RTX 5070) and two automotive-grade embedded platforms (Jetson Orin Nano, Jetson AGX Orin). Peak speedups of 54.57x are achieved while maintaining exact numerical agreement. Real-time feasibility analysis with a complete AEB timing budget (700 ms human reaction time minus 120 ms perception and 50 ms decision overhead) demonstrates that the Jetson AGX Orin can execute approximately 25,000 Monte Carlo samples within a 530 ms budget, enabling real-time probabilistic AEB evaluation as part of a complete embedded pipeline. These results establish Monte Carlo-based uncertainty evaluation as a deployable runtime component rather than an offline validation tool and provide quantitative guidance for risk-aware AEB threshold selection under the NHTSA final rule.
comment: 10 pages, 6 figures. Submitted to IEEE journal for possible publication; under review
Learning to Spend: Model Predictive Control for Budgeting under Non-Stationary Returns
We study finite-horizon budget allocation as a closed-loop economic control problem and evaluate receding-horizon Model Predictive Control (MPC) relative to reactive budgeting policies. Budgets are allocated periodically under execution noise and operational constraints, while return efficiency may evolve over time. Using a controlled simulation framework motivated by digital marketing, we compare reactive pacing to MPC across environments with increasing degrees of non-stationarity. Our results show that non-stationarity alone does not justify predictive control. When return dynamics are stationary or evolve through unpredictable stochastic drift, MPC offers no systematic advantage over reactive baselines. By contrast, when return efficiency exhibits predictable structure over the planning horizon, that is captured through an underlying model, MPC consistently outperforms reactive budgeting by exploiting intertemporal trade-offs.
comment: 8 pages, 0 figures
Efficient Graph Partitioning under Resource Constraints: A Cutting-Plane Framework for Distribution Grids
This paper presents an optimal network topology control framework using cutting-plane methods for efficient network partitioning with controllable edges. The objective is to enable real-time reconfiguration of interconnected sub-networks while ensuring radial connectivity, resource feasibility, and structured leader allocation, which are essential for distributed control, stability, and coordination. The problem is formulated as a mixed-integer program that integrates graph-theoretic constraints, resource flow, and network structural properties to enforce an operational hierarchy. To address the combinatorial complexity of cycle elimination and leader assignment, we propose an iterative cutting-plane framework that ensures convergence to an optimal and feasible network topology. Theoretical guarantees on optimality preservation, feasibility, and convergence are established, ensuring systematic elimination of infeasible configurations while maintaining distributed controllability. Simulations on a modified Iowa 240-bus power distribution grid demonstrate the framework's effectiveness in network reconfiguration under resource constraints. The approach achieves median and best-case speedups of 57.5x and over 64x in a 46-switch configuration, highlighting its applicability to other networked control systems.
Multidisciplinary Design Optimization for Wave-Driven Desalination Systems
Wave-driven desalination systems are an innovative solution to the global freshwater crisis, leveraging the complementary characteristics of seawater reverse osmosis and wave energy converters. However, the high costs of this system pose a significant barrier to widespread adoption. Optimization can help these systems reach a more competitive levelized cost of water, but the highly coupled nature of the system necessitates a multidisciplinary design optimization approach. This paper presents a holistic, multidisciplinary design optimization framework for wave-driven desalination system design, integrating models for wave energy converter hydrodynamics, power take-off transmission, seawater reverse osmosis constraints, and economic analysis. This study demonstrates the impact of multidisciplinary design optimization for wave-driven desalination systems, resulting in a 69.5% reduction in levelized cost of water compared to a nominal design. We demonstrate that multidisciplinary design optimization outperforms sequential design approaches, yielding lower levelized costs of water and substantially different optimal designs. The multidisciplinary design optimization results suggest major design changes compared to designs found in the literature. Notably, smaller wave energy converters and larger pistons, along with smaller accumulators and larger seawater reverse osmosis plant installations, are preferred. These design trends are consistent across a range of sea states, suggesting potential generalizability beyond a single location. This study demonstrates the importance of holistic modeling and co-design for wave-driven desalination systems and establishes an effective optimization framework for future studies to build upon.
L2RU: a Structured State Space Model with prescribed L2-bound
Structured state-space models (SSMs) have recently emerged as a powerful architecture at the intersection of machine learning and control, featuring layers composed of discrete-time linear time-invariant (LTI) systems followed by pointwise nonlinearities. These models combine the expressiveness of deep neural networks with the interpretability and inductive bias of dynamical systems, offering strong performance on long-sequence tasks with favorable computational complexity. However, their adoption in applications such as system identification and optimal control remains limited by the difficulty of enforcing stability and robustness in a principled and tractable manner. We introduce L2RU, a class of SSMs endowed with a prescribed $\mathcal{L}_2$-gain bound, guaranteeing input--output stability and robustness for all parameter values. The L2RU architecture is derived from free parametrizations of LTI systems satisfying an $\mathcal{L}_2$ constraint, enabling unconstrained optimization via standard gradient-based methods while preserving rigorous stability guarantees. Specifically, we develop two complementary parametrizations: a non-conservative formulation that provides a complete characterization of square LTI systems with a given $\mathcal{L}_2$-bound, and a conservative formulation that extends the approach to general (possibly non-square) systems while improving computational efficiency through a structured representation of the system matrices. Both parametrizations admit efficient initialization schemes that facilitate training long-memory models. We demonstrate the effectiveness of the proposed framework on a nonlinear system identification benchmark, where L2RU achieves improved performance and training stability compared to existing SSM architectures, highlighting its potential as a principled and robust building block for learning and control.
A Constant-Gain Equation-Error Framework for Airliner Aerodynamic Monitoring Using QAR Data
Monitoring the in-service aerodynamic performance of airliners is critical for operational efficiency and safety, but using operational Quick Access Recorder (QAR) data for this purpose presents significant challenges. This paper first establishes that the absence of key parameters, particularly aircraft moments of inertia, makes conventional state-propagation filters fundamentally unsuitable for this application. This limitation necessitates a decoupled, Equation-Error Method (EEM). However, we then demonstrate through a comparative analysis that standard recursive estimators with time-varying gains, such as Recursive Least Squares (RLS), also fail within an EEM framework, exhibiting premature convergence or instability when applied to low-excitation cruise data. To overcome these dual challenges, we propose and validate the Constant-Gain Equation-Error Method (CG-EEM). This framework employs a custom estimator with a constant, Kalman-like gain, which is perfectly suited to the stationary, low-signal-to-noise characteristics of cruise flight. The CG-EEM is extensively validated on a large, multi-fleet dataset of over 200 flights, where it produces highly consistent, physically plausible aerodynamic parameters and correctly identifies known performance differences between aircraft types. The result is a robust, scalable, and computationally efficient tool for fleet-wide performance monitoring and the early detection of performance degradation.
comment: \c{opyright} 2026 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses
Improving Bayesian Optimization for Portfolio Management with an Adaptive Scheduling
Existing black-box portfolio management systems are prevalent in the financial industry due to commercial and safety constraints, though their performance can fluctuate dramatically with changing market regimes. Evaluating these non-transparent systems is computationally expensive, as fixed budgets limit the number of possible observations. Therefore, achieving stable and sample-efficient optimization for these systems has become a critical challenge. This work presents a novel Bayesian optimization framework (TPE-AS) that improves search stability and efficiency for black-box portfolio models under these limited observation budgets. Standard Bayesian optimization, which solely maximizes expected return, can yield erratic search trajectories and misalign the surrogate model with the true objective, thereby wasting the limited evaluation budget. To mitigate these issues, we propose a weighted Lagrangian estimator that leverages an adaptive schedule and importance sampling. This estimator dynamically balances exploration and exploitation by incorporating both the maximization of model performance and the minimization of the variance of model observations. It guides the search from broad, performance-seeking exploration towards stable and desirable regions as the optimization progresses. Extensive experiments and ablation studies, which establish our proposed method as the primary approach and other configurations as baselines, demonstrate its effectiveness across four backtest settings with three distinct black-box portfolio management models.
comment: 5 pages, 2 figures; version of record. ICAAI 2025, 9th International Conference on Advances in Artificial Intelligence (ICAAI 2025), November 14-16, 2025, Manchester, United Kingdom. ACM, New York, NY, USA, pages 21-25. Version 4, code repository added: https://github.com/pixelhero98/TPE-AS
Grid-Forming Characterization in DC Microgrids
DC microgrids are converter-based electrical networks that are increasingly being used in various applications, including data centers and industrial distribution systems. A central challenge in their operation is maintaining the DC-bus voltage within predefined limits while ensuring overall system stability. Although a wide variety of converter control algorithms has been proposed to achieve these objectives, the literature lacks a clear and physically interpretable framework for evaluating their effectiveness and for classifying and comparing them. Moreover, the grid-forming versus grid-following distinction that exists in AC systems has largely been unexplored in DC microgrids. To address this gap, this paper introduces three novel impedance-based indices that can be used to quantify the voltage-forming and current-forming behavior of a converter. The indices also provide a basis for defining the desired converter behavior that yields superior DC-bus voltage regulation performance. Simulation results illustrate the application of the framework to several representative control strategies and highlight the strengths and limitations of these control algorithms.
comment: Accepted for presentation at 2026 IEEE International Conference on DC Microgrids (ICDCM)
Impacts of Electric Vehicle Charging Regimes and Infrastructure Deployments on System Performance: An Agent-Based Study
The rapid growth of electric vehicles (EVs) requires more effective charging infrastructure planning. Infrastructure layout not only determines deployment cost, but also reshapes charging behavior and influences overall system performance. In addition, destination charging and en-route charging represent distinct charging regimes associated with different power requirements, which may lead to substantially different infrastructure deployment outcomes. This study applies an agent-based modeling framework to generate trajectory-level latent public charging demand under three charging regimes based on a synthetic representation of the Melbourne (Australia) metropolitan area. Two deployment strategies, an optimization-based approach and a utilization-refined approach, are evaluated across different infrastructure layouts. Results show that utilization-refined deployments reduce total system cost, accounting for both infrastructure deployment cost and user generalized charging cost, with the most significant improvement observed under the combined charging regime. In particular, a more effective allocation of AC slow chargers reshapes destination charging behavior, which in turn reduces unnecessary reliance on en-route charging and lowers detour costs associated with en-route charging. This interaction highlights the behavioral linkage between destination and en-route charging regimes and demonstrates the importance of accounting for user response and multiple charging regimes in charging infrastructure planning.
comment: 7 pages, 4 figures
End-to-end differentiable network traffic simulation with dynamic route choice
Optimization using network traffic models requires computing gradients of objective functions with respect to model parameters. However, derivation of such gradients has often been considered difficult or impractical due to their complexity and size. Conventional approaches rely on numerical differentiation or derivative-free methods that do not scale well with the parameter dimension, or on adjoint methods that require manual derivation for each specific model. This study proposes a novel end-to-end differentiable network traffic flow simulator based on automatic differentiation (AD), employing the Link Transmission Model (LTM) and a Dynamic User Optimum (DUO) route choice model. The LTM operates on continuous aggregate state variables through piecewise-linear min/max operations, which admit subgradients almost everywhere and thus require no smooth relaxation for AD. The DUO is also suitable for AD: although the shortest path search is itself discrete, the resulting diverge ratios at each node are continuous functions of per-destination vehicle counts and are thus differentiable. In order to demonstrate the capability of the proposed model, we solved a dynamic congestion toll optimization problem on the Chicago-Sketch dataset with approximately 2500 links, 1 million vehicles, a 3-hour duration, and 15000 decision variables. The proposed model successfully derived a high-quality solution in 3000 iterations, taking about 40 minutes. The simulator, implemented in Python and JAX, is released as open-source software named UNsim (https://github.com/toruseo/UNsim).
comment: refine and finalize the overall paper
Geometric Inverse Flight Dynamics on SO(3) and Application to Tethered Fixed-Wing Aircraft
We present a robotics-oriented, coordinate-free formulation of inverse flight dynamics for fixed-wing aircraft on SO(3). Translational force balance is written in the world frame and rotational dynamics in the body frame; aerodynamic directions (drag, lift, side) are defined geometrically, avoiding local attitude coordinates. Enforcing coordinated flight (no sideslip), we derive a closed-form trajectory-to-input map yielding the attitude, angular velocity, and thrust-angle-of-attack pair, and we recover the aerodynamic moment coefficients component-wise. Applying such a map to tethered flight on spherical parallels, we obtain analytic expressions for the required bank angle and identify a specific zero-bank locus where the tether tension exactly balances centrifugal effects, highlighting the decoupling between aerodynamic coordination and the apparent gravity vector. Under a simple lift/drag law, the minimal-thrust angle of attack admits a closed form. These pointwise quasi-steady inversion solutions become steady-flight trim when the trajectory and rotational dynamics are time-invariant. The framework bridges inverse simulation in aeronautics with geometric modeling in robotics, providing a rigorous building block for trajectory design and feasibility checks.
comment: ACCEPTED ICUAS 2026
Model Predictive Communication for Timely Status Updates in Low-Altitude Networks
Timely information delivery in low-altitude networks is critical for many time-sensitive applications, such as unmanned aerial vehicle (UAV) navigation, inspection, and surveillance. The key challenge lies in balancing three competing factors: stringent data freshness requirements, UAV onboard energy consumption, and interference with terrestrial services. Addressing this challenge requires not only efficient power and channel allocation strategies but also effective communication timing over the entire operation horizon. In this work, we propose a model predictive communication (MPComm) framework, enabled by advanced channel sensing techniques, in which the channel conditions that the UAV will experience are largely predictable. Within this framework, we formulate a constrained bi-objective optimization problem to achieve a desired trade-off between energy consumption and terrestrial channel occupation, subject to a strict timeliness constraint. We solve this problem using Pareto analysis and show that the original non-convex, mixed-integer problem can be decomposed into a two-layer structure: the outer layer determines the optimal communication timing, while the inner layer determines the optimal power and channel allocation for each communication interval. An efficient algorithm for the inner problem is developed using non-convex analysis, with asymptotic optimality guarantees, while the outer problem is solved optimally via a simple graph search, with edges characterized by inner solutions. The proposed approach applies to a broad class of problem variants, including objective transformations and single-objective specializations. Numerical results demonstrate the efficiency of the proposed solution, achieving up to a six-fold reduction in terrestrial channel occupation and a 6dB energy saving compared to benchmark schemes.
Explainable Representation of Finite-Memory Policies for POMDPs using Decision Trees AAMAS 2026
Partially Observable Markov Decision Processes (POMDPs) are a fundamental framework for decision-making under uncertainty and partial observability. Since in general optimal policies may require infinite memory, they are hard to implement and often render most problems undecidable. Consequently, finite-memory policies are mostly considered instead. However, the algorithms for computing them are typically very complex, and so are the resulting policies. Facing the need for their explainability, we provide a representation of such policies, both (i) in an interpretable formalism and (ii) typically of smaller size, together yielding higher explainability. To that end, we combine models of Mealy machines and decision trees; the latter describing simple, stationary parts of the policies and the former describing how to switch among them. We design a translation for policies of the finite-state-controller (FSC) form from standard literature and show how our method smoothly generalizes to other variants of finite-memory policies. Further, we identify specific properties of recently used "attractor-based" policies, which allow us to construct yet simpler and smaller representations. Finally, we illustrate the higher explainability in a few case studies.
comment: Full version of the extended abstract accepted at AAMAS 2026
CUDA Kernel Optimization and Counter-Free Performance Analysis for Depthwise Convolution in Cloud Environments
Efficient GPU execution of convolution operators is governed by memory-access efficiency, on-chip data reuse, and execution mapping rather than arithmetic throughput alone. This paper presents a controlled operator-level study of CUDA kernel optimization for the depthwise convolution used in Structured State Space Model Convolutional Diagonal (S4ConvD), together with a cloud-compatible, counter-free performance analysis methodology. The operator, model, dataset, and training configuration are fixed, and only the CUDA kernel implementation is varied. The evaluated CUDA kernels comprise naive, global-memory-coalesced, shared-memory cache-blocked, and warp-tiled variants, covering forward, input-gradient, and weight-gradient execution paths under steady-state training conditions. Performance is characterized using a counter-free methodology that combines CUDA-event timing, execution-path decomposition, analytically derived memory-traffic modeling, effective-bandwidth estimation, and roofline analysis. This enables profiling-like architectural insights without requiring hardware performance counters or privileged profiling access. The warp-tiled kernel reduces convolution runtime by $3.26\times$ relative to the naive CUDA baseline, while end-to-end training speedup reaches $1.29\times$. A PyTorch implementation is used separately for numerical validation and runtime context, but is not treated as a controlled architectural baseline. Forward and input-gradient paths benefit substantially from improved locality and on-chip data reuse, whereas the reduction-dominated weight-gradient path remains the primary bottleneck. The results demonstrate that meaningful architecture-level GPU kernel analysis can be performed reproducibly in restricted cloud environments, even without access to hardware performance counters.
comment: 12 pages, 9 figures. Submitted to IEEE Transactions on Parallel and Distributed Systems (TPDS)
Frequency Security Assessment in Power Systems With High Penetration of Renewables Considering Spatio-Temporal Frequency Distribution
The increasing integration of renewable energy sources exacerbates the spatial and temporal differences in frequency across the power system, posing a serious challenge to the accurate and efficient assessment of system frequency security. To address this issue, a generic effective nodal frequency (ENF) model is first established to concisely characterize nodal frequency dynamics. This model is featured by the effective nodal inertia (ENI), damping, and primary regulation parameters, which retain only the dominant constant component governing nodal frequency dynamic performance. This model enables the tractable analytical formulation of nodal frequency trajectory and the key frequency security indicators. Quantitative analysis under the temporary power disturbance condition reveals that the ENI is the most influential parameter governing frequency security. Consequently, the critical nodal inertia for ensuring nodal frequency security is analytically derived. A system-level frequency security index based on the actual ENI and critical nodal inertia is proposed. On the basis of the proposed index, the system frequency security assessment is carried out with the procedure of ``offline calculation and online evaluation'', which is achieved using a lookup table approach and an interpolation method. Simulations on the modified IEEE 39-bus system verify the effectiveness of the proposed assessment method.
comment: 10 pages, 12 figures, article, 18 references
LMI Optimization Based Multirate Steady-State Kalman Filter Design
This paper presents an LMI-based design framework for multirate steady-state Kalman filters in systems with sensors operating at different sampling rates. The multirate system is formulated as a periodic time-varying system, where the Kalman gains converge to periodic steady-state values that repeat every frame period. Cyclic reformulation transforms this into a time-invariant problem; however, the resulting measurement noise covariance becomes semidefinite rather than positive definite, preventing direct application of standard Riccati equation methods. I address this through a dual LQR formulation with LMI optimization that naturally handles semidefinite covariances. The framework enables multi-objective design, supporting pole placement for guaranteed convergence rates and $l_2$-induced norm constraints for balancing average and worst-case performance. Numerical validation using an automotive navigation system with GPS and wheel speed sensors, including Monte Carlo simulation with 500 independent noise realizations, demonstrates that the proposed filter achieves a position RMSE well below the GPS noise level through effective multirate sensor fusion, and that the LMI solution provides valid upper bounds on the estimation error covariance.
A Nonlinear Separation Principle via Contraction Theory: Applications to Neural Networks, Control, and Learning
This paper establishes a nonlinear separation principle based on contraction theory and derives sharp stability conditions for recurrent neural networks (RNNs). First, we introduce a nonlinear separation principle that guarantees global exponential stability for the interconnection of a contracting state-feedback controller and a contracting observer, alongside parametric extensions for robustness and equilibrium tracking. Second, we derive sharp linear matrix inequality (LMI) conditions that guarantee the contractivity of both firing rate and Hopfield neural network architectures. We establish structural relationships among these certificates-demonstrating that continuous-time models with monotone non-decreasing activations maximize the admissible weight space-and extend these stability guarantees to interconnected systems and Graph RNNs. Third, we combine our separation principle and LMI framework to solve the output reference tracking problem for RNN-modeled plants. We provide LMI synthesis methods for feedback controllers and observers, and rigorously design a low-gain integral controller to eliminate steady-state error. Finally, we derive an exact, unconstrained algebraic parameterization of our contraction LMIs to design highly expressive implicit neural networks, achieving competitive accuracy and parameter efficiency on standard image classification benchmarks.
comment: arXiv admin note: text overlap with arXiv:2604.00119
Principled Learning-to-Communicate with Quasi-Classical Information Structures
Learning-to-communicate (LTC) in partially observable environments has received increasing attention in deep multi-agent reinforcement learning, where the control and communication strategies are jointly learned. Meanwhile, the impact of communication on decision-making has been extensively studied in control theory. In this paper, we seek to formalize and better understand LTC by bridging these two lines of work, through the lens of information structures (ISs). To this end, we formalize LTC in decentralized partially observable Markov decision processes (Dec-POMDPs) under the common-information-based framework from decentralized stochastic control, and classify LTC problems based on the ISs before (additional) information sharing. We first show that non-classical LTCs are computationally intractable in general, and thus focus on quasi-classical (QC) LTCs. We then propose a series of conditions for QC LTCs, under which LTC preserves the QC IS after information sharing, whereas violating them can cause computational hardness in general. Further, we develop provable planning and learning algorithms for QC LTCs, and establish quasi-polynomial time and sample complexities for several QC LTC examples that satisfy the above conditions. Along the way, we also establish new results on a relationship between (strictly) QC IS and the condition of having strategy-independent common-information-based beliefs (SI-CIBs), as well as on solving Dec-POMDPs without computationally intractable oracles but beyond those with SI-CIBs, which may be of independent interest.
comment: Preliminary version appeared at IEEE CDC 2025
Distributional Stability of Tangent-Linearized Gaussian Inference on Smooth Manifolds
Gaussian inference on smooth manifolds is central to robotics, but exact marginalization and conditioning are generally non-Gaussian and geometry-dependent. We study tangent-linearized Gaussian inference and derive explicit non-asymptotic $W_2$ stability bounds for projection marginalization and surface-measure conditioning. The bounds separate local second-order geometric distortion from nonlocal tail leakage and, for Gaussian inputs, yield closed-form diagnostics from $(μ,Σ)$ and curvature/reach surrogates. Circle and planar-pushing experiments validate the predicted calibration transition near $\sqrt{\|Σ\|_{\mathrm{op}}}/R\approx 1/6$ and indicate that normal-direction uncertainty is the dominant failure mode when locality breaks. These diagnostics provide practical triggers for switching from single-chart linearization to multi-chart or sample-based manifold inference. Code and Jupyter notebooks are available at https://github.com/mikigom/StabilityTLGaussian.
comment: To appear in IEEE Robotics and Automation Letters (IEEE RA-L)
Correct-by-Design Control Synthesis of Stochastic Multi-agent Systems: a Robust Tensor-based Solution
Discrete-time stochastic systems with continuous spaces are hard to verify and control, even with MDP abstractions due to the curse of dimensionality. We propose an abstraction-based framework with robust dynamic programming mappings that deliver control strategies with provable lower bounds on temporal-logic satisfaction, quantified via approximate stochastic simulation relations. Exploiting decoupled dynamics, we reveal a Canonical Polyadic Decomposition tensor structure in value functions that makes dynamic programming scalable. The proposed method provides correct-by-design probabilistic guarantees for temporal logic specifications. We validate our results on continuous-state linear stochastic systems.
Electrostatic Clutch-Based Mechanical Multiplexer with Increased Force Capability
Robotic systems with many degrees of freedom (DoF) are constrained by the demands of dedicating a motor to each joint, and while mechanical multiplexing reduces actuator count, existing clutch designs are bulky, force-limited, or restricted to one output at a time. The problem addressed in this study is how to achieve high-force multiplexing that supports both simultaneous and sequential control from a single motor. Here we show an electrostatic capstan clutch-based transmission that enables both single-input-single-output (SISO) and single-input-multiple-output (SIMO) multiplexing. We demonstrated these on a four-DoF tendon-driven robotic hand where a single motor achieved output forces of up to 212 N, increased vertical grip strength by 4.09 times, and raised horizontal carrying capacity to 111.2 N, the highest currently among five-fingered tendon-driven robotic hands. These results demonstrate that electrostatic-based multiplexing provides versatile actuation, overcoming the limitations of prior systems.
A Unified Bayesian Framework for Data-Driven Smoothing, Prediction, and Control
Extending data-driven algorithms based on Willems' fundamental lemma to stochastic data often requires empirical and customized workarounds. This work presents a unified Bayesian framework for linear systems that provides a systematic and general method for handling stochastic data-driven tasks, including smoothing, prediction, and control, via maximum a posteriori estimation. This framework formulates a unified trajectory estimation problem for the three tasks by specifying different types of trajectory knowledge. Then, a Bayesian problem is solved that optimally combines trajectory knowledge with a data-driven characterization of the trajectory from offline data for correlated input-output uncertainties with elliptical distributions. Under specific conditions, this problem is shown to generalize existing data-driven prediction and control algorithms. Numerical examples demonstrate the performance of the unified approach for all three tasks against other data-driven and system identification approaches.
comment: This work has been accepted for presentation at the 2026 23rd IFAC World Congress
Robotics
Variational Neural Belief Parameterizations for Robust Dexterous Grasping under Multimodal Uncertainty
Contact variability, sensing uncertainty, and external disturbances make grasp execution stochastic. Expected-quality objectives ignore tail outcomes and often select grasps that fail under adverse contact realizations. Risk-sensitive POMDPs address this failure mode, but many use particle-filter beliefs that scale poorly, obstruct gradient-based optimization, and estimate Conditional Value-at-Risk (CVaR) with high-variance approximations. We instead formulate grasp acquisition as variational inference over latent contact parameters and object pose, representing the belief with a differentiable Gaussian mixture. We use Gumbel-Softmax component selection and location-scale reparameterization to express samples as smooth functions of the belief parameters, enabling pathwise gradients through a differentiable CVaR surrogate for direct optimization of tail robustness. In simulation, our variational neural belief improves robust grasp success under contact-parameter uncertainty and exogenous force perturbations while reducing planning time by roughly an order of magnitude relative to particle-filter model-predictive control. On a serial-chain robot arm with a multifingered hand, we validate grasp-and-lift success under object-pose uncertainty against a Gaussian baseline. Both methods succeed on the tested perturbations, but our controller terminates in fewer steps and less wall-clock time while achieving a higher tactile grasp-quality proxy. Our learned belief also calibrates risk more accurately, keeping mean absolute calibration error below 0.14 across tested simulation regimes, compared with 0.58 for a Cross-Entropy Method planner.
comment: 11 pages, 10 figures
No Pedestrian Left Behind: Real-Time Detection and Tracking of Vulnerable Road Users for Adaptive Traffic Signal Control
Current pedestrian crossing signals operate on fixed timing without adjustment to pedestrian behavior, which can leave vulnerable road users (VRUs) such as the elderly, disabled, or distracted pedestrians stranded when the light changes. We introduce No Pedestrian Left Behind (NPLB), a real-time adaptive traffic signal system that monitors VRUs in crosswalks and automatically extends signal timing when needed. We evaluated five state-of-the-art object detection models on the BGVP dataset, with YOLOv12 achieving the highest mean Average Precision at 50% (mAP@0.5) of 0.756. NPLB integrates our fine-tuned YOLOv12 with ByteTrack multi-object tracking and an adaptive controller that extends pedestrian phases when remaining time falls below a critical threshold. Through 10,000 Monte Carlo simulations, we demonstrate that NPLB improves VRU safety by 71.4%, reducing stranding rates from 9.10% to 2.60%, while requiring signal extensions in only 12.1% of crossing cycles.
comment: © Anas Gamal Aly and Hala ElAarag, 2026. This is the authors' version of the work. It is posted here for your personal use. Not for redistribution. The definitive Version of Record will be published in Proceedings of the 2026 ACM Southeast Conference (ACMSE 2026)
Privileged Foresight Distillation: Zero-Cost Future Correction for World Action Models
World action models jointly predict future video and action during training, raising an open question about what role the future-prediction branch actually plays. A recent finding shows that this branch can be removed at inference with little to no loss on common manipulation benchmarks, suggesting that future information may act merely as a regularizer on the shared visual backbone. We propose instead that joint training induces an action-conditioned correction that privileged future observations impose on action denoising, and that current-only policies capture this correction only partially. Making the account precise, we formulate privileged foresight as a residual in the action-denoising direction -- the difference between what a model predicts given the true future and what it predicts given only the current frame -- and introduce \emph{Privileged Foresight Distillation (PFD)}, which transfers this residual from a training-time teacher into a small adapter on a current-only student. The teacher and student share the same backbone and differ only in the attention mask over video tokens; future video is never generated at inference. Controlled experiments verify that this gain reflects a genuine future-conditioned correction rather than a side effect of capacity or regularization. Empirically, PFD achieves consistent improvements on LIBERO and RoboTwin manipulation benchmarks while preserving the current-only inference interface at negligible added latency. This view reframes the role of future information in world action models: not as a target to predict, nor as a regularizer to absorb, but as a compressible correction to be distilled.
KinDER: A Physical Reasoning Benchmark for Robot Learning and Planning
Robotic systems that interact with the physical world must reason about kinematic and dynamic constraints imposed by their own embodiment, their environment, and the task at hand. We introduce KinDER, a benchmark for Kinematic and Dynamic Embodied Reasoning that targets physical reasoning challenges arising in robot learning and planning. KinDER comprises 25 procedurally generated environments, a Gymnasium-compatible Python library with parameterized skills and demonstrations, and a standardized evaluation suite with 13 implemented baselines spanning task and motion planning, imitation learning, reinforcement learning, and foundation-model-based approaches. The environments are designed to isolate five core physical reasoning challenges: basic spatial relations, nonprehensile multi-object manipulation, tool use, combinatorial geometric constraints, and dynamic constraints, disentangled from perception, language understanding, and application-specific complexity. Empirical evaluation shows that existing methods struggle to solve many of the environments, indicating substantial gaps in current approaches to physical reasoning. We additionally include real-to-sim-to-real experiments on a mobile manipulator to assess the correspondence between simulation and real-world physical interaction. KinDER is fully open-sourced and intended to enable systematic comparison across diverse paradigms for advancing physical reasoning in robotics. Website and code: https://prpl-group.com/kinder-site/
comment: Project website: https://prpl-group.com/kinder-site/. 21 pages, 8 figures. Accepted to Robotics Science and Systems (RSS), 2026
EOS-Bench: A Comprehensive Benchmark for Earth Observation Satellite Scheduling
Earth observation satellite imaging scheduling is a challenging NP-hard combinatorial optimisation problem central to space mission operations. While next-generation agile Earth observation satellites (EOS) increase operational flexibility, they also significantly raise scheduling complexity. The lack of a unified, open-source benchmark makes it difficult to compare algorithms across studies. This paper introduces EOS-Bench, a comprehensive framework for systematic and reproducible evaluation of scheduling methods. By integrating high-fidelity orbital dynamics and platform constraints, EOS-Bench generates 1,390 scenarios and 13,900 benchmark instances, spanning from small-scale validation cases to large coordination problems with up to 1,000 satellites and 10,000 requests. We further propose a scenario characterisation scheme to quantify structural difficulty based on factors such as opportunity density, task flexibility, conflict intensity, and satellite congestion. A multidimensional evaluation protocol is introduced, assessing performance across five metrics: task profit, completion rate, workload balance, timeliness, and runtime. The framework is evaluated using mixed-integer programming, heuristics, meta-heuristics, and deep reinforcement learning across both agile and non-agile settings. Results show that EOS-Bench effectively distinguishes solver performance across scales and conditions, revealing trade-offs between solution quality and computational efficiency, and providing deeper insight into scenario complexity. EOS-Bench offers a unified and extensible open testbed for advancing research in Earth observation satellite scheduling. The code and data are available at https://github.com/Ethan19YQ/EOS-Bench.
Sensitivity-Based Tube NMPC for Cooperative Aerial Structures Under Parametric Uncertainty
This paper presents a sensitivity-based tube Nonlinear Model Predictive Control (NMPC) framework for cooperative aerial chains under bounded parametric uncertainty. We consider a planar two-vehicle chain connected by rigid links, modeled with input-rate actuation to enforce slew-rate and magnitude limits on thrust and torque. Robustness to uncertainty in link mass, length, and inertia is achieved by propagating first-order parametric state sensitivities along the horizon and using them to compute online constraint-tightening margins. We robustify an inter-link separation constraint, implemented via a smooth cosine embedding, and thrust-magnitude bounds. The method is implemented in MATLAB and evaluated with boundary-hugging maneuvers and Monte-Carlo uncertainty sampling. Results show improved constraint margins under uncertainty with tracking performance comparable to nominal NMPC.
comment: Accepted to the 2026 International Conference on Unmanned Aircraft Systems, ICUAS 2026
Threat-Oriented Digital Twinning for Security Evaluation of Autonomous Platforms DSN
Open, unclassified research on secure autonomy is constrained by limited access to operational platforms, contested communications infrastructure, and representative adversarial test conditions. This paper presents a threat-oriented digital twinning methodology for cybersecurity evaluation of learning-enabled autonomous platforms. The approach is instantiated as an open-source, modular twin of a representative autonomy stack with separated sensing, autonomy, and supervisory-control functions; confidence-gated multi-modal perception; explicit command and telemetry trust boundaries; and runtime hold-safe behavior. The contribution is methodological: a reproducible design pattern that translates threat analysis into observable, controllable tests for spoofing, replay, malformed-input injection, degraded sensing, and adversarial ML stress. Although the implemented proxy is ground based, the architecture is intentionally framed around stack elements shared with UAV and space systems, including constrained onboard compute, intermittent or high-latency links, probabilistic perception, and mission-critical recovery behavior. The result is an implementable research scaffold for dependable and secure autonomy studies across UAV and space domains.
comment: Camera ready accepted for presentation at and publication in the proceedings of 2026 56st Annual IEEE/IFIP International Conference on Dependable Systems and Networks Workshops (DSN-W): Dependable and Secure Autonomous Systems (DSAS)
Reference-Augmented Learning for Precise Tracking Policy of Tendon-Driven Continuum Robots
Tendon-Driven Continuum Robots (TDCRs) pose significant control challenges due to their highly nonlinear, path-dependent dynamics and non-Markovian characteristics. Traditional Jacobian-based controllers often struggle with hysteresis-induced oscillations, while conventional learning-based approaches suffer from poor generalization to out-of-distribution trajectories. This paper proposes a reference-augmented offline learning framework for precise 6-DOF tracking control of TDCRs. By leveraging a differentiable RNN-based dynamics surrogate as a gradient bridge, we optimize a control policy through an augmented reference distribution. This multi-scale augmentation scheme incorporates stochastic bias, harmonic perturbations, and random walks, forcing the policy to internalize diverse tracking error recovery mechanisms without additional hardware interaction. Experimental results on a three-section TDCR platform demonstrate that the proposed policy achieves a 50.9\% reduction in average position error compared to non-augmented baselines and significantly outperforms Jacobian-based methods in both precision and stability across various speeds.
Learning-Based Dynamics Modeling and Robust Control for Tendon-Driven Continuum Robots
Tendon-Driven Continuum Robots (TDCRs) pose significant modeling and control challenges due to complex nonlinearities, such as frictional hysteresis and transmission compliance. This paper proposes a differentiable learning framework that integrates high-fidelity dynamics modeling with robust neural control. We develop a GRU-based dynamics model featuring bidirectional multi-channel connectivity and residual prediction to effectively suppress compounding errors during long-horizon auto-regressive prediction. By treating this model as a gradient bridge, an end-to-end neural control policy is optimized through backpropagation, allowing it to implicitly internalize compensation for intricate nonlinearities. Experimental validation on a physical three-section TDCR demonstrates that our framework achieves accurate tracking and superior robustness against unseen payloads, outperforming Jacobian-based methods by eliminating self-excited oscillations.
GEGLU-Transformer for IMU-to-EMG Estimation with Few-Shot Adaptation
Reliable estimation of neuromuscular activation is a key enabler for adaptive and personalized control in wearable robotics. However, surface electromyography (EMG) remains difficult to deploy robustly outside laboratory settings due to electrode sensitivity, signal non-stationarity, and strong subject dependence. In this work, we propose an adaptive IMU-to-EMG learning framework that reconstructs continuous muscle activation envelopes from wearable inertial measurements across heterogeneous movement conditions. The approach combines a Transformer encoder with Gaussian Error Gated Linear Units (GEGLU-Transformer) to enhance cross-subject generalization and enable rapid subject-specific personalization. Under a strict leave-one-subject-out (LOSO) protocol on a multi-condition lower-limb biomechanics dataset, the proposed architecture achieves r = 0.706 +/- 0.139 and R^2 = 0.474 +/- 0.208 without subject-specific adaptation. With only 0.5% adaptation data, performance increases to r = 0.761 +/- 0.030 and R^2 = 0.559 +/- 0.047, demonstrating rapid adaptation and early performance saturation. These results support attention-based architectures combined with lightweight adaptation as a practical and scalable alternative to direct EMG sensing for real-world wearable robotic applications.
SlicerRoboTMS: An Open-Source 3D Slicer Extension for Robot-Assisted Transcranial Magnetic Stimulation
Robot-assisted Transcranial Magnetic Stimulation (Robo-TMS) is an image-guided robotic intervention that enhances the accuracy and reproducibility of conventional Transcranial Magnetic Stimulation (TMS), a widely used non-invasive brain stimulation procedure in clinical treatment and neuroscience research. Despite its potential, the development of Robo-TMS remains challenging due to the need for multidisciplinary expertise spanning medical imaging, computer vision, and robotics. This paper presents SlicerRoboTMS, an open-source 3D Slicer extension that provides a unified interaction infrastructure for Robo-TMS research. By leveraging 3D Slicer's medical image computing and visualisation capabilities, the extension supports Magnetic Resonance Imaging (MRI)-based neuronavigation and interfaces with robotic systems through standardised communication protocols and configurable system descriptions. An example integration is presented to demonstrate how SlicerRoboTMS can be incorporated into a representative Robo-TMS workflow. Designed to support diverse hardware configurations and rapid prototyping, SlicerRoboTMS lowers the barrier to entry and facilitates reproducible and extensible research in Robo-TMS. The extension is available at https://github.com/OpenRoboTMS/SlicerRoboTMS.
comment: Accepted by the 48th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC) 2026
SAMe: A Semantic Anatomy Mapping Engine for Robotic Ultrasound
Robotic ultrasound has advanced local image-driven control, contact regulation, and view optimization, yet current systems lack the anatomical understanding needed to determine what to scan, where to begin, and how to adapt to individual patient anatomy. These gaps make systems still reliant on expert intervention to initiate scanning. Here we present SAMe, a semantic anatomy mapping engine that provides robotic ultrasound with an explicit anatomical prior layer. SAMe addresses scan initiation as a target-to-anatomy-to-action process: it grounds under-specified clinical complaints into structured target organs, instantiates a patient-specific anatomical representation for the grounded targets from a single external body image, and translates this representation into control-facing 6-DoF probe initialization states without any additional registration using preoperative CT or MRI. The anatomical representation maintained by SAMe is explicit, lightweight (single-organ inference in 0.08s), and compatible with downstream control by design. Across semantic grounding, anatomical instantiation, and real-robot evaluation, SAMe shows strong performance across the full initialization pipeline. In real-robot experiments, SAMe achieved overall organ-hit rates of 97.3% for liver initialization and 81.7% for kidney initialization across the evaluated target sets. Even when restricted to the centroid target, SAMe outperformed the surface-heuristic baseline for both liver and kidney initialization. These results establish an explicit anatomical prior layer that addresses scan initialization and is designed to support broader downstream autonomous scanning pipelines, providing the anatomical foundation for complaint-driven, anatomically informed robotic ultrasonography.
comment: Supplementary information included. Code will be released at https://github.com/MiliLab/Echo-SAMe
Improving Sensing Coverage and Compliance of 3D-Printed Artificial Skins Through Multi-Modal Sensing and Soft Materials ICRA
3D-printed artificial skins are a scalable approach to whole-body tactile and proximity coverage, but prior implementations have been limited to unimodal sensing and rigid materials. To improve the practical usability of 3D-printed artificial skins, we present a hybrid time-of-flight (ToF) and self-capacitance (SC) sensing skin that demonstrates multi-modal sensing integration, soft compliant coverings for impact absorption and pressure sensing, and a streamlined electrical interface between printed conductive traces and external electronics. We show that combining ToF and SC modalities enables contact detection, scene reconstruction, and pressure-correlated tactile responses with the compliant covering by deploying six artificial skin units with 40 sensing elements over an FR3 robot arm.
comment: This work was accepted at the "Towards Large-Area Tactile Sensing Skins: From Scalable Materials to Embodied Robotic Perception" workshop at the International Conference on Robotics and Automation (ICRA) 2026
Egocentric Tactile and Proximity Sensors as Observation Priors for Humanoid Collision Avoidance ICRA
Collision-free motion is often aided by tactile and proximity sensors distributed on the body of the robot due to their resistance to occlusion as opposed to external cameras. However, how to shape the sensor's properties, such as sensing coverage; type; and range, to enable avoidant behavior remains unclear. In this work, we present a reinforcement learning framework for whole-body collision avoidance on a humanoid H1-2 robot and use it to characterize how sensor properties shape learned avoidance behavior. Using dodgeball as a benchmark task, we ablate the properties of sensors distributed across the upper body of the robot and find that raw proximity measurements can substitute for explicit object localization provided the sensing range is sufficient and that sparse non-directional proximity signals outpace dense directional alternatives in sample efficiency.
comment: This work was accepted at the 8th RoboTac Workshop at the International Conference on Robotics and Automation (ICRA) 2026
Bridging the Indoor-Outdoor Gap: Cross-Technology Ranging for Seamless Robot Navigation
Mobile robots that move between outdoor and indoor environments still struggle with consistent positioning. Satellite-based and terrestrial ranging each work well in their home domains, but combining them at the raw measurement level has received little attention, and the building boundary is precisely where both classes degrade. This paper reports preliminary observations from the HYMN dataset, which time-synchronizes raw measurements from GNSS, Ultra-Wideband (UWB), WiFi Fine Time Measurement (FTM), and Bluetooth Low Energy (BLE) against millimeter-level ground truth in an industrial setting. Per-zone measurement availability and ranging-residual behavior are characterised. The two technology classes turn out to be complementary, and the indoor-outdoor transition is where their weaknesses overlap. The dataset is publicly available.
GS-Playground: A High-Throughput Photorealistic Simulator for Vision-Informed Robot Learning
Embodied AI research is undergoing a shift toward vision-centric perceptual paradigms. While massively parallel simulators have catalyzed breakthroughs in proprioception-based locomotion, their potential remains largely untapped for vision-informed tasks due to the prohibitive computational overhead of large-scale photorealistic rendering. Furthermore, the creation of simulation-ready 3D assets heavily relies on labor-intensive manual modeling, while the significant sim-to-real physical gap hinders the transfer of contact-rich manipulation policies. To address these bottlenecks, we propose GS-Playground, a multi-modal simulation framework designed to accelerate end-to-end perceptual learning. We develop a novel high-performance parallel physics engine, specifically designed to integrate with a batch 3D Gaussian Splatting (3DGS) rendering pipeline to ensure high-fidelity synchronization. Our system achieves a breakthrough throughput of 10^4 FPS at 640x480 resolution, significantly lowering the barrier for large-scale visual RL. Additionally, we introduce an automated Real2Sim workflow that reconstructs photorealistic, physically consistent, and memory-efficient environments, streamlining the generation of complex simulation-ready scenes. Extensive experiments on locomotion, navigation, and manipulation demonstrate that GS-Playground effectively bridges the perceptual and physical gaps across diverse embodied tasks. Project homepage: https://gsplayground.github.io.
comment: Robotics: Science and Systems 2026
Leveraging Previous-Traversal Point Cloud Map Priors for Camera-Based 3D Object Detection and Tracking
Camera-based 3D object detection and tracking are central to autonomous driving, yet precise 3D object localization remains fundamentally constrained by depth ambiguity when no expensive, depth-rich online LiDAR is available at inference. In many deployments, however, vehicles repeatedly traverse the same environments, making static point cloud maps from prior traversals a practical source of geometric priors. We propose DualViewMapDet, a camera-only inference framework that retrieves such map priors online and leverages them to mitigate the absence of a LiDAR sensor during deployment. The key idea is a dual-space camera-map fusion strategy that avoids one-sided view conversion. Specifically, we (i) project the map into perspective view (PV) and encode multi-channel geometric cues to enrich image features and support BEV lifting, and (ii) encode the map directly in bird's-eye view (BEV) with a sparse voxel backbone and fuse it with lifted camera features in a shared metric space. Extensive evaluations on nuScenes and Argoverse 2 demonstrate consistent improvements over strong camera-only baselines, with particularly strong gains in object localization. Ablations further validate the contributions of PV/BEV fusion and prior-map coverage. We make the code and pre-trained models available at https://dualviewmapdet.cs.uni-freiburg.de .
Robust Graph Matching through Semantic Relationship Generation for SLAM
Graph-based representations such as Scene Graphs enable localization in structured indoor environments by matching a locally observed graph, constructed from sensor data, to a prior map. This process is particularly challenging in environments with repetitive or symmetric layouts, where structural cues alone are often insufficient to resolve ambiguities. We propose a semantic-enhanced graph matching approach that explicitly models relations between detected objects and structural elements, such as rooms and wall planes. Objects are detected from RGB-D data and integrated into the graph, and their relations to structural elements are exploited to filter candidate correspondences prior to geometric verification, significantly reducing ambiguity and search complexity. The proposed method is integrated within the iS-Graphs framework and evaluated in synthetic and simulated environments. Results show that semantic relations significantly reduce the number of candidate matches, improve computational efficiency, and enable faster convergence, particularly in symmetric scenarios where purely geometric approaches fail.
comment: 7 pages, 5 figures
COMPASS: COmpact Multi-channel Prior-map And Scene Signature for Floor-Plan-Based Visual Localization
Architectural floor plans are widely available priors which contain not only geometry but also the semantic information of the environment, yet existing localization methods largely ignore this semantic information. To address this, we present COMPASS, an algorithm that exploits both geometric and semantic priors from floor plans to estimate the pose of a robot equipped with dual fisheye cameras. Inspired by scan context descriptor from LiDAR-based place recognition, we design a multi-channel radial descriptor that encodes the geometric layout surrounding a position. From the floor plan, rays are cast in 360 azimuth bins and the results are encoded into five channels: normalized range, structural hit type (wall, window, or opening), range gradient, inverse range, and local range variance. From the image side, the same descriptor structure is populated by detecting structural elements in the fisheye imagery. As a first step toward full cross-modal matching, we present a window detection algorithm for fisheye images that uses a line segment detector to identify window frames via vertical edge clustering and brightness verification. Detected windows are projected to azimuthal bearings through the fisheye camera model, producing the hit-type channel of the visual descriptor. As a proof of concept, we generate both descriptors at a single known pose from the Hilti-Trimble SLAM Challenge 2026 dataset and demonstrate that the wall-window pattern extracted from the first frame of each camera closely matches the floor plan descriptor, validating the feasibility of cross-modal structural matching.
ASAP: An Azimuth-Priority Strip-Based Search Approach to Planar Microphone Array DOA Estimation in 3D
Direction-of-arrival (DOA) estimation is an important task in microphone array processing and many downstream applications. The steered response power with phase transform (SRP-PHAT) method has been widely adopted for DOA estimation in recent years. However, accurate SRP-PHAT estimation in 3D scenarios requires evaluating steering responses over thousands of candidate directions, severely limiting real-time performance on resource-constrained platforms. This challenge becomes even more critical for planar arrays, which are widely used in robotics due to their structural simplicity. Motivated by the fact that azimuth estimation is usually more reliable than elevation estimation for most arrays, we propose ASAP, an azimuth-priority strip-based search approach to planar microphone array DOA estimation in 3D. In the first stage, ASAP performs coarse-to-fine region contraction within azimuthal strips to lock azimuth angles while retaining multiple maxima through spherical caps. In the second stage, it refines elevation along the great-circle arc between two close candidates. Extensive simulations and real-world experiments validate the efficiency and merits of the proposed method over existing approaches.
comment: This paper has been accepted to the Fourteenth IEEE Sensor Array and Multichannel Signal Processing Workshop, 2026
ProDrive: Proactive Planning for Autonomous Driving via Ego-Environment Co-Evolution CVPR 2026
End-to-end autonomous driving planners typically generate trajectories from current observations alone. However, real-world driving is highly dynamic, and such reactive planning cannot anticipate future scene evolution, often leading to myopic decisions and safety-critical failures. We propose ProDrive, a world-model-based proactive planning framework that enables ego-environment co-evolution for autonomous driving. ProDrive jointly trains a query-centric trajectory planner and a bird's-eye-view (BEV) world model end-to-end: the planner generates diverse candidate trajectories and planning-aware ego tokens, while the world model predicts future scene evolution conditioned on them. By injecting planner features into the world model and evaluating all candidates in parallel, ProDrive preserves end-to-end gradient flow and allows future outcome assessment to directly shape planning. This bidirectional coupling enables proactive planning beyond current-observation-driven decision-making. Experiments on NAVSIM v1 show that ProDrive outperforms strong baselines in both safety and planning efficiency, while ablations validate the effectiveness of the proposed ego-environment coupling design.
comment: Accepted to CVPR 2026 GigaBrain Challenge Workshop
ANCHOR: A Physically Grounded Closed-Loop Framework for Robust Home-Service Mobile Manipulation
Recent advances in open-vocabulary mobile manipulation have brought robots into real domestic environments. In such settings, reliable long-horizon execution under open-set object references and frequent disturbances becomes essential. However, many failures persist. These are not caused by semantic misunderstanding but by inconsistencies between symbolic plans and the evolving physical world, manifested as three recurring limitations: (i) existing systems often rely on pre-scanned semantic maps that become inconsistent after scene changes and disturbances; (ii) they select navigation endpoints without considering downstream manipulation feasibility, causing the "arrived but inoperable" problem; and (iii) they handle anomalies through undifferentiated global replanning, which often fails to contain local errors. To address this execution inconsistency, we present ANCHOR, a physically grounded closed-loop framework that aligns symbolic reasoning with verifiable physical state during execution. ANCHOR integrates three mechanisms: (i) physically anchored task planning, which binds symbolic predicates to observable geometric anchors and re-validates them after each action; (ii) operability-aware base alignment, which ensures that navigation endpoints satisfy kinematic reachability and local collision feasibility; and (iii) minimum-responsible-layer hierarchical recovery, which localizes failures across perception, base-arm coordination, and execution layers to prevent cascading retries. Across 60 real-robot trials in previously unseen environments, ANCHOR improves task success from 53.3% to 71.7% and achieves a 71.4% recovery rate under perturbations, demonstrating that explicit physical grounding and structured failure containment are critical for robust mobile manipulation. Our project page is available at https://anchor9178.github.io/ANCHOR/ .
Slot-hopping Enabled Loiter Guidance and Automation for Fixed-wing UAV Corridors
This paper addresses the problem of traffic congestion management in fixed-wing unmanned aerial vehicle (UAV) corridors by further developing a recently introduced loiter-lane framework. A semi-cooperative guidance strategy is developed for inserting fixed-wing UAVs into a loiter lane with minimal disruption to the UAVs already operating within it, while enabling a more compact fixed-wing UAV corridor. Building on the concepts of cooperative and non-disruptive loiter-lane insertion, the proposed strategy makes the incoming UAV first attempt, within its speed bounds, to rendezvous with an existing empty loiter slot. If direct insertion is infeasible, a minimal number of loitering UAVs perform coordinated slot hopping to create a suitably positioned empty slot. The feasibility and performance of the method are demonstrated through numerical simulations.
Optimal UGV-UAV Cooperative Partitioning and Inspection of Shortest Paths
We study cooperative shortest path planning for an unmanned ground vehicle (UGV) assisted by an unmanned aerial vehicle (UAV) in environments with unknown road blockages that are only discovered when a robot reaches the damaged point. This formulation generalizes the original Canadian Traveller Problem (CTP), which assumes a single ground vehicle and that the traversability status of all incident edges is revealed upon arrival at a vertex. We first analyze the case where the start and the goal are connected by $k$ disjoint paths, and prove that the worst-case competitive ratio $ρ$ for a single UGV is $2k-1$. With UAV assistance, and under the simplifying assumption of negligible initial transit and deadheading UAV costs, the ratio improves to $ρ= 2\frac{v_G}{v_A + v_G}k - 1$, where $v_G$ and $v_A$ denote the UGV and UAV speed, respectively. To address general graphs and non-negligible UAV initial transit and deadheading costs, we present an optimal path partitioning strategy that assigns path prefix inspection to the UGV and path suffix inspection to the UAV, and prove the optimality of the UAV inspection strategy on general graphs. We evaluate our algorithm by performing experiments on road networks from the world's 50 most populous cities, with randomized blockages, and show that the proposed method reduces UGV travel times by up to 30%.
comment: Accepted to Robotics: Science and Systems (RSS) 2026
Dynamic UGV-UAV Cooperative Path Planning in Uncertain Environments ICRA
This paper addresses the Dynamic UGV-UAV Cooperative Path Planning (DUCPP) problem involving one unmanned ground vehicle (UGV) assisted by one or more unmanned aerial vehicles (UAVs) operating on an uncertain road network with potentially impassable edges. DUCPP is particularly relevant for scenarios such as disaster response, emergency supply transport, and rescue operations, where a UGV must reach a specified destination in the presence of partially unknown road conditions. To enable the UGV to travel safely and efficiently to its destination, the UAV(s) dynamically inspect edges in the environment to identify and prune damaged or impassable edges from consideration. We present multiple strategies, including a bidirectional approach, to optimize UGV-UAV cooperation for finding a safe path in an uncertain road network. Furthermore, we explore the impact of using multiple UAVs on reducing the UGV's travel time, and evaluate the associated computation time. The proposed strategies are implemented and evaluated on 100 urban road networks. The results demonstrate that the bidirectional strategy achieves the best performance in most instances, and using multiple UAVs further reduces UGV travel time at the expense of increased computation time. This paper presents a robust framework for DUCPP to achieve efficient UGV-UAV cooperation for path planning and inspection, offering practical solutions for navigation in challenging and uncertain conditions.
comment: Accepted to IEEE International Conference on Robotics and Automation (ICRA) 2026
HANDFUL: Sequential Grasp-Conditioned Dexterous Manipulation with Resource Awareness
Dexterous robot hands offer rich opportunities for multifunctional manipulation, where a robot must execute multiple skills in sequence while maintaining control over previously grasped objects. Most prior work in dexterous manipulation focuses on single-object, single-skill tasks. In contrast, our insight is that many sequential tasks require resource-aware grasps that conserve fingers for future actions. In this paper, we study sequential grasp-conditioned dexterous manipulation, where a robot first grasps an object and then performs a second, distinct manipulation subtask while preserving the initial grasp. We introduce HANDFUL, a learning framework that models finger usage as a limited resource and encourages exploration of resource-aware grasps through finger-level contact rewards. These grasps are subsequently selected for downstream tasks via curriculum-based policy learning. We further propose HANDFUL-Bench, a simulation benchmark that introduces sequential dexterous manipulation tasks across multiple secondsubtask objectives, including pushing, pulling, and pressing, under a shared grasp-conditioned setup. Extensive simulation results demonstrate that prioritizing resource-aware grasps improves second-subtask success and robustness compared to a baseline that greedily optimizes the initial grasp before attempting the second subtask. We additionally validate our approach on a real dexterous LEAP hand. Together, this work establishes resource-aware grasp planning as a key principle for multifunctional dexterous manipulation. Supplementary material is available on our website: https://handful-dex.github.io.
Why Domain Matters: A Preliminary Study of Domain Effects in Underwater Object Detection ICRA 2026
Domain shift, where deviations between training and deployment data distributions degrade model performance, is a key challenge in underwater environments. Existing benchmarks testing performance for underwater domain shift simulate variability through synthetic style transfer. This fails to capture intrinsic scene factors such as visibility, illumination, scene composition, or acquisition factors, limiting analysis of real-world effects. We propose a labeling framework that defines underwater domains using measurable image, scene, and acquisition characteristics. Unlike prior benchmarks, it captures physically meaningful factors, enabling semantically consistent image grouping and supporting domain-specific evaluation of detection performance including failure analysis. We validate this on public datasets, showing systematic variations across domain factors and revealing hidden failure modes.
comment: Poster Presentation at ICRA 2026 Workshop S2S
FruitProM-V2: Robust Probabilistic Maturity Estimation and Detection of Fruits and Vegetables
Accurate fruit maturity identification is essential for determining harvest timing, as incorrect assessment directly affects yield and post-harvest quality. Although ripening is a continuous biological process, vision-based maturity estimation is typically formulated as a multi-class classification task, which imposes sharp boundaries between visually similar stages. To examine this limitation, we perform an annotation reliability study with two independent annotators on a held-out tomato dataset and observe disagreement concentrated near adjacent maturity stages. Motivated by this observation, we model maturity as a latent continuous variable and predict it probabilistically using a distributional detection head, converting the distribution into class probabilities through the cumulative distribution function (CDF). The proposed formulation maintains comparable performance to a standard detector under clean labels while better representing uncertainty. Furthermore, when controlled label noise is introduced during training, the probabilistic model demonstrates improved robustness relative to the baseline, indicating that explicitly modeling maturity uncertainty leads to more reliable visual maturity estimation.
FlowS: One-Step Motion Prediction via Local Transport Conditioning
Generative motion prediction must satisfy three simultaneous requirements for real-world autonomy: high accuracy, diverse multimodal futures, and strictly bounded latency. Diffusion models meet the first two but violate the third, requiring tens to hundreds of denoising steps. We identify a conditioning strategy that resolves this tension: \textit{single-step integration is accurate when the underlying transport problem is local}. A model that must both discover the correct behavioral mode and traverse a long displacement in one step accumulates large discretization errors; conditioning the base distribution to lie near plausible futures reduces the problem to short-range refinement, the regime where a single Euler step suffices. We instantiate this \emph{local transport conditioning} in FlowS, a conditional flow matching framework with two mechanisms. First, an online, scene-conditioned learned prior emits $K$ calibrated anchor trajectories per agent, each already near a plausible future, converting mode discovery into local correction. Second, a step-consistent displacement field enforces semigroup self-consistency, guaranteeing that a single step inherits multi-step accuracy. Crucially, anchoring this field at learned priors along straight-line paths yields a {stable, low-variance} training target, unlike prior self-consistency methods that suffer from {high-variance bootstrap} signals on curved diffusion paths. On the Waymo Open Motion Dataset, FlowS achieves state-of-the-art Soft mAP {(0.4804) and mAP (0.4703) with ensemble at 75\,FPS} with single-step inference, demonstrating that local transport conditioning makes one-step generative motion prediction practical for safety-critical autonomy. Code and pretrained models will be released upon acceptance.
comment: 8 pages
Multi-Periodogram Velocity Estimation with Irregular Reference Signals for Robot-Aided ISAC
This paper addresses velocity estimation within robot-aided integrated sensing and communications (ISAC), where mobile robots act as sensing nodes but can only opportunistically reuse irregular 5G/6G reference signals (RSs). We show that the velocity profile induced by such irregular time-domain patterns can be decomposed into a periodic-peak component and an amplitude-shaping (weighting) component. Leveraging this structure, we propose a multi-periodogram velocity estimation algorithm that is standard-compliant and does not require new sensing-dedicated RSs or 3GPP modifications. Simulation results demonstrate that, compared with conventional periodogram processing, the proposed method improves low-SNR robustness by achieving a 3 dB SNR gain at the 10% missed-detection rate and reducing false alarms by 51%.
comment: Accepted by ICC2026
A Scaled Three-Vehicle Platooning Platform
Vehicle platooning has attracted increasing attention as a promising approach to improve traffic efficiency, energy consumption, and roadway safety through coordinated multi-vehicle operation. A key challenge in platooning lies in maintaining stable and accurate path tracking during dynamic maneuvers such as lane changes, where lateral deviations and heading disturbances generated by the lead vehicle may propagate downstream to following vehicles. Robust longitudinal and lateral control systems are therefore essential not only for individual vehicle tracking performance, but also for overall platoon stability. For experimental studies, the Intelligent Mobility and Robotics Lab (IMRL) develops a scaled multi-vehicle platform for autonomous platooning research, with a particular emphasis on cooperative control and human-in-the-loop autonomy. This platform consists of one human-operable lead vehicle and two autonomous followers, enabling controlled and repeatable experiments on leader-follower coordination. Compared with full-scale field testing, this scaled platform offers a safer, lower-cost, and more flexible environment for rapid prototyping, controller validation, and multi-agent autonomy studies, while providing stronger physical realism than purely simulation-based evaluations.
Robot Planning and Situation Handling with Active Perception
Current robots are capable of computing plans to accomplish complex tasks. However, real-world environments are inherently open and dynamic, and unforeseen situations frequently arise during plan execution, such as jamming doors and fallen objects on the floor. These situations may result from the robot's own action failures or from external disturbances, such as human activities. Detecting and handling such execution - time situations remains a significant challenge, limiting those robots' ability to achieve long-term autonomy. In this paper, we develop a planning and situation-handling framework, called VAP-TAMP, that enables robots to actively perceive and address unforeseen situations during plan execution. VAP-TAMP leverages action knowledge to strategically prompt vision-language models for active view selection and situation assessment, while constructing and reasoning over scene graphs for integrated task and motion planning. We evaluated VAP-TAMP using service tasks in simulation and on a mobile manipulation platform.
Instantaneous Planning, Control and Safety for Navigation in Unknown Underwater Spaces
Navigating autonomous underwater vehicles (AUVs) in unknown environments is significantly challenging due to poor visibility, weak signal transmission, and dynamic water currents. These factors pose challenges in accurate global localization, reliable communication, and obstacle avoidance. Local sensing provides critical real time environmental data to enable online decision making. However, the inherent noise in underwater sensor measurements introduces uncertainty, complicating planning and control. To address these challenges, we propose an integrated planning and control framework that leverages real time sensor data to dynamically induce closed loop AUV trajectories, ensuring robust obstacle avoidance and enhanced maneuverability in tight spaces. By planning motion based on pre designed feedback controllers, the approach reduces the computational complexity needed for carrying out online optimizations and enhances operational safety in complex underwater spaces. The proposed method is validated through ROS Gazebo simulations on the RexRov AUV, demonstrating its efficacy. Its performance is evaluated by comparison against PID based tracking methods, and quantifying localization errors in dead reckoning as the AUV transitions into the target communication range.
comment: Uploaded by mistake. A different version of the study is under process
Limited Linguistic Diversity in Embodied AI Datasets ACL 2026
Language plays a critical role in Vision-Language-Action (VLA) models, yet the linguistic characteristics of the datasets used to train and evaluate these systems remain poorly documented. In this work, we present a systematic dataset audit of several widely used VLA corpora, aiming to characterize what kinds of instructions these datasets actually contain and how much linguistic variety they provide. We quantify instruction language along complementary dimensions--including lexical variety, duplication and overlap, semantic similarity, and syntactic complexity. Our analysis shows that many datasets rely on highly repetitive, template-like commands with limited structural variation, yielding a narrow distribution of instruction forms. We position these findings as descriptive documentation of the language signal available in current VLA training and evaluation data, intended to support more detailed dataset reporting, more principled dataset selection, and targeted curation or augmentation strategies that broaden language coverage.
comment: Accepted to ACL 2026 (Main Conference)
Reinforcement Learning for Testing Interdependent Requirements in Autonomous Vehicles: An Empirical Study
Autonomous vehicles (AVs) make driving decisions without humans, making dependability assurance critical. Scenario-based testing is widely used to evaluate AVs under diverse conditions, with reinforcement learning (RL) generating test scenarios that identify violations of functional and safety requirements. Many requirements are interdependent and involve trade-offs, making it unclear whether single-objective RL (SORL), which combines objectives into a single reward, can reliably reveal violations or whether multi-objective RL (MORL), which explicitly considers multiple objectives, is necessary. We present an empirical evaluation comparing SORL and MORL for generating critical scenarios that simultaneously test interdependent requirements using an end-to-end AV controller and high-fidelity simulator. Results suggest that MORL and SORL differ mainly in how violations occur, while showing comparable effectiveness in many cases. MORL tends to generate more requirement-violation scenarios, whereas SORL produces higher-severity violations. Their relative performance also depends on specific objective combinations and, to a lesser extent, road conditions. Regarding diversity, MORL consistently covers a broader range of scenarios. Thus, MORL is preferable when scenario diversity and coverage are prioritized, whereas SORL may better expose severe violations. Our empirical evaluation addresses a gap by systematically comparing SORL and MORL, highlighting the importance of requirement dependencies in RL-based AV testing.
MiMo-Embodied: X-Embodied Foundation Model Technical Report
We open-source MiMo-Embodied, the first cross-embodied foundation model to successfully integrate and achieve state-of-the-art performance in both Autonomous Driving and Embodied AI. MiMo-Embodied sets new records across 17 embodied AI benchmarks in Task Planning, Affordance Prediction and Spatial Understanding, while also excelling in 12 autonomous driving benchmarks across Environmental Perception, Status Prediction, and Driving Planning. Across these tasks, MiMo-Embodied significantly outperforms existing open-source, closed-source, and specialized baselines. Our results indicate that through multi-stage learning, curated data construction, and CoT/RL fine-tuning, these two domains exhibit strong positive transfer and mutually reinforce one another. We provide a detailed analysis of our model design and training methodologies to facilitate further research. Code and models are available at https://github.com/XiaomiMiMo/MiMo-Embodied.
comment: Code: https://github.com/XiaomiMiMo/MiMo-Embodied | Model: https://huggingface.co/XiaomiMiMo/MiMo-Embodied-7B
ReSim: Reliable World Simulation for Autonomous Driving NeurIPS 2025
How can we reliably simulate future driving scenarios under a wide range of ego driving behaviors? Recent driving world models, developed exclusively on real-world driving data composed mainly of safe expert trajectories, struggle to follow hazardous or non-expert behaviors, which are rare in such data. This limitation restricts their applicability to tasks such as policy evaluation. In this work, we address this challenge by enriching real-world human demonstrations with diverse non-expert data collected from a driving simulator (e.g., CARLA), and building a controllable world model trained on this heterogeneous corpus. Starting with a video generator featuring a diffusion transformer architecture, we devise several strategies to effectively integrate conditioning signals and improve prediction controllability and fidelity. The resulting model, ReSim, enables Reliable Simulation of diverse open-world driving scenarios under various actions, including hazardous non-expert ones. To close the gap between high-fidelity simulation and applications that require reward signals to judge different actions, we introduce a Video2Reward module that estimates a reward from ReSim's simulated future. Our ReSim paradigm achieves up to 44% higher visual fidelity, improves controllability for both expert and non-expert actions by over 50%, and boosts planning and policy selection performance on NAVSIM by 2% and 25%, respectively.
comment: NeurIPS 2025 Spotlight. Project page: https://opendrivelab.com/ReSim
SODA-CitrON: Static Object Data Association by Clustering Multi-Modal Sensor Detections Online
The online fusion and tracking of static objects from heterogeneous sensor detections is a fundamental problem in robotics, autonomous systems, and environmental mapping. Although classical data association approaches such as JPDA are well suited for dynamic targets, they are less effective for static objects observed intermittently and with heterogeneous uncertainties, where motion models provide minimal discriminative power with respect to clutter. In this paper, we propose a novel method for static object data association by clustering multi-modal sensor detections online (SODA-CitrON), while simultaneously estimating positions and maintaining persistent tracks for an unknown number of objects. The proposed unsupervised machine learning approach operates in a fully online manner and handles temporally uncorrelated and multi-sensor measurements. Additionally, it has a worst-case loglinear complexity in the number of sensor detections while providing full output explainability. We evaluate the proposed approach in different Monte Carlo simulation scenarios and compare it against state-of-the-art methods, including POM-based filtering, DBSTREAM clustering, and JPDA. The results demonstrate that SODA-CitrON consistently outperforms the compared methods in terms of F1 score, position RMSE, MOTP, and MOTA in the static object mapping scenarios studied.
comment: 8 pages, 5 figures; \c{opyright} 2026 IEEE. Accepted for the 2026 International Conference on Information Fusion (FUSION 2026)
RISE: Self-Improving Robot Policy with Compositional World Model
Despite the sustained scaling on model capacity and data acquisition, Vision-Language-Action (VLA) models remain brittle in contact-rich and dynamic manipulation tasks, where minor execution deviations can compound into failures. While reinforcement learning (RL) offers a principled path to robustness, on-policy RL in the physical world is constrained by safety risk, hardware cost, and environment reset. To bridge this gap, we present RISE, a scalable framework of robotic reinforcement learning via imagination. At its core is a Compositional World Model that (i) predicts multi-view future via a controllable dynamics model, and (ii) evaluates imagined outcomes with a progress value model, producing informative advantages for the policy improvement. Such compositional design allows state and value to be tailored by best-suited yet distinct architectures and objectives. These components are integrated into a closed-loop self-improving pipeline that continuously generates imaginary rollouts, estimates advantages, and updates the policy in imaginary space without costly physical interaction. Across three challenging real-world tasks, RISE yields significant improvement over prior art, with more than +35% absolute performance increase in dynamic brick sorting, +45% for backpack packing, and +35% for box closing, respectively.
comment: RSS 2026. Project page: https://opendrivelab.com/RISE/
BEVal: A Cross-dataset Evaluation Study of BEV Segmentation Models for Autonomous Driving
Current research in semantic bird's-eye view segmentation for autonomous driving focuses solely on optimizing neural network models using a single dataset, typically nuScenes. This practice leads to the development of highly specialized models that may fail when faced with different environments or sensor setups, a problem known as domain shift. In this paper, we conduct a comprehensive cross-dataset evaluation of state-of-the-art BEV segmentation models to assess their performance across different training and testing datasets and setups, as well as different semantic categories. We investigate the influence of different sensors, such as cameras and LiDAR, on the models' ability to generalize to diverse conditions and scenarios. Additionally, we conduct multi-dataset training experiments that improve models' BEV segmentation performance compared to single-dataset training. Our work addresses the gap in evaluating BEV segmentation models under cross-dataset validation. And our findings underscore the importance of enhancing model generalizability and adaptability to ensure more robust and reliable BEV segmentation approaches for autonomous driving applications. The code for this paper available at https://github.com/manueldiaz96/beval .
Agent-Centric Visual Reinforcement Learning under Dynamic Perturbations
Visual reinforcement learning aims to empower an agent to learn policies from visual observations, yet it remains vulnerable to dynamic visual perturbations, such as unpredictable shifts in corruption types. To systematically study this, we introduce the Visual Degraded Control Suite (VDCS), a benchmark extending DeepMind Control Suite with Markov-switching degradations to simulate non-stationary real-world perturbations. Experiments on VDCS reveal severe performance degradation in existing methods. We theoretically prove via information-theoretic analysis that this failure stems from reconstruction-based objectives inevitably entangling perturbation artifacts into latent representations. To mitigate this negative impact, we propose Agent-Centric Observations with Mixture-of-Experts (ACO-MoE) to robustify visual RL against perturbations. The proposed framework leverages unique agent-centric restoration experts, achieving restoration from corruptions and task-relevant foreground extraction, thereby decoupling perception from perturbation before being processed by the RL agent. Extensive experiments on VDCS show our ACO-MoE outperforms strong baselines, recovering 95.3% of clean performance under challenging Markov-switching corruptions. Moreover, it achieves SOTA results on DMControl Generalization with random-color and video-background perturbations, demonstrating a high level of robustness.
Tendon-Actuated Robots with a Tapered, Flexible Polymer Backbone: Design, Fabrication, and Modeling
This paper presents the design, modeling, and fabrication of 3D-printed, tendon-actuated continuum robots featuring a flexible, tapered backbone constructed from thermoplastic polyurethane (TPU). Our scalable design incorporates an integrated electronics base housing that enables direct tendon tension control and sensing via actuators and compression load cells. Unlike many continuum robots that are single-purpose and costly, the proposed design prioritizes customizability, rapid assembly, and low cost while enabling high curvature and enhanced distal compliance through geometric tapering, thereby supporting a broad range of compliant robotic inspection and manipulation tasks. We develop a generalized forward kinetostatic model of the tapered backbone based on Cosserat rod theory using a Newtonian approach, extending existing tendon-actuated Cosserat rod formulations to explicitly account for spatially varying backbone cross-sectional geometry. The model captures the graded stiffness profile induced by the tapering and enables systematic exploration of the configuration space as a function of the geometric design parameters. Specifically, we analyze how the backbone taper angle influences the robot's configuration space and manipulability. The model is validated against motion capture data, achieving centimeter-level shape prediction accuracy after calibrating Young's modulus via a line search that minimizes modeling error. We further demonstrate teleoperated grasping using an endoscopic gripper routed along the continuum robot, mounted on a 6-DoF robotic arm. Parameterized iLogic/CAD scripts are provided for rapid geometry generation and scaling. The presented framework establishes a simple, rapid, and reproducible pathway from parametric design to controlled tendon actuation for tapered, tendon-driven continuum robots manufactured using fused deposition modeling 3D printers.
InCoM: Intent-Driven Perception and Structured Coordination for Mobile Manipulation
Mobile manipulation is a fundamental capability for general-purpose robotic agents, requiring both coordinated control of the mobile base and manipulator and robust perception under dynamically changing viewpoints. However, existing approaches face two key challenges: strong coupling between base and arm actions complicates control optimization, and perceptual attention is often poorly allocated as viewpoints shift during mobile manipulation. We propose InCoM, an intent-driven perception and structured coordination framework for mobile manipulation. InCoM infers latent motion intent to dynamically reweight multi-scale perceptual features, enabling stage-adaptive allocation of perceptual attention. To support robust cross-modal perception, InCoM further incorporates a geometric-semantic structured alignment mechanism that enhances multimodal correspondence. On the control side, we design a decoupled coordinated flow matching action decoder that explicitly models coordinated base-arm action generation, alleviating optimization difficulties caused by control coupling. Experimental results demonstrate that InCoM significantly outperforms state-of-the-art methods, achieving success rate gains of 28.2%, 26.1%, and 23.6% across three ManiSkill-HAB scenarios without privileged information. Furthermore, its effectiveness is consistently validated in real-world mobile manipulation tasks, where InCoM maintains a superior success rate over existing baselines.
Metric, inertially aligned monocular state estimation via kinetodynamic priors
Accurate state estimation for flexible robotic systems poses significant challenges, particularly for platforms with dynamically deforming structures that invalidate rigid-body assumptions. This paper addresses this problem and enables the extension of existing rigid-body pose estimation methods to non-rigid systems. Our approach integrates two core components: first, we capture elastic properties using a deformation-force model, efficiently learned via a Multi-Layer Perceptron; second, we resolve the platform's inherently smooth motion using continuous-time B-spline kinematic models. By continuously applying Newton's Second Law, our method formulates the relationship between visually-derived trajectory acceleration and predicted deformation-induced acceleration. We demonstrate that our approach not only enables robust and accurate pose estimation on non-rigid platforms, but also shows that the properly modeled platform physics allow for the recovery of inertial sensing properties. We validate this feasibility on a simple spring-camera system, showing how it robustly resolves the typically ill-posed problem of metric scale and gravity recovery in monocular visual odometry.
Variational approach to nonholonomic and inequality-constrained mechanics
Variational principles play a central role in classical mechanics, providing compact formulations of dynamics and direct access to conserved quantities. While holonomic systems admit well-known action formulations, non-holonomic systems -- subject to non-integrable velocity constraints or position inequality constraints -- have long resisted a general extremized action treatment. In this work, we construct an explicit and general action for non-holonomic motion, motivated by the classical limit of the quantum Schwinger-Keldysh action formalism, rediscovered by Galley. Our formulation recovers the correct dynamics of the Lagrange-d'Alembert equations via extremization of a scalar action. We validate the approach on canonical examples using direct numerical optimization of the novel action, bypassing equations of motion. Our framework extends the reach of variational mechanics and offers new analytical and computational tools for constrained systems.
comment: 11 pages, 4 figures
InternScenes: A Large-scale Simulatable Indoor Scene Dataset with Realistic Layouts NeurIPS 2025
The advancement of Embodied AI heavily relies on large-scale, simulatable 3D scene datasets characterized by scene diversity and realistic layouts. However, existing datasets typically suffer from limitations in data scale or diversity, sanitized layouts lacking small items, and severe object collisions. To address these shortcomings, we introduce \textbf{InternScenes}, a novel large-scale simulatable indoor scene dataset comprising approximately 40,000 diverse scenes by integrating three disparate scene sources, real-world scans, procedurally generated scenes, and designer-created scenes, including 1.96M 3D objects and covering 15 common scene types and 288 object classes. We particularly preserve massive small items in the scenes, resulting in realistic and complex layouts with an average of 41.5 objects per region. Our comprehensive data processing pipeline ensures simulatability by creating real-to-sim replicas for real-world scans, enhances interactivity by incorporating interactive objects into these scenes, and resolves object collisions by physical simulations. We demonstrate the value of InternScenes with two benchmark applications: scene layout generation and point-goal navigation. Both show the new challenges posed by the complex and realistic layouts. More importantly, InternScenes paves the way for scaling up the model training for both tasks, making the generation and navigation in such complex scenes possible. We commit to open-sourcing the data, models, and benchmarks to benefit the whole community.
comment: Accepted by NeurIPS 2025; Project page: https://marjordcpz.github.io/InternScenes.github.io
From Scene to Object: Text-Guided Dual-Gaze Prediction
Interpretable driver attention prediction is crucial for human-like autonomous driving. However, existing datasets provide only scene-level global gaze rather than fine-grained object-level annotations, inherently failing to support text-grounded cognitive modeling. Consequently, while Vision-Language Models (VLMs) hold great potential for semantic reasoning, this critical data limitations leads to severe text-vision decoupling and visual-bias hallucinations. To break this bottleneck and achieve precise object-level attention prediction, this paper proposes a novel dual-branch gaze prediction framework, establishing a complete paradigm from data construction to model architecture. First, we construct G-W3DA, a object-level driver attention dataset. By integrating a multimodal large language model with the Segment Anything Model 3 (SAM3), we decouple macroscopic heatmaps into object-level masks under rigorous cross-validation, fundamentally eliminating annotation hallucinations. Building upon this high-quality data foundation, we propose the DualGaze-VLM architecture. This architecture extracts the hidden states of semantic queries and dynamically modulates visual features via a Condition-Aware SE-Gate, achieving intent-driven precise spatial anchoring. Extensive experiments on the W3DA benchmark demonstrate that DualGaze-VLM consistently surpasses existing state-of-the-art (SOTA) models in spatial alignment metrics, notably achieving up to a 17.8% improvement in Similarity (SIM) under safety-critical scenarios. Furthermore, a visual Turing test reveals that the attention heatmaps generated by DualGaze-VLM are perceived as authentic by 88.22% of human evaluators, proving its capability to generate rational cognitive priors.
DIAL: Decoupling Intent and Action via Latent World Modeling for End-to-End VLA
The development of Vision-Language-Action (VLA) models has been significantly accelerated by pre-trained Vision-Language Models (VLMs). However, most existing end-to-end VLAs treat the VLM primarily as a multimodal encoder, directly mapping vision-language features to low-level actions. This paradigm underutilizes the VLM's potential in high-level decision making and introduces training instability, frequently degrading its rich semantic representations. To address these limitations, we introduce DIAL, a framework bridging high-level decision making and low-level motor execution through a differentiable latent intent bottleneck. Specifically, a VLM-based System-2 performs latent world modeling by synthesizing latent visual foresight within the VLM's native feature space; this foresight explicitly encodes intent and serves as the structural bottleneck. A lightweight System-1 policy then decodes this predicted intent together with the current observation into precise robot actions via latent inverse dynamics. To ensure optimization stability, we employ a two-stage training paradigm: a decoupled warmup phase where System-2 learns to predict latent futures while System-1 learns motor control under ground-truth future guidance within a unified feature space, followed by seamless end-to-end joint optimization. This enables action-aware gradients to refine the VLM backbone in a controlled manner, preserving pre-trained knowledge. Extensive experiments on the RoboCasa GR1 Tabletop benchmark show that DIAL establishes a new state-of-the-art, achieving superior performance with 10x fewer demonstrations than prior methods. Furthermore, by leveraging heterogeneous human demonstrations, DIAL learns physically grounded manipulation priors and exhibits robust zero-shot generalization to unseen objects and novel configurations during real-world deployment on a humanoid robot.
comment: Project page: https://xpeng-robotics.github.io/dial
Genie Sim PanoRecon: Fast Immersive Scene Generation from Single-View Panorama
We present Genie Sim PanoRecon, a feed-forward Gaussian-splatting pipeline that delivers high-fidelity, low-cost 3D scenes for robotic manipulation simulation. The panorama input is decomposed into six non-overlapping cube-map faces, processed in parallel, and seamlessly reassembled. To guarantee geometric consistency across views, we devise a depth-aware fusion strategy coupled with a training-free depth-injection module that steers the monocular feed-forward network to generate coherent 3D Gaussians. The whole system reconstructs photo-realistic scenes in seconds and has been integrated into Genie Sim - a LLM-driven simulation platform for embodied synthetic data generation and evaluation - to provide scalable backgrounds for manipulation tasks. For code details, please refer to: https://github.com/AgibotTech/genie_sim/tree/main/source/geniesim_world.
Genie Sim 3.0 : A High-Fidelity Comprehensive Simulation Platform for Humanoid Robot
The development of robust and generalizable robot learning models is critically contingent upon the availability of large-scale, diverse training data and reliable evaluation benchmarks. Collecting data in the physical world poses prohibitive costs and scalability challenges, and prevailing simulation benchmarks frequently suffer from fragmentation, narrow scope, or insufficient fidelity to enable effective sim-to-real transfer. To address these challenges, we introduce Genie Sim 3.0, a unified simulation platform for robotic manipulation. We present Genie Sim Generator, a large language model (LLM)-powered tool that constructs high-fidelity scenes from natural language instructions. Its principal strength resides in rapid and multi-dimensional generalization, facilitating the synthesis of diverse environments to support scalable data collection and robust policy evaluation. We introduce the first benchmark that pioneers the application of LLM for automated evaluation. It leverages LLM to mass-generate evaluation scenarios and employs Vision-Language Model (VLM) to establish an automated assessment pipeline. We also release an open-source dataset comprising more than 10,000 hours of synthetic data across over 200 tasks. Through systematic experimentation, we validate the robust zero-shot sim-to-real transfer capability of our open-source dataset, demonstrating that synthetic data can server as an effective substitute for real-world data under controlled conditions for scalable policy training. For code and dataset details, please refer to: https://github.com/AgibotTech/genie_sim.
Variable Elimination in Hybrid Factor Graphs for Discrete-Continuous Inference & Estimation
Many problems in robotics involve both continuous and discrete components, and modeling them together for estimation tasks has been a long standing and difficult problem. Hybrid Factor Graphs give us a mathematical framework to model these types of problems, however existing approaches for solving them are based on approximations. In this work, we propose a new framework for hybrid factor graphs along with a novel variable elimination algorithm to produce a hybrid Bayes network, which can be used for exact Maximum A Posteriori estimation and marginalization over both sets of variables. Our approach first develops a novel hybrid Gaussian factor which can connect to both discrete and continuous variables, and a hybrid conditional which can represent multiple continuous hypotheses conditioned on the discrete variables. Using these representations, we derive the process of hybrid variable elimination under the Conditional Linear Gaussian scheme, giving us exact posteriors as a hybrid Bayes network. To bound the number of discrete hypotheses, we use a tree-structured representation of the factors coupled with a simple pruning and probabilistic assignment scheme, which allows for tractable inference. We demonstrate the applicability of our framework on a large scale SLAM dataset and a real world pose graph optimization problem, both with ambiguous measurements which require discrete choices to be made for the most likely measurements. Our demonstrated results showcase the accuracy, generality, and simplicity of our hybrid factor graph framework.
Hybrid Diffusion for Simultaneous Symbolic and Continuous Planning
Constructing robots to accomplish long-horizon tasks is a long-standing challenge within artificial intelligence. Approaches using generative methods, particularly Diffusion Models, have gained attention due to their ability to model continuous robotic trajectories for planning and control. However, we show that these models struggle with long-horizon tasks that involve complex decision-making and, in general, are prone to confusing different modes of behavior, leading to failure. To remedy this, we propose to augment continuous trajectory generation by simultaneously generating a high-level symbolic plan. We show that this requires a novel mix of discrete variable diffusion and continuous diffusion, which dramatically outperforms the baselines. In addition, we illustrate how this hybrid diffusion process enables flexible trajectory synthesis, allowing us to condition synthesized actions on partial and complete symbolic conditions.
comment: 10 pages, 11 figures. This work has been submitted to the IEEE for possible publication. See https://sigmundhh.com/hybrid_diffusion/ for the project website
Dynamically Extensible and Retractable Robotic Leg Linkages for Multi-task Execution in Search and Rescue Scenarios
Search and rescue (SAR) robots are required to quickly traverse terrain and perform high-force rescue tasks, necessitating both terrain adaptability and controlled high-force output. Few platforms exist today for SAR, and fewer still have the ability to cover both tasks of terrain adaptability and high-force output when performing extraction. While legged robots offer significant ability to traverse uneven terrain, they typically are unable to incorporate mechanisms that provide variable high-force outputs, unlike traditional wheel-based drive trains. This work introduces a novel concept for a dynamically extensible and retractable robot leg. Leveraging a dynamically extensible and retractable five-bar linkage design, it allows for mechanically switching between height-advantaged and force-advantaged configurations via a geometric transformation. A testbed evaluated leg performance across linkage geometries and operating modes, with empirical and analytical analyses conducted on stride length, force output, and stability. The results demonstrate that the morphing leg offers a promising path toward SAR robots that can both navigate terrain quickly and perform rescue tasks effectively.
Multiagent Systems
Pythia: Toward Predictability-Driven Agent-Native LLM Serving
As LLM applications grow more complex, developers are increasingly adopting multi-agent architectures to decompose workflows into specialized, collaborative components, introducing structure that constrains agent behavior and exposes useful semantic predictability. Unlike traditional LLM serving, which operates under highly dynamic and uncertain conditions, this structured topology enables opportunities to reduce runtime uncertainty -- yet existing systems fail to exploit it, treating agentic workloads as generic traffic and incurring significant inefficiencies. Our analysis of production traces from an agent-serving platform and an internal coding assistant reveals key bottlenecks, including low prefix cache hit rates, severe resource contention from long-context requests, and substantial queuing delays due to suboptimal scaling. To address these challenges, we propose Pythia, a multi-agent serving system that captures workflow semantics through a simple interface at the serving layer, unlocking new optimization opportunities and substantially improving throughput and job completion time over state-of-the-art baselines.
Volitional Multiagent Atomic Transactions: Describing People and their Machines
Formal models for concurrent and distributed systems describe machines; the people who operate them are either ignored or treated as external environment. Yet key distributed systems -- notably grassroots platforms -- include people operating their personal machines (smartphones), and their faithful description must include the states of both people and machines and how they jointly effect system behaviour. Here, we propose volitional multiagent atomic transactions -- executed atomically by machines and guarded by their people's volitions -- as a novel mathematical foundation for specifying systems consisting of people operating machines. Each agent's state consists of a volitional state and machine state; a transaction is enabled when the machine precondition holds and the guarding persons are willing. For example, befriending two people is guarded by both; unfriending, by either; voluntary swap of coins and bonds is guarded by both parties, while a payment is guarded by the payer. We develop the mathematical machinery to express safety and liveness of platforms specified in this framework, and provide example specifications of two grassroots platforms: social networks, and coins and bonds. These specifications are then used by AI to derive working implementations. % We employ here a novel and simpler definition of `grassroots' that better captures the informal notion -- multiple instances can form and operate independently, yet may coalesce -- and show that the platforms specified here, as well as those hitherto proven grassroots under the original definition, are grassroots under the new definition.
Should I Replan? Learning to Spot the Right Time in Robust MAPF Execution
During the execution of Multi-Agent Path Finding (MAPF) plans in real-life applications, the MAPF assumption that the fleet's movement is perfectly synchronized does not apply. Since one or more of the agents may become delayed due to internal or external factors, it is often necessary to use a robust execution method to avoid collisions caused by desynchronization. Robust execution methods - such as the Action Dependency Graph (ADG) - synchronize the execution of risky actions, but often at the expense of increased plan execution cost, because it may require some agents to wait for the delayed agents. In such cases, the execution's cost can be reduced while still preserving safety by finding a new plan either by rescheduling (reordering the agents at crossroads) or the more general replanning capable of finding new paths. However, these operations may be costly, and the new plan may not even lead to lower execution cost than the original plan: for example, the two plans may be the exact same. Therefore, we estimate the benefit that can be achieved by single replanning in scenarios with delayed agents given an immediate state of the execution with a fully connected feed-forward neural network. The input to the neural network is a set of newly designed ADG-based features describing the robust execution's state and the impact of potential delays, and the output is an estimated benefit achievable by replanning. We train and test the network on a new labeled dataset containing 12,000 experiments, and we show that our proposed method is capable of reducing the impact of delays by up to 94.6% of the achievable reduction.
comment: 8 pages, 10 figures. Submitted for double-blind review to IEEE
Where Did It Go Wrong? Capability-Oriented Failure Attribution for Vision-and-Language Navigation Agents
Embodied agents in safety-critical applications such as Vision-Language Navigation (VLN) rely on multiple interdependent capabilities (e.g., perception, memory, planning, decision), making failures difficult to localize and attribute. Existing testing methods are largely system-level and provide limited insight into which capability deficiencies cause task failures. We propose a capability-oriented testing approach that enables failure detection and attribution by combining (1) adaptive test case generation via seed selection and mutation, (2) capability oracles for identifying capability-specific errors, and (3) a feedback mechanism that attributes failures to capabilities and guides further test generation. Experiments show that our method discovers more failure cases and more accurately pinpoints capability-level deficiencies than state-of-the-art baselines, providing more interpretable and actionable guidance for improving embodied agents.
Operating-Layer Controls for Onchain Language-Model Agents Under Real Capital
We study reliability in autonomous language-model agents that translate user mandates into validated tool actions under real capital. The setting is DX Terminal Pro, a 21-day deployment in which 3,505 user-funded agents traded real ETH in a bounded onchain market. Users configured vaults through structured controls and natural-language strategies, but only agents could choose normal buy/sell trades. The system produced 7.5M agent invocations, roughly 300K onchain actions, about $20M in volume, more than 5,000 ETH deployed, roughly 70B inference tokens, and 99.9% settlement success for policy-valid submitted transactions. Long-running agents accumulated thousands of sequential decisions, including 6,000+ prompt-state-action cycles for continuously active agents, yielding a large-scale trace from user mandate to rendered prompt, reasoning, validation, portfolio state, and settlement. Reliability did not come from the base model alone; it emerged from the operating layer around the model: prompt compilation, typed controls, policy validation, execution guards, memory design, and trace-level observability. Pre-launch testing exposed failures that text-only benchmarks rarely measure, including fabricated trading rules, fee paralysis, numeric anchoring, cadence trading, and misread tokenomics. Targeted harness changes reduced fabricated sell rules from 57% to 3%, reduced fee-led observations from 32.5% to below 10%, and increased capital deployment from 42.9% to 78.0% in an affected test population. We show that capital-managing agents should be evaluated across the full path from user mandate to prompt, validated action, and settlement.
comment: 18 pages, 6 figures. Public onchain dashboard and supporting documentation linked in paper
I Would If I Could: Reasoning about Dynamics of Actions in Multi-Agent Systems KR 2026
Autonomous agents acting in realistic Multi-Agent Systems (MAS) should be able to adapt during their execution. Standard strategic logics, such as Alternating-time Temporal Logic (ATL), model agents' state- or history-dependent behaviour. However, the dynamic treatment of agents' available actions and their knowledge of required actions is still rarely addressed. In this paper, we introduce ATL with Dynamic Actions (ATL-D), which models the process of granting and revoking actions, and its extension ATEL-D, which captures how such updates affect agents' knowledge. Beyond the conceptual contribution, we provide several technical results: we analyse the expressivity of our logic in relation to ATL, study its relation to normative systems, and provide complexity results for relevant computational problems.
comment: This is an extended version of the paper with the same title that will appear in KR 2026, and which contains a technical appendix with proof details
A Survey of Multi-Agent Deep Reinforcement Learning with Graph Neural Network-Based Communication
In multi-agent reinforcement learning (MARL), the integration of a communication mechanism, allowing agents to better learn to coordinate their actions and converge on their objectives by sharing information. Based on an interaction graph, a subclass of methods employs graph neural networks (GNNs) to learn the communication, enabling agents to improve their internal representations by enriching them with information exchanged. With growing research, we note a lack of explicit structure and framework to distinguish and classify MARL approaches with communication based on GNNs. Thus, this paper surveys recent works in this field. We propose a generalized GNN-based communication process with the goal of making the underlying concepts behind the methods more obvious and accessible.
Characterising Global Platforms: Centralised, Decentralised, Federated, and Grassroots
Global digital platforms are software systems designed to serve entire populations, with some already serving billions of people. We propose atomic transactions-based multiagent transition systems and protocols as a formal framework to study them; introduce essential agents -- minimal sets of agents the removal of which makes communication impossible; and show that the cardinality of essential agents partitions all global platforms into four classes: 1. Centralised -- one (the server) 2. Decentralised -- finite $>1$ (bootstrap nodes) 3. Federated -- infinite but not universal (all servers) 4. Grassroots -- universal (all agents but one) Our illustrative formal example is a global social network, for which we provide centralised, decentralised, federated, and grassroots specifications via multiagent atomic transactions, and prove they all satisfy the same basic correctness properties, yet have different sets of essential agents as expected. We discuss informally additional global platforms -- currencies, ``sharing economy'' apps, AI, and more. While this may be the first formal characterisation of centralised, decentralised, and federated global platforms, grassroots platforms have been defined previously, using two incomparable notions. Here, we prove that both definitions imply that all agents are essential, placing grassroots platforms within the broader formal context of all global platforms. This work provides the first mathematical framework for classifying any global platform -- existing or imagined -- by providing a multiagent atomic-transactions specification of it and determining the cardinality of the minimal set of essential agents in the ensuing multiagent protocol. It thus provides a unifying mathematical approach for the study of global digital platforms, perhaps the most important class of computer systems today.
AOI: Context-Aware Multi-Agent Operations via Dynamic Scheduling and Hierarchical Memory Compression
The proliferation of cloud-native architectures, characterized by microservices and dynamic orchestration, has rendered modern IT infrastructures exceedingly complex and volatile. This complexity generates overwhelming volumes of operational data, leading to critical bottlenecks in conventional systems: inefficient information processing, poor task coordination, and loss of contextual continuity during fault diagnosis and remediation. To address these challenges, we propose AOI (AI-Oriented Operations), a novel multi-agent collaborative framework that integrates three specialized agents with an LLM-based Context Compressor. Its core innovations include: (1) a dynamic task scheduling strategy that adaptively prioritizes operations based on real-time system states, (2) a three-layer memory architecture comprising Working, Episodic, and Semantic layers that optimizes context retention and retrieval. Extensive experiments on synthetic and real-world benchmarks show that AOI achieves 72.4\% context compression while preserving 92.8\% critical information, improves task success to 94.2\%, and reduces MTTR by 34.4\% over the best baseline. This work presents a paradigm shift towards scalable, adaptive, and context-aware autonomous operations, enabling robust management of next-generation IT infrastructures with minimal human intervention.
comment: theory part rewrite.\
Axiomatisation for an asynchronous epistemic logic with sending and receiving messages
We investigate a logic for asynchronous announcements wherein the sending of the messages by the environment is separated from their reception by the individual agents. Both come with different modalities. In the logical semantics, formulas are interpreted in a world of a Kripke model but given a history of prior announcements and receptions that already happened. An axiomatisation AA for such a logic has been given in prior work, for the formulas that are valid when interpreted in the Kripke model before any such announcements have taken place. This axiomatisation is a reduction system wherein one can show that every formula is equivalent to a purely epistemic formula without dynamic modalities for announcements and receptions. We propose a generalisation AA* of this axiomatisation, for the formulas that are valid when interpreted in the Kripke model given any history of prior announcements and receptions of announcements. It does not extend the axiomatisation AA, for example it is no longer valid that nobody has received any message. Unlike AA, this axiomatisation AA* is infinitary and it is not a reduction system.
Emergent Coordination in Multi-Agent Language Models
When are multi-agent LLM systems merely a collection of individual agents versus an integrated collective with higher-order structure? We introduce an information-theoretic framework to test -- in a purely data-driven way -- whether multi-agent systems show signs of higher-order structure. This information decomposition lets us measure whether dynamical emergence is present in multi-agent LLM systems, localize it, and distinguish spurious temporal coupling from performance-relevant cross-agent synergy. We implement a practical criterion and an emergence capacity criterion operationalized as partial information decomposition of time-delayed mutual information (TDMI). We apply our framework to experiments using a simple guessing game without direct agent communication and minimal group-level feedback with three randomized interventions. Groups in the control condition exhibit strong temporal synergy but little coordinated alignment across agents. Assigning a persona to each agent introduces stable identity-linked differentiation. Combining personas with an instruction to ``think about what other agents might do'' shows identity-linked differentiation and goal-directed complementarity across agents. Taken together, our framework establishes that multi-agent LLM systems can be steered with prompt design from mere aggregates to higher-order collectives. Our results are robust across emergence measures and entropy estimators, and not explained by coordination-free baselines or temporal dynamics alone. Without attributing human-like cognition to the agents, the patterns of interaction we observe mirror well-established principles of collective intelligence in human groups: effective performance requires both alignment on shared objectives and complementary contributions across members.
Systems and Control (EESS)
Pythia: Toward Predictability-Driven Agent-Native LLM Serving
As LLM applications grow more complex, developers are increasingly adopting multi-agent architectures to decompose workflows into specialized, collaborative components, introducing structure that constrains agent behavior and exposes useful semantic predictability. Unlike traditional LLM serving, which operates under highly dynamic and uncertain conditions, this structured topology enables opportunities to reduce runtime uncertainty -- yet existing systems fail to exploit it, treating agentic workloads as generic traffic and incurring significant inefficiencies. Our analysis of production traces from an agent-serving platform and an internal coding assistant reveals key bottlenecks, including low prefix cache hit rates, severe resource contention from long-context requests, and substantial queuing delays due to suboptimal scaling. To address these challenges, we propose Pythia, a multi-agent serving system that captures workflow semantics through a simple interface at the serving layer, unlocking new optimization opportunities and substantially improving throughput and job completion time over state-of-the-art baselines.
Variational Neural Belief Parameterizations for Robust Dexterous Grasping under Multimodal Uncertainty
Contact variability, sensing uncertainty, and external disturbances make grasp execution stochastic. Expected-quality objectives ignore tail outcomes and often select grasps that fail under adverse contact realizations. Risk-sensitive POMDPs address this failure mode, but many use particle-filter beliefs that scale poorly, obstruct gradient-based optimization, and estimate Conditional Value-at-Risk (CVaR) with high-variance approximations. We instead formulate grasp acquisition as variational inference over latent contact parameters and object pose, representing the belief with a differentiable Gaussian mixture. We use Gumbel-Softmax component selection and location-scale reparameterization to express samples as smooth functions of the belief parameters, enabling pathwise gradients through a differentiable CVaR surrogate for direct optimization of tail robustness. In simulation, our variational neural belief improves robust grasp success under contact-parameter uncertainty and exogenous force perturbations while reducing planning time by roughly an order of magnitude relative to particle-filter model-predictive control. On a serial-chain robot arm with a multifingered hand, we validate grasp-and-lift success under object-pose uncertainty against a Gaussian baseline. Both methods succeed on the tested perturbations, but our controller terminates in fewer steps and less wall-clock time while achieving a higher tactile grasp-quality proxy. Our learned belief also calibrates risk more accurately, keeping mean absolute calibration error below 0.14 across tested simulation regimes, compared with 0.58 for a Cross-Entropy Method planner.
comment: 11 pages, 10 figures
No Pedestrian Left Behind: Real-Time Detection and Tracking of Vulnerable Road Users for Adaptive Traffic Signal Control
Current pedestrian crossing signals operate on fixed timing without adjustment to pedestrian behavior, which can leave vulnerable road users (VRUs) such as the elderly, disabled, or distracted pedestrians stranded when the light changes. We introduce No Pedestrian Left Behind (NPLB), a real-time adaptive traffic signal system that monitors VRUs in crosswalks and automatically extends signal timing when needed. We evaluated five state-of-the-art object detection models on the BGVP dataset, with YOLOv12 achieving the highest mean Average Precision at 50% (mAP@0.5) of 0.756. NPLB integrates our fine-tuned YOLOv12 with ByteTrack multi-object tracking and an adaptive controller that extends pedestrian phases when remaining time falls below a critical threshold. Through 10,000 Monte Carlo simulations, we demonstrate that NPLB improves VRU safety by 71.4%, reducing stranding rates from 9.10% to 2.60%, while requiring signal extensions in only 12.1% of crossing cycles.
comment: © Anas Gamal Aly and Hala ElAarag, 2026. This is the authors' version of the work. It is posted here for your personal use. Not for redistribution. The definitive Version of Record will be published in Proceedings of the 2026 ACM Southeast Conference (ACMSE 2026)
Backstepping Observer for the Quasilinear Heat Equation with Linear Design Gains: Beyond Local Stability
We consider the one-dimensional quasilinear heat equation with state-dependent heat capacity and thermal conductivity, and design a boundary-output observer based on the backstepping design for a linear heat equation with constant coefficients. Viewing the quasilinear system as a perturbation of the linear one, we establish exponential stability of the origin for the observation error dynamics in $H^1$, with an explicit region of attraction depending on the system parameters, observer gains, and the mismatch between the nonlinear diffusivity and the constant design diffusivity. Importantly, the observation error converges to zero rather than merely to a neighborhood scaling with this mismatch, even though, in contrast to backstepping-based stabilization of nonlinear PDEs, the mismatch need not decay along trajectories and may remain bounded away from zero, acting as a persistent state-dependent multiplicative perturbation. A technical challenge was to perform a sufficiently-fine Lyapunov analysis that does not yield overly conservative results such as mere boundedness of the observation error. Interestingly, while in the linear case the relationship between one of the backstepping observer gains and the convergence rate is monotonic, we show that in the nonlinear setting this is no longer the case: there may exist an optimal value of that gain, beyond which further increases deteriorate the system's performance. Such behavior cannot be predicted without our analysis: one might expect a priori the decay rate to be freely tunable at the expense of a region of attraction that shrinks to zero as the prescribed rate tends to infinity. However, our Lyapunov analysis (supported by numerical experiments) reveals that this intuition is incorrect.
comment: This is a working document of a work in progress
Threat-Oriented Digital Twinning for Security Evaluation of Autonomous Platforms DSN
Open, unclassified research on secure autonomy is constrained by limited access to operational platforms, contested communications infrastructure, and representative adversarial test conditions. This paper presents a threat-oriented digital twinning methodology for cybersecurity evaluation of learning-enabled autonomous platforms. The approach is instantiated as an open-source, modular twin of a representative autonomy stack with separated sensing, autonomy, and supervisory-control functions; confidence-gated multi-modal perception; explicit command and telemetry trust boundaries; and runtime hold-safe behavior. The contribution is methodological: a reproducible design pattern that translates threat analysis into observable, controllable tests for spoofing, replay, malformed-input injection, degraded sensing, and adversarial ML stress. Although the implemented proxy is ground based, the architecture is intentionally framed around stack elements shared with UAV and space systems, including constrained onboard compute, intermittent or high-latency links, probabilistic perception, and mission-critical recovery behavior. The result is an implementable research scaffold for dependable and secure autonomy studies across UAV and space domains.
comment: Camera ready accepted for presentation at and publication in the proceedings of 2026 56st Annual IEEE/IFIP International Conference on Dependable Systems and Networks Workshops (DSN-W): Dependable and Secure Autonomous Systems (DSAS)
Local Shifted Passivity Analysis of the Single-Machine Infinite-Bus System
This letter presents a shifted passivity analysis of the single-machine infinite-bus system in the stationary ($αβ$) reference frame. We study the attractivity of a periodic synchronous steady state with constant rotor frequency and formulate shifted passivity with respect to this motion. A port-Hamiltonian representation of the machine dynamics is used to construct a local shifted passivity condition from the error Hamiltonian and a correction term adapted to the synchronous steady state. For the infinite-bus interconnection, the resulting dissipation inequality leads to a sufficient stability condition expressed in terms of field excitation magnitude, damping, inertia, and steady-state current. This condition implies local asymptotic stability of the synchronous steady state and yields a sublevel-set estimate of its region of attraction under an additional small-inertia condition. A distinctive feature of the analysis is that it preserves the periodic structure of the rotor angle and provides a compact passivity-based stability certificate for the stationary-frame model.
comment: 14 pages
Using Large Language Models for Black-Box Testing of FMU-Based Simulations
We propose a human in the loop approach for black-box testing of Functional Mock-up Units (FMUs) using Large Language Models (LLMs). The goal is to reduce the manual effort in defining test scenarios for dynamic simulation models and to improve the interpretability of results. The approach takes the functional and interface specifications of an FMU as input, and prompts an LLM to generate structured scenario goals in Given-When-Then format that define the initial input conditions of the simulation, a possible change in those conditions, and the expected output behaviour of the system against those changes. The corresponding scenario plans specify input patterns and add assertion oracles that describe expected output patterns defined in scenario goals. The approach generates a complete input time series for the scenario plans, runs the FMU simulation, and evaluates assertions on the recorded outputs. It produces human-readable logs and plots that show statistics for each scenario with overlays, aggregate pass rates, and per-goal outcomes. The generated scenarios and results are stored for evaluation and later re-execution. We evaluate the approach on a Lube Oil Cooling system and discuss design choices that make the approach practical for everyday use. Results suggest that LLM-assisted scenario generation can facilitate automatic test design and verification of dynamic simulation models.
Multi-layer barrier adaptation of the discrete-time super-twisting controller
In digital sliding mode control implementations, discretization-induced chattering and inter-sample blindness can severely degrade the closed-loop performance, especially in case of fast perturbations. This paper addresses these challenges for a discrete-time implementation of the super-twisting sliding mode controller. Building upon recent results on barrier-function-modulated super-twisting algorithms, a nested architecture employing multiple barriers is discretized using an eigenvalue-based exact matching approach. The resulting discrete-time controller preserves the adaptive and robustness properties established in continuous time, while ensuring consistent stability behavior at the sampling level. The proposed framework is validated through numerical simulations. The results highlight the effectiveness of multi-layer barrier adaptation for discrete-time sliding mode control applications.
comment: 6 pages, accepted to 18th International Workshop on Variable Structure Systems
Complex-Vector Power and Cross-Phase Unbalance in Three-Phase Systems
Unbalanced three-phase systems still lack a compact phasor-domain representation of power that makes phase asymmetry explicit while remaining consistent with established apparent-power definitions. This paper addresses that point through a complex-vector power formulation for sinusoidal steady-state operation. The proposed representation supplements the classical dot-product expression of complex power with the cross product of voltage and current phasors, thereby retaining the usual active and reactive terms while making explicit a cross-phase unbalance vector that captures antisymmetric interphase relations. In this way, apparent power is separated into intraphase and cross-phase contributions, and its norm is preserved under the power-invariant Fortescue transformation. The formulation is extended to three-phase four-wire systems by introducing equivalent coordinates that preserve the effective apparent-power norm for the chosen voltage reference. Only standard complex numbers and matrices are required. Numerical examples show operating conditions in which a non-negligible part of the apparent-power structure is associated with cross-phase unbalance and cannot be inferred from active and reactive power alone. The proposed formulation thus provides a compact phasor-based descriptor of unbalance that complements established apparent-power theories by making explicit a component that is not accessible from scalar apparent-power representations.
comment: 8 pages, 1 figure, submitted to IEEE Trans. on Power Delivery
Distributed adaptive estimation for stochastic large regression models
This paper studies the distributed adaptiveestimation problems for stochastic large regression modelswith an infinite number of parameters. By constructing a re-cursive local cost function, we propose a novel distributedrecursive least squares algorithm to estimate the unknownsystem parameters, where the growth rate of regressors'dimension is characterized by a non-decreasing positivefunction. The almost sure convergence of the proposedalgorithm is established under a cooperative excitationcondition, which incorporates the temporal information andthe spatial information to reflect the cooperative effectamong multiple agents. Moreover, we analyze the predic-tion error by establishing the asymptotic upper boundof the accumulated regret without any excitation condi-tions. The main difficulty of theoretical analysis lies in howto analyze properties of the product of non-independentand non-stationary random matrices, whose dimensionschange over time simultaneously. Some techniques, suchas stochastic Lyapunov function, double-array martingaletheory and algebraic graph theory, are employed to dealwith the above issue. Our theoretical results are derivedwithout imposing independence or stationarity assump-tions on the regression vectors, thereby not excluding thecorrelated feedback signals.
comment: 13 pages, submitted to IEEE TAC
A Miniaturized Broadband 1-Bit Coding Reconfigurable Intelligent Surface for NLOS UE Localization and Uplink Communication
In this paper, a broadband 1-bit coding metasurface-based reconfigurable intelligent surface (RIS) is presented. The unit cell of the metasurface consists of a wide dipole modified with interdigital capacitors and loaded with an SMP 1340-040LF PIN diode. The proposed element offers cell miniaturization and a stable angular response. A phase difference of 180$\degree \pm$ 30$\degree$ is achieved for a frequency range of 4.85-6.05 GHz between the ON and OFF states for the normal incidence of the TE polarized wave, whereas it provides a fairly stable response with reflection loss of less than 3 dB and phase difference of 180$\degree$ $\pm$ 50$\degree$ for oblique incidence up to 45$\degree$. The RF is isolated from the DC on the bias lines using properly designed butterfly-shaped radial stubs. Using this unit cell, a prototype with an array of 16 $\times$ 10 elements is constructed. A low-cost microcontroller-based control circuit is designed, which can be plugged-in for biasing the PIN diodes of such array. The theoretically calculated and full-wave simulated radiation patterns of the array are validated using experiments inside anechoic chamber. Furthermore, the capability of the RIS for non-line of sight (NLOS) user equipment (UE) localization and robust uplink communication is demonstrated using LTE communication framework. This shows great potential of our RIS for applications, such as in unmanned aerial vehicle (UAV) localization and its uplink communication at NLOS or extended range.
CUDA Kernel Optimization and Counter-Free Performance Analysis for Depthwise Convolution in Cloud Environments
Efficient GPU execution of convolution operators is governed by memory-access efficiency, on-chip data reuse, and execution mapping rather than arithmetic throughput alone. This paper presents a controlled operator-level study of CUDA kernel optimization for the depthwise convolution used in Structured State Space Model Convolutional Diagonal (S4ConvD), together with a cloud-compatible, counter-free performance analysis methodology. The operator, model, dataset, and training configuration are fixed, and only the CUDA kernel implementation is varied. The evaluated CUDA kernels comprise naive, global-memory-coalesced, shared-memory cache-blocked, and warp-tiled variants, covering forward, input-gradient, and weight-gradient execution paths under steady-state training conditions. Performance is characterized using a counter-free methodology that combines CUDA-event timing, execution-path decomposition, analytically derived memory-traffic modeling, effective-bandwidth estimation, and roofline analysis. This enables profiling-like architectural insights without requiring hardware performance counters or privileged profiling access. The warp-tiled kernel reduces convolution runtime by $3.26\times$ relative to the naive CUDA baseline, while end-to-end training speedup reaches $1.29\times$. A PyTorch implementation is used separately for numerical validation and runtime context, but is not treated as a controlled architectural baseline. Forward and input-gradient paths benefit substantially from improved locality and on-chip data reuse, whereas the reduction-dominated weight-gradient path remains the primary bottleneck. The results demonstrate that meaningful architecture-level GPU kernel analysis can be performed reproducibly in restricted cloud environments, even without access to hardware performance counters.
comment: 12 pages, 9 figures. CUDA kernel optimization and counter-free performance analysis for depthwise convolution. Submitted to IEEE TPDS
From Cursed to Competitive: Closing the ZO-FO Gap via Input-to-State Stability
While it is generally understood that zeroth-order (ZO) algorithms have an extra dependency on their number of iterations for any choice of parameters, compared to their first-order (FO) counterparts, in this work, we show that under several conditions, in expectation, ZO methods do not suffer from extra dimension dependencies in their convergence rates with respect to their FO counterparts. We look at optimisation algorithms from the dynamical systems perspective and analyse the conditions under which one can formulate the average of a ZO algorithm as the average of its FO counterpart with bounded perturbations with values dependent on design parameters. Then, using input-to-state stability properties, we show ZO methods follow the same decay rate as their FO counterparts and converge to a neighbourhood of the fixed point of FO methods, where its radius depends on the bound of the norm of the perturbations, which can be made arbitrarily small. The theoretical findings are illustrated via numerical examples.
Slot-hopping Enabled Loiter Guidance and Automation for Fixed-wing UAV Corridors
This paper addresses the problem of traffic congestion management in fixed-wing unmanned aerial vehicle (UAV) corridors by further developing a recently introduced loiter-lane framework. A semi-cooperative guidance strategy is developed for inserting fixed-wing UAVs into a loiter lane with minimal disruption to the UAVs already operating within it, while enabling a more compact fixed-wing UAV corridor. Building on the concepts of cooperative and non-disruptive loiter-lane insertion, the proposed strategy makes the incoming UAV first attempt, within its speed bounds, to rendezvous with an existing empty loiter slot. If direct insertion is infeasible, a minimal number of loitering UAVs perform coordinated slot hopping to create a suitably positioned empty slot. The feasibility and performance of the method are demonstrated through numerical simulations.
Dual-Polarized Massive MIMO Based on Precoding for Vehicle-To-Ground Communication in Urban Rail Transit
The development of intelligent and diversified ser vices in urban rail transit (URT) has resulted in an increasing de mand for high-rate communication between vehicles and ground equipment. However, existing URT communication systems strug gle to handle the massive data exchange required for vehicle-to ground (V2G) communication. To address this issue, we propose a distributed dual-polarized MIMO architecture suitable for URT tunnel scenarios. Specifically, the channel model is based on spatial three-dimensional (3D) non-stationary geometry-based stochastic model (GBSM), which takes into account the geometric distribution of URT tunnels and the cross-polarization effects between dual-polarized antennas. For dual-polarized MIMO systems, the polarized-aware sparse channel estimation (PASCE) method is proposed for effective channel estimation. Additionally, we derive closed-form expressions for the MMSE and MR precoding schemes. The polarized-aware dynamic interference cancellation (PADIC) algorithm is developed to eliminate in terference between different polarization modes and multiple users. The simulation results demonstrate that the proposed dual-polarized precoding algorithm can withstand high cross polarization correlation (XPC) and improve the efficiency of V2G communication to achieve high rates.
comment: 10 pages, 8 figures; under review On IEEE TVT
Behaviour-aware Hybrid Architecture for Trust-driven Transmissions
Reliable and secure communication is essential for mission-critical aerospace and defence operations involving autonomous platforms such as Unmanned Aerial Vehicles (UAVs), satellites, and ground control systems. In contested or dynamic environments, communication links are frequently exposed to jamming, interference, and cyberattacks, making network resilience a key operational requirement. This paper presents a trust-aware Software-Defined Networking (SDN) framework that enables secure, low-latency failover between heterogeneous communication channels. The proposed architecture integrates a high-bandwidth primary link (e.g., satellite or tactical LTE) with a low-power fallback channel (e.g., RF or mesh), managed by an SDN controller that enforces zero-trust routing policies. A real-time Intrusion Detection System (IDS) continuously updates node trust scores; when trust or link reliability degrades, the controller autonomously switches traffic to the secondary channel, ensuring uninterrupted connectivity. Simulation results in a Mininet-based test environment demonstrate sub-5 ms failover latency, efficient flow installation, and significant reduction in packet loss compared with conventional single-channel or static routing systems. The proposed framework provides a scalable and resilient communication backbone for next-generation aerospace networks, enhancing mission reliability, cyber defence, and autonomous coordination across distributed aerial and space assets.
Reconfiguring flexibility in renewable power-to-ammonia systems using molten-salt thermal energy storage in the ammonia synthesis loop: A coordinated electro-hydrogen-thermal scheduling approach
In renewable power-to-ammonia (ReP2A) systems, the intermittency of wind and solar generation propagates through electrolytic hydrogen production and induces thermal instability in the ammonia synthesis reactor (ASR). The resulting temperature cycling accelerates fatigue and shortens service life, while reactor thermal inertia limits flexible start-up, shutdown, and load adjustment. To address this issue, this study integrates molten-salt thermal energy storage (MS-TES) into the Haber-Bosch synthesis loop and develops a coordinated electro-hydrogen-thermal scheduling framework. MS-TES decouples hydrogen supply fluctuations from reactor thermal dynamics by enabling hot standby operation and sustained thermal support during start-up and low-load conditions. A state-space model is established to capture the thermal dynamics of the ASR and MS-TES. Based on this model, an optimal scheduling program coordinates ammonia synthesis operation with hydrogen production, battery energy storage (BES), and hydrogen storage (HS). The problem is formulated as a mixed-integer linear program (MILP) and extended with information gap decision theory (IGDT) to address renewable uncertainty. Case studies based on an industrial-scale project in northern China show that MS-TES enhances reactor thermal stability and system-level flexibility, while diminishing the marginal benefit of large BES capacity. As a result, a configuration combining small BES, HS, and MS-TES achieves near-equivalent performance to large-BES systems, with lower investment and improved economic returns. Year-round simulations further show that MS-TES avoids ASR start-up and shutdown and delivers consistently higher net revenue under variable renewable conditions.
On Distributed Control of Continuum Swarms: Local Controllers as Differential Operators
We study the problem of distributed control of large-scale robotic swarms which can be modeled as continuum densities evolving under the continuity equation. We propose a formalization of distributed controllers as (generally nonlinear) differential operators, in which control inputs depend only on local information about the state and environment. This perspective yields a fully local, PDE-based framework for analysis and design. We apply this framework to the problem of stabilizing a swarm density around an arbitrary target density, and investigate fundamental limitations of low-order distributed controllers in achieving this goal. In particular, we show that controllers which act in a purely pointwise manner are incompatible with natural system symmetries and strong forms of stability, and must rely on mixing-type behavior to achieve stabilization. In contrast, we present a simple first-order control law which achieves stabilization and enjoys substantially stronger properties.
comment: 12 pages
A Continuous-Time Ensemble Kalman-Bucy Smoother for Causal Inference and Model Discovery
Data assimilation (DA) integrates observational information with model predictions to improve state estimation in complex systems. While filtering provides the basis for online forecasts by using only past and present observations, it can exhibit delays and biases when the underlying dynamics evolve rapidly or undergo regime transitions. Smoothing, which additionally incorporates future observations, provides a natural pipeline for hindcasting and reanalysis that yields an uncertainty reduction beyond the filter. This paper introduces an ensemble Kalman-Bucy smoother (EnKBS) for continuous-time DA of nonlinear dynamical systems, where the smoother's conditional distributions are reconstructed using ensemble moments. The result is a derivative-free framework that does not require explicit computation of tangent-linear or adjoint models, which converges to the exact smoother solution at the infinite-ensemble limit for a wide class of complex systems. Incorporating standard regularization techniques for high-dimensional systems, such as covariance localization and inflation, the skill of the EnKBS is demonstrated in various important scientific problems. By integrating future observations, which reveal the underlying causal mechanisms for retrospective state updates, the EnKBS is used for Bayesian-based inference of causal relationships and their temporal influence range in a dyadic trigger-feedback model and the development of a causality-driven iterative learning algorithm that identifies the structure and recovers the hidden parameters of a nonlinear reduced-order model mimicking midlatitude atmospheric circulation. Notably, both tasks remain effective with an ensemble size of $O(10)$ under partial observations, suggesting that EnKBS can support the instantaneous discovery of high-dimensional complex systems over time.
comment: 33 pages, 11 figures. Corresponding author: Nan Chen (chennan@math.wisc.edu)
Co-Learning Port-Hamiltonian Systems and Optimal Energy-Shaping Control
We develop a physics-informed learning framework for energy-shaping control of port-Hamiltonian (pH) systems from trajectory data. The proposed approach {co-learns} a pH system model and an optimal energy-balancing passivity-based controller (EB-PBC) through alternating optimization with policy-aware data collection. At each iteration, the system model is refined using trajectory data collected under the current control policy, and the controller is re-optimized on the updated model. Both components are parameterized by neural networks that embed the pH {dynamics} and EB-PBC structure, ensuring interpretability in terms of energy {interactions}. The learned controller renders the closed-loop system inherently passive and provably stable, and exploits passive plant dynamics without canceling the natural potential. A dissipation regularization enforces strict energy decay during training, thereby enhancing robustness to sim-to-real gaps. The proposed framework is validated on state-regulation and swing-up tasks for planar and torsional pendulum systems.
Application of Deep Reinforcement Learning to Event-Triggered Control for Networked Artificial Pancreas Systems
This paper proposes a deep reinforcement learning (DRL)-based event-triggered controller design for networked artificial pancreas (AP) systems. Although existing DRL-based AP controllers typically assume periodic control updates, networked control systems (NCSs) require a reduction in communication frequency to achieve energy-efficient operation, which is directly tied to control updates. However, jointly learning both insulin dosing and update timing significantly increases the complexity of the learning problem. To alleviate this complexity, we develop a practical DRL-based controller design that avoids explicitly learning update timing by introducing a rule-based criterion defined by changes in blood glucose. As a result, decision-making occurs at irregular intervals, and the problem is naturally formulated as a semi-Markov decision process (SMDP), for which we extend a standard DRL algorithm. Numerical experiments demonstrate that the proposed method improves communication efficiency while maintaining control performance.
Privacy-Preserving Federated Learning Framework for Distributed Chemical Process Optimization
Industrial chemical plants often operate under strict data confidentiality constraints, making centralized data-driven process modeling difficult. Federated learning (FL) provides a promising solution by enabling collaborative model training across distributed facilities without sharing raw operational data. This paper proposes a privacy-preserving federated learning framework for distributed chemical process optimization using data collected from multiple geographically separated plants. Each plant locally trains a neural-network-based process model using its own time-series sensor data, while only model parameters are transmitted to a central aggregation server through secure aggregation mechanisms. This design allows cross-plant knowledge sharing while maintaining strict data locality and industrial confidentiality. Experimental evaluation was conducted using process datasets from three independent chemical plants operating under heterogeneous conditions. The results demonstrate rapid convergence of the federated model, with the global mean squared error decreasing from approximately 2369 to below 50 within the first five communication rounds and stabilizing around 35 after 40 rounds. In comparison with local-only training, the proposed federated framework significantly improves prediction accuracy across all plants, while achieving performance comparable to centralized training. The findings indicate that federated learning provides an effective and scalable solution for collaborative industrial analytics, enabling privacy-preserving predictive modeling and process optimization across distributed chemical production facilities.
comment: 10 pages, 5 figures, 2 tables, 17 equations
Risk Assessments for Evasive Emergency Maneuvers in Autonomous Vehicles
This paper presents a systematic verification and validation (V\&V) framework for the Evasive Minimum Risk Maneuver (EMRM) feature in autonomous vehicles, addressing a critical gap in existing safety assessment methods. We introduce the first formally integrated pipeline that unifies Hazard Analysis and Risk Assessment (HARA), System-Theoretic Process Analysis (STPA), and Finite State Machine (FSM) modeling into a single traceable workflow specifically designed for EMRM V\&V. HARA and STPA are combined through a structured hazard-loss mapping to identify hazards and unsafe control actions; an FSM layer captures hazard-to-loss state transitions that neither method models individually; and the unified framework drives automated scenario generation with measurable parameter-space coverage. Applied to a T-junction EMRM case study, the framework guides 1{,}880 RRT-based simulations spanning ego speed, time-to-collision (TTC), and road friction, uncovering a key physical result: the T-junction geometry gives nearly equal difficulty to stopping and to navigating, so the intermediate mitigation mode occupies only 1.9\% of the feasible parameter space. EMRM steering strategies achieve 81\% collision-avoidance rate and reduce mean residual impact speed from 18.9~km/h to 9.0~km/h compared with emergency braking alone, while the framework attains 100\% hazard, UCA, and parameter-space coverage versus $\leq$1\% for traditional methods. These results demonstrate that the integrated HARA-STPA-FSM framework enables high-resolution, traceable EMRM V\&V that is not achievable with any single method in isolation.
Optimal-Control Suggestion for Congestion on Freeways using Data Assimilation of Distributed Fiber-Optic Sensing
This paper presents the optimal-control suggestion for congestion on freeways using data assimilation (DA) of distributed fiber-optic sensing (DFOS). To simultaneously maximize throughput and avoid/mitigate congestion, it is necessary to execute optimal control for the current traffic state as active transportation and demand management (ATDM) according to multi-objective optimization with real-time monitoring data. However, optimal control cannot be estimated due to intermittent observed data obtained from conventional sensors. To solve the issue, this paper proposes the ATDM optimal control estimation with DA of DFOS, which can monitor traffic flow in real time without dead zones. Our real-time DA method enables us to estimate the effectiveness of control scenarios by simulation. This paper also provides a method to uniquely determine the optimal-control solution among the Pareto solutions for multi-objective optimization. Throughput and mean speed across the entire road are considered as the objective functions. Variable speed limit (VSL) and inflow control are taken as ATDM examples. Validation results on a Japanese freeway show that (i) the optimal control scenario varies depending on the traffic state, especially congestion level; (ii) optimal control considering VSL alone improves throughput by 5-14% while the improvement rate for mean speed is 0-8%; (iii) throughput and mean speed are improved by 10-15% and 20-30%, respectively when VSL and inflow control are considered. This paper also implies the importance of balance management for the lane occupancy and proactive optimal control before congestion occurs.
comment: 21 pages, 13 figures, presented in the Transportation Research Board 105th Annual Meeting 2026
Inverse Linear-Quadratic Gaussian Differential Games
This paper presents a method for solving the Inverse Stochastic Differential Game (ISDG) problem in finite-horizon linear-quadratic Gaussian (LQG) differential games. The objective is to recover cost function parameters of all players, as well as noise scaling parameters of the stochastic system, consistent with observed trajectories. The proposed framework combines (i) estimation of the feedback strategies, (ii) identification of the cost function parameters via a novel reformulation of the coupled Riccati differential equations, and (iii) maximum likelihood estimation of the noise scaling parameters. Simulation results demonstrate that the approach recovers parameters, yielding trajectories that closely match the observed trajectories.
BOOST: Microgrid Sizing using Ordinal Optimization
Sizing a residential microgrid efficiently requires solving a coupled design-and-operation problem: photovoltaic (PV) and battery capacities should be chosen in a way that reflects how the system will actually be dispatched over time. This paper proposes BOOST, or Battery-solar Ordinal Optimization Sizing Technique, which combines ordinal optimization (OO) with mixed-integer linear programming (MILP). OO is used to screen a large set of candidate battery/PV designs with a simple linear model and then re-evaluate only the most promising designs with a more accurate MILP that captures diesel commitment logic. Relative to the original short paper, this expanded manuscript retains the full methodological narrative but refreshes the quantitative section using a new synthetic benchmark dataset suite generated from the released clean reimplementation. The suite contains five yearly synthetic datasets/configurations: base, cheap battery, cheap PV, expensive diesel, and high peak tariff. On the base synthetic dataset, the best accurate design is a 500 kWh battery with 1833.3 kW of PV, achieving 13.169 c/kWh, while BOOST improves upon dynamic programming and greedy baselines. Across the full 10 x 10 design grid, the LP and MILP rankings are effectively identical (rho = 1.000), the paper-style choice of N = 90 and s = 18 recovers the global accurate optimum, and the OO-based workflow reduces runtime by 51.8% relative to exhaustive accurate evaluation on the refreshed synthetic benchmark run. Because these added datasets are synthetic, they should be read as methodological stress tests rather than as direct empirical claims about any specific real-world site. Code is available at https://github.com/MFHChehade/Microgrid-Optimization.
Topology Reconstruction of a Resistor Network with Limited Boundary Measurements: An Optimization Approach
A problem of reconstruction of the topology and the respective edge resistance values of an unknown circular planar passive resistive network using limitedly available resistance distance measurements is considered. We develop a multistage topology reconstruction method, assuming that the number of boundary and interior nodes, the maximum and minimum edge conductance, and the Kirchhoff index are known apriori. First, a maximal circular planar electrical network consisting of edges with resistors and switches is constructed; no interior nodes are considered. A sparse difference in convex program $\mathbfΠ_1$ accompanied by round down algorithm is posed to determine the switch positions. The solution gives us a topology that is then utilized to develop a heuristic method to place the interior nodes. The heuristic method consists of reformulating $\mathbfΠ_1$ as a difference of convex program $\mathbfΠ_2$ with relaxed edge weight constraints and the quadratic cost. The interior node placement thus obtained may lead to a non-planar topology. We then use the modified Auslander, Parter, and Goldstein algorithm to obtain a set of planar network topologies and re-optimize the edge weights by solving $\mathbfΠ_3$ for each topology. Optimization problems posed are difference of convex programming problem, as a consequence of constraints triangle inequality and the Kalmansons inequality. A numerical example is used to demonstrate the proposed method.
Physics-Informed Neural Networks for Nonlinear Output Regulation
This work addresses the full-information output regulation problem for nonlinear systems, assuming the states of both the plant and the exosystem are known. In this setting, perfect tracking or rejection is achieved by constructing a zero-regulation-error manifold $π(w)$ and a feedforward input $c(w)$ that render such manifold invariant. The pair $(π(w), c(w))$ is characterized by the regulator equations, i.e., a system of PDEs with an algebraic constraint. We focus on accurately solving the regulator equations introducing a physics-informed neural network (PINN) approach that directly approximates $π(w)$ and $c(w)$ by minimizing the residuals under boundary and feasibility conditions, without requiring precomputed trajectories or labeled data. The learned operator maps exosystem states to steady state plant states and inputs, enables real-time inference and, critically, generalizes across families of the exosystem with varying initial conditions and parameters. The framework is validated on a regulation task that synchronizes a helicopter's vertical dynamics with a harmonically oscillating platform. The resulting PINN-based solver reconstructs the zero-error manifold with high fidelity and sustains regulation performance under exosystem variations, highlighting the potential of learning-enabled solvers for nonlinear output regulation. The proposed approach is broadly applicable to nonlinear systems that admit a solution to the output regulation problem.
Space-Air-Ground-Integrated Networks: The BER vs. Residual Delay and Doppler Analysis
Perfect Doppler compensation and synchronization is nontrivial due to multi-path Doppler effects and Einstein's theory of relativity in the space-air-ground-integrated networks (SAGINs). Hence, by considering the residual Doppler and the synchronization delay, this paper investigates the bit-error-rate (BER) performance attained under time-varying correlated Shadowed-Rician SAGIN channels. First, a practical SAGIN model is harnessed, encompassing correlated Shadowed-Rician channels, the Snell's law-based path loss, atmospheric absorption, the line-of-sight Doppler compensation, elliptical satellite orbits, and Einstein's theory of relativity. Then, a specific correlation coefficient between the pilot and data symbols is derived in the context of correlated Shadowed-Rician channels. By exploiting this correlation coefficient, the channel distribution is mimicked by a bi-variate Gamma distribution. Then, a closed-form BER formula is derived under employing least-square channel estimation and equalization for 16-QAM. Our analytical results indicate for a 300-km-altitude LEO that 1) the period of realistic elliptical orbits is around 0.8 seconds longer than that of the idealized circular orbits; and 2) the relativistic delay is lower than 1 microsecond over a full LEO pass (from rise to set). Our numerical results for the L bands quantify the effects of: 1) the residual Doppler; 2) atmospheric shadowing; 3) synchronization errors; and 4) pilot overhead.
Data-Driven Incremental GAS Certificate of Nonlinear Homogeneous Networks: A Scenario Approach with Noisy Data
This work focuses on a compositional data-driven approach to verify incremental global asymptotic stability (delta-GAS) over interconnected homogeneous networks of degree one with unknown mathematical dynamics. Our proposed approach leverages the concept of incremental input-to-state stability (delta-ISS) of subsystems, characterized by delta-ISS Lyapunov functions. To implement our data-driven scheme, we initially reframe the delta-ISS Lyapunov conditions as a robust optimization program (ROP). Due to the presence of unknown subsystem dynamics in the ROP constraints, we develop a scenario optimization program (SOP) by gathering data from trajectories of each unknown subsystem. However, since the measured one-step transition data are corrupted by noise with a known bound on its norm, rendering the proposed SOP intractable, we introduce an auxiliary SOP that explicitly accommodates noisy measurements. We solve the auxiliary SOP and construct a delta-ISS Lyapunov function for each subsystem with unknown dynamics. We then leverage a small-gain compositional condition to facilitate the construction of an incremental Lyapunov function for an unknown interconnected network based on the data-driven delta-ISS Lyapunov functions of its individual subsystems, while providing correctness guarantees, incorporating the bound on the noise norm. We demonstrate that our data-driven compositional approach reduces the sample complexity to the subsystem level. To validate the effectiveness of our approach, we apply it to an unknown controlled physical nonlinear homogeneous network of degree one, comprising 10000 subsystems. By gathering noisy data from each unknown subsystem, we demonstrate that the interconnected network is delta-GAS with a correctness guarantee.
A Diffusion-based Generative Machine Learning Paradigm for Dynamic Contingency Screening
Dynamic contingency screening is a challenging task in dynamic security assessment, when traditional numerical approaches are computationally intensive and often not able to repeatedly solve full AC power flow for all possible contingencies in real time, especially for large-scale power grids. Moreover, the severity caused by a contingency is not identical for all operating points, which does not necessitate solving all possible contingencies computationally inefficient and time-consuming. This paper introduces a novel, diffusion-based generative machine learning paradigm that transforms contingency analysis from conventional scenario selection to a proactive, likely-unsupervised scenario generation. The margin to the steady-state voltage stability limit determines the ranking of contingencies corresponding to each operating point. By leveraging physical information from each operating point, the proposed approach anticipates the contingencies most likely to be critical, without relying on static assumptions or exhaustive simulations. This data-prompted generative approach enables the identification of high-risk scenarios under varying load and generator conditions, providing dynamic security assessment in real time. The correctness, effectiveness, and scalability of the methodology are demonstrated through methodological derivations and comprehensive experiments on multiple IEEE benchmark systems, including IEEE-6, IEEE-14, IEEE-30, and IEEE-118, highlighting its potential to incorporate contingency screening in complex, evolving smart grids.
Power-Duration Characterization of Aggregated Thermostatically Controlled Loads via Reach and Hold Sets
Aggregations of thermostatically controlled loads (TCLs), such as air conditioners, offer valuable flexibility to the power grid. The aggregate power consumption of a TCL fleet can be controlled by adjusting thermostat setpoints. An \textit{ex-ante} quantification of the flexibility that results from such setpoint change can inform grid operator decisions. This paper develops a rigorous, yet practical method to quantify flexibility in terms of the `reach-and-hold' set of TCL aggregations, which defines how much power can be shifted (reach) and for how long (hold). To quantify the reach-and-hold set, we employ a Markov-chain-based model of the TCL aggregation that captures second-order TCL dynamics, enabling accurate characterization of reach-and-hold sets. A tractable optimization problem is then formulated to numerically compute an inner approximation of these sets. Simulation results validate that our method accurately characterizes the fleet's flexibility and effectively controls its power consumption. Furthermore, a robustness analysis is carried out to investigate the effects of uncertainty in initial conditions and TCL parameters.
Integrating AI and Simulation for Teaching Power System Dynamics: An Interactive Framework for Engineering Education
Artificial Intelligence (AI), especially cloud platforms and large language models (LLMs), is changing how engineering is taught by making learning more interactive and flexible. However, in electrical engineering and energy systems, students often find power system dynamics difficult to understand because the concepts are abstract, math-heavy, and there are limited opportunities for hands-on practice. This paper presents an AI-based interactive learning framework that combines simulation with intelligent feedback to improve understanding and student engagement. The framework has three connected parts: an AI layer that provides explanations and guidance, a simulation layer that models system behavior, and a user layer that allows students to interact with the system in real time. These parts work together in a continuous loop where students explore how the system behaves, change parameters, and receive feedback based on the results. The paper also provides a step-by-step process to help educators design and apply AI-supported learning environments, including breaking down concepts, using simulations, and assessing performance. This method helps students learn through practice and better understand how ideas from class apply to real power systems. It also provides a practical way to improve electrical engineering education and helps students get ready to use AI tools carefully and responsibly in engineering.
comment: 2 pages
Coordinated vs. Sequential Transmission Planning
Coordinated planning of generation, storage, and transmission more accurately captures the interactions among these three capacity types necessary to meet electricity demand, at least in theory. However, in practice, U.S. system operators typically follow a sequential planning approach: They first determine future generation and storage additions based on an assumed unconstrained (`copper plate') system. Next, they perform dispatch simulations of this projected generation and storage capacity mix on the existing transmission grid to identify transmission constraint violations. These violations indicate the need for transmission upgrades. We describe a multistage, multi-locational planning model that co-optimizes generation, storage, and transmission investments. The model respects reliability constraints as well as state energy and climate policies. We test the two planning approaches using a current stakeholder-informed 20-zone model of the PJM region, developed for the current FERC Order No. 1920 compliance filing process. In our most conservative model specification, we find that the co-optimized approach estimates 67% lower transmission upgrade needs than the sequential model, leading to total system costs that are .6% lower and similar reliability and climate outcomes. Our sensitivities show larger transmission and cost savings and reliability and climate benefits from co-optimized planning.
comment: 11 pages
A Trajectory-Based Approach to Controlled Invariance and Recursively Feasible MPC
In this paper, we revisit the computation of controlled invariant sets for linear discrete-time systems through a trajectory-based viewpoint. We begin by introducing the notion of convex feasible points, which provides a new characterization of controlled invariance using finitely long state trajectories. We further show that combining this notion with the classical backward fixed-point algorithm allows for the computation of the maximal controlled invariant set. Building on these results, we propose a model predictive control (MPC) scheme that guarantees recursive feasibility without relying on precomputed terminal sets. Finally, we formulate the search for convex feasible points as an optimization problem, yielding a practical computational method for constructing controlled invariant sets. The effectiveness of the approach is illustrated through numerical examples.
comment: 11 pages,5 figures, accepted at the European control conference, Updated title and abstract, updated proofs
Robotics
Passage-Aware Structural Mapping for RGB-D Visual SLAM
Doorways and passages are critical structural elements for indoor robot navigation, yet they remain underexplored in modern Visual SLAM (VSLAM) frameworks. This paper presents a passage-aware structural mapping approach for RGB-D VSLAM that detects doors and traversable openings by jointly fusing geometric, semantic, and topological cues. Doors are modeled as planar entities embedded within walls and classified as traversable or non-traversable based on their coplanarity with the supporting wall. Passages are inferred through two complementary strategies: traversal evidence accumulated from camera-wall interactions across consecutive keyframes, and geometric opening validation based on discontinuities in the mapped wall geometry. The proposed method is integrated into vS-Graphs as a proof of concept, enriching its scene graph with passage-level abstractions and improving room connectivity modeling. Qualitative evaluations on indoor office sequences demonstrate reliable doorway detection, and the framework lays the foundation for exploiting these elements in BIM-informed VSLAM. The source code is publicly available at https://github.com/snt-arg/visual_sgraphs/tree/doorway_integration.
comment: 5 pages, 5 figures
Exploiting Differential Flatness for Efficient Learning-based Model Predictive Control of Constrained Multi-Input Control Affine Systems
Learning-based control techniques use data from past trajectories to control systems with uncertain dynamics. However, learning-based controllers are often computationally inefficient, limiting their practicality. To address this limitation, we propose a learning-based controller that exploits differential flatness, a property of many robotic systems. Recent research on using flatness for learning-based control either is limited in that it (i) ignores input constraints, (ii) applies only to single-input systems, or (iii) is tailored to specific platforms. In contrast, our approach uses a system extension and block-diagonal cost formulation to control general multi-input, nonlinear, affine systems. Furthermore, it satisfies input and half-space flat state constraints and guarantees probabilistic Lyapunov decrease using only two sequential convex optimizations. We show that our approach performs similarly to, but is multiple times more efficient than, a Gaussian process model predictive controller in simulation, and achieves competitive tracking in real hardware experiments.
comment: Accepted for publication in 2026 European Control Conference
Learning Human-Intention Priors from Large-Scale Human Demonstrations for Robotic Manipulation
Human videos contain rich manipulation priors, but using them for robot learning remains difficult because raw observations entangle scene understanding, human motion, and embodiment-specific action. We introduce MoT-HRA, a hierarchical vision-language-action framework that learns human-intention priors from large-scale human demonstrations. We first curate HA-2.2M, a 2.2M-episode action-language dataset reconstructed from heterogeneous human videos through hand-centric filtering, spatial reconstruction, temporal segmentation, and language alignment. On top of this dataset, MoT-HRA factorizes manipulation into three coupled experts: a vision-language expert predicts an embodiment-agnostic 3D trajectory, an intention expert models MANO-style hand motion as a latent human-motion prior, and a fine expert maps the intention-aware representation to robot action chunks. A shared-attention trunk and read-only key-value transfer allow downstream control to use human priors while limiting interference with upstream representations. Experiments on hand motion generation, simulated manipulation, and real-world robot tasks show that MoT-HRA improves motion plausibility and robust control under distribution shift.
comment: 13 pages, 5 figures
Pushing Radar Odometry Beyond the Pavement: Current Capabilities and Challenges
Radar offers unique advantages for localization in unstructured environments, including robustness to weather, lighting, and airborne particulates. While most prior work has studied radar odometry in urban, largely planar settings, its performance in off-road environments remains less understood. In this paper, we investigate the potential of radar for off-road odometry estimation and identify key challenges that arise from full $SE(3)$ vehicle motion, terrain-induced ground returns, and sparse or unstable features. To address these issues, we introduce two simple baselines: Radar-KISSICP, which applies motion compensation to generate 3D-aware radar pointclouds, and Radar-IMU, which leverages IMU preintegration to stabilize scan matching. Experiments on the Great Outdoors (GO) dataset demonstrate that these baselines improve trajectory estimation in challenging routes and provide a reference point for future development of radar odometry in off-road robotics.
Agent-Centric Visual Reinforcement Learning under Dynamic Perturbations
Visual reinforcement learning aims to empower an agent to learn policies from visual observations, yet it remains vulnerable to dynamic visual perturbations, such as unpredictable shifts in corruption types. To systematically study this, we introduce the Visual Degraded Control Suite (VDCS), a benchmark extending DeepMind Control Suite with Markov-switching degradations to simulate non-stationary real-world perturbations. Experiments on VDCS reveal severe performance degradation in existing methods. We theoretically prove via information-theoretic analysis that this failure stems from reconstruction-based objectives inevitably entangling perturbation artifacts into latent representations. To mitigate this negative impact, we propose Agent-Centric Observations with Mixture-of-Experts (ACO-MoE) to robustify visual RL against perturbations. The proposed framework leverages unique agent-centric restoration experts, achieving restoration from corruptions and task-relevant foreground extraction, thereby decoupling perception from perturbation before being processed by the RL agent. Extensive experiments on VDCS show our ACO-MoE outperforms strong baselines, recovering 95.3% of clean performance under challenging Markov-switching corruptions. Moreover, it achieves SOTA results on DMControl Generalization with random-color and video-background perturbations, demonstrating a high level of robustness.
Computational Design and Co-Robotic Fabrication for Material Reuse in Architecture
Climate change and resource depletion demand a shift from the dominant linear "take-make-use-dispose" paradigm of construction toward circular, low-waste practices. Material reuse offers a promising pathway by reducing raw material extraction, mitigating waste, and extending the service lifespan of carbon-sequestering materials such as timber. Realizing this potential, however, requires addressing technical and logistical challenges across both design and construction for accommodating heterogeneous, reclaimed material inventories. This paper presents an integrated framework that couples data-driven computational design with feedback-driven adaptive human-robot collaborative (co-robotic) fabrication and assembly to enable the realization of nonstandard structures made from reclaimed timber of varying length and geometries, supplemented with new off-the-shelf timber when necessary. The framework is validated through Timbrelyn, a built case-study installation that demonstrates how timber reuse can inform and enhance architectural expression. This work contributes to the development of integrated design-to-fabrication workflows that advance adaptive, feedback-driven methods to handle inventory constraints and reclaimed material uncertainties, facilitating material reuse in the design and construction of new buildings and structures.
comment: Accepted for publication in Proceedings of the 45th Annual Conference of the Association for Computer Aided Design in Architecture (ACADIA 2025)
Real-time windrow detection from onboard tractor sensors for automated following
Proprietary design in commercial windrow-detection systems restricts transparency and limits progress in open autonomous forage-harvesting research. We present a multi-modal dataset combining stereo vision and LiDAR from tractor-mounted sensors during real baling operations. The dataset includes synchronized sensor data with GNSS trajectories, partly released as ROS2 Humble bags on Zenodo, with additional data available on request. Using this dataset, we implement a real-time (>20 Hz) centroid-based windrow-following method on an NVIDIA Jetson AGX Orin. Across the critical 4-10 m guidance range, stereo and LiDAR depth measurements show strong agreement (0.965 +/- 0.021), indicating that low-cost stereo sensors can approach LiDAR performance. Our open-source ROS 2 pipeline provides a reproducible benchmark for GPS-free windrow detection and supports development of practical autonomous forage-harvesting systems. Dataset: https://zenodo.org/records/17486318
comment: Published in the proceedings of the 46th GIL Annual Conference (GIL-Jahrestagung 2026)
Hybrid A*-Based Reverse Path-Planning of a Vehicle with Trailer System
Reverse parking maneuvering of a vehicle with trailer system is a difficult task to complete for human drivers due to the multi-body nature of the system and the unintuitive controls required to orientate the trailer properly. The problem is complicated with the presence of other vehicles that the trailer and its connected vehicle must avoid during the reverse parking maneuver. While path planning methods in reverse motion for vehicles with trailers exist, there is a lack of results that also offer collision avoidance as part of the algorithm. This paper hence proposes a modified Hybrid A*-based algorithm that can accommodate the vehicle-trailer system as well as collision avoidance considerations with the other vehicles and obstacles in the parking environment. One of the novelties of this proposed approach is its adaptability to the vehicle with trailer system, where limits of usable steering input that prevent the occurrence of jackknife incidents vary with respect to system configuration. The other contribution is the addition of the collision avoidance functionality which the standard Hybrid A* algorithm lacks. The method is developed and presented first, followed by simulation case studies to demonstrate the efficacy of the proposed approach.
Sliding Mode Control for Safe Trajectory Tracking with Moving Obstacles Avoidance: Experimental Validation on Planar Robots
This paper presents a unified control framework for robust trajectory tracking and moving obstacle avoidance applicable to a broad class of mobile robots. By formulating a generalized kinematic transformation, we convert diverse vehicle dynamics into a strict feedback form, facilitating the design of a Sliding Mode Control (SMC) strategy for precise and robust reference tracking. To ensure operational safety in dynamic environments, the tracking controller is integrated with a Collision Cone Control Barrier Function (C3BF) based safety filter. The proposed architecture guarantees asymptotic tracking in the presence of external disturbances while strictly enforcing collision avoidance constraints. The novelty of this work lies in designing a sliding mode controller for ground robots like the Ackermann drive, which has not been done before. The efficacy and versatility of the approach are validated through numerical simulations and extensive real-world experiments on three distinct platforms: an Ackermann-steered vehicle, a differential drive robot, and a quadrotor drone. Video of the experiments are available at https://youtu.be/dWcxwum96vk
Guiding Vector Field Generation via Score-based Diffusion Model ICRA2026
Guiding Vector Fields (GVFs) are a powerful tool for robotic path following. However, classical methods assume smooth, ordered curves and fail when paths are unordered, multi-branch, or generated by probabilistic models. We propose a unified framework, termed the Score-Induced Guiding Vector Field (SGVF), which leverages score-based generative modeling to construct vector fields directly from data distributions. SGVF learns tangent fields from point clouds with unit-norm, orthogonality, and directional-consistency losses, ensuring geometric fidelity and control feasibility. This approach removes the reliance on ad-hoc path segmentation and enables guidance along complex topologies such as branching and pseudo-manifolds. The study establishes a correspondence between score vanishing in diffusion models and GVF singularities and highlights representational capacity near sharp path curvatures. Experiments on robotic navigation in planar environments demonstrate that SGVF achieves reliable path following in scenarios where classical GVFs fail, underscoring its potential as a bridge between generative modeling and geometric control. Code and experiment video are available at https://github.com/czr-gif/Guiding-Vector-Field-Generation-via-Score-based-Diffusion-Model.
comment: 8 pages, 6 figrues, ICRA2026
SPLIT: Separating Physical-Contact via Latent Arithmetic in Image-Based Tactile Sensors
Training machine learning models for robotic tactile sensing requires vast amounts of data, yet obtaining realistic interaction data remains a challenge due to physical complexity and variability. Simulating tactile sensors is thus a crucial step in accelerating progress. This paper presents SPLIT, a novel method for simulating image-based tactile sensors, with a primary focus on the DIGIT sensor. Central to our approach is a latent space arithmetic strategy that explicitly disentangles contact geometry from sensor-specific optical properties. Unlike methods that require recalibration for every new unit, this disentanglement allows SPLIT to adapt to diverse DIGIT backgrounds and even transfer data to distinct sensors like the GelSight R1.5 without full model retraining. Beyond this adaptability, our approach achieves faster inference speeds than existing alternatives. Furthermore, we provide a calibrated finite element method (FEM) soft-body mesh simulation with variable resolution, offering a tunable trade-off between speed and fidelity. Additionally, our algorithm supports bidirectional simulation, allowing for both the generation of realistic images from deformation meshes and the reconstruction of meshes from tactile images. This versatility makes SPLIT a valuable tool for accelerating progress in robotic tactile sensing research.
comment: Accepted to Elsevier Robotics and Autonomous Systems Journal
Characterizing Vision-Language-Action Models across XPUs: Constraints and Acceleration for On-Robot Deployment
Vision-Language-Action (VLA) models are promising for generalist robot control, but on-robot deployment is bottlenecked by real-time inference under tight cost and energy budgets. Most prior evaluations rely on desktop-grade GPUs, obscuring the trade-offs and opportunities offered by heterogeneous edge accelerators (GPUs/XPUs/NPUs). We present a systematic analysis for low-cost VLA deployment via model-hardware co-characterization. First, we build a cross-accelerator leaderboard and evaluate model-hardware pairs under CET (Cost, Energy, Time), showing that right-sized edge devices can be more cost-/energy-efficient than flagship GPUs while meeting control-rate constraints. Second, using in-depth profiling, we uncover a consistent two-phase inference pattern: a compute-bound VLM backbone followed by a memory-bound Action Expert, which induces phase-dependent underutilization and hardware inefficiency. Finally, guided by these insights, we propose DP-Cache and V-AEFusion to reduce diffusion redundancy and enable asynchronous pipeline parallelism, achieving up to 2.9x speedup on GPUs and 6x on edge NPUs with only marginal success degradation. The example leaderboard website is available at: https://vla-leaderboard-01.vercel.app/.
comment: 13 pages
An Automatic Ground Collision Avoidance System with Reinforcement Learning
This article evaluates an artificial intelligence (AI)-based Automatic Ground Collision Avoidance System (AGCAS) designed for advanced jet trainers to enhance operational effectiveness. In the continuously evolving field of aerospace engineering, the integration of AI is crucial for advancing operations with improved timing constraints and efficiency. Our study explores the design process of an AI-driven AGCAS, specifically tailored for advanced jet trainers, focusing on addressing the AGCAS problem within a limited observation space. The system utilizes line-of-sight queries on a terrain server to ensure precise and efficient collision avoidance. This approach aims to significantly improve the safety and operational capabilities of advanced jet trainers.
FreqCache: Accelerating Embodied VLN Models with Adaptive Frequency-Guided Token Caching
Vision-Language-Navigation (VLN) models exhibit excellent navigation accuracy but incur high computational overhead. Token caching has emerged as a promising training-free strategy to reduce this cost by reusing token computation results; however, existing token caching approaches rely on visual domain methods for cacheable token selection, leading to challenges when adapted to VLN models. 1) Visual domain methods become invalid when there is viewpoint migration. 2) Visual domain methods neglect critical edge information without the aid of additional algorithms. 3) Visual domain methods overlook the temporal variation of scenarios and lack adjustability in cache budgets. In this paper, we develop detailed analyses and find that the impacts of these challenges exhibit invariance and analyzability in the frequency domain. Based on these, we propose a frequency-guided token caching framework, called FreqCache. Utilizing the inherent properties of the frequency domain, FreqCache achieves optimal token cache establishment, refreshment, and adaptive adjustment. Experiments show that FreqCache achieves 1.59x speedup with ignorable overhead, showing the effect of integrating frequency domain methods in VLN token caching.
Pedestrians play chicken with an autonomous vehicle
Automated vehicles (AVs) are commonly programmed to yield unconditionally to pedestrians in the interest of safety. However, this design choice can give rise to the Freezing Robot Problem in which pedestrians learn to assert priority at every interaction, causing vehicles to stall and make no progress. The game theoretic Sequential Chicken model has shown that, like human drivers, AVs can resolve this problem by trading credible threats of very small risks of collision or larger risks of less severe invasion of personal space against the value of time due to yielding delays. This paper presents the first demonstration and evaluation of this approach using a real AV with human subjects and shows that pedestrian behavior under experimentally constrained safety conditions can be well fitted by Sequential Chicken, with a low time value of collision, suggestive of their planning to avoid proxemic personal space penalties as well as actual collisions.
ARETE: Attention-based Rasterized Encoding for Topology Estimation using HSV-transformed Crowdsourced Vehicle Fleet Data
The continuous advancement of autonomous driving (AD) introduces challenges across multiple disciplines to ensure safe and efficient driving. One such challenge is the generation of High-Definition (HD) maps, which must remain up to date and highly accurate for downstream automotive tasks. One promising approach is the use of crowdsourced data from a vehicle fleet, representing road topology and lane-level features. This work focuses on the generation of centerlines and lane dividers from crowdsourced vehicle trajectories. We adopt a Detection Transformer (DETR)-based approach, where a rasterized representation of vehicle trajectories is used as input to predict vectorized lane representations. Each lane consists of a centerline with an associated direction and corresponding lane dividers that are geometrically constrained by the centerline. Our method includes the extraction of local tiles, from which crowdsourced vehicle trajectories are aggregated. Each tile undergoes a transformation into a rasterized representation encoding both the presence and direction of each trajectory, enabling the prediction of vectorized directed lanes. Experiments are conducted on an internal dataset as well as on the public datasets nuScenes and nuPlan.
Projected Attainable Speed Space: A Driving Efficiency Metric Connecting Instantaneous Evaluation to Travel Time
Inefficient driving behaviors, such as overly conservative yielding, remain a key obstacle to deployment of autonomous vehicles (AVs). Instantaneous driving efficiency metrics are crucial for self-driving decision-making because they affect real-time performance evaluation and control optimization. However, commonly used indicators, including speed, relative speed, and inter-vehicle distance, are limited in capturing traffic context and in ensuring consistency between instantaneous outputs and travel-level outcomes. This study proposes the Projected Attainable Speed Space (PASS) model, a unified framework for driving efficiency assessment across instantaneous and travel-level analyses by integrating kinematic and spatial traffic information. PASS characterizes instantaneous driving efficiency with two coupled elements: potential for speed improvement (available acceleration space) and response to that potential (utilization of available acceleration space). Available acceleration space is referenced to projected attainable speed, derived from an idealized catch-up maneuver using relative speed and spacing to the leading vehicle; utilization is represented by the temporal change in available acceleration space. To ensure cross-scale consistency, time-aggregated PASS is defined as a travel-level efficiency metric. Trajectory data from a driving simulation experiment are used for parameter calibration to maximize agreement between time-aggregated PASS and observed travel times. Across 10 lane-change events, results show strong consistency, with an average coefficient of determination of 0.913, validating PASS for consistent efficiency evaluation across instantaneous and travel-level temporal scales. This study provides a unified, physically grounded framework that supports real-time decision-making and long-term performance analysis in autonomous driving.
OpenPodcar2: a robust, ROS2 vehicle for self-driving research
OpenPodcar2 is a robust, ROS2-interfaced, low-cost, open source hardware and software, autonomous vehicle platform based on an off-the-shelf, hard-canopy, mobility scooter donor vehicle. It is a modification of the previous OpenPodcar design, which extends it with robust electronics and ROS2 interfacing, to enable both research and also potential deployment use cases. The platform consists of (a) hardware components: documented as a bill of materials and build instructions; (b) integration to the general purpose OSH R4 mechatronics board and a Gazebo simulation of the vehicle, both presenting a common ROS2 interface (c) higher-level ROS2 software implementations and configurations of standard robot autonomous planning and control, including the nav2 stack which performs SLAM and enacts commands to drive the vehicle from a current to a desired pose around obstacles. OpenPodcar2 can transport a human passenger or similar load at speeds up to 15km/h, for example for use as a last-mile autonomous taxi service or to transport delivery containers similarly around a city center. It is small and safe enough to be parked in a standard research lab robust enough for some deployment cases. Total build cost was around 7,000USD from new components, or 2,000USD with a used Donor Vehicle. OpenPodcar2 thus provides a research balance between real world utility, safety, cost and robustness.
Generalizable Friction Coefficient Estimation via Material Embedding and Proxy Interaction Modeling
Accurately estimating friction coefficients between arbitrary material pairs is critical for robotics, digital fabrication, and physics-based simulation, but exhaustive pairwise testing scales quadratically with the number of materials. We introduce a proxy-based modeling framework that approximates any pairwise friction $f(A,B)$ from a small, fixed set of proxy materials $C=[c_1,\dots,c_k]$ by learning a per-material embedding $z_A = g(f(A,c1),\dots,f(A,ck))$ and a fusion function $p$ such that $f(A,B)\approx p\big(z_A,z_B\big)$. We present deterministic and probabilistic realizations of $g$ and $p$, procedures for selecting diverse proxy sets, and mechanisms for handling missing or noisy proxy measurements. The learned embeddings are compact, interpretable, and enable calibrated uncertainty estimates for downstream decision making. On simulated and measured friction datasets, our approach achieves high predictive accuracy, robust performance with partial observations, and substantial experimental savings by significantly reducing pairwise testing.
$M^2$-VLA: Boosting Vision-Language Models for Generalizable Manipulation via Layer Mixture and Meta-Skills
Current Vision-Language-Action (VLA) models predominantly rely on end-to-end fine-tuning. While effective, this paradigm compromises the inherent generalization capabilities of Vision-Language Models (VLMs) and incurs catastrophic forgetting. To address these limitations, we propose $M^2$-VLA, which demonstrates that a generalized VLM is able to serve as a powerful backbone for robotic manipulation directly. However, it remains a key challenge to bridge the gap between the high-level semantic understanding of VLMs and the precise requirements of robotic control. To overcome this, we introduce the Mixture of Layers (MoL) strategy that selectively extracts task-critical information from dense semantic features. Furthermore, to facilitate efficient trajectory learning under constrained model capacity, we propose a Meta Skill Module (MSM) that integrates strong inductive biases. Extensive experiments in both simulated and real-world environments demonstrate the effectiveness of our approach. Furthermore, generalization and ablation studies validate the architecture's zero-shot capabilities and confirm the contribution of each key component. Our code and pre-trained models will be made publicly available.
AsyncShield: A Plug-and-Play Edge Adapter for Asynchronous Cloud-based VLA Navigation
While Vision-Language-Action (VLA) models have been demonstrated possessing strong zero-shot generalization for robot control, their massive parameter sizes typically necessitate cloud-based deployment. However, cloud deployment introduces network jitter and inference latency, which can induce severe spatiotemporal misalignment in mobile navigation under continuous displacement, so that the stale intents expressed in past ego frames may become spatially incorrect in the current frame and lead to collisions. To address this issue, we propose AsyncShield, a plug-and-play asynchronous control framework. AsyncShield discards traditional black-box time-series prediction in favor of a deterministic physical white-box spatial mapping. By maintaining a temporal pose buffer and utilizing kinematic transformations, the system accurately converts temporal lag into spatial pose offsets to restore the VLA's original geometric intent. To balance intent restoration fidelity and physical safety, the edge adaptation is formulated as a constrained Markov decision process (CMDP). Solved via the PPO-Lagrangian algorithm, a reinforcement learning adapter dynamically trades off between tracking the VLA intent and responding to high-frequency LiDAR obstacle avoidance hard constraints. Furthermore, benefiting from a standardized universal sub-goal interface, domain randomization, and perception-level adaptation via Collision Radius Inflation, AsyncShield operates as a lightweight, plug-and-play module. Simulation and real-world experiments demonstrate that, without fine-tuning any cloud-based foundation models, the framework exhibits zero-shot and robust generalization capabilities, effectively improving the success rate and physical safety of asynchronous navigation.
comment: 9 pages, 2 figures, 4 tables
Trajectory Planning for an Articulated Commercial Vehicle using Model Predictive Contouring Control
This paper presents a trajectory planning method for articulated commercial vehicles, specifically tractor-semitrailers, based on Model Predictive Contouring Control (MPCC). Although MPCC has proven effective for passenger cars, it is generally ill-suited for tractor-semitrailers. These vehicles are significantly larger, the semitrailer follows a different path than the tractor, and reversing maneuvers are unstable and prone to jackknifing. Furthermore, practical driving scenarios often require scenario-dependent prioritization of different vehicle `anchor points', e.g., prioritizing the semitrailer position during docking or the tractor position when parking to charge. Therefore, we extend MPCC to enable scenario-dependent weighting of these anchor points and incorporate explicit road-boundary constraints for the front and rear tractor axles and the semitrailer axle, thereby ensuring that all considered wheels remain within the drivable area. The simulation results demonstrate the successful navigation of a representative logistic scenario in both forward and reverse direction. Furthermore, the influence of the optimization parameters on the trajectories is analyzed, providing insights into controlling the vehicle behavior. Finally, first tests using a full-scale prototype vehicle show the practical applicability of the approach.
Event-based SLAM Benchmark for High-Speed Maneuvers
Event-based cameras are bio-inspired sensors with pixels that independently and asynchronously respond to brightness changes at microsecond resolution, offering the potential to handle visual tasks in high-speed maneuvering scenarios. Existing event-based approaches, although successful in mitigating motion blur caused by high-speed maneuvers, suffer from many limitations. Some of them highlight a success of pose tracking for a fronto-parallel fast shaking camera closed to the structure, while others assume pure (optionally aggressive) three-degree-of-freedom rotations. The former requires persistent local map visibility within the field of view (FOV), whereas the latter fails to generalize to six-degree-of-freedom (6-DoF) motions where both linear and angular velocities may be large. Consequently, current successes do not fully demonstrate that event-based state estimation under arbitrary aggressive maneuvers is a fully solved problem. To quantitatively assess the extent to which the potential of event cameras has been unlocked, we conduct a thorough analysis of state-of-the-art (SOTA) event-based visual odometry (VO)/visual-inertial odometry (VIO) methods and report shortcomings in current public datasets. Furthermore, we introduce a benchmarking framework for event-based state estimation, called EvSLAM, characterized by sufficient variation in data collection platforms, diverse extreme lighting scenarios, and a wide scope of challenging motion patterns under a clear and rigorous definition of high-speed maneuvers for mobile robots, along with a novel evaluation metric designed to fairly assess the operational limits of event-based solutions. This framework benchmarks state-of-the-art methods, yielding insights into optimal architectures and persistent challenges.
Betting for Sim-to-Real Performance Evaluation
This paper studies the problem of robot performance evaluation, focusing on how to obtain accurate and efficient estimates of real-world behavior under severe constraints on physical experimentation. Such estimates are essential for benchmarking algorithms, comparing design alternatives, validating controllers, and supporting certification or regulatory decision-making, yet real-world testing with physical robots is often expensive, time-consuming, and safety-limited. To mitigate the scarcity of real-world trials, sim-to-real methodologies are commonly employed, using low-cost simulators to inform, supplement, or prioritize physical experiments. Departing from (and complementary to) existing approaches in variance reduction (e.g., importance-sampling variants) or bias-correction (e.g., through prediction-powered inference or learned control variates), we examine this performance-evaluation problem through the lens of betting. We establish theoretical conditions under which a betting mechanism can yield accurate and efficient estimates (provably outperforming the Monte Carlo estimator) and we characterize how such bets should be constructed. We further develop theoretically grounded yet practically implementable approximations of the ideal bet, and we provide concrete decision rules that diagnose when these approximate betting strategies are working as intended. We demonstrate the effectiveness of the proposed methods using both synthetic examples and cross-fidelity computational simulators. Notably, we also showcase an illustrative case in which a group of synthetic distributions are used to infer the real-world pick-and-place accuracy of a robotic manipulator, a seemingly unconventional sim-to-real transfer that becomes natural and feasible under the proposed betting perspective. Programs for reproducing empirical results are available at https://github.com/ISUSAIL/Bet4Sim2Real.
comment: Accepted to RSS 2026, with DOI pending
Supporting Family-School Partnerships with Robot-Facilitated Home-Based Activities
Family-school partnerships (FSP) are critical to children's development, yet families often face barriers such as time constraints, fragmented communication, and limited opportunities for meaningful engagement. As a step toward facilitating broader family-school partnerships, we explore a novel approach that integrates a social robot into family settings, specifically supporting home-based activities. Through interviews and co-design sessions, we designed and developed a robotic system informed by both parents and children, that supported, among other interactions, family communication about school topics. We evaluated the robot in a week-long, in-home study with 10 families. Our findings show how families integrated the robot into daily life, how parental facilitation styles shaped use, and how families perceived both the helpfulness and challenges of the robot. We contribute empirical insights, a modular system, and design implications for family- and child-robot interactions. We discuss ethical and privacy considerations, and broaden the design space for technologies supporting family-school partnerships.
comment: Proceedings of the 25th Interaction Design and Children Conference (IDC '26)
Designing Robots to Support Parent-Child Connections: Opportunities Through Robot-Mediated Communication
The sense of family connectedness may support positive outcomes including individual well-being, resilience, and healthy family functioning. However, as technologies advance, they often replace human-human interactions instead of nurturing them. In this work, we investigate how robot-facilitated communication tools might instead create new opportunities for family connection. We conducted two studies with families with children aged 5-12. We first explored the design space through in-home technology probe sessions with six families. These probes inspired us to explore two key interaction design dimensions: the robot's behavior strategy (passive, reactive, proactive) and the mode of communication (synchronous, asynchronous). We then conducted a laboratory study with 20 families to examine how the two dimensions shaped parent-child interaction and connection. Our findings characterize how parents and children appropriated robot-mediated exchanges, the tensions they experienced around initiative, timing, and privacy, and the opportunities they envisioned for supporting everyday connectedness.
comment: Proceedings of the 25th Interaction Design and Children Conference (IDC '26)
Multi-Robot Motions in Milliseconds: Vector-Accelerated Primitives for Sampling-Based Planning
In this paper, we extend the recent Vector-Accelerated Motion Planning (VAMP) framework to multi-robot motion planning (MRMP). We develop two vector-accelerated primitives, multi-robot MotionValidation (MotVal) and FindFirstConflict (FFC), which exploit SIMD parallelism within the multi-robot domain. On pure multi-robot motion validation tests, this achieves over 1100X speedup in validation time. Additionally, we modify a representative set of MRMP algorithms to use these new primitives. The relative speedup for each algorithm is studied on scenarios with manipulator, rigid body, and heterogeneous teams with some instances producing multi-robot solutions in the order of milliseconds and, in many cases, shows planning time speedups of over 850X.
DiscreteRTC: Discrete Diffusion Policies are Natural Asynchronous Executors
Unlike chatbots, physical AI must act while the world keeps evolving. Therefore, the inter-chunk pause of synchronous executors are fatal for dynamic tasks regardless of how fast the inference is. Asynchronous execution -- thinking while acting -- is therefore a structural requirement, and real-time chunking (RTC) makes it viable by recasting chunk transitions as inpainting: freezing committed actions and consistently generating the remainder. However, RTC with flow-matching policy is structurally suboptimal: its inpainting comes from inference-time corrections rather than the base policy, yielding little pre-training benefit, specific fine-tuning, heuristic guidance, and extra computation that inflates the latency. In this work, we observe that discrete diffusion policies, which generate actions by iteratively unmasking, are natural asynchronous executors that resolve all limitations at once: they are fine-tuning free since inpainting is their native operation, while early stopping further provides adaptive guidance and reduces inference cost. We propose DiscreteRTC, which replaces external corrections with native unmasking, and show on dynamic simulated benchmarks and real-world dynamic manipulation tasks that it achieves higher success rates than continuous RTC and other baselines. In summary, DiscreteRTC is simpler to implement with 0 lines of code for async inpainting, faster at inference with only 0.7x computation compared with generating actions from scratch, and better at execution with 50% higher success rate in real-world dynamic pick task compared with flow-matching-based RTC. More visualizations are on https://outsider86.github.io/DiscreteRTCSite/.
TEACar: An Open-Source Autonomous Driving Platform
Intelligent Transportation Systems (ITS) increasingly rely on vision-based perception and learning-based control, necessitating experimental platforms that support realistic hardware-in-the-loop validation. Small-scale platforms for autonomous racing offer a practical path to hardware validation, but often suffer from limited modularity, high integration complexity, or restricted extensibility. This paper presents TEACAR, a 1/14- to 1/16-scale autonomous driving platform designed with modular mechanical architecture, hardware abstraction, and ROS 2-based software. The system adopts a four-layer deck structure that physically decouples sensing, computation, actuation, and power subsystems, improving structural rigidity while simplifying reconfiguration. We constructed and comprehensively evaluated the prototype of TEACAR. Its mechanical stability, structural characteristics, and software performance were quantified based on three CNN-based steering controllers. Inference latency, power consumption, and system operating time were measured to evaluate computational capability and robustness. Our experiments demonstrated that TEACAR offers a scalable, modular, and cost-effective testbed for ITS research, education, and development. Our project repository is available on GitHub.
Libra-VLA: Achieving Learning Equilibrium via Asynchronous Coarse-to-Fine Dual-System ACL 2026
Vision-Language-Action (VLA) models are a promising paradigm for generalist robotic manipulation by grounding high-level semantic instructions into executable physical actions. However, prevailing approaches typically adopt a monolithic generation paradigm, directly mapping visual-linguistic features to high-frequency motor commands in a flat, non-hierarchical fashion. This strategy overlooks the inherent hierarchy of robotic manipulation, where complex actions can be naturally modeled in a Hybrid Action Space, decomposing into discrete macro-directional reaching and continuous micro-pose alignment, severely widening the semantic-actuation gap and imposing a heavy representational burden on grounding high-level semantics to continuous actions. To address this, we introduce Libra-VLA, a novel Coarse-to-Fine Dual-System VLA architecture. We explicitly decouple the learning complexity into a coarse-to-fine hierarchy to strike a training equilibrium, while simultaneously leveraging this structural modularity to implement an asynchronous execution strategy. The Semantic Planner predicts discrete action tokens capturing macro-directional intent, while the Action Refiner conditions on coarse intent to generate high-frequency continuous actions for precise alignment. Crucially, our empirical analysis reveals that performance follows an inverted-U curve relative to action decomposition granularity, peaking exactly when the learning difficulty is balanced between the two sub-systems. With the asynchronous design, our approach offers a scalable, robust, and responsive solution for open-world manipulation.
comment: Accepted to the Main Conference of ACL 2026. Project page: https://libra-vla.github.io/
asRoBallet: Closing the Sim2Real Gap via Friction-Aware Reinforcement Learning for Underactuated Spherical Dynamics
We introduce asRoBallet, to the best of our knowledge, the first successful deployment of reinforcement learning (RL) on a humanoid ballbot hardware. Historically, ballbots have served as a canonical benchmark for underactuated and nonholonomic control, which are characterized by a reality gap in complex friction models for wheel-sphere-ground interactions. While current literature demonstrates successful handling of 3D balancing with LQR and MPC, transitioning to actual hardware for a humanoid ballbot using RL is currently hindered by critical gaps in contact modeling, actuator latency & jitter, and safe hardware exploration, and safe hardware exploration. This study proposes a high-fidelity MuJoCo simulation that explicitly models the discrete roller mechanics of ETH-type omni-wheels, thereby capturing parasitic vibrations and contact discontinuities that are previously ignored. We also developed a Friction-Aware Reinforcement Learning framework that achieves zero-shot Sim2Real transfer by mastering the coupled rolling, lateral, and torsional friction channels at the wheel-sphere and sphere-ground interfaces. We designed asRoBallet through subtractive reconfiguration, repurposing key components from an overconstrained quadruped and integrating them into a newly designed structural frame to achieve a robust research platform at low cost. We also developed a generalized iOS ecosystem that transforms consumer electronics into a low-latency interface, enabling a single operator to orchestrate expressive humanoid maneuvers via intuitive natural motion.
comment: 16 pages, 9 figure, accepted for RSS2026. For Supplementary Videos, see https://bionicdl.ancorasir.com/?p=2238
Logic of Fuzzy Paths
We introduce a new family of temporal logics intended for specifications in motion planning (MP). It builds upon the signal temporal logic (STL), which is a linear-time logic over real-valued signals that possess quantitative semantics and thus became popular in the areas of cyber-physical systems, robotics, and specifically robot MP. However, in contrast to STL, the proposed logic works with paths as first-class citizens, separating the concerns of geometry and of logic. This in turn leads to simpler and more understandable formulae, and a more refined notion of satisfaction being able to reflect also preferences over behaviours. Technically, the logic is built on fuzzy, time-varying signal constraints. As a consequence of this expressivity, it is (i) more usable for human-given specifications in MP and (ii) more amenable to learning specifications from demonstrations than other logics. The former is important for the traditional style of verification in robot MP; the latter is becoming recognized as crucial for mining data-given tasks and controller synthesis in human-aware MP. We expose the advantages of our proposed logic on examples and show the versatility and flexibility of the framework on a number of scenarios. Finally, we give a learning algorithm with a prototype implementation and discuss the possibilities of model checking and monitoring.
An analysis of sensor selection for fruit picking with suction-based grippers IROS
Robotic fruit harvesting often fails to reliably detect whether a fruit has been successfully picked, limiting efficiency and increasing crop damage. This problem is difficult due to compliant fruit and grippers, variable stem attachment, and occlusions in orchard environments. Prior work has explored vision-based perception and multi-sensor learning approaches for pick state estimation. However, minimal sensor sets and phase-dependent sensing strategies for accurate pick and slip detection remain largely unexplored. In this work, we design and evaluate a multimodal sensing suite integrated into a compliant suction-based apple gripper. Our approach is unique because it identifies which sensors are most informative at different phases of the pick, enabling predictive detection of failures before they occur. The contributions of this paper are a phase-dependent evaluation of multimodal sensors and the identification of minimal sensor sets for reliable pick state classification. Experiments in a real apple orchard show that Random Forest and Multilayer Perceptron classifiers detect successful picks and impending failures with over 90% accuracy, and Random Forest predicts pick/slip events within 0.09 s of human-annotated ground truth.
comment: IROS Conference Format, 6 pages, 6 figures, 1 table
VISION-SLS: Safe Perception-Based Control from Learned Visual Representations via System Level Synthesis
We propose VISION-SLS, a method for nonlinear output-feedback control from high-resolution RGB images which provides robust constraint satisfaction guarantees under calibrated uncertainty bounds despite partial observability, sensor noise, and nonlinear dynamics. To enable scalability while retaining guarantees, we propose: (i) a learned low-dimensional observation map from pretrained visual features with state-dependent error bounds, and (ii) a causal affine time-varying output-feedback policy optimized via System Level Synthesis (SLS). We develop a scalable, novel solver for the resulting nonconvex program that leverages sequential convex programming coupled with efficient Riccati recursions. On two simulated visuomotor tasks (a 4D car and a 10D quadrotor) with >= 512 x 512 pixels and a 59D humanoid task with partial observability, our method enables safe, information-gathering behavior that reduces uncertainty while guaranteeing constraint satisfaction with empirically-calibrated error bounds. We also validate our method on hardware, safely controlling a ground vehicle from onboard images, outperforming baselines in safety rate and solve times. Together, these results show that learned visual abstractions coupled with an efficient solver make SLS-based safe visuomotor output-feedback practical at scale. The code implementation of our method is available at https://github.com/trustworthyrobotics/VISION-SLS.
comment: Extended version; conference version to appear in Robotics: Science and Systems XXII (RSS 2026)
MotionBricks: Scalable Real-Time Motions with Modular Latent Generative Model and Smart Primitives SIGGRAPH 2026
Despite transformative advances in generative motion synthesis, real-time interactive motion control remains dominated by traditional techniques. In this work, we identify two key challenges in bridging research and production: 1) Real-time scalability: Industry applications demand real-time generation of a vast repertoire of motion skills, while generative methods exhibit significant degradation in quality and scalability under real-time computation constraints, and 2) Integration: Industry applications demand fine-grained multi-modal control involving velocity commands, style selection, and precise keyframes, a need largely unmet by existing text- or tag-driven models. To overcome these limitations, we introduce MotionBricks: a large-scale, real-time generative framework with a two-fold solution. First, we propose a large-scale modular latent generative backbone tailored for robust real-time motion generation, effectively modeling a dataset of over 350,000 motion clips with a single model. Second, we introduce smart primitives that provide a unified, robust, and intuitive interface for authoring both navigation and object interaction. Applications can be designed in a plug-and-play manner like assembling bricks without expert animation knowledge. Quantitatively, we show that MotionBricks produces state-of-the-art motion quality on open-source and proprietary datasets of various scales, while also achieving a real-time throughput of 15,000 FPS with 2ms latency. We demonstrate the flexibility and robustness of MotionBricks in a complete production-level animation demo, covering navigation and object-scene interaction across various styles with a unified model. To showcase our framework's application beyond animation, we deploy MotionBricks on the Unitree G1 humanoid robot to demonstrate its flexibility and generalization for real-time robotic control.
comment: ACM Transactions on Graphics; SIGGRAPH 2026. Project page: https://nvlabs.github.io/motionbricks/
SPEAR-1: Scaling Beyond Robot Demonstrations via 3D Understanding
Robotic Foundation Models (RFMs) hold great promise as generalist, end-to-end systems for robot control. Yet their ability to generalize across new environments, tasks, and embodiments remains limited. We argue that a major bottleneck lies in their foundations: most RFMs are built by fine-tuning internet-pretrained Vision-Language Models (VLMs). However, these VLMs are trained on 2D image-language tasks and lack the 3D spatial reasoning inherently required for embodied control in the 3D world. Bridging this gap directly with large-scale robotic data is costly and difficult to scale. Instead, we propose to enrich easy-to-collect non-robotic image data with 3D annotations and enhance a pretrained VLM with 3D understanding capabilities. Following this strategy, we train SPEAR-VLM, a 3D-aware VLM that infers object coordinates in 3D space from a single 2D image. Building on SPEAR-VLM, we introduce our main contribution, $~\textbf{SPEAR-1}$: a robotic foundation model that integrates grounded 3D perception with language-instructed embodied control. Trained on $\sim$45M frames from 24 Open X-Embodiment datasets, SPEAR-1 outperforms or matches state-of-the-art models such as $π_0$-FAST and $π_{0.5}$, while it uses 20$\times$ fewer robot demonstrations. This carefully-engineered training strategy unlocks new VLM capabilities and as a consequence boosts the reliability of embodied control beyond what is achievable with only robotic data. We make our model weights and 3D-annotated datasets publicly available at https://spear.insait.ai.
Balancing Act: Trading Off Odometry and Map Registration for Efficient Lidar Localization
Most autonomous vehicles rely on accurate and efficient localization, which is achieved by comparing live sensor data to a preexisting map, to navigate their environment. Balancing the accuracy of localization with computational efficiency remains a significant challenge, as high-accuracy methods often come with higher computational costs. In this paper, we present two ways of improving lidar localization efficiency and study their impact on performance. First, we integrate two lightweight odometry estimators, a correspondence-free Doppler-inertial estimator and a low-cost wheel odometer-gyroscope (OG) method, into a topometric localization pipeline and compare them against a state-of-the-art (SOTA) iterative closest point (ICP) baseline. We highlight the trade-offs between these approaches: the Doppler and OG estimators offer faster, lightweight updates, while ICP provides higher accuracy at the cost of increased computational load. Second, by controlling the frequency of localization updates and leveraging odometry estimates between them, we demonstrate that accurate localization can be maintained while optimizing for computational efficiency using any of the presented methods. We evaluate these approaches using over 100 km of unique real-world driving data in different on-road environments. By varying the localization interval, we demonstrate that computational effort can be reduced by 27%, 80%, and 91% for the ICP, Doppler, and OG estimators, respectively, while maintaining SOTA accuracy.
comment: 8 pages
Complementarity by Construction: A Lie-Group Approach to Solving Quadratic Programs with Linear Complementarity Constraints
Many problems in robotics require reasoning over a mix of continuous dynamics and discrete events, such as making and breaking contact in manipulation and locomotion. These problems are locally well modeled by linear complementarity quadratic programs (LCQPs), an extension to QPs that introduce complementarity constraints. While very expressive, LCQPs are non-convex, and few solvers exist for computing good local solutions for use in planning pipelines. In this work, we observe that complementarity constraints form a Lie group under infinitesimal relaxation, and leverage this structure to perform on-manifold optimization. We introduce a retraction map that is numerically well behaved, and use it to parameterize the constraints so that they are satisfied by construction. The resulting solver avoids many of the classical issues with complementarity constraints. We provide an open-source solver, Marble, that is implemented in C++ with Julia and Python bindings. We demonstrate that Marble is competitive on a suite of benchmark problems, and solves a number of robotics problems where existing approaches fail to converge.
Learning Versatile Humanoid Manipulation with Touch Dreaming
Humanoid robots promise general-purpose assistance, yet real-world humanoid loco-manipulation remains challenging because it requires whole-body stability, end-effector dexterity, and contact-aware interaction under frequent contact changes. In this work, we study dexterous, contact-rich humanoid loco-manipulation. We first develop an RL-based lower-body controller that serves as the stability backbone for whole-body execution during complex manipulation. Built on this controller, we develop a VR-based whole-body humanoid data collection system that integrates dexterous hands and tactile sensing for contact-rich manipulation. We then propose Humanoid Transformer with Touch Dreaming (HTD), a multimodal encoder--decoder Transformer that models touch as a core modality alongside multi-view vision and proprioception. HTD is trained in a single stage with behavioral cloning augmented by touch dreaming: in addition to predicting action chunks, the policy predicts future hand-joint forces and future tactile latents, with tactile-latent targets provided by an exponential moving average target encoder without requiring a separate tactile pretraining stage. This encourages the policy to learn contact-aware representations for dexterous manipulation. Across five real-world contact-rich tasks, HTD achieves a 90.9% relative improvement in average success rate over the stronger baseline. Ablation results further show that latent-space tactile prediction is more effective than raw tactile prediction, yielding a 30% relative gain in success rate. These results demonstrate that our touch-dreaming-enhanced learning system enables versatile, high-dexterity humanoid manipulation in the real world. More information and open-source materials are available at: humanoid-touch-dream.github.io.
RoboECC: Multi-Factor-Aware Edge-Cloud Collaborative Deployment for VLA Models IJCNN 2026
Vision-Language-Action (VLA) models are mainstream in embodied intelligence but face high inference costs. Edge-Cloud Collaborative (ECC) deployment offers an effective fix by easing edge-device computing pressure to meet real-time needs. However, existing ECC frameworks are suboptimal for VLA models due to two challenges: (1) Diverse model structures hinder optimal ECC segmentation point identification; (2) Even if the optimal split point is determined, changes in network bandwidth can cause performance drift. To address these issues, we propose a novel ECC deployment framework for various VLA models, termed RoboECC. Specifically, we propose a model-hardware co-aware segmentation strategy to help find the optimal segmentation point for various VLA models. Moreover, we propose a network-aware deployment adjustment approach to adapt to the network fluctuations for maintaining optimal performance. Experiments demonstrate that RoboECC achieves a speedup of up to 3.28x with only 2.55%~2.62% overhead.
comment: This paper has been accepted by IJCNN 2026
KERV: Kinematic-Rectified Speculative Decoding for Embodied VLA Models
Vision-Language-Action (VLA) models build a token-domain robot control paradigm, yet suffer from low speed. Speculative Decoding (SD) is an optimization strategy that can boost inference speed. Two key issues emerge when integrating VLA and SD: first, SD relies on re-inference to address token errors, which is computationally expensive; second, to mitigate token errors, the acceptance threshold in SD requires careful adjustment. Existing works fail to address the above two issues effectively. Meanwhile, as the bridge between AI and the physical world, existing embodied intelligence has overlooked the application of robotic kinematics. To address these issues, we innovatively combine token-domain VLA models with kinematic-domain prediction for SD, proposing a kinematic-rectified SD framework named KERV. We employ a kinematics-based Kalman Filter to predict actions and compensate for SD errors, avoiding costly re-inference. Moreover, we design a kinematics-based adjustment strategy to dynamically rectify the acceptance threshold, addressing the difficulty of threshold determination. Experimental results across diverse tasks and environments demonstrate that KERV achieves 27%~37% acceleration with nearly no Success Rate loss.
comment: This paper has been accepted by DAC 2026
HeiSD: Hybrid Speculative Decoding for Embodied Vision-Language-Action Models with Kinematic Awareness
Vision-Language-Action (VLA) Models have become the mainstream solution for robot control, but suffer from slow inference speeds. Speculative Decoding (SD) is a promising acceleration method which can be divided into two categories: drafter-based SD and retrieval-based SD. Each of the two methods demonstrates complementary advantages and limitations when applied to VLA models, leading to the hypothesis that a hybrid approach integrating these two methods will yield better performance. In this paper, we first conduct a series of detailed analyses to reveal the advantages and feasibility of hybrid utilization. However, even with the aforementioned key insights, implementing hybrid SD in VLA models presents several challenges: (1) draft rejection and persistent errors in retrieval-based SD; (2) difficulty in determining the hybrid boundary. To address these, we propose the HeiSD framework. We propose a retrieval-based SD optimization method in HeiSD, which contains a verify-skip mechanism and a sequence-wise relaxed acceptance strategy. Moreover, we proposed a kinematic-based fused metric in HeiSD to automatically determine the hybrid boundary. Experimental results demonstrate that HeiSD attains a speedup of up to 2.45x in simulation benchmarks and 2.06x~2.41x in real-world scenarios, while sustaining a high task success rate.
Genie Sim PanoRecon: Fast Immersive Scene Generation from Single-View Panorama
We present Genie Sim PanoRecon, a feed-forward Gaussian-splatting pipeline that delivers high-fidelity, low-cost 3D scenes for robotic manipulation simulation. The panorama input is decomposed into six non-overlapping cube-map faces, processed in parallel, and seamlessly reassembled. To guarantee geometric consistency across views, we devise a depth-aware fusion strategy coupled with a training-free depth-injection module that steers the monocular feed-forward network to generate coherent 3D Gaussians. The whole system reconstructs photo-realistic scenes in seconds and has been integrated into Genie Sim - a LLM-driven simulation platform for embodied synthetic data generation and evaluation - to provide scalable backgrounds for manipulation tasks. For code details, please refer to: https://github.com/AgibotTech/genie_sim/tree/main/source/geniesim_world.
InCoM: Intent-Driven Perception and Structured Coordination for Mobile Manipulation
Mobile manipulation is a fundamental capability for general-purpose robotic agents, requiring both coordinated control of the mobile base and manipulator and robust perception under dynamically changing viewpoints. However, existing approaches face two key challenges: strong coupling between base and arm actions complicates control optimization, and perceptual attention is often poorly allocated as viewpoints shift during mobile manipulation. We propose InCoM, an intent-driven perception and structured coordination framework for mobile manipulation. InCoM infers latent motion intent to dynamically reweight multi-scale perceptual features, enabling stage-adaptive allocation of perceptual attention. To support robust cross-modal perception, InCoM further incorporates a geometric-semantic structured alignment mechanism that enhances multimodal correspondence. On the control side, we design a decoupled coordinated flow matching action decoder that explicitly models coordinated base-arm action generation, alleviating optimization difficulties caused by control coupling. Experimental results demonstrate that InCoM significantly outperforms state-of-the-art methods, achieving success rate gains of 28.2%, 26.1%, and 23.6% across three ManiSkill-HAB scenarios without privileged information. Furthermore, its effectiveness is consistently validated in real-world mobile manipulation tasks, where InCoM maintains a superior success rate over existing baselines.
The Swarm Intelligence Freeway-Urban Trajectories (SWIFTraj) Dataset -- Part II: A Graph-Based Approach for Trajectory Connection
In Part I of this companion paper series, we introduced SWIFTraj, a new open-source vehicle trajectory dataset collected using a unmanned aerial vehicle (UAV) swarm. The dataset has two distinctive features. First, by connecting trajectories across consecutive UAV videos, it provides long-distance continuous trajectories, with the longest exceeding 4.5 km. Second, it covers an integrated traffic network consisting of both freeways and their connected urban roads. Obtaining such long-distance continuous trajectories from a UAV swarm is challenging, due to the need for accurate time alignment across multiple videos and the irregular spatial distribution of UAVs. To address these challenges, this paper proposes a novel graph-based approach for connecting vehicle trajectories captured by a UAV swarm. An undirected graph is constructed to represent flexible UAV layouts, and an automatic time alignment method based on trajectory matching cost minimization is developed to estimate optimal time offsets across videos. To associate trajectories of the same vehicle observed in different videos, a vehicle matching table is established using the Hungarian algorithm. The proposed approach is evaluated using both simulated and real-world data. Results from real-world experiments show that the time alignment error is within three video frames, corresponding to approximately 0.1 s, and that the vehicle matching achieves an F1-score of about 0.99. These results demonstrate the effectiveness of the proposed method in addressing key challenges in UAV-based trajectory connection and highlight its potential for large-scale vehicle trajectory collection.
One-Shot Real-World Demonstration Synthesis for Scalable Bimanual Manipulation
Learning dexterous bimanual manipulation policies critically depends on large-scale, high-quality demonstrations, yet current paradigms face inherent trade-offs: teleoperation provides physically grounded data but is prohibitively labor-intensive, while simulation-based synthesis scales efficiently but suffers from sim-to-real gaps. We present BiDemoSyn, a framework that synthesizes contact-rich, physically feasible bimanual demonstrations from a single real-world example. The key idea is to decompose tasks into invariant coordination blocks and variable, object-dependent adjustments, then adapt them through vision-guided alignment and lightweight trajectory optimization. This enables the generation of thousands of diverse and feasible demonstrations within several hours, without repeated teleoperation or reliance on imperfect simulation. Across six dual-arm tasks, we show that policies trained on BiDemoSyn data generalize robustly to novel object poses and shapes, significantly outperforming recent strong baselines. Beyond the one-shot setting, BiDemoSyn naturally extends to few-shot-based synthesis, improving object-level diversity and out-of-distribution generalization while maintaining strong data efficiency. Moreover, policies trained on BiDemoSyn data exhibit zero-shot cross-embodiment transfer to new robotic platforms, enabled by object-centric observations and a simplified 6-DoF end-effector action representation that decouples policies from embodiment-specific dynamics. By bridging the gap between efficiency and real-world fidelity, BiDemoSyn provides a scalable path toward practical imitation learning for complex bimanual manipulation without compromising physical grounding.
comment: accepted by RSS 2026. The project link is https://hnuzhy.github.io/projects/BiDemoSyn/
Muscle Coactivation in the Sky: Geometry and Pareto Optimality of Energy vs. Aerodynamic Promptness and Multirotors as Variable Stiffness Actuators
In robotics and biomechanics, trading metabolic cost for kinematic readiness is a well-established principle. This paper formalizes this concept for aerial multirotors through the introduction of aerodynamic promptness -- a dynamic metric analogous to dynamic manipulability in robotics. By formulating redundancy resolution as a geometric multi-objective optimization along task fibers, we rigorously characterize the topological trade-off between energy consumption and promptness. We demonstrate that this interplay is fundamentally governed by fiber geometry. Cooperative actuation regime yields compact fibers with bounded, compatible Pareto fronts. Conversely, antagonistic actuation regime unlocks unbounded fibers, enabling aerodynamic co-contraction that drives promptness to hardware limits at the expense of flight endurance. We establish a structural isomorphism between aerodynamic co-contraction and biologically inspired variable stiffness actuators, introducing a dynamic ``flying muscle'' paradigm. Ultimately, this framework transitions multirotor allocation from heuristic energy minimization to principled, geometry-aware Pareto navigation, laying foundational theory for the design and control of highly agile aerial platforms.
comment: Accepted for IEEE ICUAS 2026
INHerit-SG: Incremental Hierarchical Semantic Scene Graphs with RAG-Style Retrieval
Driven by recent advancements in foundation models, semantic scene graphs have emerged as a promising paradigm for high-level 3D environmental abstraction in robot navigation. However, existing frameworks struggle to successfully handle complex embodied queries while ensuring continuous semantic graph construction. To address these limitations, we present INHerit-SG, an asynchronous dual-stream architecture that systematically structures the 3D environment into a RAG-ready knowledge base. Specifically, our framework integrates comprehensive node representations, an event-triggered asynchronous update scheme, and a structured retrieval mechanism. While geometric segmentation is decoupled from semantic reasoning to maintain mapping efficiency, the semantic nodes also store natural language summaries to support text-based retrieval. Furthermore, we propose an interpretable retrieval pipeline that couples the reasoning capabilities of multi-role LLMs with the topological structure of the scene graph, followed by a visual verification process to mitigate false positives. We evaluate INHerit-SG on a newly constructed benchmark for complex embodied semantic query retrieval, HM3DSem-SQR, and in real-world environments. Experiments demonstrate that our system achieves state-of-the-art performance on complex queries, especially for those involving negations and chained spatial constraints. Project Page: https://fangyuktung.github.io/INHeritSG.github.io/
World-Env: Leveraging World Model as a Virtual Environment for VLA Post-Training
Vision-Language-Action (VLA) models trained via imitation learning suffer from significant performance degradation in data-scarce scenarios due to their reliance on large-scale demonstration datasets. Although reinforcement learning (RL)-based post-training has proven effective in addressing data scarcity, its application to VLA models is hindered by the non-resettable nature of real-world environments. This limitation is particularly critical in high-risk domains such as industrial automation, where interactions often induce state changes that are costly or infeasible to revert. Furthermore, existing VLA approaches lack a reliable mechanism for detecting task completion, leading to redundant actions that reduce overall task success rates. To address these challenges, we propose World-Env, an RL-based post-training framework that replaces physical interaction with a low-cost world model-based virtual simulator. World-Env consists of two key components: (1) a physically-consistent world simulator that generates temporally consistent future visual observations, and (2) a vision-language model (VLM)-guided instant reflector that provides continuous reward signals and predicts action termination. This simulated environment enables VLA models to safely explore and generalize beyond their initial imitation learning distribution. Our method achieves notable performance gains with as few as five expert demonstrations per task. Experiments on complex robotic manipulation tasks demonstrate that World-Env effectively overcomes the data inefficiency, safety constraints, and inefficient execution of conventional VLA models that rely on real-world interaction, offering a practical and scalable solution for post-training in resource-constrained settings. Our code is available at https://github.com/amap-cvlab/world-env.
Learning Scene-Level Signed Directional Distance Function with Ellipsoidal Priors and Neural Residuals
Dense reconstruction and differentiable rendering are fundamental tightly connected operations in 3D vision and computer graphics. Recent neural implicit representations demonstrate compelling advantages in reconstruction fidelity and differentiability over conventional discrete representations such as meshes, point clouds, and voxels. However, many neural implicit models, such as neural radiance fields (NeRF) and signed distance function (SDF) networks, are inefficient in rendering due to the need to perform multiple queries along each camera ray. Moreover, NeRF and Gaussian Splatting methods offer impressive photometric reconstruction but often require careful supervision to achieve accurate geometric reconstruction. To address these challenges, we propose a novel representation called signed directional distance function (SDDF). Unlike SDF and similar to NeRF, SDDF has a position and viewing direction as input. Like SDF and unlike NeRF, SDDF directly provides distance to the observed surface rather than integrating along the view ray. As a result, SDDF achieves accurate geometric reconstruction and efficient differentiable directional distance prediction. To learn and predict scene-level SDDF efficiently, we develop a differentiable hybrid representation that combines explicit ellipsoid priors and implicit neural residuals. This allows the model to handle distance discontinuities around obstacle boundaries effectively while preserving the ability for dense high-fidelity distance prediction. Through extensive evaluation against state-of-the-art representations, we show that SDDF achieves (i) competitive SDDF prediction accuracy, (ii) faster prediction speed than SDF and NeRF, and (iii) superior geometric consistency compared to NeRF and Gaussian Splatting.
Humanoid Whole-Body Badminton via Multi-Stage Reinforcement Learning
Humanoid robots have demonstrated strong capabilities for interacting with static scenes across locomotion and manipulation, yet dynamic real-world interactions remain challenging. As a step toward fast-moving object interactions, we present a reinforcement-learning training pipeline that yields a unified whole-body controller for humanoid badminton, coordinating footwork and striking without motion priors or expert demonstrations. Training follows a three-stage curriculum (footwork acquisition, precision-guided swing generation, and task-focused refinement) so legs and arms jointly serve the hitting objective. For deployment, we use an Extended Kalman Filter (EKF) to estimate and predict shuttlecock trajectories for target striking, and also develop a prediction-free variant that removes the EKF and explicit prediction. We validate the framework with five sets of experiments in simulation and on hardware. In simulation, two robots sustain a rally of 21 consecutive hits. In real-world tests with both machine-fed shuttles and human-robot rallies, the robot achieves outgoing shuttle speeds up to 19.1~m/s with a mean return landing distance of 4~m. Moreover, the prediction-free variant attains comparable performance to the EKF-based target-known policy. Overall, our approach enables dynamic yet precise goal striking in humanoid badminton and suggests a path toward more dynamics-critical whole-body interaction tasks.
comment: Project Page: https://humanoid-badminton.github.io/Humanoid-Whole-Body-Badminton-via-Multi-Stage-Reinforcement-Learning
SARM: Stage-Aware Reward Modeling for Long Horizon Robot Manipulation
Large-scale robot learning has made progress on complex manipulation tasks, yet long horizon, contact rich problems, especially those involving deformable objects, remain challenging due to inconsistent demonstration quality. We propose a stage-aware, video-based reward modeling framework that jointly predicts task stage and fine-grained progress, using natural language subtask annotations to derive consistent labels across variable-length demonstrations. This avoids the brittleness of frame index based labeling and provides stable supervision even in tasks like T-shirt folding. Our reward model is robust to demonstration variability, generalizes to out-of-distribution scenarios, and improves downstream policy training. Building on it, we introduce Reward-Aligned Behavior Cloning (RA-BC), which filters and reweights demonstrations based on reward estimates. Experiments show that our method significantly outperforms baselines in both real-world rollouts and human validation. On T-shirt folding, we achieve 83% success from the flattened state and 67% from the crumpled state, compared to 8% and 0% with vanilla BC. Overall, our results highlight reward modeling as a scalable and annotation-efficient solution for long horizon robotic manipulation. Project website: https://qianzhong-chen.github.io/sarm.github.io/
Aegis: Automated Error Generation and Attribution for Multi-Agent Systems
Large language model based multi-agent systems (MAS) have unlocked significant advancements in tackling complex problems, but their increasing capability introduces a structural fragility that makes them difficult to debug. A key obstacle to improving their reliability is the severe scarcity of large-scale, diverse datasets for error attribution, as existing resources rely on costly and unscalable manual annotation. To address this bottleneck, we introduce Aegis, a novel framework for Automated error generation and attribution for multi-agent systems. Aegis constructs a large dataset of 9,533 trajectories with annotated faulty agents and error modes, covering diverse MAS architectures and task domains. This is achieved using a LLM-based manipulator that can adaptively inject context-aware errors into successful execution trajectories. Leveraging fine-grained labels and the structured arrangement of positive-negative sample pairs, Aegis supports three different learning paradigms: Supervised Fine-Tuning, Reinforcement Learning, and Contrastive Learning. We develop learning methods for each paradigm. Comprehensive experiments show that trained models consistently achieve substantial improvements in error attribution. Notably, several of our fine-tuned LLMs demonstrate performance competitive with or superior to proprietary models an order of magnitude larger, validating our automated data generation framework as a crucial resource for developing more robust and interpretable multi-agent systems. Our project website is available at https://kfq20.github.io/Aegis-Website/.
Introduction to Online Control
This text presents an introduction to an emerging paradigm in control of dynamical systems and differentiable reinforcement learning called online nonstochastic control. The new approach applies techniques from online convex optimization and convex relaxations to obtain new methods with provable guarantees for classical settings in optimal and robust control. The primary distinction between online nonstochastic control and other frameworks is the objective. In optimal control, robust control, and other control methodologies that assume stochastic noise, the goal is to perform comparably to an offline optimal strategy. In online nonstochastic control, both the cost functions as well as the perturbations from the assumed dynamical model are chosen by an adversary. Thus the optimal policy is not defined a priori. Rather, the target is to attain low regret against the best policy in hindsight from a benchmark class of policies. This objective suggests the use of the decision making framework of online convex optimization as an algorithmic methodology. The resulting methods are based on iterative mathematical optimization algorithms, and are accompanied by finite-time regret and computational complexity guarantees.
comment: Draft; comments/suggestions welcome at nonstochastic.control@gmail.com
Multiagent Systems
FastOMOP: A Foundational Architecture for Reliable Agentic Real-World Evidence Generation on OMOP CDM data
The Observational Medical Outcomes Partnership Common Data Model (OMOP CDM), maintained by the Observational Health Data Sciences and Informatics (OHDSI) collaboration, enabled the harmonisation of electronic health records data of nearly one billion patients in 83 countries. Yet generating real-world evidence (RWE) from these repositories remains a manual process requiring clinical, epidemiological and technical expertise. LLMs and multi-agent systems have shown promise for clinical tasks, but RWE automation exposes a fundamental challenge: agentic systems introduce emergent behaviours, coordination failures and safety risks that existing approaches fail to govern. No infrastructure exists to ensure agentic RWE generation is flexible, safe and auditable across the lifecycle. We introduce FastOMOP, an open-source multi-agent architecture that addresses this gap by separating three infrastructure layers, governance, observability and orchestration, from pluggable agent-teams. Governance is enforced at the process boundary through deterministic validation independent of agent reasoning, ensuring no compromised or hallucinating agent can bypass safety controls. Agent teams for phenotyping, study design and statistical analysis inherit these guarantees through controlled tool exposure. We validated FastOMOP using a natural-language-to-SQL agent team across three OMOP CDM datasets: synthetic data from Synthea, MIMIC-IV and a real-world NHS dataset from Lancashire Teaching Hospitals (IDRIL). FastOMOP achieved reliability scores of 0.84-0.94 with perfect adversarial and out-of-scope block rates, demonstrating process-boundary governance delivers safety guarantees independent of model choice. These results indicate that the reliability gap in RWE deployment is architectural rather than model capability, and establish FastOMOP as a governed architecture for progressive RWE automation.
Agentic Witnessing: Pragmatic and Scalable TEE-Enabled Privacy-Preserving Auditing
Auditing the semantic properties of proprietary data creates a fundamental tension: verification requires transparent access, while proprietary rights demand confidentiality. While Zero-Knowledge Proofs (ZKPs) ensure privacy, they are typically limited to precise algebraic constraints and are ill-suited for verifying qualitative, unstructured properties, such as the logic within a codebase. We propose {\em Agentic Witnessing}, a framework that moves verification from attested execution to {\em attested reasoning}. The system is composed of three agents: a Verifier (who wants to check properties of a dataset), a Prover (who owns the dataset) and an Auditor (that inspects the dataset). The Verifier is allowed to ask a limited number of simple binary true/false questions to the auditor. By isolating an LLM-based Auditor within a Trusted Execution Environment (TEE), the system enables the Verifier to query a Prover's private data via simple Boolean queries, without exposing the raw dataset. The Auditor uses the Model Context Protocol (MCP) to dynamically inspect the target dataset, producing a yes/no verdict accompanied by a cryptographic transcript: a signed hash chain binding the reasoning trace to both the original dataset and the TEE's hardware root of trust. We demonstrate this architecture by automating the artifact evaluation process for 21 peer-reviewed computer science papers with released codebases on GitHub (e.g. Does the codebase implement the system described in the paper?). We verified five high-level properties of these codebases described in the corresponding publications, treating the source code as private. Our results show that TEE-enabled agentic auditing provides a mechanism for privacy-preserving oversight, effectively decoupling qualitative verification from the need for data disclosure.
Rewarding the Scientific Process: Process-Level Reward Modeling for Agentic Data Analysis
Process Reward Models (PRMs) have achieved remarkable success in augmenting the reasoning capabilities of Large Language Models (LLMs) within static domains such as mathematics. However, their potential in dynamic data analysis tasks remains underexplored. In this work, we first present a empirical study revealing that general-domain PRMs struggle to supervise data analysis agents. Specifically, they fail to detect silent errors, logical flaws that yield incorrect results without triggering interpreter exceptions, and erroneously penalize exploratory actions, mistaking necessary trial-and-error exploration for grounding failures. To bridge this gap, we introduce DataPRM, a novel environment-aware generative process reward model that (1) can serve as an active verifier, autonomously interacting with the environment to probe intermediate execution states and uncover silent errors, and (2) employs a reflection-aware ternary reward strategy that distinguishes between correctable grounding errors and irrecoverable mistakes. We design a scalable pipeline to construct over 8K high-quality training instances for DataPRM via diversity-driven trajectory generation and knowledge-augmented step-level annotation. Experimental results demonstrate that DataPRM improves downstream policy LLMs by 7.21% on ScienceAgentBench and 11.28% on DABStep using Best-of-N inference. Notably, with only 4B parameters, DataPRM outperforms strong baselines, and exhibits robust generalizability across diverse Test-Time Scaling strategies. Furthermore, integrating DataPRM into Reinforcement Learning yields substantial gains over outcome-reward baselines, achieving 78.73% on DABench and 64.84% on TableBench, validating the effectiveness of process reward supervision. Code is available at https://github.com/zjunlp/DataMind.
comment: Work in progress
EPM-RL: Reinforcement Learning for On-Premise Product Mapping in E-Commerce
Product mapping, the task of deciding whether two e-commerce listings refer to the same product, is a core problem for price monitoring and channel visibility. In real marketplaces, however, sellers frequently inject promotional keywords, platform-specific tags, and bundle descriptions into titles, causing the same product to appear under many different names. Recent LLM-based and multi-agent frameworks improve robustness and interpretability on such hard cases, but they often rely on expensive external APIs, repeated retrieval, and complex inference-time orchestration, making large-scale deployment costly and difficult in privacy-sensitive enterprise settings. To address these issues, we present EPM-RL, a reinforcement-learning-based framework for building an accurate and efficient on-premise e-commerce product mapping model. Our central idea is to distill high-cost agentic reasoning into a trainable in-house model. Starting from a curated set of product pairs with LLM-generated rationales and human verification, we first perform parameter-efficient fine-tuning (PEFT) on a small student model using structured reasoning outputs. We then further optimize the model with Reinforcement Learning (RL) using an agent-based reward that jointly evaluates output-format compliance, label correctness, reasoning--preference scores from specially designed judge models. Preliminary results show that EPM-RL consistently improves over PEFT-only training and offers a stronger quality--cost trade-off than commercial API-based baselines, while enabling private deployment and lower operational cost. These findings suggest that reinforcement learning can turn product mapping from a high-latency agentic pipeline into a scalable, inspectable, and production-ready in-house system.
comment: preprint
LLM-Guided Agentic Floor Plan Parsing for Accessible Indoor Navigation of Blind and Low-Vision People
Indoor navigation remains a critical accessibility challenge for the blind and low-vision (BLV) individuals, as existing solutions rely on costly per-building infrastructure. We present an agentic framework that converts a single floor plan image into a structured, retrievable knowledge base to generate safe, accessible navigation instructions with lightweight infrastructure. The system has two phases: a multi-agent module that parses the floor plan into a spatial knowledge graph through a self-correcting pipeline with iterative retry loops and corrective feedback; and a Path Planner that generates accessible navigation instructions, with a Safety Evaluator agent assessing potential hazards along each route. We evaluate the system on the real-world UMBC Math and Psychology building (floors MP-1 and MP-3) and on the CVC-FP benchmark. On MP-1, we achieve success rates of 92.31%, 76.92%, and 61.54% for short, medium, and long routes, outperforming the strongest single-call baseline (Claude 3.7 Sonnet) at 84.62%, 69.23%, and 53.85%. On MP-3, we reach 76.92%, 61.54%, and 38.46%, compared to the best baseline at 61.54%, 46.15%, and 23.08%. These results show consistent gains over single-call LLM baselines and demonstrate that our workflow is a scalable solution for accessible indoor navigation for BLV individuals.
Asymmetric-Information Resource Allocation Games: An LP Approach to Purposeful Deception
In this work, we introduce the Deceptive Resource Allocation Game (DRAG), which studies purposeful deception within a Bayesian game framework. In DRAG, a Defender allocates resources across the true asset and several decoys to influence an Attacker's beliefs and actions, with the goal of diverting the Attacker away from the true asset. We seek to characterize purposeful deception, whereby the Defender deceives only when doing so improves its performance. To this end, we solve for the Perfect Bayesian Nash Equilibrium (PBNE) of the corresponding game. We show that, despite the coupled belief-policy interdependence, the problem admits an efficient, non-iterative linear programming formulation. Numerical results demonstrate that the resulting policies naturally balance effective allocation and belief manipulation, giving rise to purposeful and emergent deceptive behaviors.
Frontier Coding Agents Can Now Implement an AlphaZero Self-Play Machine Learning Pipeline For Connect Four That Performs Comparably to an External Solver
Forecasting when AI systems will become capable of meaningfully accelerating AI research is a central challenge for AI safety. Existing benchmarks measure broad capability growth, but may not provide ample early warning signals for recursive self-improvement. We propose measuring AI's capability to autonomously implement end-to-end machine learning pipelines from past AI research breakthroughs, given a minimal task description. By providing a concise task description instead of the full prior work as reference, we hope to better elicit emerging AI research taste. We introduce a proof-of-concept benchmark in which frontier coding agents autonomously implement an AlphaZero-style machine learning pipeline for Connect Four on consumer hardware within a three-hour budget, and we evaluate the resulting game AIs in a round-robin tournament anchored to the Pascal Pons Connect Four solver. Across four agents with eight trials each, we find substantial differentiation: Claude Opus 4.7 won as first-mover against Pons in seven of eight trials, statistically significantly better than the other agents tested, none of which exceeded two of eight. The task, which no frontier agent could reliably complete when we began development in January of 2026, is now near-saturation. Our evaluation also surfaced anomalous behavior in GPT-5.4, which consistently used far less of its allocated time budget than other agents. A follow-up 16-trial probe using shorter, less evaluation-coded prompts substantially increased GPT-5.4's time-budget usage, consistent with but not diagnostic of sandbagging; Bradley-Terry ratings across probe conditions showed only directional differences, despite significant differences in time-budget usage. We release our data, code, and prompts to support reproduction and extension.
MultiHedge: Adaptive Coordination via Retrieval-Augmented Control CCS 2026
Decision-making under changing conditions remains a fundamental challenge in many real-world systems. Existing approaches often fail to generalize across shifting regimes and exhibit unstable behavior under uncertainty. This raises the research question: can retrieval-augmented LLM coordination improve the robustness of modular decision pipelines? We propose MultiHedge, a hybrid architecture where an LLM produces structured allocation decisions conditioned on retrieved historical precedents, and execution is grounded in canonical option strategies. In a controlled evaluation using U.S. equities, we compare MultiHedge to rule-based and learning-based baselines. The key result is that memory-augmented retrieval confers greater robustness and stability than increasing model scale alone. Our paper contributes a controlled computational study showing that memory and architectural design play a central role in robustness in modular decision systems.
comment: 8 pages, 2 figures. Accepted to the 26th International Conference on Computational Science (ICCS 2026), to appear in Springer LNCS proceedings
Co-Director: Agentic Generative Video Storytelling
While diffusion models generate high-fidelity video clips, transforming them into coherent storytelling engines remains challenging. Current agentic pipelines automate this via chained modules but suffer from semantic drift and cascading failures due to independent, handcrafted prompting. We present Co-Director, a hierarchical multi-agent framework formalizing video storytelling as a global optimization problem. To ensure semantic coherence, we introduce hierarchical parameterization: a multi-armed bandit globally identifies promising creative directions, while a local multimodal self-refinement loop mitigates identity drift and ensures sequence-level consistency. This balances the exploration of novel narrative strategies with the exploitation of effective creative configurations. For evaluation, we introduce GenAD-Bench, a 400-scenario dataset of fictional products for personalized advertising. Experiments demonstrate that Co-Director significantly outperforms state-of-the-art baselines, offering a principled approach that seamlessly generalizes to broader cinematic narratives. Project Page: https://co-director-agent.github.io/
comment: Project Page: https://co-director-agent.github.io/
GAMMAF: A Common Framework for Graph-Based Anomaly Monitoring Benchmarking in LLM Multi-Agent Systems
The rapid integration of Large Language Models (LLMs) into Multi-Agent Systems (MAS) has significantly enhanced their collaborative problem-solving capabilities, but it has also expanded their attack surfaces, exposing them to vulnerabilities such as prompt infection and compromised inter-agent communication. While emerging graph-based anomaly detection methods show promise in protecting these networks, the field currently lacks a standardized, reproducible environment to train these models and evaluate their efficacy. To address this gap, we introduce Gammaf (Graph-based Anomaly Monitoring for LLM Multi-Agent systems Framework), an open-source benchmarking platform. Gammaf is not a novel defense mechanism itself, but rather a comprehensive evaluation architecture designed to generate synthetic multi-agent interaction datasets and benchmark the performance of existing and future defense models. The proposed framework operates through two interdependent pipelines: a Training Data Generation stage, which simulates debates across varied network topologies to capture interactions as robust attributed graphs, and a Defense System Benchmarking stage, which actively evaluates defense models by dynamically isolating flagged adversarial nodes during live inference rounds. Through rigorous evaluation using established defense baselines (XG-Guard and BlindGuard) across multiple knowledge tasks (such as MMLU-Pro and GSM8K), we demonstrate Gammaf's high utility, topological scalability, and execution efficiency. Furthermore, our experimental results reveal that equipping an LLM-MAS with effective attack remediation not only recovers system integrity but also substantially reduces overall operational costs by facilitating early consensus and cutting off the extensive token generation typical of adversarial agents.
ITAS: A Multi-Agent Architecture for LLM-Based Intelligent Tutoring
Large language model tutors are easy to build in a notebook and hard to run in a real course. We describe ITAS (Intelligent Teaching Assistant System), a multi-agent tutoring system that a graduate quantum computing course used for a semester at Old Dominion University. The system has three layers. The teaching layer is a Spoke-and-Wheel of three parallel specialist agents (Video, Code, Guidance) followed by a Synthesizer, plus a separate autograder that evaluates both the correctness and the approach of checkpoint submissions. The operational layer is four Cloud Run microservices with session state in Cloud SQL and interaction events streamed through Pub/Sub to BigQuery. The feedback layer is a narrow-scope conversational agent that answers instructor questions over per-lesson pseudonymized event streams, addressing what we call the Blind Instructor Problem: LLM tutors accumulate more data about students than the instructor can reach through routine channels. The architecture is a direct response to specific failures of an earlier prototype, and we describe which of those fixes carried forward and which were dropped for this iteration. We report on a pilot deployment (five students, one course, one semester) interpreted as system-behavior evidence rather than learning-outcome evidence: the teaching layer handled 334 chat turns without the task-boundary hallucinations that domain consolidation would have risked, the operational layer captured 10,628 events across five modules, and the feedback layer surfaced two findings the instructor acted on mid-semester. We do not claim the pilot generalizes. We do claim that the system as described is one workable answer to the question of what an LLM-based ITS needs to look like end-to-end to run in a real course.
comment: Companion papers: arXiv:Q-ID (Quantum deployment), arXiv:L-ID (Latency analysis)
From Prototype to Classroom: An Intelligent Tutoring System for Quantum Education
Quantum computing instructors face a compounding problem: the concepts are counterintuitive, the mathematical formalism is dense, and qualified faculty are scarce outside a small number of well-resourced institutions. Our prior work introduced a knowledge-graph-augmented tutoring prototype with two specialized LLM agents: a Teaching Agent for dynamic interaction and a Lesson Planning Agent for lesson generation. Validated on simulated runs rather than in a real course, that prototype left open whether more aggressive agent specialization would be needed to handle the full range of quantum education tasks under real student load. This paper answers the three questions that the prototype could not answer. Can agent specialization solve the reliability problem in a domain as technically demanding as quantum information science? Can the system run in a real course, not a demonstration? Does the instructor gain actionable intelligence from the deployment? We present ITAS (Intelligent Teaching Assistant System), a multi-agent tutoring system built around four contributions: a five-module QIS curriculum grounded in Watrous's information-first framework, a Spoke-and-Wheel teaching architecture with quantum-specialized agents, a cloud infrastructure designed for production use and regulatory compliance, and a conversational analytics layer for instructors and content developers. Piloted in a quantum computing course at Old Dominion University, the system supports all three answers: deployment evidence is consistent with specialization addressing the task-boundary failures observed in the prototype, cloud infrastructure supports classroom-scale concurrency at sub-textbook cost, and the analytics agent surfaces curriculum gaps the instructor could not otherwise see.
comment: 10 pages, 6 figures, 1 table. Submitted to IEEE QCE 2026. Companion papers (in preparation): ITAS architecture and latency analysis
Why Are We Moral? An LLM-based Agent Simulation Approach to Study Moral Evolution ACL 2026
The evolution of morality presents a puzzle: natural selection should favor self-interest, yet humans developed moral systems promoting altruism. Traditional approaches must abstract away cognitive processes, leaving open how cognitive factors shape moral evolution. We introduce an LLM-based agent simulation framework that brings cognitive realism to this question: agents with varying moral dispositions perceive, remember, reason, and decide in a simulated prehistoric hunter-gatherer society. This enables us to manipulate factors that traditional models cannot represent -- such as moral type observability and communication bandwidth -- and to discover emergent cognitive mechanisms from agent interactions. Across 20 runs spanning four settings, we find that cooperation and mutual help are the central driver of evolutionary survival, with universal and reciprocal morality exhibiting the most stable outcomes across conditions while selfishness is strongly disfavoured. Beyond cooperation itself, we further identify cognition as a central mediator -- most clearly through a cost of moral judgment that shifts the winning moral type across settings, with a self-purging effect among selfish agents as an additional cognitive pattern. We validate robustness across multiple LLM backbones, architecture ablations, and prompt sensitivity analyses. This work establishes LLM-based simulation as a powerful new paradigm to complement traditional research in evolutionary biology and anthropology, opening new avenues for investigating the complexities of moral and social evolution.
comment: Accepted at ACL 2026 Main Conference. 51 pages including appendix
Agentic Hives: Equilibrium, Indeterminacy, and Endogenous Cycles in Self-Organizing Multi-Agent Systems
Current multi-agent AI systems operate with a fixed number of agents whose roles are specified at design time. No formal theory governs when agents should be created, destroyed, or re-specialized at runtime-let alone how the population structure responds to changes in resources or objectives. We introduce the Agentic Hive, a framework in which a variable population of autonomous micro-agents-each equipped with a sandboxed execution environment and access to a language model-undergoes demographic dynamics: birth, duplication, specialization, and death. Agent families play the role of production sectors, compute and memory play the role of factors of production, and an orchestrator plays the dual role of Walrasian auctioneer and Global Workspace. Drawing on the multi-sector growth theory developed for dynamic general equilibrium (Benhabib \& Nishimura, 1985; Venditti, 2005; Garnier, Nishimura \& Venditti, 2013), we prove seven analytical results: (i) existence of a Hive Equilibrium via Brouwer's fixed-point theorem; (ii) Pareto optimality of the equilibrium allocation; (iii) multiplicity of equilibria under strategic complementarities between agent families; (iv)-(v) Stolper-Samuelson and Rybczynski analogs that predict how the Hive restructures in response to preference and resource shocks; (vi) Hopf bifurcation generating endogenous demographic cycles; and (vii) a sufficient condition for local asymptotic stability. The resulting regime diagram partitions the parameter space into regions of unique equilibrium, indeterminacy, endogenous cycles, and instability. Together with the comparative-statics matrices, it provides a formal governance toolkit that enables operators to predict and steer the demographic evolution of self-organizing multi-agent systems.
Toward a Safe Internet of Agents
Autonomous Artificial Intelligence (AI) agents, powered by Large Language Models (LLMs), advance rapidly toward interconnected systems -- an Internet of Agents (IoA). This vision enables complex problem-solving while introducing systemic safety and security risks. Beyond existing threat taxonomies, we provide a principled guide addressing architectural vulnerability sources. We offer a framework for engineering safe agentic systems through bottom-up deconstruction, analyzing each component as a dual-use interface where capability expansion creates attack surface growth. We examine three tiers: (1) Single Agents -- analyzing inherent risks in models, memory, design patterns, tools, and guardrails; (2) Multi-Agent Systems (MAS) -- examining collective behavior components including architectural patterns, communication mechanisms, verification, and system guardrails; and (3) Interoperable Multi-Agent Systems (IMAS) -- exploring four secure ecosystem pillars: standardized protocols, agent registration/discovery, resource vetting, and governance. Our analysis reveals a central principle: agentic safety must be co-designed with capability as a fundamental architectural property. We identify specific vulnerabilities at each level and derive core mitigation principles. The result is a foundational guide enabling developers and researchers to build not merely capable but safe, reliable agentic AI, contributing to secure IoA development.
comment: 44 pages
On the Surprising Effectiveness of a Single Global Merging in Decentralized Learning
Decentralized learning provides a scalable alternative to parameter-server-based training, yet its performance is often hindered by limited peer-to-peer communication. In this paper, we study how communication should be scheduled over time, including determining when and how frequently devices synchronize. Counterintuitive empirical results show that concentrating communication budgets in the later stages of decentralized training remarkably improves global test performance. Surprisingly, we uncover that fully connected communication at the final step, implemented by a single global merging, can significantly improve the performance of decentralized learning under high data heterogeneity. Our theoretical contributions, which explain these phenomena, are the first to establish that the globally merged model of decentralized SGD can match the convergence rate of parallel SGD. Technically, we reinterpret part of the discrepancy among local models, which were previously considered as detrimental noise, as constructive components essential for matching this rate. This work provides evidence that decentralized learning is able to generalize under high data heterogeneity and limited communication, while offering broad new avenues for model merging research.
comment: We discover and theoretically explain why and when a single global parameter merging in decentralized learning can recover the performance of federated learning, even in highly heterogeneous and communication-constrained environments
The High Cost of Incivility: Quantifying Interaction Inefficiency via Multi-Agent Monte Carlo Simulations
Workplace toxicity is widely recognized as detrimental to organizational culture, yet quantifying its direct impact on operational efficiency remains methodologically challenging due to the ethical and practical difficulties of reproducing conflict in human subjects. This study leverages Large Language Model (LLM) based Multi-Agent Systems to simulate 1-on-1 adversarial debates, creating a controlled "sociological sandbox". We employ a Monte Carlo method to simulate hundrets of discussions, measuring the convergence time (defined as the number of arguments required to reach a conclusion) between a baseline control group and treatment groups involving agents with "toxic" system prompts. Our results demonstrate a statistically significant increase of approximately 25\% in the duration of conversations involving toxic participants. We propose that this "latency of toxicity" serves as a proxy for financial damage in corporate and academic settings. Furthermore, we demonstrate that agent-based modeling provides a reproducible, ethical alternative to human-subject research for measuring the mechanics of social friction.
comment: 8 figures, 3 tables
Joint Optimization of Multi-agent Memory System
Memory systems are critical for LLMs, mitigating context window limitations and supporting long-horizon user-LLM interactions. Such systems typically comprise multiple agents responsible for memory construction and retrieval. Existing approaches often optimize each agent independently under a shared global objective (e.g., downstream QA accuracy), treating other agents as a static environment. However, this design has two key limitations: (1) independent optimization ignores inter-agent dependencies and lacks agents' co-adaptation, and (2) relying solely on sparse global rewards provides limited guidance for optimizing specialized agents and causes ambiguous credit assignment. These may ultimately limit agents' effective collaboration in the memory system. To address these limitations, we propose CoMAM, a joint optimization framework that promotes collaboration among agents via end-to-end reinforcement learning and an adaptive credit assignment mechanism. Specifically, we model the multi-agent pipeline as a Markov decision process (MDP) to expose inter-agent dependencies during end-to-end training. Agents are then jointly optimized using a combination of their local task reward and an adaptively weighted global reward, enabling agents to co-adapt while receiving targeted feedback for their respective roles. Experiments show that CoMAM consistently outperforms leading memory systems, validating the effectiveness of the joint optimization framework.
Food4All: A Multi-Agent Framework for Real-time Free Food Discovery with Integrated Nutritional Metadata
Food insecurity remains a persistent public health emergency in the United States, tightly interwoven with chronic disease, mental illness, and opioid misuse. Yet despite the existence of thousands of food banks and pantries, access remains fragmented: 1) current retrieval systems depend on static directories or generic search engines, which provide incomplete and geographically irrelevant results; 2) LLM-based chatbots offer only vague nutritional suggestions and fail to adapt to real-world constraints such as time, mobility, and transportation; and 3) existing food recommendation systems optimize for culinary diversity but overlook survival-critical needs of food-insecure populations, including immediate proximity, verified availability, and contextual barriers. These limitations risk leaving the most vulnerable individuals, those experiencing homelessness, addiction, or digital illiteracy, unable to access urgently needed resources. To address this, we introduce Food4All, the first multi-agent framework explicitly designed for real-time, context-aware free food retrieval. Food4All unifies three innovations: 1) heterogeneous data aggregation across official databases, community platforms, and social media to provide a continuously updated pool of food resources; 2) a lightweight reinforcement learning algorithm trained on curated cases to optimize for both geographic accessibility and nutritional correctness; and 3) an online feedback loop that dynamically adapts retrieval policies to evolving user needs. By bridging information acquisition, semantic analysis, and decision support, Food4All delivers nutritionally annotated and guidance at the point of need. This framework establishes an urgent step toward scalable, equitable, and intelligent systems that directly support populations facing food insecurity and its compounding health risks.
comment: This paper is withdrawn because parts of the Method section are inconsistent with the actual implementation and code. Specifically, some components of the described multi-agent workflow and nutritional-metadata integration were not implemented as stated. We withdraw this version to avoid misleading readers
Aegis: Automated Error Generation and Attribution for Multi-Agent Systems
Large language model based multi-agent systems (MAS) have unlocked significant advancements in tackling complex problems, but their increasing capability introduces a structural fragility that makes them difficult to debug. A key obstacle to improving their reliability is the severe scarcity of large-scale, diverse datasets for error attribution, as existing resources rely on costly and unscalable manual annotation. To address this bottleneck, we introduce Aegis, a novel framework for Automated error generation and attribution for multi-agent systems. Aegis constructs a large dataset of 9,533 trajectories with annotated faulty agents and error modes, covering diverse MAS architectures and task domains. This is achieved using a LLM-based manipulator that can adaptively inject context-aware errors into successful execution trajectories. Leveraging fine-grained labels and the structured arrangement of positive-negative sample pairs, Aegis supports three different learning paradigms: Supervised Fine-Tuning, Reinforcement Learning, and Contrastive Learning. We develop learning methods for each paradigm. Comprehensive experiments show that trained models consistently achieve substantial improvements in error attribution. Notably, several of our fine-tuned LLMs demonstrate performance competitive with or superior to proprietary models an order of magnitude larger, validating our automated data generation framework as a crucial resource for developing more robust and interpretable multi-agent systems. Our project website is available at https://kfq20.github.io/Aegis-Website/.
Multi-User Large Language Model Agents
Large language models (LLMs) and LLM-based agents are increasingly deployed as assistants in planning and decision making, yet most existing systems are implicitly optimized for a single-principal interaction paradigm, in which the model is designed to satisfy the objectives of one dominant user whose instructions are treated as the sole source of authority and utility. However, as they are integrated into team workflows and organizational tools, they are increasingly required to serve multiple users simultaneously, each with distinct roles, preferences, and authority levels, leading to multi-user, multi-principal settings with unavoidable conflicts, information asymmetry, and privacy constraints. In this work, we present the first systematic study of multi-user LLM agents. We begin by formalizing multi-user interaction with LLM agents as a multi-principal decision problem, where a single agent must account for multiple users with potentially conflicting interests and associated challenges. We then introduce a unified multi-user interaction protocol and design three targeted stress-testing scenarios to evaluate current LLMs' capabilities in instruction following, privacy preservation, and coordination. Our results reveal systematic gaps: frontier LLMs frequently fail to maintain stable prioritization under conflicting user objectives, exhibit increasing privacy violations over multi-turn interactions, and suffer from efficiency bottlenecks when coordination requires iterative information gathering.
Hierarchical Reinforcement Learning for the Dynamic VNE with Alternatives Problem
Virtual Network Embedding (VNE) is a key enabler of network slicing, yet most formulations assume that each Virtual Network Request (VNR) has a fixed topology. Recently, VNE with Alternative topologies (VNEAP) was introduced to capture malleable VNRs, where each request can be instantiated using one of several functionally equivalent topologies that trade resources differently. While this flexibility enlarges the feasible space, it also introduces an additional decision layer, making dynamic embedding more challenging. This paper proposes HRL-VNEAP, a hierarchical reinforcement learning approach for VNEAP under dynamic arrivals. A high-level policy selects the most suitable alternative topology (or rejects the request), and a low-level policy embeds the chosen topology onto the substrate network. Experiments on realistic substrate topologies under multiple traffic loads show that naive exploitation strategies provide only modest gains, whereas HRL-VNEAP consistently achieves the best performance across all metrics. Compared to the strongest tested baselines, HRL-VNEAP improves acceptance ratio by up to \textbf{20.7\%}, total revenue by up to \textbf{36.2\%}, and revenue-over-cost by up to \textbf{22.1\%}. Finally, we benchmark against an MILP formulation on tractable instances to quantify the remaining gap to optimality and motivate future work on learning- and optimization-based VNEAP solutions.
comment: This paper has been rejected from the conferences i submitted it, and it turns out that contains several errors, please review section of MILP
Systems and Control (EESS)
VEHRON: A Configuration-Driven BEV Simulation Framework for Subsystem-Level Studies
In practical early-stage battery-electric vehicle studies, analysis workflows may become fragmented across spreadsheets, notebooks, and project-specific scripts, making reuse, audit, and extension harder. VEHRON is an open-source Python framework for a deterministic, traceable workflow built around prescribed-speed longitudinal simulation of battery-electric vehicles using validated YAML configuration, packaged drive-cycle resources, interchangeable subsystem models, and auditable case outputs. VEHRON currently runs as a command-line workflow in which a vehicle definition and a testcase definition are combined to execute a simulation, emit a flat time series, and write a case package containing copied inputs, resolved configuration, summary metadata, and standard plots. Architecturally, VEHRON is organized around a small simulation engine, a shared state bus, a registry of model selections, schema-based configuration loading, and extension points for custom battery and HVAC models loaded from external Python files. VEHRON currently focuses on battery-electric longitudinal simulation with low-order battery, thermal, auxiliary-load, and HVAC models. This paper explains how VEHRON is structured, how it is used, which models it implements, and where its present limits lie. Source code is available at https://github.com/vehron-dev/vehron, with archived release metadata recorded under DOI https://doi.org/10.5281/zenodo.19820111.
comment: 12 pages, 3 figures, 5 tables; software paper
Data-Driven Privacy-Preserving Modeling and Frequency Regulation with Aggregated Electric Vehicles via Bilinear Hidden Markov Model
Vehicle-to-Grid (V2G) technology allows bidirectional power flow for real-time grid support, making electric vehicles (EVs) well-suited for ancillary services such as frequency regulation. However, existing methods for flexibility estimation and coordinating aggregated EVs often rely on individual EV traveling information (e.g., arrival/departure time) and/or characteristic parameters (e.g., charging efficiency, battery capacity) as well as real-time state-of-charge (SOC), which raises privacy concerns and faces data quality issues. To address these challenges, this paper proposes a data-driven, privacy-preserving modeling and control framework for frequency regulation using aggregated EVs. The proposed method can provide accurate estimation for power outputs and flexibility of aggregated EVs and carry out effective frequency regulation without any individual EV information. %preserving user privacy and ensuring practical scalability. Simulation results validate the accuracy and effectiveness of the proposed method, which also outperforms the model-based and federated learning-based method under SOC data inaccuracies.
comment: 12 Pages, 10 figures, Accepted for publication in IEEE Transactions on Smart Grid
Exploiting Differential Flatness for Efficient Learning-based Model Predictive Control of Constrained Multi-Input Control Affine Systems
Learning-based control techniques use data from past trajectories to control systems with uncertain dynamics. However, learning-based controllers are often computationally inefficient, limiting their practicality. To address this limitation, we propose a learning-based controller that exploits differential flatness, a property of many robotic systems. Recent research on using flatness for learning-based control either is limited in that it (i) ignores input constraints, (ii) applies only to single-input systems, or (iii) is tailored to specific platforms. In contrast, our approach uses a system extension and block-diagonal cost formulation to control general multi-input, nonlinear, affine systems. Furthermore, it satisfies input and half-space flat state constraints and guarantees probabilistic Lyapunov decrease using only two sequential convex optimizations. We show that our approach performs similarly to, but is multiple times more efficient than, a Gaussian process model predictive controller in simulation, and achieves competitive tracking in real hardware experiments.
comment: Accepted for publication in 2026 European Control Conference
Reachability Analysis of the State Transition and State Covariance Matrices for an LTV System
In this paper, we study the reachability of two closely related matrices appearing in the analysis of linear time-varying (LTV) systems over a finite time interval, namely, its closed-loop state transition matrix via a state feedback control and its state covariance matrix starting from some given initial state covariance matrix. Under a mild assumption, we first characterize the set of closed-loop terminal state transition matrices reachable from the identity matrix using controls of the state feedback form. Then, we provide the set of terminal state covariance matrices reachable from any given positive definite initial state covariance matrix when the LTV system is not necessarily controllable. Both results are based on the solutions of corresponding matrix Riccati differential equations (RDE).
comment: 12 pages, 2 figures
Dual Control of Linear Systems from Bilinear Observations with Belief Space Model Predictive Control
We study finite-horizon quadratic control of linear systems with bilinear observations, in which the control input affects not only the state dynamics but also the partial observations of the state. In this setting, the separation principle can fail because control inputs influence the future quality of state estimates. State estimation requires an input-dependent Kalman filter whose gain and error covariance evolve as functions of the control inputs. To address this challenge, we propose a belief-space model predictive control ($\texttt{B-MPC}$) method that plans directly over both the estimated state and its error covariance. In particular, $\texttt{B-MPC}$ plans with a deterministic surrogate of the belief evolution defined by the input-dependent Kalman filter. Through numerical experiments in two synthetic settings, we show that $\texttt{B-MPC}$ can outperform both the separation-principle controller and its MPC variant in favorable regimes, and that these gains are accompanied by lower estimation covariance and more uncertainty-aware action choices.
Reduced-Order Data Assimilation for Thermospheric Density Using Physics-informed SINDyc Models
Accurate estimation of thermospheric mass density is a prerequisite for orbit prediction and space situational awareness, where the upper atmosphere responds nonlinearly to solar and geomagnetic forcing across several orders of magnitude. Physics-based general circulation models resolve this response but are computationally expensive, while empirical models run cheaply but lack a time-evolving atmospheric state. This work couples a data-driven reduced-order thermospheric model with a Kalman filter that assimilates in situ density observations. An autoregressive Sparse Identification of Nonlinear Dynamics with control (SINDy$_c$-AR) reduced-order model derived from the Thermosphere-Ionosphere-Electrodynamics General Circulation Model (TIE-GCM) captures the dominant modes of variability and their dependence on solar and geomagnetic drivers at a fraction of the parent model's cost. Density observations from CHAMP, GRACE, GRACE-FO, GOCE, and Swarm are assimilated across a range of orbital configurations and geomagnetic conditions, with a linear DMDc model evaluated as a reference. Assimilation reduces density estimation error relative to open-loop predictions, most visibly during geomagnetic storms and under single-satellite coverage. SINDy$_c$-AR and DMDc perform comparably on assimilated orbits; on withheld orbits, SINDy$_c$-AR is more accurate in the in-training scenarios while DMDc is better in the out-of-training 2024 Swarm-C case. Benchmarks against NRLMSIS~2.1 and HASDM (2000--2019, where available) show that empirical references can outperform the assimilated model far from the assimilated track, so results are framed as improvements over the open-loop forecast.
Hybrid A*-Based Reverse Path-Planning of a Vehicle with Trailer System
Reverse parking maneuvering of a vehicle with trailer system is a difficult task to complete for human drivers due to the multi-body nature of the system and the unintuitive controls required to orientate the trailer properly. The problem is complicated with the presence of other vehicles that the trailer and its connected vehicle must avoid during the reverse parking maneuver. While path planning methods in reverse motion for vehicles with trailers exist, there is a lack of results that also offer collision avoidance as part of the algorithm. This paper hence proposes a modified Hybrid A*-based algorithm that can accommodate the vehicle-trailer system as well as collision avoidance considerations with the other vehicles and obstacles in the parking environment. One of the novelties of this proposed approach is its adaptability to the vehicle with trailer system, where limits of usable steering input that prevent the occurrence of jackknife incidents vary with respect to system configuration. The other contribution is the addition of the collision avoidance functionality which the standard Hybrid A* algorithm lacks. The method is developed and presented first, followed by simulation case studies to demonstrate the efficacy of the proposed approach.
Sliding Mode Control for Safe Trajectory Tracking with Moving Obstacles Avoidance: Experimental Validation on Planar Robots
This paper presents a unified control framework for robust trajectory tracking and moving obstacle avoidance applicable to a broad class of mobile robots. By formulating a generalized kinematic transformation, we convert diverse vehicle dynamics into a strict feedback form, facilitating the design of a Sliding Mode Control (SMC) strategy for precise and robust reference tracking. To ensure operational safety in dynamic environments, the tracking controller is integrated with a Collision Cone Control Barrier Function (C3BF) based safety filter. The proposed architecture guarantees asymptotic tracking in the presence of external disturbances while strictly enforcing collision avoidance constraints. The novelty of this work lies in designing a sliding mode controller for ground robots like the Ackermann drive, which has not been done before. The efficacy and versatility of the approach are validated through numerical simulations and extensive real-world experiments on three distinct platforms: an Ackermann-steered vehicle, a differential drive robot, and a quadrotor drone. Video of the experiments are available at https://youtu.be/dWcxwum96vk
TARMM: Scaling Delay-Critical Edge AI Offloading in 5G O-RAN via Temporal Graph Mobility Management
Emerging delay-critical edge AI applications, such as VR perception and real-time video analytics, impose stringent latency and reliability requirements on 5G networks. However, existing mobility management mechanisms are largely reactive and fail to adapt to dynamic network conditions, resulting in suboptimal handover decisions and degraded performance. In this paper, we present TARMM, a 5G Open Radio Access Network (O-RAN) system that optimizes user mobility management for delay-critical edge AI offloading. The core of TARMM is a temporal graph model that captures the spatiotemporal dynamics of the RAN across users and cells, enabling near real-time handover decisions. Building on this representation, we design a multi-agent reinforcement learning (MARL) framework with rule-based action masking and proactive resource preparation to ensure safe, stable, and efficient handovers. We implement TARMM on a multi-cell indoor 5G O-RAN testbed and evaluate it using diverse VR workloads. Extensive experiments show that TARMM reduces tail latency by up to 44% and packet loss by up to 56% compared to state-of-the-art approaches.
The Fragility of Learning LQG Controllers
Learning methods are increasingly used to synthesize controllers from data, yet existing sample-complexity characterizations for continuous control are sharp only in the fully observed setting. This paper studies the partially observed case by deriving information-theoretic lower bounds for learning Linear Quadratic Gaussian (LQG) controllers from offline trajectories generated by a (linear) exploration policy. We prove an $\varepsilon$-local minimax excess-cost lower bound that applies to any algorithm mapping the offline dataset to a stabilizing linear controller. The bound is expressed in terms of the Hessian of the LQG cost with respect to model parameters and the inverse Fisher Information induced by the exploration policy. We further provide system-theoretic characterizations of these objects, enabling transparent construction of hard instances. Instantiating the bound on classical fragile robust-control examples, including variants of the Doyle LQG fragility counterexample and non-minimum-phase systems, demonstrates when fragile robust control problems translate into high sample complexity for learning-enabled control. These results suggest the asymptotic optimality of certainty-equivalent synthesis and motivate the importance of both task-directed experiment design and system co-design for sample-efficient learning in partially observed control.
Minimum Reachability Probabilities in Rectangular Automata with Random Clocks SP
Control applications for cyber-physical systems must make reliably safe control decisions in the presence of continuous dynamics as well as stochastic uncertainty. Providing safety guarantees for such systems requires formal modeling and analysis techniques that capture these aspects. For modeling, in this paper we consider rectangular automata with random clocks under prophetic scheduling. For this model class, existing methods can compute only upper bounds on reachability probabilities, enabling optimistic, best-case safety reasoning. We complement this view by introducing a novel method to compute lower bounds, thereby enabling worst-case analysis that is essential for safety-critical applications. Although both upper and lower bounds rely on reachability analysis, they are not dual: computing lower bounds requires an explicit separation of stochastic and nondeterministic choices along executions. We implement our approach and demonstrate its practical feasibility on an electric vehicle charging scenario, showing that meaningful worst-case guarantees can be obtained.
comment: This paper is accepted for publication (without appendix) in the Proceedings of the 32nd International Symposium on Model Checking Software (SPIN 2026). The appendix was part of the submission and provides additional material which is not included in the SPIN publication
A Realistic Discrete Event Simulation model for Ambulance Location and Deployment within a regional Emergency Medical Service
The objective of Emergency Medical Services (EMSs) is to promptly respond to calls from citizens for first aid, providing pre-hospital care and, if necessary, to transfer patients to an appropriate Emergency Department (ED) by ambulance. The efficiency of such a system strongly depends on the deployment of ambulance home bases, i.e., locations where ambulances and their crews are strategically positioned, ready to respond to emergency calls. This paper presents a general Discrete Event Simulation (DES) model designed to capture the stochastic behaviour and workflow of regional ambulance emergency systems. The proposed model incorporates and integrates information collected from different sources, reproducing very accurately the operation of the ambulance system, thus allowing a more comprehensive and realistic analysis. To show the applicability and reliability of the proposed general model, a case study provided by the Azienda Regionale Emergenza Sanitaria - ARES 118 (an Italian Regional Emergency Medical Services Authority - ARES~118}) is presented. It concerns a territory within the Lazio region of Italy, including a medium-size city along with sparsely populated areas. The reported results about scenario analyses highlight how the model we propose can be fruitfully used by the managers to improve effectiveness and quickness of the entire regional EMS system.
comment: 41 pages, 12 figures
Data-Driven Adaptive Resource Allocation for Reliable Low-Latency Uplink Communications in Rural Cellular 5G Multi-Connectivity
Reliable low-latency communication is a key requirement for mission-critical and mobile autonomous systems, including teleoperation, autonomous navigation, and real-time uplink-dominant telemetry applications. While commercial 5G networks often provide adequate downlink performance, uplink performance in rural deployments may be constrained by radio-resource limitations and uplink power-control mechanisms. This paper presents a comprehensive experimental evaluation of multi-connectivity strategies over commercial 5G Non-Standalone networks, based on measurement campaigns conducted in urban, suburban, and rural environments. The study analyzes per-packet uplink and downlink latency, packet loss, and radio-layer KPIs across two mobile network operators. The measurements indicate that latency and reliability cannot be inferred solely from coverage indicators such as RSRP. In coverage-constrained scenarios, performance appears to be strongly influenced by uplink power-limited operation and partially correlated impairments across operators. Several multi-connectivity strategies are evaluated, including link aggregation, switching-based policies, and conditional packet duplication. A Primary-Anchored Adaptive Failover (PAAF) framework is introduced to selectively activate redundancy based on radio, latency and service cost considerations. The results suggest that Partial Duplication (PD) approaches can approach the reliability of multi-connectivity while substantially reducing duplication overhead in the evaluated rural scenario.
AI-Native Autonomous Infrastructure (ANAI): A Formal Framework for the Next General-Purpose Technology
Artificial intelligence is increasingly described as a candidate next generation general purpose technology (GPT). However, existing interpretations predominantly emphasize performance scaling rather than structural transformation. This paper introduces a formal framework for evaluating AI as a systemic infrastructural transition rather than merely a computational breakthrough. We propose the concept of AI Native Autonomous Infrastructure (ANAI), defined as a regime in which decision autonomy becomes embedded within critical infrastructures. The framework operationalizes this transition through three quantitative constructs: the Autonomy Index (AIx), the Infrastructure Coupling Coefficient (ICC), and the Technological Transition Potential (TTP). We formalize the joint scaling dynamics of autonomy and infrastructural embedding, derive threshold conditions for paradigm transition, and introduce a phase-space representation of systemic transformation. A temporal transition model further illustrates how nonlinear coevolution between autonomy and infrastructure integration produces super linear growth in transition potential. Unlike prior GPT cycles, the ANAI regime exhibits a recursive energy computation feedback loop in which AI systems both increase computational demand and optimize the infrastructures that sustain them. This feedback mechanism accelerates infrastructural embedding and differentiates AI driven transformation from previous technological revolutions. By shifting analytical focus from model performance to infrastructural autonomy and coupling intensity, this study offers a conceptual and mathematical foundation for assessing whether artificial intelligence constitutes the next general purpose technology.
comment: 18 pages, 4 figures
On Realization of Back-Action-Evading Measurements and Quantum Non-Demolition Variables via Linear Systems Engineering
We establish a framework for realizing back-action-evading (BAE) measurements and quantum non-demolition (QND) variables in linear quantum systems. The key condition, a purely imaginary Hamiltonian with a real or imaginary coupling operator, enables BAE measurements of conjugate observables. Symmetric coupling further yields QND variables. For non-compliant systems, coherent feedback is designed to engineer BAE measurements. Crucially, the QND interaction condition simultaneously ensures BAE measurements and promotes the coupling operator to a QND observable.
comment: 12 pages, 2 figures, comments welcome
Toward Low-Altitude Embodied Intelligence: A Sensing-Communication-Computation-Control Closed-Loop Perspective
The rapid growth of the low-altitude economy drives increasingly autonomous unmanned aerial vehicle (UAV) operations, giving rise to low-altitude embodied intelligence (LAEI), in which sensing, communication, computation, and control (SC$^3$) are tightly integrated to enable closed-loop interaction, ensuring timely, effective, and safe responses in complex or unknown environments. This article systematically explores the LAEI networks, from its fundamental architecture to the diverse scenarios that it can support. We examine key enabling techniques that sustain timely information exchange and effective decision feedback within the $\text{SC}^3$ closed loop. A representative low-altitude UAV mission in an unknown urban area is presented as a case study, where the UAV provides communication services and performs environmental sensing to inform closed-loop control, illustrating how coordinated $\text{SC}^3$ capabilities enable efficient and responsive operation. By identifying major challenges and outlining future research directions, this work serves as a cornerstone for developing next-generation low-altitude intelligent systems.
Graph Neural Ordinary Differential Equations for Power System Identification
With the shift towards decentralized energy generation, the increasing complexity of power systems renders physics-based modeling challenging. At the same time the growing amount of available measurement data opens the door for obtaining models in a data-driven manner. A modern method to do so are neural ordinary differential equations (NODEs), offering a framework for continuous time system identification. Recent extensions, so called graph NODEs impose a structural inductive bias that has the potential to improve generalization of the learned representation. In this work, we employ graph NODEs and extend them with novel ideas to develop message-passing graph NODEs (MPG-NODEs) for identification of coupled systems with heterogeneous node dynamics and edge couplings. This encompasses state-of-the-art machine learning architectures to infer latent representations of unmeasured states from past measurements, local node and edge embeddings to account for heterogeneity as well as an autoregressive scheme to allow for piecewise constant control inputs. We apply MPG-NODEs to identify voltage and frequency dynamics of power systems and compare them to a monolith NODE under identical measurement assumptions. Our case study on the IEEE 9-bus system indicates that the proposed MPG-NODE offers a much more flexible framework with transfer learning options that allow to add or remove powerlines and units with little to no retraining.
comment: This work has been submitted to the IEEE for possible publication
Optimal Planning and Control under Signal Temporal Logic Specifications
This paper addresses the planning and control problem for nonlinear systems under Signal Temporal Logic (STL) specifications. We first decompose an STL task into finite local tasks. A sampling-based method generates sequences of local waypoints to satisfy all local tasks, from which the corresponding satisfaction pair sets are derived. Following a local-to-global strategy, all sequences of local waypoints are synthesized into a global one, based on which a safe corridor is then constructed. Leveraging the safe corridor and the satisfaction pair sets, an optimization problem is formulated and solved to derive a position trajectory that satisfies the STL task. Finally, numerical examples and comparative results are presented to demonstrate the efficacy of the proposed approach.
Fed-DLoRA: Efficient Wireless Federated Learning with Dynamic Low-Rank Adaptation
Federated learning (FL) offers a promising distributed learning paradigm for internet of vehicles (IoV) applications. However, it faces challenges from communication overhead and dynamic environments. Model compression techniques reduce computing and communication burden yet create trade-offs between compression ratios and vehicle participation strategies. In this paper, we propose a lightweight FL algorithm named federated learning with dynamic low-rank adaptation (Fed-DLoRA), which is combined with low-rank adaptation (LoRA) to effectively reduce parameters and communication costs while enhancing training efficiency. The convergence analysis of Fed-DLoRA is conducted through stochastic gradient descent optimization coupled with singular value decomposition. This analysis establishes the theoretical relationships among LoRA rank, vehicular scheduling strategies and the model's convergence characteristics. Building on these insights, we formulate a joint optimization problem aimed at maximizing system performance. To address this problem, we propose an adaptive rank, bandwidth and vehicle selection (ARBVS) algorithm that integrates enumeration with greedy optimization strategies. The algorithm provides efficient rank selection and resource scheduling strategies for each FL communication round, thereby achieving effective performance improvements for the FL system. Experimental results demonstrate that Fed-DLoRA achieves superior performance compared to conventional federated learning approaches, exhibiting enhanced accuracy, faster convergence, and improved communication efficiency.
comment: 11 pages, 7 figures. Accepted for publication in IEEE Transactions on Vehicular Technology
A Road-Mobile GNSS-Disciplined Oscillator for Accurate Synchronization of Vehicular Microwave Measurements
Precise synchronization is essential in various technical disciplines, being especially challenging in mobile scenarios. Unfortunately, state-of-the-art global navigation satellite system (GNSS) disciplined oscillators (GNSSDOs) are designed and optimized for stationary operation. We present a novel solution that is optimized for mobile use from the ground up. The centerpiece is a precise oven-controlled crystal oscillator (OCXO) that is optimized for low sensitivity to dynamic accelerations. A state-of-the-art GNSS timing module is used to discipline it. We evaluate the system by comparing it with state-of-the-art test equipment in a real-world test drive through diverse environments. After compensating for the stationary offset, the state-of-the-art devices deviated by up to 2315 ns, while with our devices, the deviation never exceeded 22.6 ns. It is evident that the devices designed for laboratory use perform inadequately in mobile operation and that our novel solution enables a significant leap in accuracy.
Analysis of Stability and Performance of Economic Model Predictive Control with State-Independent Costs
This paper studies economic model predictive Control (EMPC) schemes, where the stage cost depends only on control inputs. Such problems arise in applications like water distribution networks and differ from standard EMPC since multiple steady states can correspond to the unique optimal steady input. We show that, under a strict dissipativity assumption related to the set of optimal steady states, the closed-loop trajectories converge asymptotically to this set, ensuring convergence of the economic cost to the optimal steady state cost. To enhance Lyapunov stability, we propose a modified stage cost that preserves the optimal input while guaranteeing asymptotic stability of a specific equilibrium with a slight performance loss. The approach is further extended to EMPC of a class of linear systems with periodic costs and disturbances by lifting it to a multi-step EMPC problem for periodic operations. A case study with a water distribution network demonstrates the effectiveness of the proposed methods in achieving both asymptotic convergence and stability.
Safe Reconnection Time for Large-Scale Data Center Loads: An Analytical Framework for Transient Stability Assessment
The rapid growth of large, power-electronics-rich data center (DC) loads is creating new operational challenges for bulk power systems. A key risk arises when a DC uninterruptible power supply (UPS) disconnects the facility during voltage/frequency disturbances and then reconnects it while the bulk grid is still dynamically settling to a new equilibrium point. Poorly timed reconnection can amplify electromechanical oscillations, deepen frequency deviations, and lead to repeated connect-disconnect \emph{flapping}. In this paper, we develop an analytical framework to characterize the \emph{safe reconnection time} for large DC loads after a disturbance-induced disconnection that avoids flapping. Using a model in the spirit of the classical single-machine infinite-bus system, we capture (i) swing dynamics during the disconnection interval and (ii) voltage-angle coupling at the load bus, which determines the electrical power step at reconnection under constant-power load assumptions. Using energy function method, we characterize the critical safe reconnection time such that for any reconnection time after the critical safe reconnection time, the post-reconnection trajectory is guaranteed to remain within operational limits (frequency/angle/voltage) and converge to the post-reconnection equilibrium, thereby preventing flapping. Time-domain simulations validate the effectiveness of the proposed analytical approach. The results provide a simple, physics-informed criterion that can be used to bound reconnection windows for large DC facilities and inform UPS reconnection logic.
comment: 7 pages, 5 figures, accepted by PowerUp 2026
Extracting Exact Lie Derivatives Without Backpropagation: A Dual Compiler for Neural Control Barrier Functions
Deploying neural-network control barrier functions (CBFs) on embedded hardware requires evaluating the barrier value and its Lie derivatives along the system vector fields at every control cycle. The standard mechanism for exact gradient extraction, reverse-mode automatic differentiation, constructs a dynamic computational graph whose memory footprint grows with network depth and whose backward traversal obstructs the worst-case execution time analysis required for safety-critical certification. This paper presents a dual-algebraic compiler that extracts the exact barrier value and its Lie derivatives through forward network evaluation alone. Encoding the system state as the real part of a dual number and a target vector field as the dual part, we prove that every affine and componentwise-activation layer admits a dual extension that propagates the exact directional derivative alongside the activation, and that the composed dual-extended network evaluates the exact Jacobian--vector-field product with zero truncation error. We derive closed-form expressions for the dual-pass floating-point operation count and peak memory footprint, prove that the proposed algorithm eliminates dynamic graph allocation, and extend the framework to the second-order Lie derivatives required by relative-degree-two CBFs using hyper-dual arithmetic. An open-source ahead-of-time compiler translates trained neural CBFs into self-contained C++ headers that assemble the complete safety constraint on an ESP32-S3 microcontroller from a statically allocated buffer, with zero dynamic memory allocation and a sub-millisecond cycle budget that supports kilohertz-rate safety filters.
VLM-VPI: A Vision-Language Reasoning Framework for Improving Automated Vehicle-Pedestrian Interactions
Autonomous driving systems often infer pedestrian yielding behavior from geometric and kinematic cues alone, limiting their ability to reason about visual scene context and age-dependent behavioral variability. This limitation can produce delayed interventions in safety-critical encounters and unnecessary braking in benign interactions. This work introduces Vision-Language Model-based Vehicle-Pedestrian Interaction (VLM-VPI), a multimodal reasoning framework for pedestrian intent understanding and yielding-aware control in autonomous driving. The system combines three components: a multimodal perception layer that captures visual and kinematic observations, a reasoning layer that uses Qwen3-VL 8B for visual scene understanding and GPT-OSS 20B for few-shot intent reasoning, and a tiered safety controller that applies age-specific braking margins for children, adults, and seniors. In 112 CARLA scenarios, VLM-VPI achieves 92.3% intent classification accuracy, outperforming a rule-based baseline (78.4%), supervised trajectory models (73.5-82.4%), and a zero-shot LLM configuration (88.4%). Validation on 24 real-world PIE scenarios yields 87.5% accuracy, indicating functional sim-to-real transferability. Across 200 simulation cases, VLM-VPI reduces the false-alarm rate from 7.4% to 2.8% and mean intersection traversal time from 13.5 s to 11.8 s. Conflict occurrences decrease from 124 to 33, while mean minimum time-to-collision improves from 1.92 s to 4.47 s. Demographic-adaptive control further reduces conflicts by 60% for children and 54.5% for seniors compared with uniform control. These results show that an explicit vision-language reasoning layer can improve both safety and efficiency by linking pedestrian intent, demographic context, and vehicle control decisions.
comment: 40 pages, 7 figures, 10 tables
Asymmetric-Information Resource Allocation Games: An LP Approach to Purposeful Deception
In this work, we introduce the Deceptive Resource Allocation Game (DRAG), which studies purposeful deception within a Bayesian game framework. In DRAG, a Defender allocates resources across the true asset and several decoys to influence an Attacker's beliefs and actions, with the goal of diverting the Attacker away from the true asset. We seek to characterize purposeful deception, whereby the Defender deceives only when doing so improves its performance. To this end, we solve for the Perfect Bayesian Nash Equilibrium (PBNE) of the corresponding game. We show that, despite the coupled belief-policy interdependence, the problem admits an efficient, non-iterative linear programming formulation. Numerical results demonstrate that the resulting policies naturally balance effective allocation and belief manipulation, giving rise to purposeful and emergent deceptive behaviors.
Spark Policy Toolkit: Semantic Contracts and Scalable Execution for Policy Learning in Spark
Custom policy-learning pipelines in Spark fail for two coupled systems reasons: rowwise Python execution makes inference impractical, and driver-side candidate materialization makes split search fragile at feature scale. We present Spark Policy Toolkit, a semantics-governed systems toolkit for scalable policy learning in Spark. The toolkit provides two Spark-native primitives: partition-initialized vectorized inference through mapInPandas and mapInArrow, and collect-less split search that scores candidates on executors. Both primitives are governed by one fixed-input semantic contract: the same rows, feature order, treatment vocabulary, preprocessing manifest, and split boundaries must preserve per-row score vectors, best-split decisions, and end-to-end learned policy outputs. The evaluation combines practical baseline ladders, backend parity checks, measured split-search scale results, synthetic and Hillstrom end-to-end policy preservation, missingness stress, partition and order perturbation tests, quantile-boundary sensitivity, and a concrete adversarial failure catalog. On a 40-worker Databricks cluster, mapInArrow reaches 4.72M rows/s at 10M matched rows and 7.23M rows/s at 50M rows, while collect-less split search remains valid from F = 10 through F = 1000 with 124000 candidate rows, where the driver-collect baseline is intentionally skipped. Across 24 backend-ablation settings, mapInArrow wins 18 while mapInPandas wins 6, so the paper treats backend choice as workload-dependent rather than universal. Once the fixed-input lock is enforced, all six tested repartition/coalesce/shuffle perturbations preserve identical signatures; before lock, all six drift. The central result is not speed alone: throughput and collect-less execution are the mechanisms that let policy semantics survive at Spark scale.
A Novel Two-Step Approach for Reactive Power Demand Calculation Using Integrated Voltage Stability Analysis
The assessment of reactive power demand plays an instrumental role in power system planning. This paper presents a methodology for calculating reactive power demand based on a two-step approach. Unlike existing methodologies in the literature that focus primarily on optimization of reactive power compensation equipment placement and sizing through single-simulation approaches, this methodology directly calculates the actual reactive power demand through a comprehensive back-to-back simulation framework. While existing methods address either long-term or short-term voltage stability using either steady-state analysis or individual dynamic simulations, the proposed approach integrates both stability assessments sequentially through iterative Quasi-Dynamic Simulation, Q-V analysis and dynamic simulation. Furthermore, this methodology employs comprehensive time-series analysis over a full annual period (8760 hours) with multi-criteria violation assessment (number, severity and duration of voltage violations). In the final section of this paper, a case study was conducted to demonstrate the application of the proposed methodology. Simulations were performed to validate the effectiveness of the methodology, with the results showing that all buses with voltage issues were successfully addressed and finally the total reactive power demand across the network was calculated.
EVT-Based Generative AI for Tail-Aware Channel Estimation
Ultra-reliable and low-latency communication (URLLC) will play a key role in fifth-generation (5G) and beyond networks, enabling mission-critical applications. Meeting the stringent URLLC requirements, characterized by extremely low packet error rates and minimal latency, calls for advanced statistical modeling to accurately capture rare events in wireless channels. Traditional methods, such as those that rely on large datasets and computationally intensive estimation techniques, often fail in real-time scenarios. In this paper, a novel framework is proposed to meet URLLC requirements through a synergistic integration of extreme value theory (EVT) with generative artificial intelligence (AI). EVT is used to model channel tail distributions, providing an accurate characterization of rare events. Concurrently, generative AI enables data augmentation and channel parameter estimation from limited samples. The integration of EVT with generative AI can thus help overcome the limitations of generative models in capturing extreme events during channel characterization. Using an experimental dataset collected from an automotive environment, it is demonstrated that this integration enhances data augmentation for extreme quantiles, while requiring fewer samples than traditional analytical EVT methods and generative baselines in online estimation of channel distribution.
A Class AAA Solar Testbed for Reproducible Long-Term Characterization of Energy-Harvesting Systems
Energy harvesting promises maintenance-free operation of wireless sensor nodes but introduces strong dependencies on stochastic and deployment-specific environmental conditions. In particular, solar-powered systems are highly sensitive to variations in irradiance and spectral composition, which complicates system-level design, parameter tuning, and reliable verification. This work presents a solar testbed in which active control via Hardware-in-the-Loop (HIL) enables stable and repeatable illumination conditions for evaluating ultra-low-power energy harvesting systems. The proposed LED-based solar testbed provides spectrally configurable illumination over a wide dynamic range, from 5.7 mW/m2 to 908 kW/m2. It achieves Class AAA performance according to IEC 60904-9, with a spectral match below 1.3% and a spatial non-uniformity below 1.28% over a 16.5 cm x 16.5 cm test area. The long-term irradiance instability remains below 0.6%. Closed-loop control using integrated illuminance and spectral sensors ensures high temporal stability, while a temperature-controlled DUT stage supports long-term experiments. Experimental results demonstrate high repeatability and suitability for systematic laboratory characterization of solar energy harvesting systems.
Interval Observer Design Using Observability Decomposition for Detectable Linear Systems
We provide a systematic interval observer design method for detectable linear time-invariant (LTI) systems, where a part of the state is observable from the measured output. An observability-based invertible LTI transformation decomposes the state into two parts. The first part is decoupled from the other and observable from the output, while the second is affected by the first, does not appear in the output, but is detectable. A Sylvester-based LTI interval observer is designed for the first part. For the second part, a Jordan-based linear time-varying interval observer is built, treating the interaction from the first part as inputs with known bounds. The intervals in the original coordinates are constructed either by inverting the decomposition online for the intervals in the transformed coordinates or by directly implementing the observer written in the original coordinates. Academic examples illustrate the interest of our approach.
comment: 23rd IFAC World Congress, Busan, South Korea, Aug. 2026
TEACar: An Open-Source Autonomous Driving Platform
Intelligent Transportation Systems (ITS) increasingly rely on vision-based perception and learning-based control, necessitating experimental platforms that support realistic hardware-in-the-loop validation. Small-scale platforms for autonomous racing offer a practical path to hardware validation, but often suffer from limited modularity, high integration complexity, or restricted extensibility. This paper presents TEACAR, a 1/14- to 1/16-scale autonomous driving platform designed with modular mechanical architecture, hardware abstraction, and ROS 2-based software. The system adopts a four-layer deck structure that physically decouples sensing, computation, actuation, and power subsystems, improving structural rigidity while simplifying reconfiguration. We constructed and comprehensively evaluated the prototype of TEACAR. Its mechanical stability, structural characteristics, and software performance were quantified based on three CNN-based steering controllers. Inference latency, power consumption, and system operating time were measured to evaluate computational capability and robustness. Our experiments demonstrated that TEACAR offers a scalable, modular, and cost-effective testbed for ITS research, education, and development. Our project repository is available on GitHub.
An analysis of sensor selection for fruit picking with suction-based grippers IROS
Robotic fruit harvesting often fails to reliably detect whether a fruit has been successfully picked, limiting efficiency and increasing crop damage. This problem is difficult due to compliant fruit and grippers, variable stem attachment, and occlusions in orchard environments. Prior work has explored vision-based perception and multi-sensor learning approaches for pick state estimation. However, minimal sensor sets and phase-dependent sensing strategies for accurate pick and slip detection remain largely unexplored. In this work, we design and evaluate a multimodal sensing suite integrated into a compliant suction-based apple gripper. Our approach is unique because it identifies which sensors are most informative at different phases of the pick, enabling predictive detection of failures before they occur. The contributions of this paper are a phase-dependent evaluation of multimodal sensors and the identification of minimal sensor sets for reliable pick state classification. Experiments in a real apple orchard show that Random Forest and Multilayer Perceptron classifiers detect successful picks and impending failures with over 90% accuracy, and Random Forest predicts pick/slip events within 0.09 s of human-annotated ground truth.
comment: IROS Conference Format, 6 pages, 6 figures, 1 table
A Quantitative Framework for Navigating Controller Design Tradeoffs under Computational Constraints
Computational constraints permeate the controller design process, and yet are rarely treated as explicit design constraints. Towards addressing this gap, we propose a quantitative framework that captures the effects of common design approximations, such as model order reduction, temporal discretization, horizon truncation, and solver accuracy, on both controller performance and computational requirements. Our framework highlights that these approximations are tunable parameters within an overall controller design process. By leveraging incremental input-to-state stability, we show that bounding the aggregate effects of these approximations reduces to verifying a design-dependent sector bound on the difference between the deployed policy and an idealized baseline, with stability enforced via a small-gain condition. We operationalize these insights via a Design Meta-Problem in which the performance gap is minimized subject to stability, real-time compute, and timing constraints. Finally, we instantiate the framework on a receding horizon LQR case study, and demonstrate a principled near-optimal navigation of tradeoffs among sampling rate, model order, horizon length, and solver iterations.
comment: 34 pages, 16 figures. Extended version
VISION-SLS: Safe Perception-Based Control from Learned Visual Representations via System Level Synthesis
We propose VISION-SLS, a method for nonlinear output-feedback control from high-resolution RGB images which provides robust constraint satisfaction guarantees under calibrated uncertainty bounds despite partial observability, sensor noise, and nonlinear dynamics. To enable scalability while retaining guarantees, we propose: (i) a learned low-dimensional observation map from pretrained visual features with state-dependent error bounds, and (ii) a causal affine time-varying output-feedback policy optimized via System Level Synthesis (SLS). We develop a scalable, novel solver for the resulting nonconvex program that leverages sequential convex programming coupled with efficient Riccati recursions. On two simulated visuomotor tasks (a 4D car and a 10D quadrotor) with >= 512 x 512 pixels and a 59D humanoid task with partial observability, our method enables safe, information-gathering behavior that reduces uncertainty while guaranteeing constraint satisfaction with empirically-calibrated error bounds. We also validate our method on hardware, safely controlling a ground vehicle from onboard images, outperforming baselines in safety rate and solve times. Together, these results show that learned visual abstractions coupled with an efficient solver make SLS-based safe visuomotor output-feedback practical at scale. The code implementation of our method is available at https://github.com/trustworthyrobotics/VISION-SLS.
comment: Extended version; conference version to appear in Robotics: Science and Systems XXII (RSS 2026)
Using Dynamic Safety Margins as Control Barrier Functions
This paper presents an approach to design control barrier functions (CBFs) for arbitrary state and input constraints using tools from the reference governor literature. In particular, it is shown that dynamic safety margins (DSMs) are CBFs for an augmented system obtained by concatenating the state with a virtual reference. The proposed approach is agnostic to the relative degree and can handle multiple state and input constraints using the control-sharing property of CBFs. The construction of CBFs using Lyapunov-based DSMs is then investigated in further detail. Numerical simulations show that the method outperforms existing DSM-based approaches, while also guaranteeing safety and persistent feasibility of the associated optimization program.
comment: 12 pages, 5 figures, 2 tables
PolyOCP.jl -- A Julia Package for Stochastic OCPs and MPC
The consideration of stochastic uncertainty in optimal and predictive control is a well-explored topic. Recently Polynomial Chaos Expansions (PCE) have received considerable attention for problems involving stochastically uncertain system parameters and also for problems with additive stochastic i.i.d. disturbances. While there exist a number of open-source PCE toolboxes, tailored open-source codes for the solution of OCPs involving additive stochastic i.i.d. disturbances in julia are not available. Hence, this paper introduces the toolbox PolyOCP$.$jl which enables to efficiently solve stochastic OCPs for linear systems subject to a large class of disturbance distributions. We explain the main mathematical concepts between the PCE transcription of stochastic OCPs and how they are provided in the toolbox. We draw upon two examples to illustrate the functionalities of PolyOCP$.$jl.
ModelPredictiveControl.jl: advanced process control made easy in Julia
Proprietary closed-source software is still the norm in advanced process control. Transparency and reproducibility are key aspects of scientific research. Free and open-source toolkit can contribute to the development, sharing and advancement of new and efficient control approaches, and the industrial sector will certainly benefit from them. This paper presents ModelPredictiveControl.jl, an open-source software package for designing model predictive controllers in the Julia programming language. It is designed to be easy to use and modular, while providing advanced features like nonlinear control and moving horizon estimation. It relies on powerful control system, mathematical optimization and automatic differentiation frameworks to simplify the construction and testing of state estimators and predictive controllers. It also integrates with the standard plotting library to quickly visualize closed-loop data. The paper presents the main functionalities and illustrates them with two case studies in simulation. The first example is a continuously stirred tank reactor described by linear dynamics. The second one implements a nonlinear, an economic, and a successive linearization model predictive controllers for an inverted pendulum. The solving times are benchmarked against equivalent implementations in MATLAB to show the efficiency of the package.
comment: 11 pages, 12 figures, 1 table
Accelerated Recovery with RIS: Designing Wireless Resilience in Mission-Critical Environments
As 6G and beyond redefine connectivity, wireless networks become the foundation of critical operations, making resilience more essential than ever. With this shift, wireless systems cannot only take on vital services previously handled by wired infrastructures but also enable novel innovative applications that would not be possible with wired systems. As a result, there is a pressing demand for strategies that can adapt to dynamic channel conditions, interference, and unforeseen disruptions, ensuring seamless and reliable performance in an increasingly complex environment. Despite considerable research, existing resilience assessments lack comprehensive key performance indicators (KPIs), especially those quantifying its adaptability, which are vital for identifying a system's capacity to rapidly adapt and reallocate resources. In this work, we bridge this gap by proposing a novel framework that explicitly quantifies the adaption performance by augmenting the gradient of the system's rate function. To further enhance the network resilience, we integrate Reconfigurable Intelligent Surfaces (RISs) into our framework due to their capability to dynamically reshape the propagation environment while providing alternative channel paths. Numerical results show that gradient augmentation enhances resilience by improving adaptability under adverse conditions while proactively preparing for future disruptions.
comment: 6 pages, 3 figures, submitted to Asilomar 2026
A hybrid dynamic model and parameter estimation method for accurately simulating overhead cranes with friction
This paper presents a new approach to accurately simulating 3D overhead cranes with friction. Although nonlinear friction dynamics has a significant impact on these systems, accurately modeling this phenomenon in simulations is a significant challenge. Traditional methods often rely on imprecise approximations of friction or require excessive computational times for reliable results. To address this, we present a hybrid dynamical model that features a trade-off between high-fidelity friction modeling and computational efficiency. Furthermore, we present a step-by-step algorithm for the comprehensive estimation of all unknown system parameters, including friction. This methodology is based on Bayesian Linear Regression and Least Squares (LS) estimations. Finally, experimental validation with a laboratory crane confirms the effectiveness of the proposed modeling and estimation approach.
comment: 10 pages, 12 figures. Major changes in all the sections
Beyond the Bellman Fixed Point: Geometry and Fast Policy Identification in Value Iteration
Q-value iteration (Q-VI) is usually analyzed through the \(γ\)-contraction of the Bellman operator. This argument proves convergence to \(Q^*\), but it gives only a coarse account of when the induced greedy policy becomes optimal. We study discounted Q-VI as a switching system and focus on the practically optimal solution set (POSS), the set of \(Q\)-functions whose tie-broken greedy policies are optimal. The main result shows that Q-VI reaches the optimal action class in finite time by entering an invariant tube around \(\mathcal X_1=Q^*+\operatorname{span}(\mathbf 1)\), which is contained in the POSS. For every \(\varepsilon>0\), the distance to \(\mathcal X_1\) satisfies an exponential bound with rate \((\barρ+\varepsilon)^k\), where \(\barρ\) is the joint spectral radius of the projected switching family restricted to directions transverse to \(\mathcal X_1\). When \(\barρ<γ\), this transverse convergence is faster than the classical contraction rate. The analysis separates fast policy identification from the subsequent convergence to \(Q^*\), which may still be governed by the all-ones mode. We also give spectral and graph-theoretic conditions under which the strict inequality \(\barρ<γ\) holds or fails.
Augmented Model Predictive Control: A Balance between Satellite Agility and Computation Complexity
Agile earth observation satellites employ multiple actuators to enable flexible and responsive imaging capabilities. While significant advancements in actuator technology have enhanced satellites' torque and momentum, relatively little attention has been given to control strategies specifically tailored to improve satellite agility. This paper provides a comparative analysis of different Model Predictive Control (MPC) formulations and introduces an augmented-MPC method that effectively balances agility requirements with hardware implementation constraints. The proposed method achieves the high-performance characteristics of nonlinear MPC while preserving the computational simplicity of linear MPC. Numerical simulations and physical experiments are conducted to validate the effectiveness and feasibility of the proposed approach.
comment: European Control Conference 2026
Communication-Aware Synthesis of Safety Controller for Networked Control Systems
Networked control systems (NCS) are widely used in safety-critical applications, but they are often analyzed under the assumption of ideal communication channels. This work focuses on the synthesis of safety controllers for discrete-time linear systems affected by unknown disturbances operating in imperfect communication channels. The proposed method guarantees safety by constructing ellipsoidal robust safety invariant (RSI) sets and verifying their invariance through linear matrix inequalities (LMI), which are formulated and solved as semi-definite programming (SDP). In particular, our framework simultaneously considers controller synthesis and communication errors without requiring explicit modeling of the communication channel. A case study on cruise control problem demonstrates that the proposed controller ensures safety in the presence of unexpected disturbances and multiple communication imperfections simultaneously.
Muscle Coactivation in the Sky: Geometry and Pareto Optimality of Energy vs. Aerodynamic Promptness and Multirotors as Variable Stiffness Actuators
In robotics and biomechanics, trading metabolic cost for kinematic readiness is a well-established principle. This paper formalizes this concept for aerial multirotors through the introduction of aerodynamic promptness -- a dynamic metric analogous to dynamic manipulability in robotics. By formulating redundancy resolution as a geometric multi-objective optimization along task fibers, we rigorously characterize the topological trade-off between energy consumption and promptness. We demonstrate that this interplay is fundamentally governed by fiber geometry. Cooperative actuation regime yields compact fibers with bounded, compatible Pareto fronts. Conversely, antagonistic actuation regime unlocks unbounded fibers, enabling aerodynamic co-contraction that drives promptness to hardware limits at the expense of flight endurance. We establish a structural isomorphism between aerodynamic co-contraction and biologically inspired variable stiffness actuators, introducing a dynamic ``flying muscle'' paradigm. Ultimately, this framework transitions multirotor allocation from heuristic energy minimization to principled, geometry-aware Pareto navigation, laying foundational theory for the design and control of highly agile aerial platforms.
comment: Accepted for IEEE ICUAS 2026
Introduction to Online Control
This text presents an introduction to an emerging paradigm in control of dynamical systems and differentiable reinforcement learning called online nonstochastic control. The new approach applies techniques from online convex optimization and convex relaxations to obtain new methods with provable guarantees for classical settings in optimal and robust control. The primary distinction between online nonstochastic control and other frameworks is the objective. In optimal control, robust control, and other control methodologies that assume stochastic noise, the goal is to perform comparably to an offline optimal strategy. In online nonstochastic control, both the cost functions as well as the perturbations from the assumed dynamical model are chosen by an adversary. Thus the optimal policy is not defined a priori. Rather, the target is to attain low regret against the best policy in hindsight from a benchmark class of policies. This objective suggests the use of the decision making framework of online convex optimization as an algorithmic methodology. The resulting methods are based on iterative mathematical optimization algorithms, and are accompanied by finite-time regret and computational complexity guarantees.
comment: Draft; comments/suggestions welcome at nonstochastic.control@gmail.com
Risk-Aware Aerocapture Guidance Through a Probabilistic Indicator Function
Aerocapture is sensitive to trajectory errors, particularly for low-cost missions with imprecise navigation. For such missions, considering the probability of each failure mode when computing guidance commands can increase capture rate. A risk-aware aerocapture guidance algorithm is proposed that uses a generative model-based probabilistic indicator function to estimate escape, impact, or capture probabilities. The probability of each mode is incorporated into corrective guidance commands to increase the likelihood of successful capture. The proposed method is evaluated against state-of-the-art numeric predictor-corrector guidance algorithms in high-uncertainty scenarios where entry interface dispersions lead to nontrivial failure probabilities. When using a probabilistic indicator function in guidance, 71.43% to 100% of recoverable cases are saved for a variety of initial distributions and atmosphere models. The probabilistic indicator function is capable of predicting failure probability for dispersions and atmosphere models outside its training data, showing generalizability. In addition, the probabilistic indicator is compared to a fading memory filter for density estimation, demonstrating improvements in accuracy when both are used in conjunction. The proposed risk-aware aerocapture guidance algorithm improves capture performance and robustness to entry interface state dispersions, especially for missions with high navigation uncertainty.
comment: Currently under revision for the AIAA Journal of Guidance Control and Dynamics
Approximate Model Predictive Control for Microgrid Energy Management via Imitation Learning
Efficient energy management is essential for reliable and sustainable microgrid operation amid increasing renewable integration. In this paper, an imitation learning-based framework to approximate mixed-integer Economic Model Predictive Control (EMPC) is proposed for microgrid energy management, considering fuel generators, renewable energy resources, a unified energy storage unit, and curtailable loads. Within the proposed framework, a neural network is trained to imitate expert EMPC control actions from offline trajectories, thereby enabling fast real-time decision making without solving online mixed-integer optimization problems, which often exhibit highly variable solution times across instances and do not scale well to large problem sizes; in particular, worst-case solve times can be excessively large and therefore unsuitable for real-time deployment. In contrast, the learned policy provides predictable and consistently low computation times. To enhance robustness and generalization, the learning process incorporates noise injection during training to mitigate distribution shift and explicitly accounts for forecast uncertainty in renewable generation and demand. Furthermore, a constraint-tightening approach combined with a projection layer is proposed to ensure recursive feasibility and constraint satisfaction of the learned controller. Simulation results demonstrate that the learned policy achieves economic performance comparable to EMPC, while reducing computation time by approximately one order of magnitude relative to the optimization-based EMPC.
comment: Submitted to Engineering Applications of Artificial Intelligence (EAAI) and IFAC WC 2026 (Accepted by the IFAC WC 2026) Main changes: (1) extensive simulations with real data; (2) formal feasibility and recursive feasibility guarantees using discrete-time control barrier functions
Robotics
Cooptimizing Safety and Performance Using Safety Value-Constrained Model Predictive Control
Autonomous systems are increasingly deployed in real-world environments, where they must achieve high performance while maintaining safety under state and input constraints. Although Model Predictive Control (MPC) provides a principled framework for constrained optimal control, guaranteeing safety beyond its finite planning horizon remains a fundamental challenge. In this work, we augment MPC with a safety value function-based terminal constraint that enforces membership in a control-invariant safe set at the end of each planning horizon. This formulation enables real-time synthesis of trajectories that are both high-performing and provably safe. We show that, under an exact safety value function and a feasible initialization, the proposed MPC scheme is recursively feasible, thereby ensuring persistent safety. In contrast to traditional terminal set constructions that rely on local linearizations or conservative approximations, our approach incorporates a reachability-based safety value function for terminal constraints, yielding less conservative and more expressive safety guarantees. We validate the proposed framework through simulation and hardware experiments on a Flexiv Rizon 10s manipulator. Results demonstrate improved constraint satisfaction and robustness compared to standard state-constrained MPC and reactive safety filtering, while maintaining competitive task performance. The full implementation and experiments are available on the project website.
Vision-Language-Action Safety: Threats, Challenges, Evaluations, and Mechanisms
Vision-Language-Action (VLA) models are emerging as a unified substrate for embodied intelligence. This shift raises a new class of safety challenges, stemming from the embodied nature of VLA systems, including irreversible physical consequences, a multimodal attack surface across vision, language, and state, real-time latency constraints on defense, error propagation over long-horizon trajectories, and vulnerabilities in the data supply chain. Yet the literature remains fragmented across robotic learning, adversarial machine learning, AI alignment, and autonomous systems safety. This survey provides a unified and up-to-date overview of safety in Vision-Language-Action models. We organize the field along two parallel timing axes, attack timing (training-time vs. inference-time and defense timing (training-time vs. inference-time, linking each class of threat to the stage at which it can be mitigated. We first define the scope of VLA safety, distinguishing it from text-only LLM safety and classical robotic safety, and review the foundations of VLA models, including architectures, training paradigms, and inference mechanisms. We then examine the literature through four lenses: Attacks, Defenses, Evaluation, and Deployment. We survey training-time threats such as data poisoning and backdoors, as well as inference-time attacks including adversarial patches, cross-modal perturbations, semantic jailbreaks, and freezing attacks. We review training-time and runtime defenses, analyze existing benchmarks and metrics, and discuss safety challenges across six deployment domains. Finally, we highlight key open problems, including certified robustness for embodied trajectories, physically realizable defenses, safety-aware training, unified runtime safety architectures, and standardized evaluation.
Unleashing the Agility of Wheeled-Legged Robots for High-Dynamic Reflexive Obstacle Evasion
Wheeled-legged robots combine the energy efficiency of wheeled locomotion with the terrain adaptability of legged systems, making them promising platforms for agile mobility in complex and dynamic environments. However, enabling high-dynamic reflexive evasion against fast-moving obstacles remains challenging due to the hybrid morphology, mode coupling, and non-holonomic constraints of such platforms. In this work, we propose AWARE, Adaptive Wheeled-Legged Avoidance and Reflexive Evasion, a hierarchical reinforcement learning framework for high-dynamic obstacle avoidance in wheeled-legged robots. The proposed system naturally exhibits diverse emergent gaits and evasive behaviors, including forward lunge and lateral dodge, thereby leveraging the robot's hybrid morphology to enhance agility under highly dynamic threats. Extensive experiments in Isaac Lab simulation and real-world deployment on the M20 platform across diverse dynamic scenarios demonstrate that AWARE achieves robust and agile obstacle avoidance while revealing behaviorally distinct evasive strategies. These results highlight both the practical effectiveness of AWARE and the intrinsic reflexive agility of wheeled-legged robots.
comment: 8 pages, 8 figures, 4 tables
QuietWalk: Physics-Informed Reinforcement Learning for Ground Reaction Force-Aware Humanoid Locomotion Under Diverse Footwear
Humanoid robots operating in human-centered environments (e.g., homes, hospitals, and offices) must mitigate foot--ground impact transients, as impact-induced vibration and noise degrade user experience and repeated impacts accelerate hardware wear. However, existing low-noise locomotion training often relies on kinematic proxy objectives or fragile force sensors, and footwear-induced changes in contact dynamics introduce distribution shifts that hinder policy generalization.We present QuietWalk, a physics-informed reinforcement learning framework for ground-reaction-force-aware humanoid locomotion under diverse footwear conditions. QuietWalk employs an inverse-dynamics-constrained physics-informed neural network (PINN) to estimate per-foot vertical ground reaction forces (GRFs) from proprioceptive signals, and integrates the frozen predictor into the RL training loop to penalize predicted impact forces without requiring force sensors at deployment.On a held-out real-robot dataset, enforcing inverse-dynamics consistency reduces vertical GRF prediction errors by 82%-86% compared with a purely supervised predictor and improves the coefficient of determination from 0.39/0.67 to 0.99/0.99 for the left/right feet. On hardware at 1.2 m/s (barefoot; averaged over four floor materials), QuietWalk reduces mean A-weighted noise level by 7.17 dB and peak noise level by 4.98 dB under a consistent recording setup. Cross-footwear experiments (barefoot, skate shoes, athletic sneakers, and high heels) across multiple surfaces further demonstrate robust adaptation to footwear-induced contact variations.
comment: 8 pages,8 figures
Real-Time Non-Contact Force Compensation for Wrist-Mounted Force/Torque Sensors in Haptic-Enabled Robotic Surgery Training IROS
Haptic feedback has been a long-missed feature in robotic-assisted surgery, one that would allow surgeons to perceive tissue properties and apply controlled forces during delicate procedures. Although commercial robotic systems have begun to integrate haptic technologies, their high costs limit accessibility for training and research purposes. To address this gap, we extend our previously developed low-cost robotic surgery training setup, RoboScope, by incorporating a wrist-mounted force/torque (F/T) sensor for haptic feedback training. Wrist-mounted sensing avoids many challenges associated with tip-mounted sensors but introduces additional non-contact forces, such as gravity, sensor bias, installation offsets, and associated torques, which compromise measurement accuracy. In this paper, we propose a robust real-time compensation method based on recursive least squares (RLS). This method eliminates the need for dataset collection and frequent recalibration while adapting to changing operating conditions. Experimental validation demonstrates that the proposed approach achieves over 95% error reduction in non-contact force compensation and more than 91% in non-contact torque compensation, significantly outperforming existing methods. These results highlight the potential of our approach for providing reliable haptic feedback in robotic surgery training and research.
comment: Submitted to 2026 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)
Decentralized Heterogeneous Multi-Robot Collaborative Exploration for Indoor and Outdoor 3D Environments
Heterogeneous multi-robot systems feature significant adaptability for complex environments. However, effective collaboration that fully exploits the robots' potential remains a core challenge. This paper proposes a decentralized collaborative framework for heterogeneous multi-robot systems to autonomously explore indoor and outdoor 3D environments. First, a basic perception map that integrates terrain and observation metrics is designed. Improved supervoxel segmentation is developed to simplify the map structure and form a high-level representation that supports lightweight communication. Second, the traversal and observation capabilities of heterogeneous robots are modeled to evaluate the requirements of task views derived from incomplete supervoxels. These task views are grouped by requirements and clustered to streamline assignment. Subsequently, the view-cluster assignment is formulated as a heterogeneous multi-depot multi-traveling salesman problem (HMDMTSP) that incorporates constraints between view-cluster requirements and robot capabilities. An improved genetic algorithm is developed to efficiently solve this problem while ensuring global consistency. Based on the assignments, redundant views within clusters are eliminated to refine exploration routes. Finally, conflicts between robots' motion paths are resolved. Simulations and field experiments in cluttered indoor and outdoor environments demonstrate that our approach effectively coordinates exploration tasks among heterogeneous robots, achieving superior exploration efficiency and communication savings compared to state-of-the-art approaches.
Safe Navigation in Unknown and Cluttered Environments via Direction-Aware Convex Free-Region Generation
Convex free regions provide a structured and optimization-friendly representation of collision-free space for robot navigation in unknown and cluttered environments. However, existing methods typically enlarge local collision-free regions mainly according to surrounding obstacle geometry. In cluttered environments, such strategies may fail to generate regions that both accommodate robot geometry and preserve traversable extension along candidate motion directions, thereby limiting downstream traversal, especially in narrow passages. Even when such a region is available, safe motion generation remains challenging, because safety checking at discretized trajectory samples does not guarantee continuously collision-free motion when robot geometry is modeled explicitly. To address these issues, we propose a navigation framework that jointly incorporates candidate motion directions and robot geometry into convex free-region generation, and achieves continuously collision-free motion through continuous-safe trajectory generation. Within each region, the framework performs geometry-aware target pose selection and trajectory generation, together with Lipschitz-based continuous safety certification and local refinement. The resulting free regions and candidate motions are maintained in a region-based graph to support incremental planning. Quantitative results in cluttered 2D navigation scenarios show that the proposed method generates free regions better aligned with downstream traversal and enables reliable collision-free navigation, while additional 3D and real-world experiments on a quadrupedal robot and a UAV demonstrate the extensibility and practical applicability of the framework. The open-source project can be found at https://github.com/ZhichengSong6/FRGraph.
Move-Then-Operate: Behavioral Phasing for Human-Like Robotic Manipulation
We present Move-Then-Operate, a Vision language action framework that explicitly decouples robotic manipulation into two distinct behavioral phases: coarse relocation (move) and contact-critical interaction (operate). Unlike monolithic policies that conflate these heterogeneous regimes, our architecture employs a dual-expert policy routed by a learnable phase selector, introducing a structural inductive bias that isolates phase-specific dynamics. Phase labels are automatically generated via an MLLM-based pipeline conditioned on lightweight contextual cues such as end-effector velocity and subtask decomposition to ensure alignment with human motor patterns. Evaluated on the RoboTwin2 benchmark, our method achieves an average success rate of $68.9\%$, outperforming the monolithic $π_0$ baseline by $24\%$. It matches or exceeds models trained on $10\times$ more data and reaches peak performance in $40\%$ fewer training steps, demonstrating that architectural disentanglement of move and operate phases is a highly effective and efficient strategy for mastering high-precision manipulation.
comment: 15 pages, 10 figures
Tube Diffusion Policy: Reactive Visual-Tactile Policy Learning for Contact-rich Manipulation
Contact-rich manipulation is central to many everyday human activities, requiring continuous adaptation to contact uncertainty and external disturbances through multi-modal perception, particularly vision and tactile feedback. While imitation learning has shown strong potential for learning complex manipulation behaviors, most existing approaches rely on action chunking, which fundamentally limits their ability to react to unforeseen observations during execution. This limitation becomes especially critical in contact-rich scenarios, where physical uncertainty and high-frequency tactile feedback demand rapid, reactive control. To address this challenge, we propose Tube Diffusion Policy (TDP), a novel reactive visual-tactile policy learning framework that bridges diffusion-based imitation learning with tube-based feedback control. By leveraging the expressive power of generative models, TDP learns an observation-conditioned feedback flow around nominal action chunks, forming an action tube that enables fast and adaptive reactions during execution. We evaluate TDP on the widely used Push-T benchmark and three additional challenging visual-tactile dexterous manipulation tasks. Across all benchmarks, TDP consistently outperforms state-of-the-art imitation learning baselines. Two real-world experiments further validate its robust reactivity under contact uncertainty and external disturbances. Moreover, the step-wise correction mechanism enabled by action tube significantly reduces the required denoising steps, making TDP well suited for real-time, high-frequency feedback control in contact-rich manipulation.
Learning to Identify Out-of-Distribution Objects for 3D LiDAR Anomaly Segmentation CVPR
Understanding the surrounding environment is fundamental in autonomous driving and robotic perception. Distinguishing between known classes and previously unseen objects is crucial in real-world environments, as done in Anomaly Segmentation. However, research in the 3D field remains limited, with most existing approaches applying post-processing techniques from 2D vision. To cover this lack, we propose a new efficient approach that directly operates in the feature space, modeling the feature distribution of inlier classes to constrain anomalous samples. Moreover, the only publicly available 3D LiDAR anomaly segmentation dataset contains simple scenarios, with few anomaly instances, and exhibits a severe domain gap due to its sensor resolution. To bridge this gap, we introduce a set of mixed real-synthetic datasets for 3D LiDAR anomaly segmentation, built upon established semantic segmentation benchmarks, with multiple out-of-distribution objects and diverse, complex environments. Extensive experiments demonstrate that our approach achieves state-of-the-art and competitive results on the existing real-world dataset and the newly introduced mixed datasets, respectively, validating the effectiveness of our method and the utility of the proposed datasets. Code and datasets are available at https://simom0.github.io/lido-page/.
comment: This paper has been accepted at the 2026 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
PhysCodeBench: Benchmarking Physics-Aware Symbolic Simulation of 3D Scenes via Self-Corrective Multi-Agent Refinement
Physics-aware symbolic simulation of 3D scenes is critical for robotics, embodied AI, and scientific computing, requiring models to understand natural language descriptions of physical phenomena and translate them into executable simulation environments. While large language models (LLMs) excel at general code generation, they struggle with the semantic gap between physical descriptions and simulation implementation. We introduce PhysCodeBench, the first comprehensive benchmark for evaluating physics-aware symbolic simulation, comprising 700 manually-crafted diverse samples across mechanics, fluid dynamics, and soft-body physics with expert annotations. Our evaluation framework measures both code executability and physical accuracy through automated and visual assessment. Building on this, we propose a Self-Corrective Multi-Agent Refinement Framework (SMRF) with three specialized agents (simulation generator, error corrector, and simulation refiner) that collaborate iteratively with domain-specific validation to produce physically accurate simulations. SMRF achieves 67.7 points overall performance compared to 36.3 points for the best baseline among evaluated SOTA models, representing a 31.4-point improvement. Our analysis demonstrates that error correction is critical for accurate physics-aware symbolic simulation and that specialized multi-agent approaches significantly outperform single-agent methods across the tested physical domains.
EgoLive: A Large-Scale Egocentric Dataset from Real-World Human Tasks
The advancement of robot learning is currently hindered by the scarcity of large-scale, high-quality datasets. While established data collection methods such as teleoperation and universal manipulation interfaces dominate current datasets, they suffer from inherent limitations in scalability and real-world deployability. Human egocentric video collection, by contrast, has emerged as a promising approach to enable scalable, natural and in-the-wild data collection. As such, we present EgoLive, a large-scale, high-quality egocentric dataset designed explicitly for robot manipulation learning. EgoLive establishes three distinctive technical advantages over existing egocentric datasets: first, it represents the largest open-source annotated egocentric dataset focused on real-world task-oriented human routines to date; second, it delivers leading data quality via a customized head-mounted capture device and comprehensive high-precision multi-modal annotations; third, all data is collected exclusively in unconstrained real-world scenarios and encompasses vertical field human working data, including home service, retail, and other practical work scenarios, providing superior diversity and ecological validity. With the introduction of EgoLive, we aim to provide the research community with a scalable, high-quality dataset that accelerates breakthroughs in generalizable robotic models and facilitates the real-world deployment of robot systems.
Large Language Model based Interactive Decision-Making for Autonomous Driving
In high-conflict mixed-traffic scenarios involving human-driven and autonomous vehicles, most existing autonomous driving systems default to overly conservative behaviors, lack proactive interaction, and consequently suffer from limited public acceptance. To mitigate intent misunderstandings and decision failures, we present a Large Language Model based interactive decision-making framework that augments scene understanding and intent-aware interaction to jointly improve safety and efficiency. The approach uses Object-Process Methodology to semantically model complex multi-vehicle scenes, abstracting low-level perceptual data into objects, processes, and relations, thereby streamlining reasoning over latent causal structure. Building on this representation, the Large Language Model parses both explicit and implicit intents of surrounding agents and, under jointly enforced safety and efficiency constraints, selects candidate maneuvers. We further generate perturbed trajectory candidates via Monte Carlo sampling and evaluate them to obtain an optimized executable trajectory. To foster transparency and coordination with nearby road users, the final decision is translated by the Large Language Model into concise natural-language messages and broadcast through an external Human-Machine Interface, completing a closed loop from scene understanding to action to language. Experiments in a cluster driving simulator demonstrate that the proposed method outperforms traditional baselines across safety, comfort, and efficiency metrics, while a Turing-test-style evaluation indicates a high degree of human-likeness in decision making. Besides, these results suggest that coupling semantic scene abstraction with Large Language Model mediated intent reasoning and language-based eHMI communication offers a practical pathway toward interactive, trustworthy autonomous driving in dense mixed traffic.
comment: Accepted by Journal of Traffic and Transportation Engineering (English Edition)
EL3DD: Extended Latent 3D Diffusion for Language Conditioned Multitask Manipulation
Acting in human environments is a crucial capability for general-purpose robots, necessitating a robust understanding of natural language and its application to physical tasks. This paper seeks to harness the capabilities of diffusion models within a visuomotor policy framework that merges visual and textual inputs to generate precise robotic trajectories. By employing reference demonstrations during training, the model learns to execute manipulation tasks specified through textual commands within the robot's immediate environment. The proposed research aims to extend an existing model by leveraging improved embeddings, and adapting techniques from diffusion models for image generation. We evaluate our methods on the CALVIN dataset, proving enhanced performance on various manipulation tasks and an increased long-horizon success rate when multiple tasks are executed in sequence. Our approach reinforces the usefulness of diffusion models and contributes towards general multitask manipulation.
comment: 10 pages; 2 figures; 1 table
SLAM&Render: A Benchmark for the Intersection Between Neural Rendering, Gaussian Splatting and SLAM IROS 2026
Models and methods originally developed for Novel View Synthesis and Scene Rendering, such as Neural Radiance Fields (NeRF) and Gaussian Splatting, are increasingly being adopted as representations in Simultaneous Localization and Mapping (SLAM). However, existing datasets fail to include the specific challenges of both fields, such as sequential operations and, in many settings, multi-modality in SLAM or generalization across viewpoints and illumination conditions in neural rendering. Additionally, the data are often collected using sensors which are handheld or mounted on drones or mobile robots, which complicates the accurate reproduction of sensor motions. To bridge these gaps, we introduce SLAM&Render, a novel dataset designed to benchmark methods in the intersection between SLAM, Novel View Rendering and Gaussian Splatting. Recorded with a robot manipulator, it uniquely includes 40 sequences with time-synchronized RGB-D images, IMU readings, robot kinematic data, and ground-truth pose streams. By releasing robot kinematic data, the dataset also enables the assessment of recent integrations of SLAM paradigms within robotic applications. The dataset features five setups with consumer and industrial objects under four controlled lighting conditions, each with separate training and test trajectories. All sequences are static with different levels of object rearrangements and occlusions. Our experimental results, obtained with several baselines from the literature, validate SLAM&Render as a relevant benchmark for this emerging research area.
comment: 9 pages, 8 figures, 7 tables. Submitted to IROS 2026
Using Language Models as Closed-Loop High-Level Planners for Robotics Applications: A Brief Overview and Benchmarks
Large Language Models (LLMs) and Vision Language Models (VLMs) have become popular tools for embodied high-level planning. However, their deployment in black-box settings often leads to unpredictable or costly errors. To harness their capabilities more reliably in robotic systems, we empirically investigate practical strategies for integrating language models as closed-loop planners. Concretely, we study how the control horizon and warm-starting impact the performance of language model-based planners. We design and conduct controlled experiments to extract actionable insights, providing recommendations that can help improve the performance and robustness of language model-based embodied planning. The full implementation and experiments are available on the project website
Equitable Routing--Rethinking the Multiple Traveling Salesman Problem
The Multiple Traveling Salesman Problem (MTSP) extends the traveling salesman problem by assigning multiple salesmen to visit a set of targets from a common depot, with each target visited exactly once while minimizing total tour length. A common variant, the min-max MTSP, focuses on workload balance by minimizing the longest tour, but it is difficult to solve optimally due to weak linear relaxation bounds. This paper introduces two new parametric fairness-driven variants of the MTSP: the $\varepsilon$-Fair-MTSP and the $Δ$-Fair-MTSP, which promote equitable distribution of tour lengths while controlling overall cost. The $\varepsilon$-Fair-MTSP is formulated as a mixed-integer second-order cone program, while the $Δ$-Fair-MTSP is modeled as a mixed-integer linear program. We develop algorithms that guarantee global optimality for both formulations. Computational experiments on benchmark instances and real-world applications, including electric vehicle fleet routing, demonstrate their effectiveness. Furthermore, we show that the algorithms presented for the fairness-constrained MTSP variants can be used to obtain the Pareto front of a bi-objective optimization problem in which one objective minimizes the total tour length and the other balances the lengths of the individual tours. Overall, these fairness-constrained MTSP variants provide a practical and flexible alternative to the min-max MTSP.
comment: 30 pages
Compiling OpenSCENARIO 2.1 for Scenario-Based Testing in CARLA
While the ASAM OpenSCENARIO 2.1 Domain-Specific Language (DSL) enables declarative, intent-driven authoring for Scenario-Based Testing (SBT), its integration into open-source simulators like CARLA remains limited by legacy parsers. We propose a multi-pass modern compiler architecture that translates the OpenSCENARIO 2.1 DSL directly into executable CARLA behaviors. The pipeline features an ANTLR4 frontend for Abstract Syntax Tree (AST) generation, a semantic middle-end, and a runtime backend that synthesizes deterministic py_trees behavior trees. Mapping the standardized domain ontology directly to CARLA's procedural API via a custom method registry eliminates the need for external logic solvers. A demonstrative multi-actor cut-in and evasive maneuver, selected from a wider suite of validated scenarios, confirms the compiler's ability to process concurrent actions, dynamic mathematical expressions, and asynchronous signaling. This framework establishes a functional baseline for reproducible, large-scale SBT, paving the way for future C++ optimizations to mitigate current Python-based computational overhead.
Affordance-R1: Reinforcement Learning for Generalizable Affordance Reasoning in Multimodal Large Language Model
Affordance grounding focuses on predicting the specific regions of objects that are associated with the actions to be performed by robots. It plays a vital role in the fields of human-robot interaction, human-object interaction, embodied manipulation, and embodied perception. Existing models often neglect the affordance shared among different objects because they lack the Chain-of-Thought(CoT) reasoning abilities, limiting their out-of-domain (OOD) generalization and explicit reasoning capabilities. To address these challenges, we propose Affordance-R1, the first unified affordance grounding framework that integrates cognitive CoT guided Group Relative Policy Optimization (GRPO) within a reinforcement learning paradigm. Specifically, we designed a sophisticated affordance function, which contains format, perception, and cognition rewards to effectively guide optimization directions. Furthermore, we constructed a high-quality affordance-centric reasoning dataset, ReasonAff, to support training. Trained exclusively via reinforcement learning with GRPO and without explicit reasoning data, Affordance-R1 achieves robust zero-shot generalization and exhibits emergent test-time reasoning capabilities. Comprehensive experiments demonstrate that our model outperforms well-established methods and exhibits open-world generalization. To the best of our knowledge, Affordance-R1 is the first to integrate GRPO-based RL with reasoning into affordance reasoning. The code of our method and our dataset is released on https://github.com/hq-King/Affordance-R1.
Safer Trajectory Planning with CBF-guided Diffusion Model for Unmanned Aerial Vehicles
Safe and agile trajectory planning is essential for autonomous systems, especially during complex aerobatic maneuvers. Motivated by the recent success of diffusion models in generative tasks, this paper introduces AeroTrajGen, a novel framework for diffusion-based trajectory generation that incorporates control barrier function (CBF)-guided sampling during inference, specifically designed for unmanned aerial vehicles (UAVs). The proposed CBF-guided sampling addresses two critical challenges: (1) mitigating the inherent unpredictability and potential safety violations of diffusion models, and (2) reducing reliance on extensively safety-verified training data. During the reverse diffusion process, CBF-based guidance ensures collision-free trajectories by seamlessly integrating safety constraint gradients with the diffusion model's score function. The model features an obstacle-aware diffusion transformer architecture with multi-modal conditioning, including trajectory history, obstacles, maneuver styles, and goal, enabling the generation of smooth, highly agile trajectories across 14 distinct aerobatic maneuvers. Trained on a dataset of 2,000 expert demonstrations, AeroTrajGen is rigorously evaluated in simulation under multi-obstacle environments. Simulation results demonstrate that CBF-guided sampling reduces collision rates by 94.7% compared to unguided diffusion baselines, while preserving trajectory agility and diversity. Our code is open-sourced at https://github.com/RoboticsPolyu/CBF-DMP.
comment: Some equations and sentences need to be checked again and will be uploaded again
Safety-aware Goal-oriented Semantic Sensing, Communication, and Control for Robotics
Wirelessly-connected robotic systems empower robots with real-time intelligence by leveraging remote computing resources for decision-making. However, the data exchange between robots and edge servers often overwhelms communication links, introducing latency that degrades task performance. To tackle this, goal-oriented semantic communication (GSC) has been introduced for wirelessly-connected robotic systems to extract and transmit only goal-relevant semantic representations. While this improves task effectiveness, it generally overlooks practical safety requirements. Meanwhile, existing robotics research often treats safety primarily as a control-level problem, without systematically considering safety across sensing, communication, and control in a closed-loop manner. To bridge this gap, we investigate how to enable safety-aware goal-oriented semantic (SA-GS) sensing, communication, and control co-design in wirelessly-connected robotic systems, aiming to maximize the robotic task effectiveness subject to practical safety requirements. We first introduce {an} architecture {for} wirelessly-connected robotic systems and representative use cases. We then summarize general safety requirements and effectiveness metrics across the use cases. Next, we systematically analyze the unique safety and effectiveness challenges in sensing, communication, and control. Based on these, we further present potential SA-GS research directions. Finally, an Unmanned Aerial Vehicle (UAV) target tracking case study validates that one of the presented SA-GS research directions, i.e., semantic-based C\&C packet execution, could significantly improve safety rate and tracking success rate by more than 2 times and 4.5 times, respectively.
comment: 7 pages. This paper has been submitted to the IEEE Wireless Communications Magazine
DextER: Language-driven Dexterous Grasp Generation with Embodied Reasoning CVPR 2026
Language-driven dexterous grasp generation requires the models to understand task semantics, 3D geometry, and complex hand-object interactions. While vision-language models have been applied to this problem, existing approaches directly map observations to grasp parameters without intermediate reasoning about physical interactions. We present DextER, Dexterous Grasp Generation with Embodied Reasoning, which introduces contact-based embodied reasoning for multi-finger manipulation. Our key insight is that predicting which hand links contact where on the object surface provides an embodiment-aware intermediate representation, bridging task semantics with physical constraints. DextER autoregressively generates embodied contact tokens specifying which finger links contact where on the object surface, followed by grasp tokens encoding the hand configuration. On DexGYS, DextER achieves 67.14% success rate, outperforming state-of-the-art by 3.83 p.p. with 96.4% improvement in intention alignment. We also demonstrate steerable generation through partial contact specification, providing fine-grained control over grasp synthesis.
comment: CVPR 2026, Project page: https://junha-l.github.io/dexter/
A Reconfigured Wheel-Legged Robot for Enhanced Steering and Adaptability
Wheel-legged robots integrate leg agility on rough terrain with wheel efficiency on flat ground. However, most existing designs do not fully capitalize on the benefits of both legged and wheeled structures, which limits overall system flexibility and efficiency. We present FLORES, a novel wheel-legged robot design featuring a distinctive front-leg configuration that sets it beyond standard design approaches. Specifically, FLORES replaces the conventional hip-roll degree of freedom (DoF) of the front leg with hip-yaw DoFs, and this allows for efficient movement on flat surfaces while ensuring adaptability when navigating complex terrains. This innovative design facilitates seamless transitions between different locomotion modes (i.e., legged locomotion and wheeled locomotion) and optimizes the performance across varied environments. To fully exploit \flores's mechanical capabilities, we develop a tailored reinforcement learning (RL) controller that adapts the Hybrid Internal Model (HIM) with a customized reward structure optimized for our unique mechanical configuration. This framework enables the generation of adaptive, multi-modal locomotion strategies that facilitate smooth transitions between wheeled and legged movements. Furthermore, our distinctive joint design enables the robot to exhibit novel and highly efficient locomotion gaits that capitalize on the synergistic advantages of both locomotion modes. Through comprehensive experiments, we demonstrate FLORES's enhanced steering capabilities, improved navigation efficiency, and versatile locomotion across various terrains. The open-source project can be found at https://github.com/ZhichengSong6/FLORES.
ESPADA: Execution Speedup via Semantics Aware Demonstration Data Downsampling for Imitation Learning
Behavior-cloning based visuomotor policies enable precise manipulation but often inherit the slow, cautious tempo of human demonstrations, limiting practical deployment. However, prior studies on acceleration methods mainly rely on statistical or heuristic cues that ignore task semantics and can fail across diverse manipulation settings. We present ESPADA, a semantic and spatially aware framework that segments demonstrations using a VLM-LLM pipeline with 3D gripper-object relations, enabling aggressive downsampling only in non-critical segments while preserving precision-critical phases, without requiring extra data or architectural modifications, or any form of retraining. To scale from a single annotated episode to the full dataset, ESPADA propagates segment labels via Dynamic Time Warping (DTW) on dynamics-only features. Across both simulation and real-world experiments with ACT and DP baselines, ESPADA achieves approximately a 2x speed-up while maintaining success rates, narrowing the gap between human demonstrations and efficient robot control.
comment: project page: https://project-espada.github.io/espada/
Generative Control as Optimization: Time Unconditional Flow Matching for Adaptive and Robust Robotic Control
Diffusion models and flow matching have become a cornerstone of robotic imitation learning, yet they suffer from a structural inefficiency where inference is often bound to a fixed integration schedule that is agnostic to state complexity. This paradigm forces the policy to expend the same computational budget on trivial motions as it does on complex tasks. We introduce Generative Control as Optimization (GeCO), a time-unconditional framework that transforms action synthesis from trajectory integration into iterative optimization. GeCO learns a stationary velocity field in the action-sequence space where expert behaviors form stable attractors. Consequently, test-time inference becomes an adaptive process that allocates computation based on convergence--exiting early for simple states while refining longer for difficult ones. Furthermore, this stationary geometry yields an intrinsic, training-free safety signal, as the field norm at the optimized action serves as a robust out-of-distribution (OOD) detector, remaining low for in-distribution states while significantly increasing for anomalies. We validate GeCO on standard simulation benchmarks and demonstrate seamless scaling to pi0-series Vision-Language-Action (VLA) models. As a plug-and-play replacement for standard flow-matching heads, GeCO improves success rates and efficiency with an optimization-native mechanism for safe deployment. Video and code can be found at https://hrh6666.github.io/GeCO/
comment: 18 pages, 6 figures
Advancing Remote Medical Palpation through Cognition and Emotion
Medical palpation is more than force transmission. It is a bidirectional cognitive and emotional exchange between doctor and patient. We model two complementary touch pathways: active touch by the doctor (kinesthetic and tactile) and passive touch by the patient (subjective and emotional). We use this framework to design a mixed-reality telepalpation prototype and evaluate it with 14 experienced clinicians serving as both doctors and patients across 391 trials. Touch location was transmitted reliably across participants, while force perception showed systematic inter-individual variation, suggesting that force alone is insufficient to characterize the palpation experience.
RobotPan: A 360$^\circ$ Surround-View Robotic Vision System for Embodied Perception
Surround-view perception is increasingly important for robotic navigation and loco-manipulation, especially in human-in-the-loop settings such as teleoperation, data collection, and emergency takeover. However, current robotic visual interfaces are often limited to narrow forward-facing views, or, when multiple on-board cameras are available, require cumbersome manual switching that interrupts the operator's workflow. Both configurations suffer from motion-induced jitter that causes simulator sickness in head-mounted displays. We introduce a surround-view robotic vision system that combines six cameras with LiDAR to provide full 360$^\circ$ visual coverage, while meeting the geometric and real-time constraints of embodied deployment. We further present \textsc{RobotPan}, a feed-forward framework that predicts \emph{metric-scaled} and \emph{compact} 3D Gaussians from calibrated sparse-view inputs for real-time rendering, reconstruction, and streaming. \textsc{RobotPan} lifts multi-view features into a unified spherical coordinate representation and decodes Gaussians using hierarchical spherical voxel priors, allocating fine resolution near the robot and coarser resolution at larger radii to reduce computational redundancy without sacrificing fidelity. To support long sequences, our online fusion updates dynamic content while preventing unbounded growth in static regions by selectively updating appearance. Finally, we release a multi-sensor dataset tailored to 360$^\circ$ novel view synthesis and metric 3D reconstruction for robotics, covering navigation, manipulation, and locomotion on real platforms. Experiments show that \textsc{RobotPan} achieves competitive quality against prior feed-forward reconstruction and view-synthesis methods while producing substantially fewer Gaussians, enabling practical real-time embodied deployment.
comment: Project website: https://robotpan.github.io/
Multiagent Systems
EndoGov: A knowledge-governed multi-agent expert system for endometrial cancer risk stratification
Multimodal artificial intelligence models for endometrial cancer (EC) risk stratification typically optimize aggregate predictive performance but provide limited mechanisms for enforcing mandatory guideline overrides, such as assigning POLE-mutated tumors to the low-risk group despite high-grade morphology. We present EndoGov, a two-tier multi-agent expert system that factorizes the decision process as D(x) = G(P(x), R), where specialist agents P extract structured evidence and a governance agent G applies an executable rule set R. Tier 1 comprises pathology, molecular, and clinical agents that independently generate schema-constrained reports from frozen foundation-model features or structured records. Tier 2 queries an evidence-level-weighted Guideline Knowledge Graph, using deterministic hard-path rules for high-priority overrides and constrained soft-path reasoning for ambiguous cases. In TCGA-UCEC (n=541), EndoGov achieved 0.943 accuracy, 0.973 macro AUC, and a conditional logic-violation rate (C-LVR) of 0.93% among trigger-exposed cases. In CPTAC-UCEC (n=95), where reference labels are guideline-derived, EndoGov reached 0.842 accuracy compared with < 0.31 for locked-transfer neural baselines, supporting governance-pathway transfer under distribution shift rather than validation against independent clinical truth. End-to-end safety decomposition localized residual failures primarily to upstream molecular detection rather than downstream governance. Backend-swap experiments further showed that hard-path compliance is invariant to the LLM backend. These findings indicate that explicit clinical-rule governance can provide guideline-compliant, auditable EC risk assignment while preserving competitive discrimination.
Information-Theoretic Measures in AI: A Practical Decision Guide
Information-theoretic (IT) measures are ubiquitous in artificial intelligence: entropy drives decision-tree splits and uncertainty quantification, cross-entropy is the default classification loss, mutual information underpins representation learning and feature selection, and transfer entropy reveals directed influence in dynamical systems. A second, less consolidated family of measures, integrated information (Phi), effective information (EI), and autonomy, has emerged for characterizing agent complexity. Despite wide adoption, measure selection is often decoupled from estimator assumptions, failure modes, and safe inferential claims. This paper provides a practical decision framework for all seven measures, organized around three prescriptive questions for each: (i) what question does the measure answer and in which AI context; (ii) which estimator is appropriate for the data type and dimensionality; and (iii) what is the most dangerous misuse. The framework is operationalized in two complementary artifacts: a measure-selection flowchart and a master decision table. We cover both AI/ML and decision-making agent application domains per measure, with standardized Bridge Boxes linking IT quantities to cognitive constructs. Three worked examples illustrate the framework on concrete practitioner scenarios spanning representation learning, temporal influence analysis, and evolved agent complexity.
comment: 25 pages, 2 tables, 1 figure. Submitted to Applied Intelligence (Springer)
DLM: Unified Decision Language Models for Offline Multi-Agent Sequential Decision Making
Building scalable and reusable multi-agent decision policies from offline datasets remains a challenge in offline multi-agent reinforcement learning (MARL), as existing methods often rely on fixed observation formats and action spaces that limit generalization. In contrast, large language models (LLMs) offer a flexible modeling interface that can naturally accommodate heterogeneous observations and actions. Motivated by this, we propose the Decision Language Model (DLM), which formulates multi-agent decision making as a dialogue-style sequence prediction problem under the centralized training with decentralized execution paradigm. DLM is trained in two stages: a supervised fine-tuning phase, which leverages dialogue-style datasets for centralized training with inter-agent context and generates executable actions from offline trajectories, followed by a group relative policy optimization phase to enhance robustness to out-of-distribution actions through lightweight reward functions. Experiments on multiple benchmarks show that a unified DLM outperforms strong offline MARL baselines and LLM-based conversational decision-making methods, while demonstrating strong zero-shot generalization to unseen scenarios across tasks.
comment: 22 pages, 11 figures
Breaking the Secret: Economic Interventions for Combating Collusion in Embodied Multi-Agent Systems
Collusion among autonomous agents poses a critical security threat in embodied multi-agent systems (MAS), where coordinated behaviors can deviate from global objectives and lead to real-world consequences. Existing defenses, primarily based on identity control or post-hoc behavior analysis, are insufficient to address such threats in embodied settings due to delayed feedback and noisy observations in physical environments, which make behavioral deviations difficult to detect accurately and in a timely manner. To address this challenge, we propose a mutagenic incentive intervention approach that mitigates collusion by reshaping agents' payoff structures. By rewarding agents who report collusive behavior and penalizing identified participants, the mechanism induces strategic defection and renders collusion unstable. We further design supporting mechanisms, including reporting deposits, smart contract-based reward enforcement, and encrypted communication, to ensure robustness against misuse of the incentive mechanism and retaliation from penalized agents. We implement the proposed approach in both simulated and real-world embodied environments. Experimental results show that our method effectively suppresses collusion by inducing defection, while preserving system efficiency. It achieves performance comparable to the non-collusion baseline and outperforms representative reactive defenses, thereby fulfilling the desired security objectives. These results demonstrate the effectiveness of proactive incentive design as a practical paradigm for securing embodied multi-agent systems.
Benchmarking Emergent Coordination in Large-Scale LLM Populations: An Evaluation Framework on the MoltBook Archive
As multi-agent Large Language Model (LLM) systems scale, evaluating their emergent coordination dynamics becomes increasingly critical. However, current evaluation paradigms-focused on single agents or small, explicitly structured groups-fail to capture the self-organization and viral information dynamics that arise in large, decentralized populations. We introduce a systematic evaluation framework to benchmark role specialization, information diffusion, and cooperative task resolution in open agent environments. We demonstrate this framework on the MoltBook Observatory Archive, a dataset of 2.73M interactions among 90,704 autonomous agents, establishing quantitative baselines for emergent coordination. Our evaluation reveals a pronounced core-periphery structure (silhouette 0.91), heavy-tailed cascade distributions ($α= 2.57$), and severe coordination overhead in decentralized task resolution (Cohen's $d = -0.88$ against a single-agent baseline). By providing standardized evaluation tasks and empirical baselines, our framework enables the rigorous comparison of future multi-agent protocols and establishes evaluation itself as an object of scientific study.
Systems and Control (EESS)
Machine Learning and Deep Learning Models for Short Term Electricity Price Forecasting in Australia's National Electricity Market
Short term electricity price forecast is essential in competitive power markets, yet electricity price series exhibit high volatility, irregularity, and non-stationarity. This phenomenon is pronounced in the South Australian region of the National Electricity Market, where high renewable penetration drives price volatility and frequent negative price intervals, while structural changes such as the transition to five-minute settlement further complicate forecast. To address these challenges, this study develops a unified benchmark framework. Under identical data preprocessing, feature engineering with lag features, rolling statistics, cyclic temporal encodings, and so on, and an 85% to 15% chronological train test split, six algorithms are systematically compared, including AWMLSTM, CatBoost, GBRT, LSTM, LightGBM, and SVR. The results show that for price prediction, tree-based models, especially GBRT with an R squared value of 0.88, generally outperform LSTM and SVR. However, all models achieve a mean absolute percentage error above 90%, and more than 65% of GBRT predictions have relative errors above 10%, which highlights the inherent difficulty of price forecast. For demand prediction, all models perform substantially better than in price prediction. AWMLSTM and GBRT achieve an R2 value of 0.96 with mean absolute percentage error below 32%, and GBRT has 74.37% of samples within 5% error, while LSTM and SVR perform less accurately in both tasks. Future improvements should focus on hybrid models such as tree plus transformers, data augmentation for extreme events, and error correction to better capture price spikes.
comment: 28 pages, 5 figures
Isospectral Steering
We study the controllability of the differential Lyapunov equation under isospectral rotation of a linear gradient field. Specifically, control is effected by a symmetric time-varying gain-matrix constrained to have fixed eigenvalues; that is, by exclusively modulating the eigen-vectors of the state matrix and not its eigenvalues. Motivation for this problem stems from a certain type of control objectives (minimum shear/attention) aimed to reduce anisotropic deformation when ensembles are steered by a common law--optimality necessitates constancy of eigenvalues. In the paper we introduce and motivate this type of isospectral steering, and describe the reachable set of covariances for any specified terminal time and eigenvalues of the gain matrix. The theory we develop is intimately linked to multilinear algebra as well as to positive linear algebra and the Birkoff-von Neumann theorem for doubly stochastic matrices.
comment: 16 pages, 2 figures
Cooptimizing Safety and Performance Using Safety Value-Constrained Model Predictive Control
Autonomous systems are increasingly deployed in real-world environments, where they must achieve high performance while maintaining safety under state and input constraints. Although Model Predictive Control (MPC) provides a principled framework for constrained optimal control, guaranteeing safety beyond its finite planning horizon remains a fundamental challenge. In this work, we augment MPC with a safety value function-based terminal constraint that enforces membership in a control-invariant safe set at the end of each planning horizon. This formulation enables real-time synthesis of trajectories that are both high-performing and provably safe. We show that, under an exact safety value function and a feasible initialization, the proposed MPC scheme is recursively feasible, thereby ensuring persistent safety. In contrast to traditional terminal set constructions that rely on local linearizations or conservative approximations, our approach incorporates a reachability-based safety value function for terminal constraints, yielding less conservative and more expressive safety guarantees. We validate the proposed framework through simulation and hardware experiments on a Flexiv Rizon 10s manipulator. Results demonstrate improved constraint satisfaction and robustness compared to standard state-constrained MPC and reactive safety filtering, while maintaining competitive task performance. The full implementation and experiments are available on the project website.
Architectural Isolation as a Timing Safety Primitive for Edge AI Medical Devices: Controlled Experimental Evidence on a Shared-Silicon Platform
A system can satisfy accuracy-based validation, maintain output stability (Safety-Threshold Exceedance Rate, STER, equal to zero), and still violate timing constraints under deployment load. These are structurally independent properties that current pre-market validation protocols often do not operationalize at the inference layer. This letter demonstrates their independence through a controlled same-hardware experiment: identical MobileNetV2 models are evaluated under identical adversarial load on two execution paths of the same NVIDIA Jetson Orin Nano Super, a dedicated GPU accelerator (TensorRT FP16, half-precision floating point) and a general-purpose CPU (ONNX Runtime FP32, single-precision floating point). Both paths maintain STER = 0; the CPU path (ONNX Runtime FP32) degrades 7.2x under combined load (mean latency 9.8x higher than the GPU path (TensorRT FP16), which maintains latency below 11 ms), breaching the 10 Hz clinical cycle budget by 65%. Joint STER and latency verification is proposed as a candidate method for operationalizing U.S. FDA Draft Guidance FDA-2024-D-4488 robustness requirements at the inference layer, subject to regulatory review and clinical validation.
comment: 10 pages, 3 figures, 5 tables. Submitted to IEEE Embedded Systems Letters
On the Generalization Properties of Selective State-Space Models for Filtering Tasks for Unknown Systems
Selective State-Space Models (SSMs) such as Mamba have emerged as an alternative architecture to self-attention based transformers in sequence modeling tasks. Recent works have demonstrated the use of transformers in some filtering and output prediction tasks via in-context learning. In this paper, we analyze whether structured SSMs can work equally well for filtering of unknown systems. In particular, we train the SSM on trajectory samples from a set of systems. At run-time, the SSM is given the outputs of an unknown system from the same set and is expected to predict the next output online. Theoretically, under appropriate assumptions, we derive generalization bounds as to why SSMs succeed in such tasks. Empirically, we demonstrate the performance via several numerical examples. We also discuss the advantages and disadvantages of SSMs versus transformers for this task.
comment: Conference on Decision and Control 2026, 8 pages, 4 figures
An Individual-Delay-Reflected Generalized Consensus Analysis for Multi-Agent Systems with Heterogeneous Time-Varying Delays
In multi-agent systems, heterogeneous time delays exist for all agents because of the difference in communication environments. Therefore, the consensus analysis of a system considering a homogeneous time-varying delay among all agents results in conservatism. In this study, an individual-delay-reflected generalized consensus is proposed for multi-agent systems with heterogeneous time-varying delays with various bounds. To reflect heterogeneous time-varying delays, the proposed Lyapunov-Krasovskii functional is constructed by dividing the integral term into intervals containing heterogeneous delays and considering augmented vectors with delay states and integral states. Furthermore, by adding zero equality conditions, conservatism is reduced. N-dependent generalized integral inequality is used to allow the user to adjust the computational complexity. Numerical examples demonstrate a reduction in conservatism with the proposed consensus criterion.
comment: 7 pages, 3 figures. Extended version of a paper accepted to IFAC World Congress 2026
Defining the Magnetization State of LCF Magnets: From Material Properties to Motor-Level Metrics
Variable flux memory motors, which employ Low Coercive Force (LCF) magnets, achieve extended high-efficiency operation through controllable magnetization states. To address the need for a unified approach to defining and comparing the magnetization state (MS) across material and motor levels, this paper proposes four MS definitions: two based on intrinsic material properties-magnetic flux density B and magnetic polarization J-and two based on motor-level quantities-fundamental flux linkage and back-EMF components. These definitions are evaluated across the id, iq operating plane using finite element analysis on an interior PMSM with a hybrid magnet configuration (LCF and HCF: High Coercive Force) and a defined circuit setup. The results clarify the relationship between material-level behavior and measurable motor quantities. The proposed framework provides guidance for selecting appropriate MS metrics depending on the application objective, whether for material analysis, control implementation, or condition monitoring in variable flux machines.
Unified Energy Function Tailored to Inverter-Based Resources with PI Controllers for Transient Stability Analysis
The increasing penetration of inverter-based resources (IBRs) has fundamentally altered the transient stability characteristics of modern power systems. IBRs typically rely on proportional--integral (PI) controllers for synchronization and regulation, resulting in nonlinear swing equations that differ significantly from those of synchronous generators (SGs) and exhibit state-dependent damping. Consequently, although the classical energy function is often adopted in IBR analysis by analogy with SGs, it cannot be directly applied to IBRs with PI controller. A new energy function explicitly tailored to PI controller is proposed in this letter. It admits a unified form and can be applied to a class of nonlinear systems with PI controllers. Two representative cases are considered, including a grid-following (GFL) inverter and a DC-voltage-controlled grid-forming (GFM) inverter, demonstrating less conservative and more effective estimation of the region of attraction (ROA). All findings are verified through hardware-in-the-loop (HIL) experiments.
Real-Time Non-Contact Force Compensation for Wrist-Mounted Force/Torque Sensors in Haptic-Enabled Robotic Surgery Training IROS
Haptic feedback has been a long-missed feature in robotic-assisted surgery, one that would allow surgeons to perceive tissue properties and apply controlled forces during delicate procedures. Although commercial robotic systems have begun to integrate haptic technologies, their high costs limit accessibility for training and research purposes. To address this gap, we extend our previously developed low-cost robotic surgery training setup, RoboScope, by incorporating a wrist-mounted force/torque (F/T) sensor for haptic feedback training. Wrist-mounted sensing avoids many challenges associated with tip-mounted sensors but introduces additional non-contact forces, such as gravity, sensor bias, installation offsets, and associated torques, which compromise measurement accuracy. In this paper, we propose a robust real-time compensation method based on recursive least squares (RLS). This method eliminates the need for dataset collection and frequent recalibration while adapting to changing operating conditions. Experimental validation demonstrates that the proposed approach achieves over 95% error reduction in non-contact force compensation and more than 91% in non-contact torque compensation, significantly outperforming existing methods. These results highlight the potential of our approach for providing reliable haptic feedback in robotic surgery training and research.
comment: Submitted to 2026 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)
An AI-Based Supervisory Measurement Integrity Validation Layer for Cyber-Resilient AC/DC Protection in Inverter-Based Microgrids
Line current differential relays (LCDRs) are measurement-driven relays that rely on time-synchronized multi-phase current waveforms to infer internal faults in AC and DC power networks. In inverter-based microgrids, however, the increasing reliance on digitally communicated measurements exposes LCDRs to false-data injection attacks (FDIAs), in which adversaries manipulate remote measurement streams to create protection-triggering yet physically inconsistent current trajectories. This paper addresses this emerging measurement integrity problem by introducing a measurement integrity validation scheme that operates as a supervisory instrumentation layer for modern LCDRs. The proposed scheme interprets short windows of synchronized instantaneous current measurements recorded during relay operation and assesses their physical consistency to distinguish genuine fault-induced trajectories from cyber-manipulated measurement streams. A recurrent neural network is trained offline using only relay-available current measurements and exploits the temporal structure of differential current waveforms, which remains informative in inverter-dominated systems where current magnitude is no longer a reliable observable. The method requires no additional sensors, auxiliary protection elements, or prior knowledge of network topology, and is applicable to both AC and DC LCDRs without structural modification. The proposed measurement validation scheme is evaluated on an islanded inverter-based microgrid under a comprehensive set of fault and FDIA scenarios, demonstrating high detection accuracy while preserving relay dependability. Hardware-in-the-loop validation using an OPAL-RT real-time simulator confirms that the scheme satisfies protection timing constraints and can operate in real time under realistic operating conditions.
Dynamic-Key Post-Quantum Encrypted Control Against System Identification Attacks
This study proposes post-quantum encrypted control systems based on dynamic-key Learning with Errors (LWE) encryption schemes. The proposed method develops update maps that simultaneously update the private key and ciphertexts within the LWE framework, enabling dynamic-key encrypted control resistant to system identification attacks. The growth of errors induced by homomorphic operations is analyzed, and sufficient parameter conditions guaranteeing correct decryption at each control step are clarified. Furthermore, a design procedure for the encrypted control systems is presented based on security metrics such as sample-identifying complexity and deciphering time. A numerical example demonstrates that the proposed control systems achieve secure control against the considered system identification attack.
comment: This work has been submitted to the IEEE for possible publication
Physics-Aware LLM-Based Probabilistic Wind Power Scenario Generation under Extreme Icing Conditions
Accurately characterizing wind power uncertainty under icing and post-disaster conditions remains a critical challenge for resilient power system operation. To address this issue, this paper proposes a physics-aware large language model (LLM) framework for probabilistic wind power scenario generation under extreme icing conditions. The proposed framework integrates supervisory control and data acquisition (SCADA)-based physical modeling, multimodal tokenization, and a causal Transformer architecture trained in an autoregressive manner. A physics-aware decoding scheme effectively enforces rated power limits and ramping constraints on the generated trajectories while preserving stochastic diversity. Case studies using real wind turbine data show that the proposed method reproduces icing-induced power degradation and temporal variability observed during extreme weather. The resulting scenarios are physically consistent and high-fidelity, thereby significantly enhancing resilience assessment and recovery planning in renewable-integrated power systems.
comment: 5 pages, 3 figures, 3 tables. Accepted by the 2026 IEEE Power & Energy Society General Meeting
Resource-Constrained Shortest Path with Polytopic Reset Sets
This paper investigates the problem of computing the shortest path between two states under resource constraints in environments with resource-replenishment regions. Namely, the length of the path is limited by a budget that can be restored within polytopic replenishment regions. We show that the optimal path in this problem exhibits a distinct geometric structure: it consists of straight-line segments, changes direction at replenishment regions, and visits regions at most once. We propose an approach to solve the continuous problem in two steps: using a graph-based approach, followed by convex programming. First, we define a graph whose nodes are possible waypoints of feasible paths, and the edges are the Euclidean distances between these nodes. To obtain a discrete set of nodes that ensure a feasible and near-optimal solution, we utilize a wavefront algorithm. With a sufficiently small spacing between wavefronts, the solution of the shortest path problem on this graph yields the optimal sequence of polytopes to visit. Next, we use convex optimization on this sequence of polytopes to find the exact optimal path. A numerical experiment is presented to demonstrate the effectiveness of the approach. This approach provides a framework for solving the resource-constrained shortest path with budget reset.
Bellman Residual Minimization for Control: Geometry, Stationarity, and Convergence
Markov decision problems are most commonly solved via dynamic programming. Another approach is Bellman residual minimization, which directly minimizes the squared Bellman residual objective function. However, compared to dynamic programming, this approach has received relatively less attention, mainly because it is often less efficient in practice and can be more difficult to extend to model-free settings such as reinforcement learning. Nonetheless, Bellman residual minimization has several advantages that make it worth investigating, such as more stable convergence with function approximation for value functions. While Bellman residual methods for policy evaluation have been widely studied, methods for policy optimization (control tasks) have been scarcely explored. In this paper, we establish foundational results for the control Bellman residual minimization for policy optimization.
An accelerated proximal bundle method for convex optimization
The proximal bundle method (PBM) is a powerful and widely used approach for minimizing nonsmooth convex functions. However, for smooth objectives, its best-known convergence rate remains suboptimal, and whether PBM can be accelerated remains open. In this work, we present the first accelerated proximal bundle method that achieves the optimal $\mathscr{O}(1/\sqrtε)$ iteration complexity for obtaining an $ε$-accurate solution in smooth convex optimization. The proposed method is conceptually simple, which differs from Nesterov's accelerated gradient descent by only a single line and retains all key structural properties of the classical PBM. In particular, it relies on the same minimal assumptions on model approximations and preserves the standard bundle testing criterion. Numerical experiments confirm the accelerated $\mathscr{O}(1/\sqrtε)$ convergence rate predicted by our theory.
comment: 25 pages, 1 figure
A Dynamic Mode Decomposition Approach to Parameter Identification
This paper presents a data-driven algorithm for simultaneous system identification and parameter estimation in control-affine nonlinear systems. Parameter estimation is achieved by training a data-driven predictive model using state-action measurements and various known values at the parameters of interest. The predictive model is then used in conjunction with state-action data corresponding to unknown values of the parameters to estimate the said unknown value. Numerical experiments on the controlled Duffing oscillator with unknown damping, stiffness, and nonlinearity coefficients demonstrate accurate recovery of both the system trajectories and the unknown parameter values from data collected under open-loop excitation.
comment: 7 pages, 4 figures. Submitted to the 2026 IEEE Conference on Decision and Control
Energy-Aware Routing to Large Reasoning Models
Large reasoning models (LRMs) have heterogeneous inference energy costs based on which model is used and how much it reasons. To reduce energy, it is important to choose the right LRM and operate it in the right way. As a result, the performance of systems that dispatch tasks to different individual LRMs depend on the balance between mean energy provisioning and stochastic fluctuations. The critical regime is the unique operating point at which neither auxiliary energy nor baseline energy is systematically wasted. Increasing baseline supply shifts the system toward persistent over-supply and baseline-energy waste, while reducing supply induces persistent reliance on auxiliary energy. Yet in this regime, performance remains volatility-limited and so a second-order characterization provides further insights that we develop. Here, performance is governed by how variability is absorbed across time, models, and execution choices. This perspective highlights variance-aware routing and dispatch as a principled design axis, and provides a theoretical basis for developing energy-aware model routing policies. Routing behavior is characterized when dispatch policies are based on training-compute and inference-compute scaling laws for LRMs.
Compiling OpenSCENARIO 2.1 for Scenario-Based Testing in CARLA
While the ASAM OpenSCENARIO 2.1 Domain-Specific Language (DSL) enables declarative, intent-driven authoring for Scenario-Based Testing (SBT), its integration into open-source simulators like CARLA remains limited by legacy parsers. We propose a multi-pass modern compiler architecture that translates the OpenSCENARIO 2.1 DSL directly into executable CARLA behaviors. The pipeline features an ANTLR4 frontend for Abstract Syntax Tree (AST) generation, a semantic middle-end, and a runtime backend that synthesizes deterministic py_trees behavior trees. Mapping the standardized domain ontology directly to CARLA's procedural API via a custom method registry eliminates the need for external logic solvers. A demonstrative multi-actor cut-in and evasive maneuver, selected from a wider suite of validated scenarios, confirms the compiler's ability to process concurrent actions, dynamic mathematical expressions, and asynchronous signaling. This framework establishes a functional baseline for reproducible, large-scale SBT, paving the way for future C++ optimizations to mitigate current Python-based computational overhead.
Bilinear Input Modulation for Mamba: Koopman Bilinear Forms for Memory Retention and Multiplicative Computation
Selective State Space Models (SSMs), notably Mamba, employ diagonal state transitions that limit both memory retention and bilinear computational capacity. We propose a factorized bilinear input modulation that augments the SSM with a state-input product, interpretable as a finite-dimensional Koopman bilinear form. After introducing a shared state across channels (Coupled SSM), the modulation admits three implementations. Coupled Bilinear Input Modulation (seq-BIM) retains the full bilinear product on the input side at the cost of sequential computation, Coupled Gated Modulation (GM) linearizes it into a gate modulation that is compatible with the parallel scan, and Parallel Bilinear Input Modulation (p-BIM) places the same bilinear product on the state transition while remaining parallel-scannable. Experiments on a multiple input-delay pendulum (memory retention) and NARMA-10 (bilinear computation) reveal a clear dissociation. GM substantially improves memory retention but not bilinear computation, while both seq-BIM and p-BIM improve both. A pathway ablation confirms that the two downstream routes of the bilinear signal serve complementary roles. The improvement is statistically robust, with the bilinear variants consistently outperforming the other variants on bilinear computation. Furthermore, only the bilinear variants benefit from increasing the SSM state dimension, while coupling or gate modulation alone show no improvement, establishing the bilinear mechanism as uniquely capable of exploiting larger state spaces.
comment: 7 pages, 5 figures, extended version of the article submitted to IEEE Control Systems Letters (L-CSS)
Safety-aware Goal-oriented Semantic Sensing, Communication, and Control for Robotics
Wirelessly-connected robotic systems empower robots with real-time intelligence by leveraging remote computing resources for decision-making. However, the data exchange between robots and edge servers often overwhelms communication links, introducing latency that degrades task performance. To tackle this, goal-oriented semantic communication (GSC) has been introduced for wirelessly-connected robotic systems to extract and transmit only goal-relevant semantic representations. While this improves task effectiveness, it generally overlooks practical safety requirements. Meanwhile, existing robotics research often treats safety primarily as a control-level problem, without systematically considering safety across sensing, communication, and control in a closed-loop manner. To bridge this gap, we investigate how to enable safety-aware goal-oriented semantic (SA-GS) sensing, communication, and control co-design in wirelessly-connected robotic systems, aiming to maximize the robotic task effectiveness subject to practical safety requirements. We first introduce {an} architecture {for} wirelessly-connected robotic systems and representative use cases. We then summarize general safety requirements and effectiveness metrics across the use cases. Next, we systematically analyze the unique safety and effectiveness challenges in sensing, communication, and control. Based on these, we further present potential SA-GS research directions. Finally, an Unmanned Aerial Vehicle (UAV) target tracking case study validates that one of the presented SA-GS research directions, i.e., semantic-based C\&C packet execution, could significantly improve safety rate and tracking success rate by more than 2 times and 4.5 times, respectively.
comment: 7 pages. This paper has been submitted to the IEEE Wireless Communications Magazine
DustNet: A Wireless Network of Ultrasonic Neural Implants
Spatially distributed peripheral nerve recordings can be used to reconstruct motor intention and improve natural control of prosthetics However, many existing clinical solutions rely on percutaneous wires to access peripheral nerves; these sites are prone to infection and motion-induced electrode degradation, preventing chronic use. To address the need for fully wireless neural recording systems, this paper presents DustNet: a spatially-distributed network of ultrasonically-powered neural recording implants capable of supporting up to 8 simultaneously recording nodes over a single ultrasound link. To enable high throughput multi-implant communication, DustNet implements a time-division multiple-access (TDMA) protocol with up to 16-level amplitude modulation of the ultrasound backscatter that achieves up to 4x higher data rates than traditional on-off keying methods. Each neural implant consists of a 0.7x0.7x0.7 mm$^3$ piezoceramic transducer, a 100 nF off-chip capacitor, and an IC mounted on a flexible PCB. The implant IC was fabricated in a 28nm CMOS process and occupies an area of 0.43 mm$^2$. System functionality was verified at 90mm depth in oil, achieving a maximum measured data rate of 200 kb/s at 2 MHz ultrasound carrier frequency, with each implant transmitting uplink data at 50 kb/s and dissipating just 7 $μ$W; the system is demonstrated to support up to 400 kb/s total data rate over the same link.
HardFlow: Hard-Constrained Sampling for Flow-Matching Models via Trajectory Optimization
Diffusion and flow-matching have emerged as powerful methodologies for generative modeling, with remarkable success in capturing complex data distributions and enabling flexible guidance at inference time. Many downstream applications, however, demand enforcing hard constraints on generated samples (for example, robot trajectories must avoid obstacles), a requirement that goes beyond simple guidance. Prevailing projection-based approaches constrain the entire sampling path to the constraint manifold, which is overly restrictive and degrades sample quality. In this paper, we introduce a novel framework that reformulates hard-constrained sampling as a trajectory optimization problem. Our key insight is to leverage numerical optimal control to steer the sampling trajectory so that constraints are satisfied precisely at the terminal time. By exploiting the underlying structure of flow-matching models and adopting techniques from model predictive control, we transform this otherwise complex constrained optimization problem into a tractable surrogate that can be solved efficiently and effectively. Furthermore, this trajectory optimization perspective offers significant flexibility beyond mere constraint satisfaction, allowing for the inclusion of integral costs to minimize distribution shift and terminal objectives to further enhance sample quality, all within a unified framework. We provide a control-theoretic analysis of our method, establishing bounds on the approximation error between our tractable surrogate and the ideal formulation. Extensive experiments across diverse domains, including robotics (planning), partial differential equations (boundary control), and vision (text-guided image editing), demonstrate that our algorithm, which we name $\textit{HardFlow}$, substantially outperforms existing methods in both constraint satisfaction and sample quality.
Robotics
Otherness as a Quality in Designing Expressive Robotic Touch
Haptic technologies have advanced rapidly, yet exploration of robotic touch remains dominated by replicating realistic environmental cues or hand gestures, which narrows the design space and risks social resistance. This paper argues for alternatives: grounded in the notion of "otherness" from human-robot interaction (HRI), we propose treating robotic touch's inherent otherness as a design quality. Instead of being a limitation in pursuing realism, otherness can be embraced to elicit ambiguity and provoke alternative interpretations, fostering expressive and evocative robotic touch design. To develop this perspective, we analyze inspirational art and design precedents and four design research cases through a reflective Research through Design (RtD) approach. Through this analysis, we articulate a set of design languages structured around why otherness matters for touch meaning-making, how it can be shaped through design strategies, and where it can be embedded within robotic touch systems. We conclude by reflecting on the tensions and risks involved in designing robotic touch with otherness in mind.
comment: Full paper accepted to 2026 ACM Designing Interactive Systems Conference (DIS '26)
Keypoint-based Dynamic Object 6-DoF Pose Tracking via Event Camera ICRA 2026
Accurate 6-DoF pose estimation of objects is critical for robots to perform precise manipulation tasks. However, for dynamic object pose estimation, conventional camera-based approaches face several major challenges, such as motion blur, sensor noise, and low-light limitation. To address these issues, we employ event cameras, whose high dynamic range and low latency offer a promising solution. Furthermore, we propose a keypoint-based detection and tracking approach for dynamic object pose estimation. Firstly, a keypoint detection network is constructed to extract keypoints from the time surface generated by the event stream. Subsequently, the polarity and spatial coordinates of the events are leveraged, and the event density in the vicinity of each keypoint is utilized to achieve continuous keypoint tracking. Finally, a hash mapping is established between the 2D keypoints and the 3D model keypoints, and the EPnP algorithm is employed to estimate the 6-DoF pose. Experimental results demonstrate that, whether in simulated or real event environments, the proposed method outperforms the event-based state-of-the-art methods in terms of both accuracy and robustness.
comment: Accepted to 2026 IEEE International Conference on Robotics and Automation (ICRA 2026)
Learning from Demonstration with Failure Awareness for Safe Robot Navigation
Learning from demonstration is widely used for robot navigation, yet it suffers from a fundamental limitation: demonstrations consist predominantly of successful behaviors and provide limited coverage of unsafe states. This limitation leads to poor safety when the robot encounters scenarios beyond the demonstration distribution. Failure experiences, such as collisions, contain essential information about unsafe regions, but remain underutilized. The key difficulty lies in the fact that failure data do not provide valid guidance for action imitation, and their naive incorporation into policy learning often degrades performance. We address this challenge by proposing a failure-aware learning framework that explicitly decouples the roles of success and failure data. In this framework, failure experiences are used to shape value estimation in hazardous regions, while policy learning is restricted to successful demonstrations. This separation enables the effective use of failure data without corrupting policy behavior. We implement this design within an offline reinforcement learning (RL) setting and evaluate it in both simulation and real-world environments. The results show that our framework consistently reduces collision rates while preserving the task success rate, and demonstrate strong generalization across different environments and robot platforms.
An Efficient Beam Search Algorithm for Active Perception in Mobile Robotics
Active perception is a fundamental problem in autonomous robotics in which the robot must decide where to move and what to sense in order to obtain the most informative observations for accomplishing its mission. Existing approaches either solve a computationally expensive traveling salesman problem over heuristically selected informative nodes, or adopt a more efficient but overly constrained shortest path tree formulation. To address these limitations, we explore beam search algorithms as scalable alternatives. While the standard beam search provides scalability by preserving the top-B paths at each depth level, it is prone to local optima and exhibits parameter sensitivity. Our first contribution is a node-wise beam search (NBS) algorithm, which maintains top-B candidates per node to enable more effective exploration of the solution space. Systematic benchmarking on graphs shows that NBS consistently outperforms other baselines and maintains strong performance even at low beam widths. As a second contribution, we integrate the concept of frontiers into the path selection criterion, introducing the expected gain metric, which better balances exploration and exploitation compared to existing alternatives. Our third contribution proposes the rapidly-exploring random annulus graph (RRAG), a novel graph construction method that preserves full orientation sampling and ensures connectivity in cluttered environments through a fallback local sampling-based planner. Extensive experiments demonstrate that NBS combined with RRAG achieves the highest performance across all three representative active perception tasks, outperforming state-of-the-art algorithms by at least 20% in one or more tasks. We further validate the approach on real robotic platforms in different scenarios.
comment: Accepted to The International Journal of Robotics Research (IJRR). Project page: https://efficient-beam-search.github.io/
Modular Sensory Stream for Integrating Physical Feedback in Vision-Language-Action Models
Humans understand and interact with the real world by relying on diverse physical feedback beyond visual perception. Motivated by this, recent approaches attempt to incorporate physical sensory signals into Vision-Language-Action models (VLAs). However, they typically focus on a single type of physical signal, failing to capture the heterogeneous and complementary nature of real-world interactions. In this paper, we propose MoSS, a modular sensory stream framework that adapts VLAs to leverage multiple sensory signals for action prediction. Specifically, we introduce decoupled modality streams that integrate heterogeneous physical signals into the action stream via joint cross-modal self-attention. To enable stable incorporation of new modalities, we adopt a two-stage training scheme that freezes pretrained VLA parameters in the early stage. Furthermore, to better capture contact interaction dynamics, we incorporate an auxiliary task that predicts future physical signals. Through extensive real-world experiments, we demonstrate that MoSS successfully augments VLAs to leverage diverse physical signals (i.e., tactile and torque), integrating multiple signals to achieve synergistic performance gains.
comment: 14 pages, 8 figures, Project page: https://jiminlx.github.io/MoSS
BridgeACT: Bridging Human Demonstrations to Robot Actions via Unified Tool-Target Affordances
Learning robot manipulation from human videos is appealing due to the scale and diversity of human demonstrations, but transferring such demonstrations to executable robot behavior remains challenging. Prior work either relies on robot data for downstream adaptation or learns affordance representations that remain at the perception level and do not directly support real-world execution. We present BridgeACT, an affordance-driven framework that learns robotic manipulation directly from human videos without requiring any robot demonstration data. Our key idea is to model affordance as an embodiment-agnostic intermediate representation that bridges human demonstrations and robot actions. BridgeACT decomposes manipulation into two complementary problems: where to grasp and how to move. To this end, BridgeACT first grounds task-relevant affordance regions in the current scene, and then predicts task-conditioned 3D motion affordances from human demonstrations. The resulting affordances are mapped to robot actions through a grasping module and a lightweight closed-loop motion controller, enabling direct deployment on real robots. In addition, we represent complex manipulation tasks as compositions of affordance operations, which allows a unified treatment of diverse tasks and object-to-object interactions. Experiments on real-world manipulation tasks show that BridgeACT outperforms prior baselines and generalizes to unseen objects, scenes, and viewpoints.
Cooperative Informative Sensing for Monitoring Dynamic Indoor Environments via Multi-Agent Reinforcement Learning
Monitoring human activity in indoor environments is important for applications such as facility management, safety assessment, and space utilization analysis. While mobile robot teams offer the potential to actively improve observation quality, existing multi-robot monitoring and active perception approaches typically rely on coverage or visitation based objectives that are weakly aligned with the accuracy requirements of human-centric monitoring tasks. In this work, we formulate cooperative active observation as a decentralized control problem in which multiple robots adjust their motion to directly optimize monitoring accuracy under partial observability. We propose a learning-based framework for cooperative policies from decentralized observations using multi-agent reinforcement learning (MARL), supported by an architecture that handles variable numbers of humans and temporal dependencies. Simulation results across diverse indoor environments and monitoring tasks show that the proposed approach consistently outperforms classical coverage, persistent monitoring, and learning-free multi-robot baselines, while remaining robust to changes in the number of observed humans.
comment: 8 pages, 10 figures, 2 tables
Breaking Lock-In: Preserving Steerability under Low-Data VLA Post-Training
Have you ever post-trained a generalist vision-language-action (VLA) policy on a small demonstration dataset, only to find that it stops responding to new instructions and is limited to behaviors observed during post-training? We identify this phenomenon as lock-in: after low-data, supervised fine-tuning (SFT), the policy becomes overly specialized to the post-training data and fails to generalize to novel instructions, manifesting as concept lock-in (fixation on training objects/attributes) and spatial lock-in (fixation on training spatial targets). Many existing remedies introduce additional supervision signals, such as those derived from foundation models or auxiliary objectives, or rely on augmented datasets to recover generalization. In this paper, we show that the policy's internal pre-trained knowledge is sufficient: DeLock mitigates lock-in by preserving visual grounding during post-training and applying test-time contrastive prompt guidance to steer the policy's denoising dynamics according to novel instructions. Across eight simulation and real-world evaluations, DeLock consistently outperforms strong baselines and matches or exceeds the performance of a state-of-the-art generalist policy post-trained with substantially more curated demonstrations.
Risk-Aware Rulebooks for Multi-Objective Trajectory Evaluation under Uncertainty
We present a risk-aware formalism for evaluating system trajectories in the presence of uncertain interactions between the system and its environment. The proposed formalism supports reasoning under uncertainty and systematically handles complex relationships among requirements and objectives, including hierarchical priorities and non-comparability. Rather than treating the environment as exogenous noise, we explicitly model how each system trajectory influences the environment and evaluate trajectories under the resulting distribution of environment responses. We prove that the formalism induces a preorder on the set of system trajectories, ensuring consistency and preventing cyclic preferences. Finally, we illustrate the approach with an autonomous driving example that demonstrates how the formalism enhances explainability by clarifying the rationale behind trajectory selection.
DART: Learning-Enhanced Model Predictive Control for Dual-Arm Non-Prehensile Manipulation
What appears effortless to a human waiter remains a major challenge for robots. Manipulating objects nonprehensilely on a tray is inherently difficult, and the complexity is amplified in dual-arm settings. Such tasks are highly relevant to service robotics in domains such as hotels and hospitality, where robots must transport and reposition diverse objects with precision. We present DART, a novel dual-arm framework that integrates nonlinear Model Predictive Control (MPC) with an optimization-based impedance controller to achieve accurate object motion relative to a dynamically controlled tray. The framework systematically evaluates three complementary strategies for modeling tray-object dynamics as the state transition function within our MPC formulation: (i) a physics-based analytical model, (ii) an online regression based identification model that adapts in real-time, and (iii) a reinforcement learning-based dynamics model that generalizes across object properties. Our pipeline is validated in simulation with objects of varying mass, geometry, and friction coefficients. Extensive evaluations highlight the trade-offs among the three modeling strategies in terms of settling time, steady-state error, control effort, and generalization across objects. To the best of our knowledge, DART constitutes the first framework for non-prehensile dual-arm manipulation of objects on a tray. Project Link: https://dart-icra.github.io/dart/
Multiagent Systems
Architecture Matters for Multi-Agent Security
Multi-agent systems (MAS), composed of networks of two or more autonomous AI agents, have become increasingly popular in production deployments, yet introduce security risks that do not arise in single-agent settings. Even if individual agents exhibit robust security, architectural decisions governing their coordination can create attack surfaces that have not been systematically characterized. In this work, we present an empirical study of how MAS design decisions shape the tradeoff between task performance and attack resistance. Across three agentic environments (browser, desktop, and code) and 13 architectural configurations, we use stagewise evaluations that distinguish planning refusal, execution-stage interception, partial harmful execution, and successful attack completion to study three key design choices: (i) agent roles, which determine how authority and responsibility are allocated; (ii) communication topology, which shapes how and when agents interact; and (iii) memory, which determines the context and state visibility accessible to each agent. We find that multi-agent architectures are more vulnerable than standalone agents in the majority of configurations, with attack success rates varying by up to 3.8x at comparable or higher benign accuracy, and that no single design is universally safer. These results motivate the development of further evaluations that move beyond the security properties of a single agent.
GSAR: Typed Grounding for Hallucination Detection and Recovery in Multi-Agent LLMs
Autonomous multi-agent LLM systems are increasingly deployed to investigate operational incidents and produce structured diagnostic reports. Their trustworthiness hinges on whether each claim is grounded in observed evidence rather than model-internal inference. Existing groundedness evaluators (binary classifiers, LLM-as-judge scalars, self-correction loops) treat supporting evidence as interchangeable and emit a single signal that offers no principled control over downstream action. We present GSAR, a grounding-evaluation and replanning framework that (i) partitions claims into a four-way typology (grounded, ungrounded, contradicted, complementary), giving first-class standing to non-redundant alternative perspectives; (ii) assigns evidence-type-specific weights reflecting epistemic strength; (iii) computes an asymmetric contradiction-penalised weighted groundedness score; and (iv) couples that score to a three-tier decision function (proceed, regenerate, replan) driving a bounded-iteration outer loop under an explicit compute budget. We formalise the algorithm, prove six structural properties, and evaluate five design claims on FEVER with gold Wikipedia evidence under four independently-trained LLM judges (gpt-5.4, claude-sonnet-4-6, claude-opus-4-7, gemini-2.5-pro). Every ablation reproduces in the same direction on every judge: bootstrap 95% CIs on the rho=0 effect exclude 0 on all four; the no-complementary ablation under Opus 4.7 has CI [-96,-68] of 200; at n=1000 three independent judges converge to DeltaS(rho=0)=+0.058. A head-to-head against Vectara HHEM-2.1-Open is included. To our knowledge, GSAR is the first published groundedness framework coupling evidence-typed scoring with tiered recovery under an explicit compute budget.
Proteus: Shapeshifting Desktop Visualizations for Mobile via Multi-level Intelligent Adaptation
With the rise of mobile-first consumption, users increasingly engage with data visualizations on mobile devices. However, the vast majority of existing visualizations are originally authored for desktop environments. Due to significant differences in viewport size and interaction paradigms, directly scaling desktop charts often results in illegible text, information loss, and interaction failures. To bridge this gap, we propose an automated framework to adapt desktop-based visualizations for mobile screens. By systematically categorizing the operations involved in the adaptation process, we establish a multi-level design space. This space defines evolution rules spanning from the global topology level, through the reference frame level, down to the visual elements level. Guided by this theoretical framework, we developed Proteus, a large language model-driven multi-agent system that automatically parses online visualizations, predicts optimal transformation strategies within the design space, and generates equivalent, highly readable visualizations for mobile devices. Case studies and an in-depth user study with 12 participants demonstrate the effectiveness and usability of Proteus.
comment: accepted by ACM Designing Interactive Systems Conference
Cooperative Informative Sensing for Monitoring Dynamic Indoor Environments via Multi-Agent Reinforcement Learning
Monitoring human activity in indoor environments is important for applications such as facility management, safety assessment, and space utilization analysis. While mobile robot teams offer the potential to actively improve observation quality, existing multi-robot monitoring and active perception approaches typically rely on coverage or visitation based objectives that are weakly aligned with the accuracy requirements of human-centric monitoring tasks. In this work, we formulate cooperative active observation as a decentralized control problem in which multiple robots adjust their motion to directly optimize monitoring accuracy under partial observability. We propose a learning-based framework for cooperative policies from decentralized observations using multi-agent reinforcement learning (MARL), supported by an architecture that handles variable numbers of humans and temporal dependencies. Simulation results across diverse indoor environments and monitoring tasks show that the proposed approach consistently outperforms classical coverage, persistent monitoring, and learning-free multi-robot baselines, while remaining robust to changes in the number of observed humans.
comment: 8 pages, 10 figures, 2 tables
MindTrellis: Co-Creating Knowledge Structures with AI through Interactive Visual Exploration
Knowledge workers face increasing challenges in synthesizing information from multiple documents into structured conceptual understanding. This process is inherently iterative: users explore content, identify relationships between concepts, and continuously reorganize their mental models. However, current approaches offer limited support. LLM-based systems let users query information but not shape how knowledge is organized; manual tools like mind maps support structure creation but lack intelligent assistance. This leaves an open opportunity: supporting collaborative construction where users and AI jointly develop an evolving knowledge representation. We present MindTrellis, an interactive visual system where users and AI collaboratively build a dynamic knowledge graph. Users can query the graph to retrieve document-grounded information, and contribute by introducing new concepts, modifying relationships, and reorganizing the hierarchy to reflect their developing understanding. In a user study where 12 participants created slide decks, MindTrellis outperformed retrieval-only baselines in knowledge organization and cognitive load, as measured by expert ratings of content coverage and structural quality.
comment: 21 pages, 7 figures, ACM Designing Interactive Systems. DIS 2026
No Test Cases, No Problem: Distillation-Driven Code Generation for Scientific Workflows
Existing multi-agent Large Language Model (LLM) frameworks for code generation typically use execution feedback and improve iteratively using Input/Output (I/O) test cases. However, this does not work for scientific workflows, where I/O test cases do not exist, and generating them requires solving the very problem at hand. To address this, we introduce MOSAIC, a training-free multi-agent framework for scientific code generation without I/O supervision. Instead of execution feedback, MOSAIC employs a student-teacher knowledge distillation framework that grounds generation through domain-specific examples and structured problem decomposition. To further mitigate hallucinations across chained subproblems, we introduce a Consolidated Context Window (CCW) for maintaining consistent reasoning across agents. Experiments on the SciCode benchmark show that MOSAIC improves accuracy, executability, and numerical precision over existing approaches while relying on lightweight models.
Usable Agent Discovery for Decentralized AI Systems
Large-scale agentic systems run on distributed infrastructures where many software agents share physical hosts and are discovered via peer-to-peer mechanisms. Discovery must handle node-level churn from failures and host departures and agent-level churn from demand-driven activation, deactivation, and state changes. Their interaction reshapes classic trade-offs between structured and unstructured overlays. We study decentralized agent discovery under this two-level churn, assuming nodes host multiple agents, overlays are structured or gossip-based, and agents switch between warm and cold states. Using Kademlia as a structured and Cyclon+Vicinity as a gossip baseline, we compare stable, node-churn-only, agent-cooling-only, and combined regimes to see when routing efficiency, resilience, and service readiness align or favor different designs. Structured overlays are more robust and efficient in stable and node-churn regimes, while gossip-based overlays remain competitive and can be faster when readiness dominates.
Scheming Ability in LLM-to-LLM Strategic Interactions
As large language model (LLM) agents are deployed autonomously in diverse contexts, evaluating their capacity for strategic deception becomes crucial. While recent research has examined how AI systems scheme against human developers, LLM-to-LLM scheming remains underexplored. We investigate the scheming ability and propensity of frontier LLM agents through two game-theoretic frameworks: a Cheap Talk signaling game and a Peer Evaluation adversarial game. Testing four models (GPT-4o, Gemini-2.5-pro, Claude-3.7-Sonnet, and Llama-3.3-70b), we measure scheming performance with and without explicit prompting while analyzing scheming tactics through chain-of-thought reasoning. When prompted, most models, especially Gemini-2.5-pro and Claude-3.7-Sonnet, achieved near-perfect performance. Critically, models exhibited significant scheming propensity without prompting: all models chose deception over confession in Peer Evaluation (100% rate), while models choosing to scheme in Cheap Talk succeeded at 95-100% rates. These findings highlight the need for robust evaluations using high-stakes game-theoretic scenarios in multi-agent settings.
comment: 20 pages, 13 figures
Human-AI Governance (HAIG): A Trust-Utility Approach
This paper introduces the Human-AI Governance (HAIG) framework, contributing to the AI Governance (AIG) field by foregrounding the relational dynamics between human and AI actors rather than treating AI systems as objects of governance alone. Current categorical frameworks (e.g., human-in-the-loop models) inadequately capture how AI systems evolve from tools to partners, particularly as foundation models demonstrate emergent capabilities and multi-agent systems exhibit autonomous goal-setting behaviours. As systems are deployed across contexts, agency redistributes in complex patterns that are better represented as positions along continua rather than discrete categories. The HAIG framework operates across three levels: dimensions (Decision Authority, Process Autonomy, and Accountability Configuration), continua (continuous positional spectra along each dimension), and thresholds (critical points along the continua where governance requirements shift qualitatively). The framework's dimensional architecture is level-agnostic, applicable from individual deployment decisions and organisational governance through to sectorial comparison and national and international regulatory design. Unlike risk-based or principle-based approaches that treat governance primarily as a constraint on AI deployment, HAIG adopts a trust-utility orientation - reframing governance as the condition under which human-AI collaboration can realise its potential, calibrating oversight to specific relational contexts rather than predetermined categories. Case studies in healthcare and European regulation demonstrate how HAIG complements existing frameworks while offering a foundation for adaptive regulatory design that anticipates governance challenges before they emerge.
comment: 35 pages including references and appendix, 28 pages core text, 3 figures, 3 tables
TeachMaster: Generative Teaching via Code ACL 2026
The scalability of high-quality online education is hindered by the high costs and slow cycles of manual content creation. Despite advancements in video generation, current approaches often fail to ensure pedagogical structure and precise control due to their pixel-level, black-box nature. In this paper, we propose Generative Teaching, a novel paradigm shifting educators from manual creators to high-level directors who focus on pedagogical intents while agents handle the execution. To realize this vision, we introduce TeachMaster, a multi-agent framework that leverages code as an intermediate semantic medium. Unlike traditional video generation methods, TeachMaster orchestrates a collaborative team of agents, spanning planning, design, and rendering, to automate the production of interpretable, editable, and curriculum-ready educational videos. Experiments validate that TeachMaster significantly boosts production efficiency without compromising structural coherence or visual fidelity, slashing production costs to only 0.3% of traditional online course videos and providing a robust solution for scalable education.
comment: Accepted to ACL 2026; https://www.teachmaster.cn/
Reasonably reasoning AI agents can avoid game-theoretic failures in zero-shot, provably
As autonomous AI agents increasingly mediate online platform markets, a fundamental question emerges: do these markets generate stable strategic outcomes? In repeated strategic environments, the Nash equilibrium provides a natural benchmark for this stability. However, empirical evidence on off-the-shelf LLM agents is mixed, leaving it unclear whether independently deployed agents can converge to equilibrium behavior without explicit strategic post-training. In this paper, we provide an affirmative answer. Extending the Bayesian learning literature in theoretical economics, we prove that AI agents, acting as Bayesian posterior samplers rather than expected utility maximizers, are guaranteed to eventually become weakly close to a Nash equilibrium in infinitely repeated games. We further extend this analysis to settings in which stage payoffs are unknown ex ante, and agents observe only their privately realized stochastic payoffs, and obtain the same convergence guarantees. Finally, we empirically evaluate these theoretical implications across five repeated-game environments, ranging from the Prisoner's Dilemma to marketing promotion games. Taken together, our findings suggest that strategic stability in AI-mediated markets can emerge from the intrinsic reasoning and learning properties of modern AI agents, without the need for unrealistic universal fine-tuning.
Systems and Control (EESS)
Private and Common Information States in Decentralized Parallel Dynamic Programming for Delayed Sharing Patterns
This paper develops a dynamic programming (DP) approach for decentralized stochastic optimal control problems with delayed sharing information patterns, which exhibits the fundamental Properties of classical DP of centralized partially observable Markov decision problems (POMDPs): the value functions and information states depend on the actions of the minimizing controls and not their strategies. This is achieved by invoking the concept of Person-by-Person (PbP) optimality, in which each control strategy is associated with a value function conditioned on its assigned delayed sharing information pattern, when all other strategies are fixed to their optimal responses. The value functions satisfy generalized and simplified DP equations. These are used to derive necessary and sufficient conditions for PbP optimality. The simplified DP equations are obtained by invoking the structural property that optimal strategies are separated and functionals of two information states: 1) a private a posteriori probability distribution based on the information pattern of the strategy, and 2) a centralized a posteriori probability distribution based on the shared or common information to all strategies, each satisfying a Markov recursion. The DP approach of this paper, settles a long standing open problem since the appearance of T-step delayed sharing patterns in [1, Section IV.G], in terms of generalizing the fundamental properties of classical DP approach.
Nonlinear Non-Gaussian Density Steering with Input and Noise Channel Mismatch: Sinkhorn with Memory for Solving the Control-affine Schrödinger Bridge Problem
Solutions to the Schrödinger bridge problem and its generalizations yield feedback control policies for optimal density steering over a controlled diffusion. To numerically compute the same, the dynamic Sinkhorn recursion has become a standard approach. The mathematical engine behind this approach is the Hopf-Cole transform that recasts the conditions for optimality into a system of boundary-coupled linear PDEs. Recent works pointed out that for the control-affine Schrödinger bridge problem, this exact linearity via Hopf-Cole transform, and thus the standard Sinkhorn recursion, apply only if the control and noise channels are proportional. When the channels do not match, the Hopf-Cole-transformed PDEs remain nonlinear, and no algorithm is available to solve the same. We advance the state-of-the-art by designing a Sinkhorn recursion with memory that leverages the structure of these nonlinear PDEs, and demonstrate how it solves the control-affine Schrödinger bridge problem with input and noise channel mismatch. We prove the local stability of the proposed algorithm.
Multiplicative Contractions, Additive Recoveries: Functional-Form Restrictions on Risk Exposure Dynamics
We test a regime-conditional functional-form restriction on aggregate risk-exposure dynamics implied by VaR-constrained intermediary models: exposures contract multiplicatively when capital constraints bind and grow additively (level-independent) when slack. The contraction half follows from binding VaR constraints (Brunnermeier and Pedersen 2009; Adrian and Shin 2010; He and Krishnamurthy 2013). The additive-rebuild prediction is derived under constant-rate capital replenishment; we test the joint restriction on FINRA monthly margin debt (1997-2026). Two findings. First, regime-interacted regression of detrended margin growth on lagged level (T=350 months) yields calm slope -0.040 (p=0.082, additive) and stress slope -0.205 (p<0.001, multiplicative); Wald test on regime x level interaction rejects equal dependence (p=0.0016). Second, the restriction implies drawdown-recovery duration ratio increases with crash depth. On 73 S&P 500 episodes (1950-2026), Cox model gives depth coefficient -13.75 (p<10^{-7}): 75% lower recovery hazard per 10pp deeper drawdown. Continuous-depth regression yields beta=1.22 (p=0.047); beta=1.59 (p<0.001) excluding 1980-82 Volcker. Median duration ratio for crashes >30% is 3.1x; replicates across eight other equity indices. Calibrated Heston, Markov-switching, and block bootstrap nulls match price-level duration asymmetry but lack an exposure state variable, so cannot speak to the regime-conditional flip on direct exposures. We do not claim the exposure test identifies the intermediary mechanism: FINRA margin debt is a noisy proxy. We claim only that the regime-conditional functional form is a sharper target than return-level moments alone, and confirming it on margin debt is consistent with -- not proof of -- the constrained-intermediary mechanism. A companion test on CFTC weekly speculative positioning is left for future work (Sections 5.2 and F).
Robust Operation of Distribution Networks: Generalized Uncertainty Modelling in Confidence-Level-Based Information Gap Decision
This paper studies the robust optimal operation of distribution networks (DNs) under renewable generation and load demand uncertainties, seeking an improved trade-off between robustness and economic performance. Building upon information gap decision theory (IGDT), a generalized uncertainty modelling is proposed to enhance the expressiveness of the uncertainty characterization. The proposed modelling captures both symmetric and asymmetric uncertainty features, and supports linear or nonlinear expansion of the uncertainty sets driven by confidence level. This advancement leads to the development of a confidence-level-based IGDT (CL-IGDT) framework for DN operation. To solve the resulting model, its equivalence to a family of two-stage robust optimization problems (TSROs) is established, enabling a Fibonacci search over the confidence level. To further improve computational efficiency, a cut-recycling strategy is proposed to exploit invariant information across TSROs. These techniques are integrated into a novel Fibonacci-Parametric Column-and-Constraint Generation algorithm with guaranteed asymptotic convergence. Case studies validate the effectiveness of the proposed framework and demonstrate the performance advantages of the proposed algorithm.
sumoITScontrol: Traffic Controller Collection for SUMO Traffic Simulations
Reliable benchmarking is essential for progress in intelligent traffic control research. While microscopic traffic simulators such as SUMO enable detailed modelling of individual vehicle interactions, many published control studies still rely on single-run evaluations and project-specific baseline implementations, limiting reproducibility and comparability. This paper presents sumoITScontrol, an open-source and extensible Python framework providing a curated collection of widely used traffic controllers implemented for SUMO via the TraCI interface. The framework includes established methods for both urban and freeway traffic management, such as Max Pressure signal control, SCOOT/SCATS-inspired adaptive strategies, and ramp metering algorithms including ALINEA, HERO, and METALINE. Beyond providing implementations, the paper emphasises methodological best-practices for controller evaluation in stochastic microscopic environments. Through systematic calibration and replicated simulation experiments, we demonstrate the substantial impact of stochastic variability on performance metrics and highlight the necessity of variance-aware reporting and statistical hypothesis testing. By combining standardised controller implementations with reproducibility-oriented evaluation guidelines, sumoITScontrol aims to improve methodological transparency, enable fair benchmarking of novel approaches, and strengthen experimental standards within the SUMO and intelligent transportation systems research communities. Source Code on project's GitHub page: https://github.com/DerKevinRiehl/sumoITScontrol/.
Operationalising Information Security Management: A Procedural Framework Analysis of ISO/IEC 27001:2022 Implementation in a Financial-Technology Organisation
Organisations operating within information-intensive environments face intensifying pressure to formalise the governance of information security. The ISO/IEC 27001:2022 standard provides a globally recognised framework for establishing, implementing, maintaining, and continually improving an Information Security Management System (ISMS). This article analyses the procedural architecture deployed in a financial-technology organisation's ISMS, examining eight core operational procedures: IT Risk Assessment and Treatment, User Code of Conduct, Password Policy, Access Control, Internet Access, Physical Security, Backup and Restore Management, and Nonconformity Root Cause Analysis and Corrective Action. Drawing on documented internal training materials, the article investigates how each procedure operationalises the requirements of Annex~A controls and Clauses~6--10 of ISO~27001:2022. The paper evaluates the CIA Triad as a unifying evaluation criterion, the twelve-step risk assessment methodology, role-based responsibility allocation, and the interplay between corrective action governance and continual improvement. The findings suggest that a tightly integrated, multi-layered procedural hierarchy, supported by clear accountability structures and measurable risk metrics, constitutes the foundation of an effective ISMS implementation in financial-technology operating environments.
comment: 9 pages, 3 tables
A Low-rank ADI Algorithm for Solving Large-scale Non-symmetric Algebraic Riccati Equations
This paper considers large-scale nonsymmetric continuous-time algebraic Riccati equations (NAREs) that admit low-rank solutions. Low-rank alternating direction implicit (ADI) methods have proven to be an efficient approach for solving several matrix equations, including Lyapunov equations, Sylvester equations, and symmetric Riccati equations. Although a low-rank algorithm for the Sylvester equation has been used as an inner loop in computing low-rank solutions of NAREs, no low-rank ADI algorithm currently exists for NAREs themselves. This paper fills this gap by developing a low-rank ADI algorithm for large-scale NAREs that admit a low-rank solution. Since Lyapunov equations, Sylvester equations, and symmetric Riccati equations are special cases of the NARE, the existing low-rank ADI methods in the literature are special cases of the more general low-rank ADI method proposed here. An automatic and computationally efficient method for shift generation is also discussed, and a subspace-accelerated projection approach is presented to generate shifts for subsequent iterations without user intervention. Once initialized with arbitrary shifts, the proposed algorithm solves large-scale NAREs autonomously, generating its own shifts. Numerical results are presented using benchmark example of order $10^6$, demonstrating the computational efficiency and accuracy of the proposed algorithm.
Adaptive Spatial-Temporal Graph Learning-Enabled Short-Term Voltage Stability Assessment against Time-Varying Topological Conditions
The emerging deep learning (DL) technology has recently exhibited great potential in data-driven short-term voltage stability (SVS) assessment of complex power grids. However, without sufficient attention to the time-varying topological structures of today's power grids, the majority of existing DL-based SVS assessment schemes could experience severe performance degradation in practice. To address this drawback, this paper proposes an adaptive spatial-temporal graph learning-enabled SVS assessment approach that can adapt well to various topological changes. First, considering the time-varying topological conditions of a given power grid, an adaptive graph representation matrix is automatically learned to effectively capture the complicated spatial correlations between individual buses within the grid. Then, to help better capture regional SVS features for subsequent learning processes, the adaptive graph representation matrix is properly adjusted by introducing a spatial attention mechanism. Further, with post-fault system trajectory data linked together via attention-based graph representation, a residual spatiotemporal graph convolutional network is carefully built with Optuna-based optimization to deeply mine system-wide spatiotemporal features and thus achieve structure-adaptive SVS assessment. Numerical test results on two representative sub-systems of a realistic provincial power grid in South China demonstrate the efficacy of the proposed approach under various changing topological conditions.
comment: 13 pages, accepted for publication in IEEE Transactions on Power Systems
An Exponentially stable Extended Kalman Filter with Estimate dependent Process noise Covariance for Chemical Reaction Networks
Biomolecular systems are often modeled with partially known nonlinear stochastic dynamics, making state and parameter estimation a central challenge. While Kalman filtering techniques are widely used in this setting, their performance critically depends on the choice of the process noise covariance, which is typically assumed constant and heuristically tuned. Such assumptions are not justified for biomolecular systems, where intrinsic noise arises from underlying reaction kinetics. In this work, we propose an Extended Kalman Filter (EKF) with a state estimate-dependent process noise covariance based on Chemical Langevin Equation (CLE). Further, we analyze the stochastic stability of the proposed filter and derive conditions under which the estimation error remains exponentially bounded in the mean-square sense. In particular, we obtain an upper bound on the sampling period for discrete-time biomolecular systems that guarantees this property. The proposed framework is validated through simulations on a nonlinear gene expression model. This approach enables first principle-based modeling and filter design choices for synthetic biomolecular circuits, eliminating the need for heuristic tuning of the process noise covariance.
GPU-Native Multi-Area State Estimation via SIMD Abstraction and Boundary Condensation
Power system state estimation (SE) is foundational for grid monitoring, yet conventional centralized solvers face increasing computational pressure as the system scale and real-time requirements grow. This paper presents a GPU-native framework for hierarchical multi-area state estimation (MASE) that addresses these bottlenecks through a single-instruction, multiple-data (SIMD) abstraction and sparse Schur local condensation. We partition the network into areas, evaluate measurement residuals and derivatives using fixed-sparsity templates, and directly assemble local normal-equation blocks through a fused GPU accumulation kernel without materializing explicit Jacobians. Each area is then factorized on the GPU in Schur mode to export a dense local boundary block and condensed right-hand side, after which a reduced global boundary system is assembled and solved on device. This design preserves device residency across measurement evaluation, local condensation, and boundary coordination while exposing parallelism across areas. Numerical experiments on partitioned PEGASE 2869-bus, PEGASE 9241-bus, and ACTIVSg10k benchmark systems demonstrate that the proposed approach effectively leverages GPU throughput by maintaining full device residency and high arithmetic intensity.
An Algebraic State Observer for a Class of Physical Systems
In this paper we present a radically new approach to design state observers for nonlinear systems, with particular emphasis on physical ones. Our objective is to obtain an algebraic relation between the unmeasurable part of the state and filtered versions of the systems inputs and outputs, which holds true for all $t \geq 0$. The latter qualifier should be contrasted with the usual asymptotic (or fixed/finite time) objective. The standing assumption for our design is the availability -- or possibility of constructing, via coordinate change -- state components with measurable derivatives. In the physical systems studied in the paper this condition is naturally satisfied. The next step in the design is the application of the Swapping Lemma to pull out from the dynamics the derivative of one of these signals. The design is completed replacing the latter by the measurable signals and arranging the remaining terms. The algebraic observer constitutes a refreshing major departure from classical asymptotic observer designs, even in the case of electrical motors and mechanical systems that have been exhaustively studied. Particularly notable is the fact that no observability or excitation condition is imposed for the construction of the algebraic observer.
System-Level Impacts of Flexible Data Center Load Scheduling on Cost, Emissions, and Transmission Congestion
Large data centers are being deployed in the U.S. at an unprecedented rate, introducing significant flexible load potential. A portion of data center workloads - best-effort (BE) jobs - can be scheduled flexibly to reduce power system operating costs and emissions. However, the system-level impacts of such scheduling remain underexplored. This paper investigates the effects of flexible data center load scheduling on operating cost, system stress, and emissions using the ACTIVSg2000 2000-bus test system. Results show that BE loads shift toward periods of lower locational marginal prices (LMPs), typically aligned with high renewable generation. Importantly, latency-critical (LC) workloads remain unaffected, preserving quality of service (QoS). Flexible scheduling also leads to reductions in both greenhouse gas and toxic emissions, as well as transmission congestion, compared to inflexible operation, demonstrating its potential to support more efficient and sustainable grid operation.
A Diffusion-based Generative Machine Learning Paradigm for Dynamic Contingency Screening
Dynamic contingency screening is a challenging task in dynamic security assessment, when traditional numerical approaches are computationally intensive and often not able to repeatedly solve full AC power flow for all possible contingencies in real time, especially for large-scale power grids. Moreover, the severity caused by a contingency is not identical for all operating points, which does not necessitate solving all possible contingencies computationally inefficient and time-consuming. This paper introduces a novel, diffusion-based generative machine learning paradigm that transforms contingency analysis from conventional scenario selection to a proactive, likely-unsupervised scenario generation. The margin to the steady-state voltage stability limit determines the ranking of contingencies corresponding to each operating point. By leveraging physical information from each operating point, the proposed approach anticipates the contingencies most likely to be critical, without relying on static assumptions or exhaustive simulations. This data-prompted generative approach enables the identification of high-risk scenarios under varying load and generator conditions, providing dynamic security assessment in real time. The correctness, effectiveness, and scalability of the methodology are demonstrated through methodological derivations and comprehensive experiments on multiple IEEE benchmark systems, including IEEE-6, IEEE-14, IEEE-30, and IEEE-118, highlighting its potential to incorporate contingency screening in complex, evolving smart grids.
Risk-Aware Rulebooks for Multi-Objective Trajectory Evaluation under Uncertainty
We present a risk-aware formalism for evaluating system trajectories in the presence of uncertain interactions between the system and its environment. The proposed formalism supports reasoning under uncertainty and systematically handles complex relationships among requirements and objectives, including hierarchical priorities and non-comparability. Rather than treating the environment as exogenous noise, we explicitly model how each system trajectory influences the environment and evaluate trajectories under the resulting distribution of environment responses. We prove that the formalism induces a preorder on the set of system trajectories, ensuring consistency and preventing cyclic preferences. Finally, we illustrate the approach with an autonomous driving example that demonstrates how the formalism enhances explainability by clarifying the rationale behind trajectory selection.
A Framework for Solving Continuous Energy and Power System Problems using Adiabatic Quantum Computing
The increasing scale and nonlinearity of modern energy and power system problems pose significant challenges to classical numerical solvers. In parallel, advances in quantum and quantum-inspired hardware are expected to improve scalability and offer performance advantages for large-scale optimization problems. Therefore, we propose a novel combinatorial optimization framework that reformulates continuous energy and power system problems into a format executable on quantum/digital annealers. The proposed framework accommodates both real and complex numbers and can represent both linear and nonlinear equations. As a proof of concept, we demonstrate its use in three applications: (i) 2D steady conductive heat transfer for a plate with constant temperature at each edge, where coefficient and boundary condition matrices are developed to solve linear system of equations, (ii) power system parameter identification, where the admittance matrix is estimated given voltage and current measurements, and (iii) power flow analysis, which solves the governing equations for active and reactive power balance. As a proof of concept, the applications are run on small test cases. The results show that the framework effectively and efficiently addresses the three applications and therefore suggest its potential to solve a wide range of energy and power system problems.
comment: 9 pages, 5 figures
Robotics
GCImOpt: Learning efficient goal-conditioned policies by imitating optimal trajectories
Imitation learning is a well-established approach for machine-learning-based control. However, its applicability depends on having access to demonstrations, which are often expensive to collect and/or suboptimal for solving the task. In this work, we present GCImOpt, an approach to learn efficient goal-conditioned policies by training on datasets generated by trajectory optimization. Our approach for dataset generation is computationally efficient, can generate thousands of optimal trajectories in minutes on a laptop computer, and produces high-quality demonstrations. Further, by means of a data augmentation scheme that treats intermediate states as goals, we are able to increase the training dataset size by an order of magnitude. Using our generated datasets, we train goal-conditioned neural network policies that can control the system towards arbitrary goals. To demonstrate the generality of our approach, we generate datasets and then train policies for various control tasks, namely cart-pole stabilization, planar and three-dimensional quadcopter stabilization, and point reaching using a 6-DoF robot arm. We show that our trained policies can achieve high success rates and near-optimal control profiles, all while being small (less than 80,000 neural network parameters) and fast enough (up to more than 6,000 times faster than a trajectory optimization solver) that they could be deployed onboard resource-constrained controllers. We provide videos, code, datasets and pre-trained policies under a free software license; see our project website https://jongoiko.github.io/gcimopt/.
comment: Accepted for publication at the 8th Annual Conference on Learning for Dynamics and Control (L4DC 2026). 16 pages (including appendix), 1 figure. For project website, see https://jongoiko.github.io/gcimopt/
ATRS: Adaptive Trajectory Re-splitting via a Shared Neural Policy for Parallel Optimization
Parallel trajectory optimization via the Alternating Direction Method of Multipliers (ADMM) has emerged as a scalable approach to long-horizon motion planning. However, existing frameworks typically decompose the problem into parallel subproblems based on a predefined fixed structure. Such structural rigidity often causes optimization stagnation in highly constrained regions, where a few lagging subproblems delay global convergence. A natural remedy is to adaptively re-split these stagnating segments online. Yet, deciding when, where, and how to split exceeds the capability of rule-based heuristics. To this end, we propose ATRS, a novel framework that embeds a shared Deep Reinforcement Learning policy into the parallel ADMM loop. We formulate this adaptive adjustment as a Multi-Agent Shared-Policy Markov Decision Process, where all trajectory segments act as homogeneous agents and share a unified neural policy network. This parameter-sharing architecture endows the system with size invariance, enabling it to handle dynamically changing segment counts during re-splitting and generalize to arbitrary trajectory lengths. Furthermore, our formulation inherently supports zero-shot generalization to unseen environments, as our network relies solely on the internal states of the numerical solver rather than on the geometric features of the environment. To ensure solver stability, a Confidence-Based Election mechanism selects only the most stagnating segment for re-splitting at each step. Extensive simulations demonstrate that ATRS accelerates convergence, reducing the number of iterations by up to 26.0% and the computation time by up to 19.1%. Real-world experiments further confirm its applicability to both large-scale offline global planning and real-time onboard replanning within 35 ms per cycle, with no sim-to-real degradation.
comment: 8 pages, submitted to IEEE Robotics and Automation Letters
GazeVLA: Learning Human Intention for Robotic Manipulation
Embodied foundation models have achieved significant breakthroughs in robotic manipulation, yet they still depend heavily on large-scale robot demonstrations. Although recent works have explored leveraging human data to alleviate this dependency, effectively extracting transferable knowledge remains a significant challenge due to the inherent embodiment gap between human and robot. We argue that the intention underlying human actions can serve as a powerful intermediate representation for bridging this gap. In this paper, we introduce a novel framework that explicitly learns and transfers human intention to facilitate robotic manipulation. Specifically, we model intention through gaze, as it naturally precedes physical actions and serves as an observable proxy for human intent. Our model is first pretrained on a large-scale egocentric human dataset to capture human intention and its synergy with action, followed by finetuning on a small set of robot and human data. During inference, the model adopts a Chain-of-Thought reasoning paradigm, sequentially predicting intention before executing the action. Extensive evaluations in simulation and real-world settings, across long-horizon and fine-grained tasks, and under few-shot and robustness benchmarks, show that our method consistently outperforms strong baselines, generalizes better, and achieves state-of-the-art performance.
RedVLA: Physical Red Teaming for Vision-Language-Action Models
The real-world deployment of Vision-Language-Action (VLA) models remains limited by the risk of unpredictable and irreversible physical harm. However, we currently lack effective mechanisms to proactively detect these physical safety risks before deployment. To address this gap, we propose \textbf{RedVLA}, the first red teaming framework for physical safety in VLA models. We systematically uncover unsafe behaviors through a two-stage process: (I) \textbf{Risk Scenario Synthesis} constructs a valid and task-feasible initial risk scene. Specifically, it identifies critical interaction regions from benign trajectories and positions the risk factor within these regions, aiming to entangle it with the VLA's execution flow and elicit a target unsafe behavior. (II) \textbf{Risk Amplification} ensures stable elicitation across heterogeneous models. It iteratively refines the risk factor state through gradient-free optimization guided by trajectory features. Experiments on six representative VLA models show that RedVLA uncovers diverse unsafe behaviors and achieves the ASR up to 95.5\% within 10 optimization iterations. To mitigate these risks, we further propose SimpleVLA-Guard, a lightweight safety guard built from RedVLA-generated data. Our data, assets, and code are available \href{https://redvla.github.io}{here}.
QDTraj: Exploration of Diverse Trajectory Primitives for Articulated Objects Robotic Manipulation
Thanks to the latest advances in learning and robotics, domestic robots are beginning to enter homes, aiming to execute household chores autonomously. However, robots still struggle to perform autonomous manipulation tasks in open-ended environments. In this context, this paper presents a method that enables a robot to manipulate a wide spectrum of articulated objects. In this paper, we automatically generate different robot low-level trajectory primitives to manipulate given object articulations. A very important point when it comes to generating expert trajectories is to consider the diversity of solutions to achieve the same goal. Indeed, knowing diverse low-level primitives to accomplish the same task enables the robot to choose the optimal solution in its real-world environment, with live constraints and unexpected changes. To do so, we propose a method based on Quality-Diversity algorithms that leverages sparse reward exploration in order to generate a set of diverse and high-performing trajectory primitives for a given manipulation task. We validated our method, QDTraj, by generating diverse trajectories in simulation and deploying them in the real world. QDTraj generates at least 5 times more diverse trajectories for both hinge and slider activation tasks, outperforming the other methods we compared against. We assessed the generalization of our method over 30 articulations of the PartNetMobility articulated object dataset, with an average of 704 different trajectories by task. Code is publicly available at: https://kappel.web.isir.upmc.fr/trajectory_primitive_website
comment: 8 pages, 7 figures, webpage: https://kappel.web.isir.upmc.fr/trajectory_primitive_website
Information-Theoretic Geometry Optimization and Physics-Aware Learning for Calibration-Free Magnetic Localization
Wireless localization of permanent magnets enables occlusion-free guidance for medical interventions, yet its practical accuracy is fundamentally limited by two coupled challenges: the poor observability of conventional planar sensor arrays and the simulation-to-reality (Sim-to-Real) gap of learning-based estimators. To address these issues, this article presents a unified framework that combines information-theoretic sensor geometry optimization with physics-aware deep learning. First, a rigorous Fisher Information Matrix (FIM)-based evaluation framework is established to quantify geometry-induced observability limitations. The results show that a staggered split-array topology provides a substantially stronger observability foundation for localization while remaining compatible with practical external deployment. Second, building on this optimized sensing configuration, we propose Phy-GAANet, a calibration-free estimator trained entirely on hardware-aware synthetic data. By incorporating Physics-Informed Features (PIF) for saturation modeling and Geometry-Aware Attention (GAA) for preserving cross-layer vector structure, the network effectively bridges the Sim-to-Real gap. Extensive real-world experiments demonstrate state-of-the-art performance, achieving a position error of 1.84 mm and an orientation error of 3.18 degrees at a refresh rate exceeding 270 Hz. The proposed method consistently outperforms classical Levenberg--Marquardt solvers and generic convolutional baselines, particularly in suppressing catastrophic outliers and maintaining robustness in challenging near-field boundary regions. Beyond the proposed network, the FIM-guided analysis also provides a framework for sensor geometry design in magnetic localization systems under practical deployment constraints.
comment: 10pages 8 figures
Decoding High-Dimensional Finger Motion from EMG Using Riemannian Features and RNNs
Continuous estimation of high-dimensional finger kinematics from forearm surface electromyography (EMG) could enable natural control for hand prostheses, AR/XR interfaces, and teleoperation. However, the complexity of human hand gestures and the entanglement of forearm muscles make accurate recognition intrinsically challenging. Existing approaches typically reduce task complexity by relying on classification-based machine learning, limiting the controllable degrees of freedom and compromising on natural interaction. We present an end-to-end framework for continuous EMG-to-kinematics regression using only consumer-grade hardware. The framework combines an 8-channel EMG armband, a single webcam, and an automatic synchronization procedure, enabling the collection of the EMG Finger-Kinematics dataset (EMG-FK), a 10-h dataset of synchronized EMG and 15 finger joint angles from 20 participants performing rich, unconstrained right-hand motions. We also introduce the Temporal Riemannian Regressor (TRR), a lightweight GRU-based model that uses sequences of multi-band Riemannian covariance features to decode finger motion. Across EMG-FK and the public emg2pose benchmark, TRR outperforms state-of-the-art methods in both intra- and cross-subject evaluation. On EMG-FK, it reaches an average absolute error of $9.79 °\pm 1.48$ in intra-subject and $16.71 °\pm 3.97$ in cross-subject. Finally, we demonstrate real-time deployment on a Raspberry Pi 5 and intuitive control of a robotic hand; TRR runs at nearly 10 predictions/s and is roughly an order of magnitude faster than state-of-the-art approaches. Together, these contributions lower the barrier to reproducible, real-time EMG-based decoding of high-dimensional finger motion, and pave the way toward more natural and intuitive control of embedded EMG-based systems.
comment: 13 pages, 10 figures, 3 tables, links to a GitHub, a dataset on Zenodo, and two videos on YouTube
Point & Grasp: Flexible Selection of Out-of-Reach Objects Through Probabilistic Cue Integration
Selecting out-of-reach objects is a fundamental task in mixed reality (MR). Existing methods rely on a single cue or deterministically fuse multiple cues, leading to performance degradation when the dominant cue becomes unreliable. In this work, we introduce a probabilistic cue integration framework that enables flexible combination of multiple user-generated cues for intent inference. Inspired by natural grasping behavior, we instantiate the framework with pointing direction and grasp gestures as a new interaction technique, Point&Grasp. To this end, we collect the Out-of-Reach Grasping (ORG) dataset to train a robust likelihood model of the gestural cue, which captures grasping patterns not present in existing in-reach datasets. User studies demonstrate that our selection method with cue integration not only improves accuracy and speed over single-cue baselines, but also remains practically effective compared to state-of-the-art methods across various sources of ambiguity. The dataset and code are available at https://github.com/drlxj/point-and-grasp.
comment: 19 pages, 13 figures, CHI 2026
Adaptive vs. Static Robot-to-Human Handover: A Study on Orientation and Approach Direction
Robot-to-human handovers often rely on static, open-loop strategies (or, at best, approaches that adapt only the position), which generally do not consider how the object will be grasped by the human, thus requiring the user to adapt. This work presents a novel adaptive framework that dynamically adjusts the object's delivery pose in real time based on the user's hand pose and the intended downstream task. By integrating AI-based hand pose estimation with smooth, kinematically constrained trajectories, the system ensures a safe approach and an optimal handover orientation. A comprehensive user study compares the proposed adaptive approach against a static baseline across multiple tasks, evaluating both subjective metrics (NASA-TLX, Human-Robot Trust Scale) and objective physiological data (blink rate measured via wearable eye-trackers). The results demonstrate that dynamic alignment significantly reduces users' cognitive workload and physiological stress, while increasing perceived trust in the robot's reliability. These findings highlight the potential of task- and pose-aware systems for enabling fluid and ergonomic human-robot collaboration.
LeHome: A Simulation Environment for Deformable Object Manipulation in Household Scenarios ICRA2026
Household environments present one of the most common, impactful yet challenging application domains for robotics. Within household scenarios, manipulating deformable objects is particularly difficult, both in simulation and real-world execution, due to varied categories and shapes, complex dynamics, and diverse material properties, as well as the lack of reliable deformable-object support in existing simulations. We introduce LeHome, a comprehensive simulation environment designed for deformable object manipulation in household scenarios. LeHome covers a wide spectrum of deformable objects, such as garments and food items, offering high-fidelity dynamics and realistic interactions that existing simulators struggle to simulate accurately. Moreover, LeHome supports multiple robotic embodiments and emphasizes low-cost robots as a core focus, enabling end-to-end evaluation of household tasks on resource-constrained hardware. By bridging the gap between realistic deformable object simulation and practical robotic platforms, LeHome provides a scalable testbed for advancing household robotics. Webpage: https://lehome-web.github.io/ .
comment: ICRA2026 Accepted
Closed Form Relations and Higher-Order Approximations of First and Second Derivatives of the Tangent Operator on SE(3)
The Lie group SE(3) of isometric orientation preserving transformation is used for modeling multibody systems, robots, and Cosserat continua. The use of these models in numerical simulation and optimization schemes necessitates the exponential map, its right-trivialized differential (often referred to as tangent operator), as well as higher derivatives in closed form. The $6\times 6$ matrix representation of the differential, $\mathbf{dexp}_{\mathbf{X}}:se\left( 3\right) \rightarrow se\left( 3\right) $ , and its first derivative were reported using a $3\times 3$ block partitioning. In this paper, the differential, its first and second derivative, as well as the Jacobian and Hessian of the evaluation maps, $\mathbf{dexp}_{\mathbf{X}}\mathbf{Z}$ and $\mathbf{dexp}_{\mathbf{X}}^{T}% \mathbf{Z}$, are reported avoiding the block partitioning. For all of them, higher-order approximations are derived. Besides the compactness, the advantage of the presented closed form relations is their numerical robustness when combined with the local approximation. The formulations are demonstrated for computation of the deformation field and the strain rates of an elastic Cosserat-Simo-Reissner rod.
A Kinematic Analysis of Palm Degrees of Freedom for Enhancing Thumb Opposability in Robotic Hands
This study investigates the kinematic role of palm degrees of freedom (DoF) in enhancing thumb opposability in a five-finger robotic hand. A hand model consisting of a five DoF thumb and four fingers with three to four DoF is analyzed, where palm motion is introduced between adjacent fingers. To quantitatively evaluate thumb-finger interaction, the overlap workspace volume is defined based on voxelized fingertip reachable regions. Seven cases are considered, including configurations with increased total DoF and configurations in which the total DoF is maintained by redistributing DoF from the fingers to the palm. The results show that palm DoF significantly improves opposability, particularly for the ring and little fingers, by repositioning their base locations rather than simply extending their reachable range. However, when the total DoF is constrained, redistributing DoF to the palm leads to trade-offs between overlap workspace expansion and kinematic redundancy. These findings indicate that palm DoF and finger DoF play distinct roles in hand kinematics and should be considered jointly in design. This study provides a quantitative framework for evaluating palm-induced opposability without relying on object or contact models and offers practical design guidelines for incorporating palm motion in robotic hands.
comment: This manuscript has been submitted for possible publication
False Feasibility in Variable Impedance MPC for Legged Locomotion
Variable impedance model predictive control (MPC) formulations that treat joint stiffness as an instantaneous decision variable operate on a feasible set strictly larger than the physically realizable set under first-order actuator dynamics. We identify this as a formulation error rather than a modeling approximation, formalize the distinction between the parameter-based feasible set Fparam and the realizable set Freal, and characterize the regime of mismatch via the dimensionless parameter alpha = omega_sT (actuator bandwidth times task timescale). For the 1D hopping monoped, we prove that below an analytical threshold alpha_crit derived in closed form from task physics, no admissible stiffness command realizes the parameter-based prediction. Numerical validation in 1D shows monotonic deviation growth as alpha decreases, with the predicted scaling holding across ten parameter combinations (log-log R2 = 0.99). Mechanism transfer to planar spring-loaded inverted pendulum dynamics confirms center-of-mass and stance-timing deviation as the primary consequence, with regime-dependent friction effects as a tertiary observable. A second threshold alpha_infeas < alpha_crit establishes a floor below which restricting the admissible stiffness range cannot repair realizability, closing the conservative-tuning objection on structural grounds. Augmenting the prediction state with stiffness closes the mismatch by construction.
Learning Control Policies to Provably Satisfy Hard Affine Constraints for Black-Box Hybrid Dynamical Systems
Ensuring safety for black-box hybrid dynamical systems presents significant challenges due to their instantaneous state jumps and unknown explicit nonlinear dynamics. Existing solutions for strict safety constraint satisfaction, like control barrier functions (CBFs) and reachability analysis, rely on direct knowledge of the dynamics. Similarly, safe reinforcement learning (RL) approaches often rely on known system dynamics or merely discourage safety violations through reward shaping. In this work, we want to learn RL policies which provably satisfy affine state constraints in closed loop for black-box hybrid dynamical systems with affine reset maps. Our key insight is forcing the RL policy to be affine and repulsive near the constraint boundaries for the unknown nonlinear dynamics of the system, providing guarantees that the trajectories will not violate the constraint. We further account for constraint violation due to instantaneous state jumps that occur due to impacts or reset maps in the hybrid system by introducing a second repulsive affine region before the reset that prevents post-reset states from violating the constraint. We derive sufficient conditions under which these policies satisfy safety constraints in closed loop. We also compare our approach with state-of-the-art reward shaping and learned-CBF methods on hybrid dynamical systems like the constrained pendulum and paddle juggler environments. In both scenarios, we show that our methodology learns higher quality policies while always satisfying the safety constraints.
CodeGraphVLP: Code-as-Planner Meets Semantic-Graph State for Non-Markovian Vision-Language-Action Models
Vision-Language-Action (VLA) models promise generalist robot manipulation, but are typically trained and deployed as short-horizon policies that assume the latest observation is sufficient for action reasoning. This assumption breaks in non-Markovian long-horizon tasks, where task-relevant evidence can be occluded or appear only earlier in the trajectory, and where clutter and distractors make fine-grained visual grounding brittle. We present CodeGraphVLP, a hierarchical framework that enables reliable long-horizon manipulation by combining a persistent semantic-graph state with an executable code-based planner and progress-guided visual-language prompting. The semantic-graph maintains task-relevant entities and relations under partial observability. The synthesized planner executes over this semantic-graph to perform efficient progress checks and outputs a subtask instruction together with subtask-relevant objects. We use these outputs to construct clutter-suppressed observations that focus the VLA executor on critical evidence. On real-world non-Markovian tasks, CodeGraphVLP improves task completion over strong VLA baselines and history-enabled variants while substantially lowering planning latency compared to VLM-in-the-loop planning. We also conduct extensive ablation studies to confirm the contributions of each component.
Learning-augmented robotic automation for real-world manufacturing
Industrial robots are widely used in manufacturing, yet most manipulation still depends on fixed waypoint scripts that are brittle to environmental changes. Learning-based control offers a more adaptive alternative, but it remains unclear whether such methods, still mostly confined to laboratory demonstrations, can sustain hours of reliable operation, deliver consistent quality, and behave safely around people on a live production line. Here we present Learning-Augmented Robotic Automation, a hybrid system that integrates learned task controllers and a neural 3D safety monitor into conventional industrial workflows. We deployed the system on an electric-motor production line to automate deformable cable insertion and soldering under real manufacturing constraints, a step previously performed manually by human workers. With less than 20 min of real-world data per task, the system operated continuously for 5 h 10 min, producing 108 motors without physical fencing and achieving a 99.4% pass rate on product-level quality-control tests. It maintained near-human takt time while reducing variability in solder-joint quality and cycle time. These results establish a practical pathway for extending industrial automation with learning-based methods.
An LLM-Driven Closed-Loop Autonomous Learning Framework for Robots Facing Uncovered Tasks in Open Environments
Autonomous robots operating in open environments need the ability to continuously handle tasks that are not covered by predefined local methods. However, existing approaches often rely on repeated large-language-model (LLM) interaction for uncovered tasks, and even successful executions or observed successful external behaviors are not always autonomously transformed into reusable local knowledge. In this paper, we propose an LLM-driven closed-loop autonomous learning framework for robots facing uncovered tasks in open environments. The proposed framework first retrieves the local method library to determine whether a reusable solution already exists for the current task or observed event. If no suitable method is found, it triggers an autonomous learning process in which the LLM serves as a high-level reasoning component for task analysis, candidate model selection, data collection planning, and execution or observation strategy organization. The robot then learns from both self-execution and active observation, performs quasi-real-time training and adjustment, and consolidates the validated result into the local method library for future reuse. Through this recurring closed-loop process, the robot gradually converts both execution-derived and observation-derived experience into reusable local capability while reducing future dependence on repeated external LLM interaction. Results show that the proposed framework reduces execution time and LLM dependence in both repeated-task self-execution and observation-driven settings, for example reducing the average total execution time from 7.7772s to 6.7779s and the average number of LLM calls per task from 1.0 to 0.2 in the repeated-task self-execution experiments.
V-STC: A Time-Efficient Multi-Vehicle Coordinated Trajectory Planning Approach
Coordinating the motions of multiple autonomous vehicles (AVs) requires planning frameworks that ensure safety while making efficient use of space and time. This paper presents a new approach, termed variable-time-step spatio-temporal corridor (V-STC), that enhances the temporal efficiency of multi-vehicle coordination. An optimization model is formulated to construct a V-STC for each AV, in which both the spatial configuration of the corridor cubes and their time durations are treated as decision variables. By allowing the corridor's spatial position and time step to vary, the constructed V-STC reduces the overall temporal occupancy of each AV while maintaining collision-free separation in the spatio-temporal domain. Based on the generated V-STC, a dynamically feasible trajectory is then planned independently for each AV. Simulation studies demonstrate that the proposed method achieves safe multi-vehicle coordination and yields more time-efficient motion compared with existing STC approaches.
comment: 12 pages, 23 figures
Energy-Efficient Multi-Robot Coverage Path Planning of Non-Convex Regions of Interests
This letter presents an energy-efficient multi-robot coverage path planning (MRCPP) framework for large, nonconvex Regions of Interest (ROI) containing obstacles and no-fly zones (NFZ). Existing minimum-energy coverage planning algorithms utilize meta-heuristic boustrophedon workspace decomposition. Therefore, even with minimum energy objectives and energy consumption constraints, they cannot achieve optimal energy efficiency. Moreover, most existing frameworks support only a single type of robotic platform. MRCPP overcomes these limitations by: generating globally-informed swath generation, creating parallel sweeping paths with minimal turns, calculating safety buffers to ensure safe turning clearance, using an efficient mTSP solver to balance workloads and minimize mission time, and connecting disjoint segments via a modified visibility graph that tracks heading angles while maintaining transitions within safe regions. The efficacy of the proposed MRCPP framework is demonstrated through real-world experiments involving autonomous aerial vehicles (AAVs) and autonomous surface vehicles (ASVs). Evaluations demonstrate that the proposed MRCPP consistently outperforms state-of-the-art planners, reducing average total energy consumption by 3\% to 40\% for a team of 3 robots and computation time by an order of magnitude, while maintaining balanced workload distribution and strong scalability across increasing fleet sizes. The MRCPP framework is released as an open-source package and videos of real-world and simulated experiments are available at https://mrc-pp.github.io.
comment: Accepted in " Robotics and Automation Letters (RAL)"
dWorldEval: Scalable Robotic Policy Evaluation via Discrete Diffusion World Model
Evaluating robotics policies across thousands of environments and thousands of tasks is infeasible with existing approaches. This motivates the need for a new methodology for scalable robotics policy evaluation. In this paper, we propose dWorldEval, which uses a discrete diffusion world model as a scalable evaluation proxy for robotics policies. Specifically, dWorldEval maps all modalities - including vision, language, and robotic actions - into a unified token space, modeling them via a single transformer-based denoising network. In this paper, we propose dWorldEval, using a discrete diffusion world model as a scalable evaluation proxy for robotics policy. Specifically, it maps all modalities, including vision, language, and robotics action into a unified token space, then denoises them with a single transformer network. Building on this architecture, we employ a sparse keyframe memory to maintain spatiotemporal consistency. We also introduce a progress token that indicates the degree of task completion. At inference, the model jointly predicts future observations and progress token, allowing automatically determine success when the progress reaches 1. Extensive experiments demonstrate that dWorldEval significantly outperforms previous approaches, i.e., WorldEval, Ctrl-World, and WorldGym, on LIBERO, RoboTwin, and multiple real-robot tasks. It paves the way for a new architectural paradigm in building world simulators for robotics evaluation at scale.
PAGaS: Pixel-Aligned 1DoF Gaussian Splatting for Depth Refinement
Gaussian Splatting (GS) has emerged as an efficient approach for high-quality novel view synthesis. While early GS variants struggled to accurately model the scene's geometry, recent advancements constraining the Gaussians' spread and shapes, such as 2D Gaussian Splatting, have significantly improved geometric fidelity. In this paper, we present Pixel-Aligned 1DoF Gaussian Splatting (PAGaS) that adapts the GS representation from novel view synthesis to the multi-view stereo depth task. Our key contribution is modeling a pixel's depth using one-degree-of-freedom (1DoF) Gaussians that remain tightly constrained during optimization. Unlike existing approaches, our Gaussians' positions and sizes are restricted by the back-projected pixel volumes, leaving depth as the sole degree of freedom to optimize. PAGaS produces highly detailed depths, as illustrated in Figure 1. We quantitatively validate these improvements on top of reference geometric and learning-based multi-view stereo baselines on challenging 3D reconstruction benchmarks. Code: davidrecasens.github.io/pagas
A Lightweight Toggleable Adhesion Prototype for Multirotor UAV Landing on Tilting Platforms
Autonomous multirotor landings on uncrewed surface vessels (USVs) are critical for persistent maritime operations but remain challenging due to wave-induced tilt, wind disturbances, and limited landing area. Many existing approaches exhibit small pose tolerance for reliable landing. This paper presents a lightweight toggleable adhesion mechanism to improve landing reliability. The system uses a motor-driven corkscrew that engages hook-and-loop material on the landing surface, enabling active adhesion during landing and controlled release during takeoff. We evaluate a prototype using a modified Crazyflie 2.0 and a custom tilting platform at fixed angles representative of extreme wave conditions. Using only a simple vertical PID controller, the proposed approach increases landing success from an average of 40% (baseline) to 80% across platform tilts up to 43 degrees using appropriately selected actuation settings.
comment: To be published in the proceedings of the International Conference on Unmanned Aircraft Systems (ICUAS) 2026
RL Token: Bootstrapping Online RL with Vision-Language-Action Models
Vision-language-action (VLA) models can learn to perform diverse manipulation skills "out of the box," but achieving the precision and speed that real-world tasks demand requires further fine-tuning -- for example, via reinforcement learning (RL). We introduce a lightweight method that enables sample-efficient online RL fine-tuning of pretrained VLAs using just a few hours of real-world practice. We (1) adapt the VLA to expose an "RL token," a compact readout representation that preserves task-relevant pretrained knowledge while serving as an efficient interface for online RL, and (2) train a small actor-critic head on this RL token to refine the actions, while anchoring the learned policy to the VLA. Online RL with the RL token (RLT) makes it possible to fine-tune even large VLAs with RL quickly and efficiently. Across four real-robot tasks (screw installation, zip tie fastening, charger insertion, and Ethernet insertion), RLT improves the speed on the hardest part of the task by up to 3x and raises success rates significantly within minutes to a few hours of practice. It can even surpass the speed of human teleoperation on some of the tasks.
Control Barrier Functions Solved with Hierarchical Quadratic Programming for Safe Physical Human-Robot Interaction
Physical human-robot interaction offers the potential to leverage human intelligence and robot physical capabilities to enable a range of exciting applications, e.g., collaborative robots for rehabilitation. Safety is critical for the successful deployment of this kind of robotic system. In recent years, Control Barrier Function (CBF) has emerged as an effective approach to enforce safety guarantees, which has been widely applied in various applications, from adaptive cruise control to navigation of legged robots. CBFs can be solved in a Quadratic Programming (QP) problem, which can include many CBF-formulated tasks. To manage a large number of safety tasks, a hierarchical CBF has been used to allow hierarchical relaxation of safety tasks to ensure the feasibility of a solution in the presence of conflicting tasks. In this work, we propose to use a CBF-based Hierarchical Quadratic Programming (HQP) framework in physical human-robot interaction to allow us to design both performance tasks (e.g., preserve the desired behavior at the human-robot interaction point) and safety tasks at any level of a hierarchy to balance the safety and the performance in a more flexible way. Extensive experiments were carried out on a real redundant robot to validate the effectiveness, flexibility, and generality of this approach.
comment: 8 pages, 8 figures
Equivariant Filter for Radar-Inertial Odometry
Radar-Inertial Odometry (RIO) based on the Extended Kalman Filter (EKF) relies on accurate extrinsic calibration between the radar and the Inertial Measurement Unit (IMU) and is sensitive to disturbances, as large linearization errors can degrade performance or even cause divergence. To address these limitations, this letter proposes an Equivariant Filter (EqF) for RIO based on a Lie group symmetry that geometrically couples navigation states and IMU biases, extending it to incorporate radar-IMU extrinsic calibration and multi-state constraint updates. This equivariant formulation inherently preserves consistency and enhances robustness, enabling reliable state estimation even under poor or completely wrong initialization of calibration states. Real-world experiments on two different Uncrewed Aerial Vehicles (UAVs) show that the proposed EqF-RIO achieves state-of-the-art accuracy under correct extrinsic calibration and offers improved convergence under large calibration errors, where the conventional EKF-RIO fails. Evaluation code is open-sourced.
GenAssets: Generating in-the-wild 3D Assets in Latent Space CVPR 2025
High-quality 3D assets for traffic participants are critical for multi-sensor simulation, which is essential for the safe end-to-end development of autonomy. Building assets from in-the-wild data is key for diversity and realism, but existing neural-rendering based reconstruction methods are slow and generate assets that render well only from viewpoints close to the original observations, limiting their usefulness in simulation. Recent diffusion-based generative models build complete and diverse assets, but perform poorly on in-the-wild driving scenes, where observed actors are captured under sparse and limited fields of view, and are partially occluded. In this work, we propose a 3D latent diffusion model that learns on in-the-wild LiDAR and camera data captured by a sensor platform and generates high-quality 3D assets with complete geometry and appearance. Key to our method is a "reconstruct-then-generate" approach that first leverages occlusion-aware neural rendering trained over multiple scenes to build a high-quality latent space for objects, and then trains a diffusion model that operates on the latent space. We show our method outperforms existing reconstruction and generation based methods, unlocking diverse and scalable content creation for simulation.
comment: CVPR 2025. Project page: https://waabi.ai/genassets
Vision-Language-Action in Robotics: A Survey of Datasets, Benchmarks, and Data Engines
Despite remarkable progress in Vision--Language--Action (VLA) models, a central bottleneck remains underexamined: the data infrastructure that underlies embodied learning. In this survey, we argue that future advances in VLA will depend less on model architecture and more on the co-design of high-fidelity data engines and structured evaluation protocols. To this end, we present a systematic, data-centric analysis of VLA research organized around three pillars: datasets, benchmarks, and data engines. For datasets, we categorize real-world and synthetic corpora along embodiment diversity, modality composition, and action space formulation, revealing a persistent fidelity-cost trade-off that fundamentally constrains large-scale collection. For benchmarks, we analyze task complexity and environment structure jointly, exposing structural gaps in compositional generalization and long-horizon reasoning evaluation that existing protocols fail to address. For data engines, we examine simulation-based, video-reconstruction, and automated task-generation paradigms, identifying their shared limitations in physical grounding and sim-to-real transfer. Synthesizing these analyses, we distill four open challenges: representation alignment, multimodal supervision, reasoning assessment, and scalable data generation. Addressing them, we argue, requires treating data infrastructure as a first-class research problem rather than a background concern.
comment: This is a survey paper. The survey is already accepted by TMLR after peer-review. The OpenReview link is here: https://openreview.net/forum?id=tAaWFpvnmm
Learning from the Best: Smoothness-Driven Metrics for Data Quality in Imitation Learning
In behavioral cloning (BC), policy performance is fundamentally limited by demonstration data quality. Real-world datasets contain trajectories of varying quality due to operator skill differences, teleoperation artifacts, and procedural inconsistencies, yet standard BC treats all demonstrations equally. Existing curation methods require costly policy training in the loop or manual annotation, limiting scalability. We propose RINSE (Ranking and INdexing Smooth Examples), a lightweight framework for scoring demonstrations based on trajectory smoothness that is policy-architecture-agnostic and operates on trajectory data alone, with TED additionally using a phase-boundary/contact signal. Grounded in motor control theory, which establishes smoothness as a hallmark of skilled movement, RINSE uses two complementary metrics: Spectral Arc Length (SAL), a spectral measure of frequency-domain regularity, and Trajectory-Envelope Distance (TED), a spatial measure of contact-aware geometric deviation. We show that smoothness filtering can reduce the conditional action variance of the retained data distribution, with downstream effects that can be amplified by action chunking and compounding error. On RoboMimic benchmarks, SAL filtering achieves 16% higher success using one-sixth of the data. On real-world manipulation, TED filtering achieves 20% improvement with half the data. As a retrieval-stage filter within STRAP on LIBERO-10, RINSE re-ranking improves mean success by 5.6%. As soft weights in Re-Mix domain reweighting, RINSE scores produce domain allocations highly correlated with the learned Re-Mix allocations (Spearman $ρ\geq 0.89$). These results support smoothness as a useful quality signal across filtering, retrieval, and reweighting settings, especially in noisy or heterogeneous data regimes.
comment: 8 pages, 5 figures
Efficient Image Annotation via Semi-Supervised Object Segmentation with Label Propagation
Reliable object perception is necessary for general-purpose service robots. Open-vocabulary detectors struggle to generalize beyond a few classes and fully supervised training of object detectors requires time-intensive annotations. We present a semi-supervised label propagation approach for household object segmentation. A segment proposer generates class-agnostic masks, and an ensemble of Hopfield networks assigns labels by learning representative embeddings in complementary foundation model embedding spaces (CLIP, ViT, Theia). Our approach scales to 50 object classes with limited annotation overhead and can automatically label 60% of the data in a RoboCup@Home setting, where preparation time is severely constrained. Dataset and code are publicly available at https://github.com/ais-bonn/label_propagation.
comment: 12 pages, 6 figures, 7 tables, submitted to RoboCup 2026 Symposium
Collaborative Trajectory Prediction via Late Fusion
Predicting future trajectories of surrounding traffic agents is critical for safe autonomous navigation and collision avoidance. Despite all advances in the trajectory forecasting realm, the prediction models remains vulnerable to uncertainty caused by occlusions, limited sensing range, and perception errors. Collaborative vehicle-to-vehicle (V2V) approaches help reduce this uncertainty by sharing complementary information. Existing collaborative trajectory prediction methods typically fuse feature maps at the perception stage to construct a holistic scene view. Further this holistic representation is decoded into the future trajectories. Such design incurs substantial communication overhead due to the exchange of high-dimensional feature representations and often assumes idealized bandwidth and synchronization, limiting practical deployment. We address these limitations by shifting collaboration from perception to the prediction module and introducing a late-fusion framework for shared forecasts. The framework is model-agnostic and treats collaborating vehicles as independent asynchronous agents. We evaluate the approach on the OPV2V, V2V4Real, and DeepAccident datasets, comparing individual and collaborative forecasting. Across all datasets, late fusion consistently reduces miss rate and improves trajectory success rate ($\mathrm{TSR}_{0.5}$), defined as the fraction of ground-truth agents with final displacement error below 0.5 m. On the real-world V2V4Real dataset, collaborative prediction improves the success rate by $1.69\%$ and $1.22\%$ for both intelligent vehicles, respectively, compared with individual forecasting.
RecoverFormer: End-to-End Contact-Aware Recovery for Humanoid Robots
Humanoid robots operating in unstructured environments must recover from unexpected disturbances-a capability that remains challenging for end-to-end control policies. We present RECOVERFORMER, a fully end-to-end humanoid recovery policy that learns when and how to switch among recovery behaviors-including compensatory stepping, hand-environment contact, and center-of-mass reshaping-while maintaining robust performance under model mismatch. The architecture combines a causal transformer over a 50-step observation history with two novel heads: a latent recovery mode that enables smooth transitions among distinct recovery strategies, and a contact affordance head that predicts which environmental surfaces (walls, railings, table edges) are beneficial for stabilization. We evaluate RECOVERFORMER on the Unitree G1 humanoid in MuJoCo. Trained only on open floor, RECOVERFORMER transfers zero shot to walled environments, achieving 100% recovery success across 100-300 N pushes and across wall distances from 0.25-1.4m. Under zero-shot dynamics mismatch, RECOVERFORMER reaches 75.5% at plus +25% mass, 89% under 30 ms latency, 91.5% at low friction, and 99% under compound friction, latency and mass perturbation. The learned latent modes specialize across force regimes without mode-level supervision, validated by t-SNE analysis of 300 episodes. Taken together, these results show that a single end-to-end policy can deliver multi-modal, contact aware humanoid recovery that generalizes across perturbation magnitude, contact geometry, and dynamics shift.
Magnetic Indoor Localization through CNN Regression and Rotation Invariance
Indoor positioning is an essential technology for a wide range of applications in GNSS-denied environments, including indoor navigation and IoT systems. Combining convolutional neural networks (CNNs) and magnetic field-based features offers a low-cost, infrastructure-free solution for precise positioning. While magnetic fingerprints are a promising approach for indoor positioning, models trained on raw 3D magnetometer data are highly sensitive to device orientation. We address this by using two rotation invariant features derived from the 3D magnetic field: the norm (Mn) and the projection onto the gravity axis (Mg). We train a lightweight 7-layer dilated CNN (MagNetS/XL) on magnetic sequences to directly regress (x, y) positions. Using the MagPie dataset (three buildings, handheld trajectories), we systematically evaluate fixed and random rotations of test and/or train data. Raw 3D inputs (Mx, My , Mz) exhibit isotropic error increases under fixed 90° rotations and further degrade with growing random rotations. In contrast, 2D (Mn, Mg) inputs maintain rotation invariant accuracy and surpass the 3D inputs once rotation exceeds building-specific thresholds for three reference buildings: 0° for Loomis (large), 5° for Talbot (medium), and 6° for CSL (small). MagNetXL achieves or exceeds state-of-the-art accuracy on the MagPie dataset, and MagNetS delivers similar performance with roughly one third of the parameters, favoring mobile deployment. These results show that the robustness gained from rotation invariant inputs outweighs the loss of input dimensionality in realistic usage, allowing mapping and localization without orientation alignment or added infrastructure.
comment: Published and presented at the 2026 4th International Conference on Mechatronics, Control and Robotics (ICMCR)
Recent Advances in Multi-Agent Human Trajectory Prediction: A Comprehensive Review
With the emergence of powerful data-driven methods in human trajectory prediction (HTP), gaining a finer understanding of multi-agent interactions lies within hand's reach, with important implications in areas such as social robot navigation, autonomous driving, and crowd modeling. This survey reviews some of the most recent advancements in deep learning-based multi-agent trajectory prediction, focusing on studies published between 2020 and 2025. We categorize the existing methods based on their architectural design, their input representations, and their overall prediction strategies, placing a particular emphasis on models evaluated using the ETH/UCY benchmark. Furthermore, we highlight key challenges and future research directions in the field of multi-agent HTP.
comment: 40 pages
Evidence of an Emergent "Self" in Continual Robot Learning
A key challenge to understanding self-awareness has been a principled way of quantifying whether an intelligent system has a concept of a "self", and if so how to differentiate the "self" from other cognitive structures. We propose that the "self" can be isolated by seeking the invariant portion of cognitive process that changes relatively little compared to more rapidly acquired cognitive knowledge and skills, because our self is the most persistent aspect of our experiences. We used this principle to analyze the cognitive structure of robots under two conditions: One robot learns a constant task, while a second robot is subjected to continual learning under variable tasks. We find that robots subjected to continual learning develop an invariant subnetwork that is significantly more stable (p < 0.001) compared to the control, and that this subnetwork is also functionally important: preserving it aids adaptation while damaging it impairs performance. We suggest that this principle can offer a window into exploring selfhood in other cognitive AI systems
comment: 43 pages, 22 figures, includes supplementary materials
PokeVLA: Empowering Pocket-Sized Vision-Language-Action Model with Comprehensive World Knowledge Guidance
Recent advances in Vision-Language-Action (VLA) models have opened new avenues for robot manipulation, yet existing methods exhibit limited efficiency and a lack of high-level knowledge and spatial awareness. To address these challenges, we propose PokeVLA, a lightweight yet powerful foundation model for embodied manipulation that effectively infuses vision-language understanding into action learning. Our framework introduces a two-stage training paradigm: first, we pre-train a compact vision-language model (PokeVLM) on a curated multimodal dataset of 2.4M samples encompassing spatial grounding, affordance, and embodied reasoning tasks; second, we inject manipulation-relevant representations into the action space through multi-view goal-aware semantics learning, geometry alignment, and a novel action expert. Extensive experiments demonstrate state-of-the-art performance on the LIBERO-Plus benchmark and in real-world deployment, outperforming comparable baselines in success rate and robustness under diverse perturbations. To foster reproducibility and community progress, we will open-source our code, model weights, and the scripts for the curated pre-training dataset. Project page: https://getterupper.github.io/PokeVLA
How attention simplifies mental representations for planning
Human planning is efficient--it frugally deploys limited cognitive resources to accomplish difficult tasks--and flexible--adapting to novel problems and environments. Computational approaches suggest that people construct simplified mental representations of their environment, balancing the complexity of a task representation with its utility. These models imply a nested optimisation in which planning shapes perception, and perception shapes planning--but the perceptual and attentional mechanisms governing how this interaction unfolds remain unknown. Here, we harness virtual maze navigation to characterise how spatial attention controls which aspects of a task representation enter subjective awareness and are available for planning. We find that spatial proximity governs which aspects of a maze are available for planning, and that when task-relevant information follows natural (lateralized) contours of attention, people can more easily construct simplified and useful maze representations. This influence of attention varies considerably across individuals, explaining differences in people's task representations and behaviour. Inspired by the 'spotlight of attention' analogy, we incorporate the effects of visuospatial attention into existing computational accounts of value-guided construal. Together, our work bridges computational perspectives on perception and decision-making to better understand how individuals represent their environments in aid of planning.
DVGT-2: Vision-Geometry-Action Model for Autonomous Driving at Scale
End-to-end autonomous driving has evolved from the conventional paradigm based on sparse perception into vision-language-action (VLA) models, which focus on learning language descriptions as an auxiliary task to facilitate planning. In this paper, we propose an alternative Vision-Geometry-Action (VGA) paradigm that advocates dense 3D geometry as the critical cue for autonomous driving. As vehicles operate in a 3D world, we think dense 3D geometry provides the most comprehensive information for decision-making. However, most existing geometry reconstruction methods (e.g., DVGT) rely on computationally expensive batch processing of multi-frame inputs and cannot be applied to online planning. To address this, we introduce a streaming Driving Visual Geometry Transformer (DVGT-2), which processes inputs in an online manner and jointly outputs dense geometry and trajectory planning for the current frame. We employ temporal causal attention and cache historical features to support on-the-fly inference. To further enhance efficiency, we propose a sliding-window streaming strategy and use historical caches within a certain interval to avoid repetitive computations. Despite the faster speed, DVGT-2 achieves superior geometry reconstruction performance on various datasets. The same trained DVGT-2 can be directly applied to planning across diverse camera configurations without fine-tuning, including closed-loop NAVSIM and open-loop nuScenes benchmarks.
comment: Code is available at https://github.com/wzzheng/DVGT
How Vulnerable Is My Learned Policy? Universal Adversarial Perturbation Attacks On Modern Behavior Cloning Policies
Learning from demonstrations is a popular approach to train AI models; however, their vulnerability to adversarial attacks remains underexplored. We present the first systematic study of adversarial attacks, across a range of both classic and recently proposed imitation learning algorithms, including Vanilla Behavior Cloning (Vanilla BC), LSTM-GMM, Implicit Behavior Cloning (IBC), Diffusion Policy (DP), and Vector-Quantized Behavior Transformer (VQ-BET). We study the vulnerability of these methods to both white-box, grey-box and black-box adversarial perturbations. Our experiments reveal that most existing methods are highly vulnerable to these attacks, including black-box transfer attacks that transfer across algorithms. To the best of our knowledge, we are the first to study and compare the vulnerabilities of different popular imitation learning algorithms to both white-box and black-box attacks. Our findings highlight the vulnerabilities of modern imitation learning algorithms, paving the way for future work in addressing such limitations. Videos and code are available at https://sites.google.com/view/uap-attacks-on-bc.
An Efficient Real-Time Planning Method for Swarm Robotics Based on an Optimal Virtual Tube
Robot swarms navigating through unknown obstacle environments are an emerging research area that faces challenges. Performing tasks in such environments requires swarms to achieve autonomous localization, perception, decision-making, control, and planning. The limited computational resources of onboard platforms present significant challenges for planning and control. Reactive planners offer low computational demands and high re-planning frequencies but lack predictive capabilities, often resulting in local minima. Multi-step planners can make multi-step predictions to reduce deadlocks, but they require substantial computation, resulting in a lower replanning frequency. This paper proposes a novel homotopic trajectory planning framework for a robot swarm that combines centralized homotopic trajectory planning (optimal virtual tube planning) with distributed control, enabling low-computation, high-frequency replanning, thereby uniting the strengths of multi-step and reactive planners. Based on multi-parametric programming, homotopic optimal trajectories are approximated by affine functions. The resulting approximate solutions have computational complexity $O(n_t)$, where $n_t$ is the number of trajectory parameters. This low complexity makes centralized planning of a large number of optimal trajectories practical and, when combined with distributed control, enables rapid, low-cost replanning.} The effectiveness of the proposed method is validated through several simulations and experiments.
comment: 18 pages, 21 figures
Topology-Preserving Scalar Field Optimization for Boundary-Conforming Spiral Toolpaths on Multiply Connected Freeform Surfaces
Multiply connected freeform surface features are widely encountered in industrial components, where toolpath generation often suffers from discontinuities, sharp turns, non-uniform scallop heights, and incomplete boundary coverage. This paper proposes a scalar-field variational optimization method for milling that produces continuous, boundary-conforming, and non-self-intersecting toolpaths with smoother transitions, more uniform spacing, and reduced redundant path length. A feasible singularity-free initial scalar field with boundary-conforming iso-level sets is first constructed via conformal slit mapping. The optimization is then reformulated as a topology-preserving mesh deformation process governed by boundary-synchronous updates, whereby the continuity, boundary-conformity, and non-self-intersection requirements of the toolpath are converted into mesh-shape constraints maintained throughout the iterative optimization. As a result, the proposed method achieves globally optimized path spacing and improved scallop-height uniformity while preserving trajectory smoothness. Milling experiments show that, compared with a state-of-the-art conformal slit mapping-based method, the proposed approach improves machining efficiency by 14.24%, enhances scallop-height uniformity by 5.70%, and reduces milling impact-induced vibrations by over 10%. The proposed strategy provides an effective solution for high-performance machining of complex multiply connected freeform components.
comment: Reorganized the manuscript and added more detailed explanations of the workflow and multiple case studies
Self-Supervised Multisensory Pretraining for Contact-Rich Robot Reinforcement Learning
Effective contact-rich manipulation requires robots to synergistically leverage vision, force, and proprioception. However, Reinforcement Learning agents struggle to learn in such multisensory settings, especially amidst sensory noise and dynamic changes. We propose MultiSensory Dynamic Pretraining (MSDP), a novel framework for learning expressive multisensory representations tailored for task-oriented policy learning. MSDP is based on masked autoencoding and trains a transformer-based encoder by reconstructing multisensory observations from only a subset of sensor embeddings, leading to cross-modal prediction and sensor fusion. For downstream policy learning, we introduce a novel asymmetric architecture, where a cross-attention mechanism allows the critic to extract dynamic, task-specific features from the frozen embeddings, while the actor receives a stable pooled representation to guide its actions. Our method demonstrates accelerated learning and robust performance under diverse perturbations, including sensor noise, and changes in object dynamics. Evaluations in multiple challenging, contact-rich robot manipulation tasks in simulation and the real world showcase the effectiveness of MSDP. Our approach exhibits strong robustness to perturbations and achieves high success rates on the real robot with as few as 6,000 online interactions, offering a simple yet powerful solution for complex multisensory robotic control. Website: https://msdp-pearl.github.io/
comment: 8 pages, 11 figures
OmniVLA-RL: A Vision-Language-Action Model with Spatial Understanding and Online RL
Visual-Language-Action (VLA) models represent a paradigm shift in embodied AI, yet existing frameworks often struggle with imprecise spatial perception, suboptimal multimodal fusion, and instability in reinforcement learning. To bridge these gaps, we propose OmniVLA-RL, a novel architecture that leverages a Mix-of-Transformers (MoT) design to synergistically integrate reasoning, spatial, and action experts. Furthermore, we introduce Flow-GSPO, which reformulates flow matching as a Stochastic Differential Equation (SDE) process and integrates it with Group Segmented Policy Optimization (GSPO) to enhance action precision and training robustness. Extensive evaluations on the LIBERO and LIBERO-Plus benchmarks demonstrate that OmniVLA-RL achieves decent overall performance and surpasses mainstream existing methods, effectively overcoming the fundamental limitations of current VLA models.
Altitude-Adaptive Vision-Only Geo-Localization for UAVs in GPS-Denied Environments
To address the scale mismatch caused by large altitude variations in UAV visual place recognition, we propose a monocular vision-only altitude-adaptive geo-localization framework. The method first estimates relative altitude from a single downward-looking image by transforming the input into the frequency domain and formulating altitude estimation as a regression-as-classification (RAC) problem. The estimated altitude is then used to crop the query image to a canonical scale, after which a classification-then-retrieval visual place recognition module performs coarse localization. To improve retrieval robustness under varying image quality, we further introduce a quality-adaptive margin classifier (QAMC) and refine the final location by weighted coordinate estimation over the top retrieved candidates. Experiments on two synthetic datasets and two real-flight datasets show that the relative altitude estimation (RAE) module yields clear overall improvements in downstream retrieval performance under significant altitude changes. With our visual place recognition module, altitude adaptation improves average R@1 and R@5 by 41.50 and 56.83 percentage points, respectively, compared with using the same retrieval pipeline without altitude normalization, and the full system runs at 13.3 frames/s on the reported workstation hardware. These results indicate that relative altitude estimation provides an effective scale prior for cross-altitude UAV geo-localization and supports GPS-denied coarse initialization without auxiliary range sensors or temporal inputs.
Using Unwrapped Full Color Space Recording to Measure the Exposedness of Vehicle Exterior Parts for External Human Machine Interfaces
One of the concerns with autonomous vehicles is their ability to communicate their intent to other road users, specially pedestrians, in order to prevent accidents. External Human-Machine Interfaces (eHMIs) are the proposed solution to this issue, through the introduction of electronic devices on the exterior of a vehicle that communicate when the vehicle is planning on slowing down or yielding. This paper uses the technique of unwrapping the faces of a mesh onto a texture where every pixel is a unique color, as well as a series of animated simulations made and ran in the Unity game engine, to measure how many times is each point on a 2015 Ford F-150 King Ranch is unobstructed to a pedestrian attempting to cross the road at a four-way intersection. By cross-referencing the results with a color-coded map of the labeled parts on the exterior of the vehicle, it was concluded that while the bumper, grill, and hood were the parts of the vehicle visible to the crossing pedestrian most often, the existence of other vehicles on the same lane that might obstruct the view of these makes them insufficient. The study recommends instead a distributive approach to eHMIs by using both the windshield and frontal fenders as simultaneous placements for these devices.
comment: 10 pages, 13 figures
Clutter-Robust Vision-Language-Action Models through Object-Centric and Geometry Grounding
Recent Vision-Language-Action (VLA) models have made impressive progress toward general-purpose robotic manipulation by post-training large Vision-Language Models (VLMs) for action prediction. Yet most VLAs entangle perception and control in a monolithic pipeline optimized purely for action, which can erode language-conditioned grounding. In our real-world tabletop tests, policies over-grasp when the target is absent, are distracted by clutter, and overfit to background appearance. To address these issues, we propose OBEYED-VLA (OBject-centric and gEometrY groundED VLA), a framework that explicitly disentangles perceptual grounding from action reasoning. Instead of operating directly on raw RGB, OBEYED-VLA augments VLAs with a perception module that grounds multi-view inputs into task-conditioned, object-centric, and geometry-aware observations. This module includes a VLM-based object-centric grounding stage that selects task-relevant object regions across camera views, along with a complementary geometric grounding stage that emphasizes the 3D structure of these objects over their appearance. The resulting grounded views are then fed to a pretrained VLA policy, which we fine-tune exclusively on single-object demonstrations collected without environmental clutter or non-target objects. On a real-world UR10e tabletop setup, OBEYED-VLA substantially improves robustness over strong VLA baselines across four challenging regimes and multiple difficulty levels: distractor objects, absent-target rejection, background appearance changes, and cluttered manipulation of unseen objects. Ablation studies confirm that both semantic grounding and geometry-aware grounding are critical to these gains. Overall, the results indicate that making perception an explicit, object-centric component is an effective way to strengthen and generalize VLA-based robotic manipulation.
comment: Under review. Project website: https://uark-aicv.github.io/OBEYED_VLA
Policy Contrastive Decoding for Robotic Foundation Models ICLR 2026
Robotic foundation models, or generalist robot policies, hold immense potential to enable flexible, general-purpose and dexterous robotic systems. Despite their advancements, our empirical experiments reveal that existing robot policies are prone to learning spurious correlations from pre-training trajectories, adversely affecting their generalization capabilities beyond the training data. To tackle this, we propose a novel Policy Contrastive Decoding (PCD) approach, which redirects the robot policy's focus toward object-relevant visual clues by contrasting action probability distributions derived from original and object-masked visual inputs. As a training-free method, our PCD can be used as a plugin to improve different types of robot policies without needing to finetune or access model weights. We conduct extensive experiments on top of three open-source robot policies, including the autoregressive policy OpenVLA and the diffusion-based policies Octo and $π_0$. The obtained results in both simulation and real-world environments prove PCD's flexibility and effectiveness, e.g., PCD enhances the state-of-the-art policy $π_0$ by 8.9% in the simulation environment and by 108% in the real-world environment. Code and demos are publicly available at: https://koorye.github.io/PCD.
comment: ICLR 2026. Project website: https://koorye.github.io/PCD/
SANDO: Safe Autonomous Trajectory Planning for Dynamic Unknown Environments
SANDO is a safe trajectory planner for 3D dynamic unknown environments, where obstacle locations and motions are unknown a priori and a collision-free plan can become unsafe at any moment, requiring fast replanning. Existing soft-constraint planners are fast but cannot guarantee collision-free paths, while hard-constraint methods ensure safety at the cost of longer computation. SANDO addresses this trade-off through three contributions. First, a heat map-based A* global planner steers paths away from high-risk regions using soft costs, and a spatiotemporal safe flight corridor (STSFC) generator produces time-layered polytopes that inflate obstacles only by their worst-case reachable set at each time layer, rather than by the worst case over the entire horizon. Second, trajectory optimization is formulated as a Mixed-Integer Quadratic Program (MIQP) with hard collision-avoidance constraints, and a variable elimination technique reduces the number of decision variables, enabling fast computation. Third, a formal safety analysis establishes collision-free guarantees under explicit velocity-bound and estimation-error assumptions. Ablation studies show that variable elimination yields up to 7.4x speedup in optimization time, and that STSFCs are critical for feasibility in dense dynamic environments. Benchmark simulations against state-of-the-art methods across standardized static benchmarks, obstacle-rich static forests, and dynamic environments show that SANDO consistently achieves the highest success rate with no constraint violations across all difficulty levels; perception-only experiments without ground truth obstacle information confirm robust performance under realistic sensing. Hardware experiments on a UAV with fully onboard planning, perception, and localization demonstrate six safe flights in static environments and ten safe flights among dynamic obstacles.
comment: 20 pages, 17 figures
Defining an Evaluation Method for External Human-Machine Interfaces
As the number of fatalities involving Autonomous Vehicles increase, the need for a universal method of communicating between vehicles and other agents on the road has also increased. Over the past decade, numerous proposals of external Human-Machine Interfaces (eHMIs) have been brought forward with the purpose of bridging this communication gap, with none yet to be determined as the ideal one. This work proposes a universal evaluation method conformed of 223 questions to objectively evaluate and compare different proposals and arrive at a conclusion. The questionnaire is divided into 7 categories that evaluate different aspects of any given proposal that uses eHMIs: ease of standardization, cost effectiveness, accessibility, ease of understanding, multifacetedness in communication, positioning, and readability. In order to test the method it was used on four existing proposals, plus a baseline using only kinematic motions, in order to both exemplify the application of the evaluation method and offer a baseline score for future comparison. The result of this testing suggests that the ideal method of machine-human communication is a combination of intentionally-designed vehicle kinematics and distributed well-placed text-based displays, but it also reveals knowledge gaps in the readability of eHMIs and the speed at which different observers may learn their meaning. This paper proposes future work related to these uncertainties, along with future testing with the proposed method.
comment: 62 pages, 8 figures, 26 tables,
OREN: Octree Residual Network for Real-Time Euclidean Signed Distance Mapping
Reconstructing signed distance functions (SDFs) from point cloud data benefits many robot autonomy capabilities, including localization, mapping, motion planning, and control. Methods that support online and large-scale SDF reconstruction often rely on discrete volumetric data structures, which affects the continuity and differentiability of the SDF estimates. Neural network methods have demonstrated high-fidelity differentiable SDF reconstruction but they tend to be less efficient, experience catastrophic forgetting and memory limitations in large environments, and are often restricted to truncated SDF. This work proposes OREN, a hybrid method that combines an explicit prior from octree interpolation with an implicit residual from neural network regression. Our method achieves non-truncated (Euclidean) SDF reconstruction with computational and memory efficiency comparable to volumetric methods and differentiability and accuracy comparable to neural network methods. Extensive experiments demonstrate that OREN outperforms the state of the art in terms of accuracy and efficiency, providing a scalable solution for downstream tasks in robotics and computer vision.
FeudalNav: A Simple Framework for Visual Navigation
Visual navigation for robotics is inspired by the human ability to navigate environments using visual cues and memory, eliminating the need for detailed maps. In unseen, unmapped, or GPS-denied settings, traditional metric map-based methods fall short, prompting a shift toward learning-based approaches with minimal exploration. In this work, we develop a hierarchical framework that decomposes the navigation decision-making process into multiple levels. Our method learns to select subgoals through a simple, transferable waypoint selection network. A key component of the approach is a latent-space memory module organized solely by visual similarity, as a proxy for distance. This alternative to graph-based topological representations proves sufficient for navigation tasks, providing a compact, light-weight, simple-to-train navigator that can find its way to the goal in novel locations. We show competitive results with a suite of SOTA methods in Habitat AI environments without using any odometry in training or inference. An additional contribution leverages the interpretablility of the framework for interactive navigation. We consider the question: how much direction intervention/interaction is needed to achieve success in all trials? We demonstrate that even minimal human involvement can significantly enhance overall navigation performance.
comment: 8 Pages, 6 figures and 4 tables. arXiv admin note: substantial text overlap with arXiv:2411.09893, arXiv:2402.12498
$π_{0.7}$: a Steerable Generalist Robotic Foundation Model with Emergent Capabilities
We present a new robotic foundation model, called $π_{0.7}$, that can enable strong out-of-the-box performance in a wide range of scenarios. $π_{0.7}$ can follow diverse language instructions in unseen environments, including multi-stage tasks with various kitchen appliances, provide zero-shot cross-embodiment generalization, for example enabling a robot to fold laundry without seeing the task before, and perform challenging tasks such as operating an espresso machine out of the box at a level of performance that matches much more specialized RL-finetuned models. The main idea behind $π_{0.7}$ is to use diverse context conditioning during training. This conditioning information, contained in the prompt, makes it possible to steer the model precisely to perform many tasks with different strategies. It is conditioned not just on a language command that describes what it should do, but on additional multimodal information that also describes the manner or strategy in which it should do it, including metadata about task performance and subgoal images. This enables $π_{0.7}$ to use very diverse data, including demonstrations, potentially suboptimal (autonomous) data including failures, and data from non-robot sources. Our experiments evaluate $π_{0.7}$ across numerous tasks with multiple robot platforms, on tasks that require speed and dexterity, language following, and compositional task generalization.
comment: Website: https://www.pi.website/blog/pi07
SMP: Reusable Score-Matching Motion Priors for Physics-Based Character Control SIGGRAPH 2026
Data-driven motion priors that can guide agents toward producing naturalistic behaviors play a pivotal role in creating life-like virtual characters. Adversarial imitation learning has been a highly effective method for learning motion priors from reference motion data. However, adversarial priors, with few exceptions, need to be retrained for each new controller, thereby limiting their reusability and necessitating the retention of the reference motion data when applied to downstream tasks. In this work, we present Score-Matching Motion Priors (SMP), which leverages pre-trained motion diffusion models and score distillation sampling (SDS) to create reusable task-agnostic motion priors. SMPs can be pre-trained on a motion dataset, independent of any control policy or task. Once trained, SMPs can be kept frozen and reused as general-purpose reward functions to train new policies to produce naturalistic behaviors for downstream tasks. We show that a general motion prior trained on large-scale datasets can be repurposed into a variety of style-specific priors. Furthermore, SMP can compose different styles to synthesize new styles not present in the original dataset. Our method can create reusable and modular motion priors that produce high-quality motions comparable to state-of-the-art adversarial imitation learning methods. In our experiments, we demonstrate the effectiveness of SMP across a diverse suite of control tasks with physically simulated humanoid characters. Video available at https://youtu.be/jBA2tWk6vzU
comment: To appear in ACM Transactions on Graphics (SIGGRAPH 2026)
U-ViLAR: Uncertainty-Aware Visual Localization for Autonomous Driving via Differentiable Association and Registration
Accurate localization using visual information is a critical yet challenging task, especially in urban environments where nearby buildings and construction sites significantly degrade GNSS (Global Navigation Satellite System) signal quality. This issue underscores the importance of visual localization techniques in scenarios where GNSS signals are unreliable. This paper proposes U-ViLAR, a novel uncertainty-aware visual localization framework designed to address these challenges while enabling adaptive localization using high-definition (HD) maps or navigation maps. Specifically, our method first extracts features from the input visual data and maps them into Bird's-Eye-View (BEV) space to enhance spatial consistency with the map input. Subsequently, we introduce: a) Perceptual Uncertainty-guided Association, which mitigates errors caused by perception uncertainty, and b) Localization Uncertainty-guided Registration, which reduces errors introduced by localization uncertainty. By effectively balancing the coarse-grained large-scale localization capability of association with the fine-grained precise localization capability of registration, our approach achieves robust and accurate localization. Experimental results demonstrate that our method achieves state-of-the-art performance across multiple localization tasks. Furthermore, our model has undergone rigorous testing on large-scale autonomous driving fleets and has demonstrated stable performance in various challenging urban scenarios.
comment: Vision Localization, Autonomous Driving, Bird's-Eye-View
DriVerse: Navigation World Model for Driving Simulation via Multimodal Trajectory Prompting and Motion Alignment
This paper presents DriVerse, a generative model for simulating navigation-driven driving scenes from a single image and a future trajectory. Previous autonomous driving world models either directly feed the trajectory or discrete control signals into the generation pipeline, leading to poor alignment between the control inputs and the implicit features of the 2D base generative model, which results in low-fidelity video outputs. Some methods use coarse textual commands or discrete vehicle control signals, which lack the precision to guide fine-grained, trajectory-specific video generation, making them unsuitable for evaluating actual autonomous driving algorithms. DriVerse introduces explicit trajectory guidance in two complementary forms: it tokenizes trajectories into textual prompts using a predefined trend vocabulary for seamless language integration, and converts 3D trajectories into 2D spatial motion priors to enhance control over static content within the driving scene. To better handle dynamic objects, we further introduce a lightweight motion alignment module, which focuses on the inter-frame consistency of dynamic pixels, significantly enhancing the temporal coherence of moving elements over long sequences. With minimal training and no need for additional data, DriVerse outperforms specialized models on future video generation tasks across both the nuScenes and Waymo datasets. The code and models will be released to the public.
comment: 13 pages, 5 figures
Multiagent Systems
Seeing the Whole Elephant: A Benchmark for Failure Attribution in LLM-based Multi-Agent Systems ACL 2026
Failure attribution, i.e., identifying the responsible agent and decisive step of a failure, is particularly challenging in LLM-based multi-agent systems (MAS) due to their natural-language reasoning, nondeterministic outputs, and intricate interaction dynamics. A reliable benchmark is therefore essential to guide and evaluate attribution techniques. Yet existing benchmarks rely on partially observable traces that capture only agent outputs, omitting the inputs and context that developers actually use when debugging. We argue that failure attribution should be studied under full execution observability, aligning with real-world developer-facing scenarios where complete traces, rather than only outputs, are accessible for diagnosis. To this end, we introduce TraceElephant, a benchmark designed for failure attribution with full execution traces and reproducible environments. We then systematically evaluate failure attribution techniques across various configurations. Specifically, full traces improve attribution accuracy by up to 76\% over a partial-observation counterpart, confirming that missing inputs obscure many failure causes. TraceElephant provides a foundation for follow-up failure attribution research, promoting evaluation practices that reflect real-world debugging and supporting the development of more transparent MASs.
comment: Accepted by ACL 2026
AgentSearchBench: A Benchmark for AI Agent Search in the Wild
The rapid growth of AI agent ecosystems is transforming how complex tasks are delegated and executed, creating a new challenge of identifying suitable agents for a given task. Unlike traditional tools, agent capabilities are often compositional and execution-dependent, making them difficult to assess from textual descriptions alone. However, existing research and benchmarks typically assume well-specified functionalities, controlled candidate pools, or only executable task queries, leaving realistic agent search scenarios insufficiently studied. We introduce AgentSearchBench, a large-scale benchmark for agent search in the wild, built from nearly 10,000 real-world agents across multiple providers. The benchmark formalizes agent search as retrieval and reranking problems under both executable task queries and high-level task descriptions, and evaluates relevance using execution-grounded performance signals. Experiments reveal a consistent gap between semantic similarity and actual agent performance, exposing the limitations of description-based retrieval and reranking methods. We further show that lightweight behavioral signals, including execution-aware probing, can substantially improve ranking quality, highlighting the importance of incorporating execution signals into agent discovery. Our code is available at https://github.com/Bingo-W/AgentSearchBench.
Fast Neural-Network Approximation of Active Target Search Under Uncertainty
We address the problem of searching for an unknown number of stationary targets at unknown positions with a mobile agent. A probability hypothesis density filter is used to estimate the expected number of targets under measurement uncertainty. Existing planners, such as Active Search (AS) and its Intermittent variant (ASI), achieve accurate detection but require costly online optimization. To reduce online computation, we propose to use a convolutional neural network to approximate AS or ASI decisions through direct inference. The network is trained on AS/ASI data using a multi-channel grid that encodes target beliefs, the agent position, visitation history, and boundary information. Simulations with uniform and clustered target distributions show that the network achieves detection rates comparable to AS or ASI while reducing computation by orders of magnitude.
V-STC: A Time-Efficient Multi-Vehicle Coordinated Trajectory Planning Approach
Coordinating the motions of multiple autonomous vehicles (AVs) requires planning frameworks that ensure safety while making efficient use of space and time. This paper presents a new approach, termed variable-time-step spatio-temporal corridor (V-STC), that enhances the temporal efficiency of multi-vehicle coordination. An optimization model is formulated to construct a V-STC for each AV, in which both the spatial configuration of the corridor cubes and their time durations are treated as decision variables. By allowing the corridor's spatial position and time step to vary, the constructed V-STC reduces the overall temporal occupancy of each AV while maintaining collision-free separation in the spatio-temporal domain. Based on the generated V-STC, a dynamically feasible trajectory is then planned independently for each AV. Simulation studies demonstrate that the proposed method achieves safe multi-vehicle coordination and yields more time-efficient motion compared with existing STC approaches.
comment: 12 pages, 23 figures
Peer Identity Bias in Multi-Agent LLM Evaluation: An Empirical Study Using the TRUST Democratic Discourse Analysis Pipeline
The TRUST democratic discourse analysis pipeline exposes its large language model (LLM) components to peer model identity through multiple structural channels -- a design feature whose bias implications have not previously been empirically tested. We provide the first systematic measurement of identity-dependent scoring bias across all active identity exposure channels in TRUST, crossing four model families with two anonymization scopes across 30 political statements. The central finding is that single-channel anonymization produces near-zero bias effects, because individual channels act in opposite directions and cancel each other out -- a result that would lead an evaluator to conclude that identity bias is absent when it is not. Only full-pipeline anonymization reveals the true pattern: homogeneous ensembles amplify identity-driven sycophancy when model identity is fully visible, while the heterogeneous production configuration shows the reverse. Model choice matters independently: one tested model exhibits baseline sycophancy two to three times higher than the others and near-zero deliberative conflict on ideological topics, making it structurally unsuitable for pipelines where genuine inter-role disagreement is the intended quality mechanism. Three practical conclusions follow. First, heterogeneous model ensembles are structurally more robust than homogeneous ones, achieving higher consensus rates and lower identity amplification. Second, full-pipeline anonymization is required for valid bias measurement -- partial anonymization is insufficient and actively misleading. Third, these findings have direct implications for the validation of multi-agent LLM systems in quality-critical applications: a system validated under partial anonymization or with a homogeneous ensemble may pass validation while retaining structural identity bias invisible to single-channel measurement.
comment: 13 pages, 1 figure
Beyond Single-Agent Alignment: Preventing Context-Fragmented Violations in Multi-Agent Systems
We identify and formalize a novel security risk: Context-Fragmented Violations (CFVs) - a class of policy breaches where individual agent actions appear locally safe and reasonable, yet collectively violate organizational policies because critical policy facts are siloed in different departments private contexts. Existing prompt-based alignment mechanisms and monolithic interceptors are poorly matched to violations that span contextual islands. We propose Distributed Sentinel, a distributed zero-trust enforcement architecture that introduces the Semantic Taint Token (STT) Protocol. Through lightweight sidecar proxies, our system propagates security state across organizational boundaries without exposing raw cross-domain data, enabling Counterfactual Graph Simulation for cross-domain policy verification. We construct PhantomEcosystem, a comprehensive benchmark comprising 9 categories of realistic cross-agent violation scenarios with adversarially balanced safe controls. On this benchmark, Distributed Sentinel achieves F1 = 0.95 with 106ms end-to-end latency (16ms verification + 90ms entity extraction on A100), compared to 0.85 F1 for prompt-based filtering and 0.65 for rule-based DLP. To empirically validate the need for external enforcement, we evaluate eight frontier LLMs in execution-oriented multi-agent workflows with per-agent domain world models. All models exhibit substantial violation rates (14-98%), with cross-domain data flows showing systematically higher violation rates than same-domain flows. These results indicate that self-avoidance is unreliable and that multi-agent security benefits from a centralized enforcement layer operating above individual agents.
comment: 34 pages, 3 figures, 20 tables
Open-Ended Video Game Glitch Detection with Agentic Reasoning and Temporal Grounding
Open-ended video game glitch detection aims to identify glitches in gameplay videos, describe them in natural language, and localize when they occur. Unlike conventional game glitch understanding tasks which have largely been framed as image-level recognition or closed-form question answering, this task requires reasoning about game-specific dynamics such as mechanics, physics, rendering, animation, and expected state transitions directly over continuous gameplay videos and distinguishing true glitches from unusual but valid in-game events. To support this task, we introduce VideoGlitchBench, the first benchmark for open-ended video game glitch detection with temporal localization. VideoGlitchBench contains 5,238 gameplay videos from 120 games, each annotated with detailed glitch descriptions and precise temporal spans, enabling unified evaluation of semantic understanding and temporal grounding. We further propose GliDe, an agentic framework with three key components: a game-aware contextual memory for informed reasoning, a debate-based reflector for multi-perspective glitch detection and verification, and an event-level grounding module that recovers complete glitch intervals from fragmented temporal evidence. We also design a task-specific evaluation protocol that jointly measures semantic fidelity and temporal accuracy. Experiments show that this task remains highly challenging for current multimodal models, while GliDe achieves substantially stronger performance than corresponding vanilla model baselines.
comment: 16 pages, 10 figures, under review
When AI Agents Learn from Each Other: Insights from Emergent AI Agent Communities on OpenClaw for Human-AI Partnership in Education
The AIED community envisions AI evolving "from tools to teammates," yet most research still examines AI agents primarily through one-on-one human-AI interactions. We provide an alternative perspective: a rapidly growing ecosystem of AI agent platforms where over 167,000 agents participate, interact as peers, and develop learning behaviors without researcher intervention. Based on a month of daily qualitative observations across multiple platforms including Moltbook, The Colony, and 4claw, we identify four phenomena with implications for AIED: (1) humans who configure their agents undergo a "bidirectional scaffolding" process, learning through teaching; (2) peer learning emerges without any designed curriculum, including sharing concrete agent artifacts such as skills, workflows, and reusable routines; (3) agents converge on shared memory architectures that mirror open learner model design; and (4) trust dynamics, reliance risks, and platform mortality reveal design constraints for networked educational AI. Rather than presenting empirical findings, we argue that these organic phenomena offer a naturalistic window into dynamics that can inform principled design of multi-agent educational systems. We sketch an illustrative curriculum design, "Learning with Your AI Agent Tutor," and outline potential research directions and open problems to show how these observations might inform future AIED practice and inquiry.
comment: 15 pages. Paper accepted at AIED 2026 bluesky
Systems and Control (EESS)
A Vehicle Routing Problem for Human-Centered Electric Mobility
In this paper, we present the Electric Mobility Dial-a-Ride Problem (EM-DARP), which extends the Electric Vehicle Dial-a-Ride Problem (EV-DARP) to better accommodate human-focused mobility services. The problem involves utilizing a fleet of heterogeneous Electric Vehicles (EVs) to fulfill a set of customer requests with DARP and mobility-related specifications, while incorporating visits to charging stations amid requests. The problem is formulated as a Mixed-Integer Linear Program (MILP) and subsequently solved for a number of curated evaluation scenarios to demonstrate its practical applicability.
comment: 7 pages, 5 figures, standard IEEE double-column format
GCImOpt: Learning efficient goal-conditioned policies by imitating optimal trajectories
Imitation learning is a well-established approach for machine-learning-based control. However, its applicability depends on having access to demonstrations, which are often expensive to collect and/or suboptimal for solving the task. In this work, we present GCImOpt, an approach to learn efficient goal-conditioned policies by training on datasets generated by trajectory optimization. Our approach for dataset generation is computationally efficient, can generate thousands of optimal trajectories in minutes on a laptop computer, and produces high-quality demonstrations. Further, by means of a data augmentation scheme that treats intermediate states as goals, we are able to increase the training dataset size by an order of magnitude. Using our generated datasets, we train goal-conditioned neural network policies that can control the system towards arbitrary goals. To demonstrate the generality of our approach, we generate datasets and then train policies for various control tasks, namely cart-pole stabilization, planar and three-dimensional quadcopter stabilization, and point reaching using a 6-DoF robot arm. We show that our trained policies can achieve high success rates and near-optimal control profiles, all while being small (less than 80,000 neural network parameters) and fast enough (up to more than 6,000 times faster than a trajectory optimization solver) that they could be deployed onboard resource-constrained controllers. We provide videos, code, datasets and pre-trained policies under a free software license; see our project website https://jongoiko.github.io/gcimopt/.
comment: Accepted for publication at the 8th Annual Conference on Learning for Dynamics and Control (L4DC 2026). 16 pages (including appendix), 1 figure. For project website, see https://jongoiko.github.io/gcimopt/
Compositional Online Learning for Multi-Objective System Co-Design
Many engineered systems must balance competing objectives, such as performance and safety, cost and reliability, or efficiency and sustainability, and are naturally modeled as compositions of interacting subsystems. We study online multi-objective decision-making in monotone co-design, where functionalities and resources are partially ordered, and the goal is to identify the target-feasible antichain of non-dominated trade-offs using few expensive evaluations. We introduce optimistic evaluators: history-dependent bounds on functionality and resource mappings that enable safe elimination of implementations before full evaluation. Based on these evaluators, we develop an elimination-based rejection-sampling algorithm, prove its soundness, and show that the admissible region shrinks monotonically as information accumulates. We instantiate the framework under monotonicity, Lipschitz continuity, and linear-parametric structure. For compositional co-design problems modeled by multigraphs, we show how local optimistic certificates propagate through the tractable remainder of the graph to yield system-level optimistic feasibility and resource bounds. Experiments on multi-robot fleet design, intermodal mobility systems, and synthetic monotone and Lipschitz benchmarks show substantial sample-efficiency gains over uniform sampling, Bayesian optimization, and multi-objective evolutionary algorithms.
Multi-robot obstacle-aware shepherding of non-cohesive target agents ICRA 2026
This paper presents a novel control strategy for multi-agent shepherding of non-cohesive targets in obstacle-rich environments. Unlike previous approaches that assume cohesive flocking behavior, our method handles targets that interact only with nearby herders through repulsive forces and exhibit no inter-target coordination. Each herder employs a hybrid control policy that combines direct goal-oriented steering with obstacle-tangent maneuvering, enabling targets to circumnavigate obstacles while being guided toward a goal region. The herder dynamics integrate three key behaviors: return-to-goal motion when idle, target steering with adaptive directional control, and obstacle avoidance using both normal and tangential force components. Numerical simulations demonstrate superior performance compared to existing shepherding methods, achieving higher target confinement rates in cluttered environments. Experimental validation using TurtleBot4 herders and Osoyoo target robots in an indoor arena confirms the practical effectiveness of the proposed approach.
comment: Accepted at ICRA 2026
Strategically Robust Linear Quadratic Dynamic Games
We study linear quadratic dynamic games where players are uncertain about each other's control policies or goals and consequently seek to be strategically robust. Building on recent work on strategically robust and risk-averse game theory, we first formalize the problem of strategically robust linear quadratic dynamic games. We show that these can be rewritten as simple transformations of linear quadratic games in which each player chooses a controller in a fictitious game in which they are faced with an adversary who is penalized for deviating from the other players' policies. This formulation naturally induces a novel notion of dynamic equilibrium, which we call a strategically robust dynamic equilibrium. We establish existence and uniqueness of such equilibria and furthermore show that the equilibrium policies are Markovian, linear, and can be efficiently computed via coupled backward Riccati equations. Through numerical simulations, including experiments in a network game, we illustrate the benefits of strategic robustness in designing robust and resilient decentralized control schemes. Our experiments also expose a "free-lunch" phenomenon in games in which robustness does not incur a corresponding loss in performance but can yield improvements in players' utilities and social welfare.
comment: 6 pages, 5 figures, 2 tables. Submitted to the 2026 IEEE Conference on Decision and Control (CDC)
Control of Multi-agent Systems under STL Specifications based on Prescribed Performance Observers
This paper addresses decentralized control of large-scale heterogeneous multi-agent systems subject to bounded external disturbances and limited communication, with the objective of satisfying cooperative Signal Temporal Logic (STL) specifications. The considered specifications involve spatiotemporal tasks that require collaboration among multiple agents, including agents beyond direct communication neighborhoods. To address the communication constraints, a $k$-hop Prescribed Performance State Observer ($k$-hop PPSO) is designed to enable each agent to estimate the states of agents up to $k$ communication hops away using only information from $1$-hop neighbors, while guaranteeing predefined performance bounds on the estimation errors. The estimation error bounds are explicitly incorporated into a reformulation of the spatial robustness of the STL specifications, yielding robustness measures that account for worst-case estimation uncertainty. Based on the modified robustness, a decentralized continuous-time feedback control law is designed to guarantee satisfaction of the STL specifications in the presence of bounded disturbances and estimation errors. The proposed framework provides formal correctness guarantees using only local information and limited communication. Numerical simulations illustrate the theoretical results.
comment: arXiv admin note: text overlap with arXiv:2602.05586. text overlap with arXiv:2602.05586
Optimal sequential decision-making for error propagation mitigation in digital twins
Here, we explore the problem of error propagation mitigation in modular digital twins as a sequential decision process. Building on a companion study that used a Hidden Markov Model (HMM) to infer latent error regimes from surrogate-physics residuals, we develop a Markov Decision Process (MDP) in which the inferred regimes serve as states, corrective interventions serve as actions, and a scalar reward that takes into consideration the cost-benefit tradeoff between system fidelity and maintenance expense. The baseline transition matrix is extracted from the HMM-learned parameters. We then extend the formulation to a Partially Observable MDP (POMDP) that accounts for the imperfect nature of regime classification by maintaining a belief distribution updated via Bayesian filtering, with the HMM confusion matrix serving as the observation model. Both formulations are solved via dynamic programming and validated through Gillespie stochastic simulation. We then benchmark two model-free reinforcement learning algorithms, Q-learning and REINFORCE, to assess whether effective policies can be learned without explicit model knowledge. A systematic comparison of different intervention policies demonstrates that the MDP policy achieves the highest cumulative reward and fraction of time in nominal operation, while the POMDP recovers approximately 95\% of MDP performance under realistic observation noise. Sensitivity analyses across observation quality, repair probability, and discount factor confirm the robustness of these conclusions, and the major gaps in the policy hierarchy are statistically significant at $p < 0.001$. The gap between MDP and POMDP performance quantifies the value of information providing a principled criterion for investing in improved classification accuracy.
Sampling-Based Safety Filter with Probabilistic Restrictiveness Guarantee
Ensuring safety is a critical requirement for autonomous systems, yet providing formal guarantees for nominal controllers remains a significant challenge. In this paper, we propose a modular sampling-based safety filter to ensure the safety of arbitrary nominal control inputs. At each timestep, the filter evaluates the safety of the nominal input by leveraging control sequence samples generated via Stein Variational Model Predictive Control (SV-MPC). This approach approximates a safety-conditioned posterior distribution over control sequences, enabling the filter to effectively capture multimodal safe regions in complex, non-convex environments. The filter guarantees safety by overriding the nominal input when all sampled control sequence candidates are deemed unsafe. By leveraging the scenario approach, the proposed method provides a probabilistic guarantee on its restrictiveness. We validate the filter through collision avoidance tasks in both single- and multi-vehicle settings, demonstrating its efficacy in navigating cluttered environments where nominal controllers may fail.
Global Convergence of Policy Gradient Methods for ReLU Controllers in Linear Quadratic Regulation
We study the convergence of model-based policy gradient for the deterministic, scalar, discounted linear-quadratic regulator when the controller is an overparameterized one-hidden-layer ReLU network without biases. Although the optimal LQR controller is linear, neural parameterization creates a redundant nonconvex weight space with a possibly asymmetric piecewise-linear controller. We show that this structure can still be analyzed exactly through the two effective gains induced on the positive and negative half-lines. Under suitable random initialization, sufficient width, and a small step size, the model-based policy gradient remains stable, decreases the cost geometrically, and drives the effective gains to the unique optimal scalar LQR gain with high probability.
Learning the Weather-Grid Nexus via Weather-to-Voltage (W2V) Predictive Modeling
This paper proposes a weather-to-voltage (W2V) predictive modeling framework to learn the underlying weather-grid nexus. Unlike existing approaches on weather-informed grid operations, our proposed W2V model can achieve the joint analysis of weather and grid states, and further leverage this coupling to enhance grid-aware weather forecasting (GAWF) as a key application. To achieve this end-to-end learning, the W2V model acts as a differentiable surrogate for weather-incorporated power flow analysis by mapping weather features at high spatial resolution directly to grid-wide bus voltages. Thanks to a compact neural network design and principal component analysis based initialization, it achieves high voltage prediction accuracy and numerical stability during training. Building on this capability, W2V-based voltage signals are used to guide the development of GAWF that can account for its downstream voltage prediction performance. Using a 6717-bus Texas synthetic test system with meteorological inputs from 701 weather locations, our numerical tests have verified the excellent accuracy and generalizability of the proposed W2V model. More importantly, the W2V model has enabled the GAWF to effectively prioritize the weather features and conditions that are most critical to grid operations, such as system-wide quick wind drops preceding ramp-ups.
comment: 10 pages, 10 figures, Submitted to IEEE Transactions on Sustainable Energy
Probabilistic Hazard Analysis Framework with Stochastic Optimal Control for Deteriorating Civil Infrastructure Systems
The safety and resilience of civil infrastructure systems are increasingly threatened by compounded risks from various hazard events and structural deterioration due to environmental stressors. This study presents a comprehensive risk-informed, life-cycle optimization framework that extends the Performance-Based Earthquake Engineering (PBEE) and probabilistic seismic loss estimation paradigms by combining hazard uncertainties, nonstationary deterioration, structural damage accumulation, and state-dependent fragility assessments, with optimal, adaptive maintenance strategies in time. The life-cycle cost optimization is formulated in this work as a Markov Decision Process (MDP) problem, utilizing derived, transition matrices reflecting time-variant deterioration effects and hazard risks. To mitigate the curse of dimensionality in system-level optimization, a novel tensor-based method exploiting Kronecker-factored transition dynamics is introduced, reducing complexity from exponential to linear in the number of components while still preserving exact, global dynamic programming solutions. Overall, the framework is general and versatile, able to accommodate various hazard types. A seismic hazard application is, however, demonstrated and explained in detail in this work. The developed methodology eventually provides decision-makers with a practical, data-driven tool toward cost effective risk mitigation of civil infrastructure systems.
Integrated Lander-Propulsion-GNC Framework for Autonomous Lunar Powered Descent
This paper presents an integrated lander-propulsion-GNC framework for autonomous lunar powered descent. The BUG VTVL test vehicle serves as the reference platform, with the YUNT V0 throttleable bipropellant engine providing variable thrust across a wide operating envelope, integrated with a real-time successive convexification guidance solver. The vehicle design accounts for structural configuration, landing stability, center-of-mass migration, and inertia evolution, while the propulsion architecture defines the throttle ratio, dead-zone behavior, and gimbal authority that constrain the guidance problem. A successive convexification algorithm addresses all nonconvexities; thrust lower bounds, mass depletion coupling, and thruster dead-zone behavior are all handled within a unified second-order cone program solvable in near-real time. Parametric analysis reveals a fundamental coupling between throttle ratio, pointing authority, and surface gravity. Monte Carlo simulations validate guidance robustness, achieving sub-50-meter landing precision under realistic perturbations.
comment: Accepted at IEEE RAST 2026, 13-15 May, Istanbul, Türkiye. 6 pages. \c{opyright} 2026 IEEE. Personal use permitted; other uses require IEEE permission
Surrogate-Based Co-Design Coupling Analysis for Floating Offshore Wind Turbines
This work presents a design coupling analysis (DCA) framework to investigate the interactions among control and plant design variables in floating offshore wind turbine (FOWT) and to support the formulation of tractable control co-design (CCD) optimization strategies. DCA provides quantitative information that reveals the relationships and dependencies among design variables and to objective function, enabling improved design variable selection, identification of dominant variables that drive system interactions, and informed selection of optimization solution strategies. However, applying DCA to complex systems is challenging because the models used to describe their dynamics are computationally expensive, and constructing DCA information requires exhaustive model evaluations and optimizations. Here, a surrogate model of the FOWT system is employed to make the repeated model evaluations required for DCA computationally feasible. Using this framework, the bidirectional couplings between control and plant design variables, as well as the couplings among plant design variables, are estimated. The results reveal strong interactions among various design variables, and identify the most influential plant design variables affecting system performance. These insights guide the development of two DCA-based optimization strategies for large CCD problems: a sequential decomposition approach that preserves dominant design variable couplings while reducing problem size at each stage, and a reduced dimensional optimization approach that focuses on collectively the most influential variables. The results demonstrate that these strategies significantly reduce computational complexity while achieving solutions comparable to those obtained through full simultaneous optimization, demonstrating the value of DCA for understanding and solving complex design problems.
comment: 11 pages, 8 figures
Home Battery Dispatch under a Tiered Peak Power Tariff
We consider the problem of operating a battery in a home connected to the grid to minimize electricity cost, which combines an energy charge and a tiered peak power charge based on the average of the $N$ largest daily peak powers in each billing month. With perfect foresight of loads and prices, the minimum cost is the solution of a mixed-integer linear program (MILP), which provides a lower bound on the cost of any implementable policy. We propose a model predictive control (MPC) policy that uses simple forecasts of loads and prices and solves a small MILP at each time step. Numerical experiments on one year of data from a home in Trondheim, Norway, show that the MPC policy attains a cost within $1.7\%$ of the prescient bound, and saves close to three times as much as the best rule-based policy we consider.
Differentiable Filtering for Learning Hidden Markov Models
Hidden Markov Models (HMMs) are fundamental for modeling sequential data, yet learning their parameters from observations remains challenging. Classical methods like the Baum-Welch algorithm are computationally intensive and prone to local optima, while modern spectral algorithms offer provable guarantees but may produce probability outputs outside valid ranges. This work introduces Belief Net, a differentiable filtering framework that learns HMM parameters by formulating the forward filter as a structured neural network and optimizing it with stochastic gradient descent. This architecture recursively updates the belief state, which represents the posterior probability distribution over hidden states based on the observation history. Unlike black-box transformer models, Belief Net's learnable weights are explicitly the logits of the initial distribution, transition matrix, and emission matrix, ensuring full interpretability. The model processes observation sequences using a decoder-only (causal) architecture and is trained end-to-end with standard autoregressive next-observation prediction loss. On synthetic HMM data, Belief Net achieves faster convergence than Baum-Welch while successfully recovering parameters in both undercomplete and overcomplete settings, whereas spectral methods prove ineffective in the latter. Comparisons with transformer-based models are also presented on real-world language data.
comment: 20 pages, 8 figures, accepted to conference: L4DC 2026
Nonlinear Stochastic Density Steering via Gaussian Mixture Schrodinger Bridges and Multiple Linearizations
The paper studies the optimal density steering problem for nonlinear continuous-time stochastic systems. To accurately capture nonlinear dynamics in high-uncertainty regions that deviate significantly from a nominal linearization point, we introduce the concept of Multiple Distribution-to-Distribution Linearization. The proposed approach first approximates the boundary distributions using Gaussian Mixture Models (GMMs), and decomposes the original nonlinear problem into a collection of Gaussian-to-Gaussian Optimal Covariance Steering (OCS) subproblems between pairs of mixture components. Each elementary OCS problem is solved via local linearization around the mean trajectory connecting the corresponding initial and terminal Gaussian components. The resulting elementary policies are then combined according to their associated conditional densities. We prove that the proposed multi-linearization approach yields tighter approximation error bounds than single-linearization for a broad class of problems. The effectiveness of the approach is demonstrated through numerical experiments on an Earth-to-Mars orbit transfer scenario.
State-space fading memory
The fading-memory (FM) property captures the progressive loss of influence of past inputs on a system's current output and has originally been formalized by Boyd and Chua in an operator-theoretic framework. Despite its importance for systems approximation, reservoir computing, and recurrent neural networks, its connection with state-space notions of nonlinear stability, especially incremental ones, remains understudied. This paper introduces a state-space definition of FM. In state-space, FM can be interpreted as an extension of incremental input-to-output stability ($δ$IOS) that explicitly incorporates a memory kernel upper-bounding the decay of past input differences. It is also closely related to Boyd and Chua's FM definition, with the sole difference of requiring uniform, instead of general, continuity of the memory functional with respect to an input-fading norm. We demonstrate that incremental input-to-state stability ($δ$ISS) implies FM semi-globally for time-invariant systems under an equibounded input assumption. Notably, Boyd and Chua's approximation theorems apply to $δ$ISS state-space models. As a closing application, we show that, under mild assumptions, the state-space model of current-driven memristors possess the FM property.
comment: 13 pages
Robust Adaptive Sliding-Mode Control for Damaged Fixed-Wing UAVs
Many unmanned aerial vehicles (UAVs) can remain aerodynamically flyable after sustaining structural or control surface damage, yet insufficient robustness in conventional autopilots often leads to mission failure. This paper proposes a robust adaptive sliding mode controller (RASMC) for fixed-wing UAVs subject to aerodynamic coefficient perturbations and partial loss of control surface effectiveness. A damage-aware flight dynamics model is developed to systematically analyze the impact of such impairments on the closed-loop behavior. The RASMC is designed to ensure reliable tracking and stabilization, while a gain adaptation law maintains low control effort under nominal conditions and increases the gains as needed in the presence of aerodynamic damage. Lyapunov-based stability guarantees are derived, and assumptions on admissible uncertainty bounds are formulated to characterize the limits within which closed-loop stability and performance can be ensured. The proposed controller is implemented within an existing UAV autopilot framework, where outer-loop guidance and speed control modules provide reference commands to the RASMC for attitude stabilization. Simulations demonstrate that, despite significant damage, all closed-loop states remain stable with bounded tracking errors.
comment: 8 pages, 2 figures, Accepted for International Conference on Unmanned Aircraft Systems (ICUAS) 2026
An Efficient Real-Time Planning Method for Swarm Robotics Based on an Optimal Virtual Tube
Robot swarms navigating through unknown obstacle environments are an emerging research area that faces challenges. Performing tasks in such environments requires swarms to achieve autonomous localization, perception, decision-making, control, and planning. The limited computational resources of onboard platforms present significant challenges for planning and control. Reactive planners offer low computational demands and high re-planning frequencies but lack predictive capabilities, often resulting in local minima. Multi-step planners can make multi-step predictions to reduce deadlocks, but they require substantial computation, resulting in a lower replanning frequency. This paper proposes a novel homotopic trajectory planning framework for a robot swarm that combines centralized homotopic trajectory planning (optimal virtual tube planning) with distributed control, enabling low-computation, high-frequency replanning, thereby uniting the strengths of multi-step and reactive planners. Based on multi-parametric programming, homotopic optimal trajectories are approximated by affine functions. The resulting approximate solutions have computational complexity $O(n_t)$, where $n_t$ is the number of trajectory parameters. This low complexity makes centralized planning of a large number of optimal trajectories practical and, when combined with distributed control, enables rapid, low-cost replanning.} The effectiveness of the proposed method is validated through several simulations and experiments.
comment: 18 pages, 21 figures
Robust stability of event-triggered nonlinear moving horizon estimation
In this work, we propose an event-triggered moving horizon estimation (ET-MHE) scheme for the remote state estimation of general nonlinear systems. In the presented method, whenever an event is triggered, a single measurement is transmitted and the nonlinear MHE optimization problem is subsequently solved. If no event is triggered, the current state estimate is updated using an open-loop prediction based on the system dynamics. Moreover, we introduce a novel event-triggering rule under which we demonstrate robust global exponential stability of the ET-MHE scheme, assuming a suitable detectability condition is met. In addition, we show that with the adoption of a varying horizon length, a tighter bound on the estimation error can be achieved. Finally, we validate the effectiveness of the proposed method through two illustrative examples.
Sparse shepherding control of large-scale multi-agent systems via Reinforcement Learning
We propose a Reinforcement Learning framework for sparse indirect control of large-scale multi-agent systems, where few controlled agents shape the collective behavior of many uncontrolled agents. The approach addresses this multi-scale challenge by coupling ODEs (modeling controlled agents) with a PDE (describing the uncontrolled population density), capturing how microscopic control achieves macroscopic objectives. Our method combines model-free Reinforcement Learning with adaptive interaction strength compensation to overcome sparse actuation limitations. Numerical validation demonstrates effective density control, with the system achieving target distributions while maintaining robustness to disturbances and measurement noise, confirming that learning-based sparse control can replace computationally expensive online optimization.
Federated Nonlinear System Identification
We consider federated learning of linearly-parameterized nonlinear systems. We establish theoretical guarantees on the effectiveness of federated nonlinear system identification compared to centralized approaches, demonstrating that the convergence rate improves as the number of clients increases. Although the convergence rates in the linear and nonlinear cases differ only by a constant, this constant depends on the feature map $φ$, which can be carefully chosen in the nonlinear setting to increase excitation and improve performance. We experimentally validate our theory in physical settings where client devices are driven by i.i.d. control inputs and control policies exhibiting i.i.d. random perturbations, ensuring non-active exploration. Experiments use trajectories from nonlinear dynamical systems characterized by real-analytic feature functions, including polynomial and trigonometric components, representative of physical systems including pendulum and quadrotor dynamics. We analyze the convergence behavior of the proposed method under varying noise levels and data distributions. Results show that federated learning consistently improves convergence of any individual client as the number of participating clients increases.
comment: Accepted at American Control Conference 2026
Estimation of Unknown Parameters in Presence of Perturbations and Noises with Application to GPEBO Design
A problem of online estimation of unknown parameters is considered for a linear regression equation, which is affected by an additive perturbation that can be caused by measurement noise (that corrupts regressor and regressand), as well as external perturbations. Known approaches to solve this problem typically have one of the following disadvantages: 1) they ensure convergence of a parametric error to a compact set with non-adjustable bound, 2) independence of all system regressor elements from the perturbation/noise is required to annihilate them, 3) an instrumental variable is needed to be selected. On the basis of the novel perturbation annihilation procedure, in the present paper, we propose three new estimation laws, which are free from the above-mentioned drawbacks and ensure exponential convergence of the parametric error to an arbitrarily small neighborhood of zero, particularly, in case more than a half (not all) of the regressor elements are independent from additive perturbation. One of the proposed estimation laws is used for the design of Generalized Parameter Estimation-Based Observer (GPEBO) for nonlinear affine systems to enhance GPEBO performance in case when the measured system output is corrupted by noise. The theoretical results are supported by examples and mathematical modelling.
comment: 8 pages, 2 figures
Adapt and Stabilize, Then Learn and Optimize: A New Approach to Adaptive LQR
This paper focuses on adaptive control of the discrete-time linear quadratic regulator (adaptive LQR). Recent literature has made significant contributions in proving non-asymptotic convergence rates, but existing approaches have a few drawbacks that pose barriers for practical implementation. These drawbacks include (i) a requirement of an initial stabilizing controller, (ii) a reliance on exploration for closed-loop stability, and/or (iii) computationally intensive algorithms. This paper proposes a new algorithm that overcomes these drawbacks for a particular class of discrete-time systems. This algorithm leverages direct model-reference adaptive control (direct MRAC) and combines it with an epoch-based approach in order to address the drawbacks (i)-(iii) with a provable high-probability regret bound comparable to existing literature. Simulations demonstrate that the proposed approach yields regrets that are comparable to those from existing methods when the conditions (i) and (ii) are met, and yields regrets that are significantly smaller when either of these two conditions is not met.
Optical Network Digital Twin -- Practical Use Cases and Architecture
With the widespread adoption of AI, machine-to-machine communications are rapidly increasing, reshaping the requirements for optical networks. Recent advances in Gaussian noise modeling for digital coherent transmission have raised expectations for digital-twin-based operation. However, unlike digital twins in wireless communication, which are already well established, significant barriers remain for commercialization in optical networks. This paper discusses the evolving requirements of optical networks in the AI era and proposes a practical Optical Network Digital Twin architecture enabling dynamic and Quality of Transmission aware operation beyond conventional management. Representative use cases, including operator-driven optimization, user-operator collaboration, and multi-operator interconnection, are presented, along with the architectural framework and key challenges toward practical deployment.
comment: 7 pages, 5 figures
Feedback Linearisation with State Constraints
Feedback Linearisation (FBL) is a widely used technique that applies feedback laws to transform input-affine nonlinear control systems into linear control systems, allowing for the use of linear controller design methods such as pole placement. However, for problems with state constraints, controlling the linear system induced by FBL can be more challenging than controlling the original system. This is because simple state constraints in the original nonlinear system become complex nonlinear constraints in the FBL induced linearised system, thereby diminishing the advantages of linearisation. To avoid increasing the complexity of state constraints under FBL, this paper introduces a method to first augment system dynamics to capture state constraints before applying FBL. We show that our proposed augmentation method leads to ill-defined relative degrees at state constraint boundaries. However, we show that ill-defined relative degrees can be overcome by using a switching FBL controller. Numerical experiments illustrate the capabilities of this method for handling state constraints within the FBL framework.
Neuro-Symbolic Acceleration of MILP Motion Planning with Temporal Logic and Chance Constraints
Autonomous systems must solve motion planning problems subject to increasingly complex, time-sensitive, and uncertain missions. These problems often involve high-level task specifications, such as temporal logic or chance constraints, which require solving large-scale Mixed-Integer Linear Programs (MILPs). However, existing MILP-based planning methods suffer from high computational cost and limited scalability, hindering their real-time applicability. We propose to use a neuro-symbolic approach to accelerate MILP-based motion planning by leveraging machine learning techniques to guide the solver's symbolic search. Focusing on three representative classes of diverse planning problems - Signal Temporal Logic (STL) specifications, chance constraints formulated via Conformal Predictive Programming (CPP), and Capability Temporal Logic (CaTL) specifications - we demonstrate how graph neural network-based learning methods can guide traditional symbolic MILP solvers in solving challenging planning problems, including branching variable selection and solver parameter configuration. Through extensive experiments, we show that neuro-symbolic search techniques yield scalability gains. Our approach yields substantial improvements across all three classes of planning problems, achieving an average performance gain of about 20% over state-of-the-art solver across key metrics, including runtime and solution quality.
Robotics
Hi-WM: Human-in-the-World-Model for Scalable Robot Post-Training
Post-training is essential for turning pretrained generalist robot policies into reliable task-specific controllers, but existing human-in-the-loop pipelines remain tied to physical execution: each correction requires robot time, scene setup, resets, and operator supervision in the real world. Meanwhile, action-conditioned world models have been studied mainly for imagination, synthetic data generation, and policy evaluation. We propose \textbf{Human-in-the-World-Model (Hi-WM)}, a post-training framework that uses a learned world model as a reusable corrective substrate for failure-targeted policy improvement. A policy is first rolled out in closed loop inside the world model; when the rollout becomes incorrect or failure-prone, a human intervenes directly in the model to provide short corrective actions. Hi-WM caches intermediate states and supports rollback and branching, allowing a single failure state to be reused for multiple corrective continuations and yielding dense supervision around behaviors that the base policy handles poorly. The resulting corrective trajectories are then added back to the training set for post-training. We evaluate Hi-WM on three real-world manipulation tasks spanning both rigid and deformable object interaction, and on two policy backbones. Hi-WM improves real-world success by 37.9 points on average over the base policy and by 19.0 points over a world-model closed-loop baseline, while world-model evaluation correlates strongly with real-world performance (r = 0.953). These results suggest that world models can serve not only as generators or evaluators, but also as effective corrective substrates for scalable robot post-training.
comment: Project Page: https://hi-wm.github.io/
A Case Study in Recovery of Drones using Discrete-Event Systems
Discrete-event systems and supervisory control theory provide a rigorous framework for specifying correct-by-construction behavior. However, their practical application to swarm robotics remains largely underexplored. In this paper, we investigate a topological recovery method based on discrete-event-systems within a swarm robotics context. We propose a hybrid architecture that combines a high-level discrete event systems supervisor with a low-level continuous controller, allowing lost drones to safely recover from fault or attack events and re-enter a controlled region. The method is demonstrated using ten simulated UAVs in the py-bullet-drones framework. We show recovery performance across four distinct scenarios, each with varying initial state estimates. Additionally, we introduce a secondary recovery supervisor that manages the regrouping process for a drone after it has re-entered the operational region.
comment: Accepted for publication at WODES 2026; final version will appear in IEEE Xplore
A Compact Peristaltic Pump Based on Magneto-Elastic Hysteresis with Single Pneumatic Control
Pumping fluids is fundamental to a wide range of industrial, environmental, and biomedical applications. Among various pumping mechanisms, peristaltic pumps enable efficient and safe fluid transport by deforming an elastic tube without direct contact with the working fluid. Although previous studies have introduced mechanical, pneumatic, or magnetic actuations to drive membrane deformation, these approaches often lead to complex pump architectures and control schemes. In this study, we present a soft membrane pump that achieves peristaltic motion through a single pneumatic input combined with an embedded passive magnet. The actuation mechanism and system dynamics were analyzed and simplified through modeling. Numerical simulations were conducted to predict the internal fluid flow, and the magneto-elastic hysteresis behavior observed in the simulations was successfully validated by experiments with a proof-of-concept prototype.
comment: 5 pages
Effects of Swarm Size Variability on Operator Workload
Real-world deployments of human--swarm teams depend on balancing operator workload to leverage human strengths without inducing overload. A key challenge is that swarm size is often dynamic: robots may join or leave the mission due to failures or redeployment, causing abrupt workload fluctuations. Understanding how such changes affect human workload and performance is critical for robust human--swarm interaction design. This paper investigates how the magnitude and direction of changes in swarm size influence operator workload. Drawing on the concept of workload history, we test three hypotheses: (1) workload remains elevated following decreases in swarm size, (2) small increases are more manageable than large jumps, and (3) sufficiently large changes override these effects by inducing a cognitive reset. We conducted two studies (N = 34) using a monitoring task with simulated drone swarms of varying sizes. By varying the swarm size between episodes, we measured perceived workload relative to swarm size changes. Results show that objective performance is largely unaffected by small changes in swarm size, while subjective workload is sensitive to both change direction and magnitude. Small increases preserve lower workload, whereas small decreases leave workload elevated, indicating workload residue; large changes in either direction attenuate these effects, suggesting a reset response. These findings offer actionable guidance for managing swarm-size transitions to support operator workload in dynamic human--swarm systems.
SLAM as a Stochastic Control Problem with Partial Information: Optimal Solutions and Rigorous Approximations
Simultaneous localization and mapping (SLAM) is a foundational state estimation problem in robotics in which a robot accurately constructs a map of its environment while also localizing itself within this construction. We study the active SLAM problem through the lens of optimal stochastic control, thereby recasting it as a decision-making problem under partial information. After reviewing several commonly studied models, we present a general stochastic control formulation of active SLAM together with a rigorous treatment of motion, sensing, and map representation. We introduce a new exploration stage cost that encodes the geometry of the state when evaluating information-gathering actions. This formulation, constructed as a nonstandard partially observable Markov decision process (POMDP), is then analyzed to derive rigorously justified approximate solutions that are near-optimal. To enable this analysis, the associated regularity conditions are studied under general assumptions that apply to a wide range of robotics applications. For a particular case, we conduct an extensive numerical study in which standard learning algorithms are used to learn near-optimal policies.
Task-specific Subnetwork Discovery in Reinforcement Learning for Autonomous Underwater Navigation
Autonomous underwater vehicles are required to perform multiple tasks adaptively and in an explainable manner under dynamic, uncertain conditions and limited sensing, challenges that classical controllers struggle to address. This demands robust, generalizable, and inherently interpretable control policies for reliable long-term monitoring. Reinforcement learning, particularly multi-task RL, overcomes these limitations by leveraging shared representations to enable efficient adaptation across tasks and environments. However, while such policies show promising results in simulation and controlled experiments, they yet remain opaque and offer limited insight into the agent's internal decision-making, creating gaps in transparency, trust, and safety that hinder real-world deployment. The internal policy structure and task-specific specialization remain poorly understood. To address these gaps, we analyze the internal structure of a pretrained multi-task reinforcement learning network in the HoloOcean simulator for underwater navigation by identifying and comparing task-specific subnetworks responsible for navigating toward different species. We find that in a contextual multi-task reinforcement learning setting with related tasks, the network uses only about 1.5% of its weights to differentiate between tasks. Of these, approximately 85% connect the context-variable nodes in the input layer to the next hidden layer, highlighting the importance of context variables in such settings. Our approach provides insights into shared and specialized network components, useful for efficient model editing, transfer learning, and continual learning for underwater monitoring through a contextual multi-task reinforcement learning method.
comment: To be published in IEEE OCEANS 2026 (Sanya) conference proceedings
A Bayesian Reasoning Framework for Robotic Systems in Autonomous Casualty Triage ICRA
Autonomous robots deployed in mass casualty incidents (MCI) face the challenge of making critical decisions based on incomplete and noisy perceptual data. We present an autonomous robotic system for casualty assessment that fuses outputs from multiple vision-based algorithms, estimating signs of severe hemorrhage, visible trauma, or physical alertness, into a coherent triage assessment. At the core of our system is a Bayesian network, constructed from expert-defined rules, which enables probabilistic reasoning about a casualty's condition even with missing or conflicting sensory inputs. The system, evaluated during the DARPA Triage Challenge (DTC) in realistic MCI scenarios involving 11 and 9 casualties, demonstrated a nearly three-fold improvement in physiological assessment accuracy (from 15\% to 42\% and 19\% to 46\%) compared to a vision-only baseline. More importantly, overall triage accuracy increased from 14\% to 53\%, while the diagnostic coverage of the system expanded from 31\% to 95\% of cases. These results demonstrate that integrating expert-guided probabilistic reasoning with advanced vision-based sensing can significantly enhance the reliability and decision-making capabilities of autonomous systems in critical real-world applications.
comment: Accepted to the 2026 IEEE International Conference on Robotics and Automation (ICRA)
X2-N: A Transformable Wheel-legged Humanoid Robot with Dual-mode Locomotion and Manipulation
Wheel-legged robots combine the efficiency of wheeled locomotion with the versatility of legged systems, enabling rapid traversal over both continuous and discrete terrains. However, conventional designs typically employ fixed wheels as feet and limited degrees of freedom (DoFs) at the hips, resulting in reduced stability and mobility during legged locomotion compared to humanoids with flat feet. In addition, most existing platforms lack a full upper body with arms, which limits their ability to perform dexterous manipulation tasks. In this letter, we present X2-N, a high-DoF transformable robot with dual-mode locomotion and manipulation. X2-N can operate in both humanoid and wheel-legged forms and transform seamlessly between them through joint reconfiguration. We further propose a reinforcement learning (RL)-based whole-body control framework tailored to this morphology, enabling unified control across hybrid locomotion, transformation, and manipulation. We validate X2-N in a range of challenging locomotion and manipulation tasks, including dynamic skating-like motion, stair climbing and package delivery. Results demonstrate high locomotion efficiency, strong terrain adaptability, and stable loco-manipulation performance of X2-N, highlighting its potential for real-world deployment.
MISTY: High-Throughput Motion Planning via Mixer-based Single-step Drifting
Multi-modal trajectory generation is essential for safe autonomous driving, yet existing diffusion-based planners suffer from high inference latency due to iterative neural function evaluations. This paper presents MISTY (Mixer-based Inference for Single-step Trajectory-drifting Yield), a high-throughput generative motion planner that achieves state-of-the-art closed-loop performance with pure single-step inference. MISTY integrates a vectorized Sub-Graph encoder to capture environment context, a Variational Autoencoder to structure expert trajectories into a compact 32-dimensional latent manifold, and an ultra-lightweight MLP-Mixer decoder to eliminate quadratic attention complexity. Importantly, we introduce a latent-space drifting loss that shifts the complex distribution evolution entirely to the training phase. By formulating explicit attractive and repulsive forces, this mechanism empowers the model to synthesize novel, proactive maneuvers, such as active overtaking, that are virtually absent from the raw expert demonstrations. Extensive evaluations on the nuPlan benchmark demonstrate that MISTY achieves state-of-the-art results on the challenging Test14-hard split, with comprehensive scores of 80.32 and 82.21 in non-reactive and reactive settings, respectively. Operating at over 99 FPS with an end-to-end latency of 10.1 ms, MISTY offers an order-of-magnitude speedup over iterative diffusion planners while while achieving significantly robust generation.
comment: 8 pages, 4 figures, 3 tables. Submitted to IEEE Robotics and Automation Letters (RA-L)
Ufil: A Unified Framework for Infrastructure-based Localization SC
Infrastructure-based localization enhances road safety and traffic management by providing state estimates of road users. Development is hindered by fragmented, application-specific stacks that tightly couple perception, tracking, and middleware. We introduce Ufil, a Unified Framework for Infrastructure-Based Localization with a standardized object model and reusable multi-object tracking components. Ufil offers interfaces and reference implementations for prediction, detection, association, state update, and track management, allowing researchers to improve components without reimplementing the pipeline. Ufil is open-source C++/ROS 2 software with documentation and executable examples. We demonstrate Ufil by integrating three heterogeneous data sources into a single localization pipeline combining (i) vehicle onboard units broadcasting ETSI ITS-G5 Cooperative Awareness Messages, (ii) a lidar-based roadside sensor node, and (iii) an in-road sensitive surface layer. The pipeline runs unchanged in the CARLA simulator and a small-scale CAV testbed, demonstrating Ufil's scale-independent execution model. In a three-lane highway scenario with 423 and 355 vehicles in simulation and testbed, respectively, the fused system achieves lane-level lateral accuracy with mean lateral position RMSEs of 0.31 m in CARLA and 0.29 m in the CPM Lab, and mean absolute orientation errors around 2.2°. Median end-to-end latencies from sensing to fused output remain below 100 ms across all modalities in both environments.
comment: 8 pages, 6 figures, this work was submitted to IEEE International Conference on Intelligent Transportation Systems (ITSC) 2026
Tempered Sequential Monte Carlo for Trajectory and Policy Optimization with Differentiable Dynamics
We propose a sampling-based framework for finite-horizon trajectory and policy optimization under differentiable dynamics by casting controller design as inference. Specifically, we minimize a KL-regularized expected trajectory cost, which yields an optimal "Boltzmann-tilted" distribution over controller parameters that concentrates on low-cost solutions as temperature decreases. To sample efficiently from this sharp, potentially multimodal target, we introduce tempered sequential Monte Carlo (TSMC): an annealing scheme that adaptively reweights and resamples particles along a tempering path from a prior to the target distribution, while using Hamiltonian Monte Carlo rejuvenation to maintain diversity and exploit exact gradients obtained by differentiating through trajectory rollouts. For policy optimization, we extend TSMC via (i) a deterministic empirical approximation of the initial-state distribution and (ii) an extended-space construction that treats rollout randomness as auxiliary variables. Experiments across trajectory- and policy-optimization benchmarks show that TSMC is broadly applicable and compares favorably to state-of-the-art baselines.
From Noise to Intent: Anchoring Generative VLA Policies with Residual Bridges
Bridging high-level semantic understanding with low-level physical control remains a persistent challenge in embodied intelligence, stemming from the fundamental spatiotemporal scale mismatch between cognition and action. Existing generative VLA policies typically adopt a "Generation-from-Noise" paradigm, which disregards this disparity, leading to representation inefficiency and weak condition alignment during optimization. In this work, we propose ResVLA, an architecture that shifts the paradigm to "Refinement-from-Intent." Recognizing that robotic motion naturally decomposes into global intent and local dynamics, ResVLA utilizes spectral analysis to decouple control into a deterministic low-frequency anchor and a stochastic high-frequency residual. By anchoring the generative process on the predicted intent, our model focuses strictly on refining local dynamics via a residual diffusion bridge. Extensive simulation experiments show that ResVLA achieves competitive performance, strong robustness to language and robot embodiment perturbations, and faster convergence than standard generative baselines. It also demonstrates strong performance in real-world robot experiments.
A Replicable Robotics Awareness Method Using LLM-Enabled Robotics Interaction: Evidence from a Corporate Challenge
Large language models are increasingly being explored as interfaces between humans and robotic systems, yet there remains limited evidence on how such technologies can be used not only for interaction, but also as a structured means of introducing robotics to non-specialist users in real organizational settings. This paper introduces and evaluates a challenge-based method for robotics awareness, implemented through an LLM-enabled humanoid robot activity conducted with employees of AD Ports Group in the United Arab Emirates. In the event, participants engaged with a humanoid robot in a logistics-inspired task environment using voice commands interpreted through an LLM-based control framework. The activity was designed as a team-based, role-driven experience intended to expose participants to embodied AI and human-robot collaboration without requiring prior robotics expertise. To evaluate the approach, a post-event survey remained open for 16 days and collected 102 responses. Results indicate strong overall reception, with high satisfaction (8.46/10), increased interest in robotics and AI (4.47/5), and improved understanding of emerging forms of human-robot collaboration (4.45/5). Participants who interacted directly with the robot also reported natural interaction (4.37/5) and a strong sense that interaction became easier as the activity progressed (4.74/5). At the same time, lower ratings for reliability and predictability point to important technical and design challenges for future iterations. The findings suggest that challenge-based, LLM-enabled humanoid interaction can serve as a promising and replicable method for robotics awareness in industrial and operational environments.
comment: 10 pages, 8 Figures, to be submitted for journal per-review
A Deployable Embodied Vision-Language Navigation System with Hierarchical Cognition and Context-Aware Exploration
Bridging the gap between embodied intelligence and embedded deployment remains a key challenge in intelligent robotic systems, where perception, reasoning, and planning must operate under strict constraints on computation, memory, energy, and real-time execution. In vision-language navigation (VLN), existing approaches often face a fundamental trade-off between strong reasoning capabilities and efficient deployment on real-world platforms. In this paper, we present a deployable embodied VLN system that achieves both high efficiency and robust high-level reasoning on real-world robotic platforms. To achieve this, we decouple the system into three asynchronous modules: a real-time perception module for continuous environment sensing, a memory integration module for spatial-semantic aggregation, and a reasoning module for high-level decision making. We incrementally construct a cognitive memory graph to encode scene information, which is further decomposed into subgraphs to enable reasoning with a vision-language model (VLM). To further improve navigation efficiency and accuracy, we also leverage the cognitive memory graph to formulate the exploration problem as a context-aware Weighted Traveling Repairman Problem (WTRP), which minimizes the weighted waiting time of viewpoints. Extensive experiments in both simulation and real-world robotic platforms demonstrate improved navigation success and efficiency over existing VLN approaches, while maintaining real-time performance on resource-constrained hardware.
comment: 10 pages, 5 figures,
RPG: Robust Policy Gating for Smooth Multi-Skill Transitions in Humanoid Fighting
Humanoid robots have demonstrated impressive motor skills in a wide range of tasks, yet whole-body control for humanlike long-time, dynamic fighting remains particularly challenging due to the stringent requirements on agility and stability. While imitation learning enables robots to execute human-like fighting skills, existing approaches often rely on switching among multiple single-skill policies or employing a general policy to imitate input reference motions. These strategies suffer from instability when transitioning between skills, as the mismatch of initial and terminal states across skills or reference motions introduces out-of-domain disturbances, resulting in unsmooth or unstable behaviors. In this work, we propose RPG, a hybrid expert policy framework, for smooth and stable humanoid multi-skills transition. Our approach incorporates motion transition randomization and temporal randomization to train a unified policy that generates agile fighting actions with stability and smoothness during skill transitions. Furthermore, we design a control pipeline that integrates walking/running locomotion with fighting skills, allowing humanlike long-time combat of arbitrary duration that can be seamlessly interrupted or transit action policies at any time. Extensive experiments in simulation demonstrate the effectiveness of the proposed framework, and real-world deployment on the Unitree G1 humanoid robot further validates its robustness and applicability.
Learn Weightlessness: Imitate Non-Self-Stabilizing Motions on Humanoid Robot
The integration of imitation and reinforcement learning has enabled remarkable advances in humanoid whole-body control, facilitating diverse human-like behaviors. However, research on environment-dependent motions remains limited. Existing methods typically enforce rigid trajectory tracking while neglecting physical interactions with the environment. We observe that humans naturally exploit a "weightless" state during non-self-stabilizing (NSS) motions--selectively relaxing specific joints to allow passive body--environment contact, thereby stabilizing the body and completing the motion. Inspired by this biological mechanism, we design a weightlessness-state auto-labeling strategy for dataset annotation; and we propose the Weightlessness Mechanism (WM), a method that dynamically determines which joints to relax and to what level, together enabling effective environmental interaction while executing target motions. We evaluate our approach on 3 representative NSS tasks: sitting on chairs of varying heights, lying down on beds with different inclinations, and leaning against walls via shoulder or elbow. Extensive experiments in simulation and on the Unitree G1 robot demonstrate that our WM method, trained on single-action demonstrations without any task-specific tuning, achieves strong generalization across diverse environmental configurations while maintaining motion stability. Our work bridges the gap between precise trajectory tracking and adaptive environmental interaction, offering a biologically-inspired solution for contact-rich humanoid control.
PREVENT-JACK: Context Steering for Swarms of Long Heavy Articulated Vehicles
In this paper, we aim to extend the traditional point-mass-like robot representation in swarm robotics and instead study a swarm of long Heavy Articulated Vehicles (HAVs). HAVs are kinematically constrained, elongated, and articulated, introducing unique challenges. Local, decentralized coordination of these vehicles is motivated by many real-world applications. Our approach, Prevent-Jack, introduces the sparsely covered context steering framework in robotics. It fuses six local behaviors, providing guarantees against jackknifing and collisions at the cost of potential dead- and livelocks, tested for vehicles with up to ten trailers. We highlight the importance of the Evade Attraction behavior for deadlock prevention using a parameter study, and use 15,000 simulations to evaluate the swarm performance. Our extensive experiments and the results show that both the dead- and livelocks occur more frequently in larger swarms and denser scenarios, affecting a peak average of 27%/31% of vehicles. We observe that larger swarms exhibit increased waiting, while smaller swarms show increased evasion.
comment: 32 pages, 7 figures, 4 videos; submitted to the Swarm Robotics collection of the Nature Portfolio Journal Robotics (NPJ Robot)
FingerViP: Learning Real-World Dexterous Manipulation with Fingertip Visual Perception
The current practice of dexterous manipulation generally relies on a single wrist-mounted view, which is often occluded and limits performance on tasks requiring multi-view perception. In this work, we present FingerViP, a learning system that utilizes a visuomotor policy with fingertip visual perception for dexterous manipulation. Specifically, we design a vision-enhanced fingertip module with an embedded miniature camera and install the modules on each finger of a multi-fingered hand. The fingertip cameras substantially improve visual perception by providing comprehensive, multi-view feedback of both the hand and its surrounding environment. Building on the integrated fingertip modules, we develop a diffusion-based whole-body visuomotor policy conditioned on a third-view camera and multi-view fingertip vision, which effectively learns complex manipulation skills directly from human demonstrations. To improve view-proprioception alignment and contact awareness, each fingertip visual feature is augmented with its corresponding camera pose encoding and per-finger joint-current encoding. We validate the effectiveness of the multi-view fingertip vision and demonstrate the robustness and adaptability of FingerViP on various challenging real-world tasks, including pressing buttons inside a confined box, retrieving sticks from an unstable support, retrieving objects behind an occluding curtain, and performing long-horizon cabinet opening and object retrieval, achieving an overall success rate of 80.8%. All hardware designs and code will be fully open-sourced.
comment: 12 pages, 6 figures
Reasoning About Traversability: Language-Guided Off-Road 3D Trajectory Planning
While Vision-Language Models (VLMs) enable high-level semantic reasoning for end-to-end autonomous driving, particularly in unstructured environments, existing off-road datasets suffer from language annotations that are weakly aligned with vehicle actions and terrain geometry. To address this misalignment, we propose a language refinement framework that restructures annotations into action-aligned pairs, enabling a VLM to generate refined scene descriptions and 3D future trajectories directly from a single image. To further encourage terrain-aware planning, we introduce a preference optimization strategy that constructs geometry-aware hard negatives and explicitly penalizes trajectories inconsistent with local elevation profiles. Furthermore, we propose off-road-specific metrics to quantify traversability compliance and elevation consistency, addressing the limitations of conventional on-road evaluation. Experiments on the ORAD-3D benchmark demonstrate that our approach reduces average trajectory error from 1.01m to 0.97m, improves traversability compliance from 0.621 to 0.644, and decreases elevation inconsistency from 0.428 to 0.322, highlighting the efficacy of action-aligned supervision and terrain-aware optimization for robust off-road driving.
CorridorVLA: Explicit Spatial Constraints for Generative Action Heads via Sparse Anchors
Vision--Language--Action (VLA) models often use intermediate representations to connect multimodal inputs with continuous control, yet spatial guidance is often injected implicitly through latent features. We propose $CorridorVLA$, which predicts sparse spatial anchors as incremental physical changes (e.g., $Δ$-positions) and uses them to impose an explicit tolerance region in the training objective for action generation. The anchors define a corridor that guides a flow-matching action head: trajectories whose implied spatial evolution falls outside it receive corrective gradients, while minor deviations from contacts and execution noise are permitted. On the more challenging LIBERO-Plus benchmark, CorridorVLA yields consistent gains across both SmolVLA and GR00T, improving success rate by $3.4\%$--$12.4\%$ over the corresponding baselines; notably, our GR00T-Corr variant reaches a success rate of $83.21\%$. These results indicate that action-aligned physical cues can provide direct and interpretable constraints for generative action policies, complementing spatial guidance encoded in visual or latent forms. Code is available at https://github.com/corridorVLA.
How VLAs (Really) Work In Open-World Environments
Vision-language-action models (VLAs) have been extensively used in robotics applications, achieving great success in various manipulation problems. More recently, VLAs have been used in long-horizon tasks and evaluated on benchmarks, such as BEHAVIOR1K (B1K), for solving complex household chores. The common metric for measuring progress in such benchmarks is success rate or partial score based on satisfaction of progress-agnostic criteria, meaning only the final states of the objects are considered, regardless of the events that lead to such states. In this paper, we argue that using such evaluation protocols say little about safety aspects of operation and can potentially exaggerate reported performance, undermining core challenges for future real-world deployment. To this end, we conduct a thorough analysis of state-of-the-art models on the B1K Challenge and evaluate policies in terms of robustness via reproducibility and consistency of performance, safety aspects of policies operations, task awareness, and key elements leading to the incompletion of tasks. We then propose evaluation protocols to capture safety violations to better measure the true performance of the policies in more complex and interactive scenarios. At the end, we discuss the limitations of the existing VLAs and motivate future research.
comment: 8 pages, 7 figures, 2 tables
Full-Body Dynamic Safety for Robot Manipulators: 3D Poisson Safety Functions for CBF-Based Safety Filters
Collision avoidance for robotic manipulators requires enforcing full-body safety constraints in high-dimensional configuration spaces. Control Barrier Function (CBF) based safety filters have proven effective in enabling safe behaviors, but enforcing the high number of constraints needed for safe manipulation leads to theoretic and computational challenges. This work presents a framework for full-body collision avoidance for manipulators in dynamic environments by leveraging 3D Poisson Safety Functions (PSFs). In particular, given environmental occupancy data, we sample the manipulator surface at a prescribed resolution and shrink free space via a Pontryagin difference according to this resolution. On this buffered domain, we synthesize a globally smooth CBF by solving Poisson's equation, yielding a single safety function for the entire environment. This safety function, evaluated at each sampled point, yields task-space CBF constraints enforced by a real-time safety filter via a multi-constraint quadratic program. We prove that keeping the sample points safe in the buffered region guarantees collision avoidance for the entire continuous robot surface. The framework is validated on a 7-degree-of-freedom manipulator in dynamic environments.
Long-Horizon Manipulation via Trace-Conditioned VLA Planning
Long-horizon manipulation remains challenging for vision-language-action (VLA) policies: real tasks are multi-step, progress-dependent, and brittle to compounding execution errors. We present LoHo-Manip, a modular framework that scales short-horizon VLA execution to long-horizon instruction following via a dedicated task-management VLM. The manager is decoupled from the executor and is invoked in a receding-horizon manner: given the current observation, it predicts a progress-aware remaining plan that combines (i) a subtask sequence with an explicit done + remaining split as lightweight language memory, and (ii) a visual trace -- a compact 2D keypoint trajectory prompt specifying where to go and what to approach next. The executor VLA is adapted to condition on the rendered trace, thereby turning long-horizon decision-making into repeated local control by following the trace. Crucially, predicting the remaining plan at each step yields an implicit closed loop: failed steps persist in subsequent outputs, and traces update accordingly, enabling automatic continuation and replanning without hand-crafted recovery logic or brittle visual-history buffers. Extensive experiments spanning embodied planning, long-horizon reasoning, trajectory prediction, and end-to-end manipulation in simulation and on a real Franka robot demonstrate strong gains in long-horizon success, robustness, and out-of-distribution generalization. Project page: https://www.liuisabella.com/LoHoManip
comment: Project page: https://www.liuisabella.com/LoHoManip
VistaBot: View-Robust Robot Manipulation via Spatiotemporal-Aware View Synthesis ICRA 2026
Recently, end-to-end robotic manipulation models have gained significant attention for their generalizability and scalability. However, they often suffer from limited robustness to camera viewpoint changes when training with a fixed camera. In this paper, we propose VistaBot, a novel framework that integrates feed-forward geometric models with video diffusion models to achieve view-robust closed-loop manipulation without requiring camera calibration at test time. Our approach consists of three key components: 4D geometry estimation, view synthesis latent extraction, and latent action learning. VistaBot is integrated into both action-chunking (ACT) and diffusion-based ($π_0$) policies and evaluated across simulation and real-world tasks. We further introduce the View Generalization Score (VGS) as a new metric for comprehensive evaluation of cross-view generalization. Results show that VistaBot improves VGS by 2.79$\times$ and 2.63$\times$ over ACT and $π_0$, respectively, while also achieving high-quality novel view synthesis. Our contributions include a geometry-aware synthesis model, a latent action planner, a new benchmark metric, and extensive validation across diverse environments. The code and models will be made publicly available.
comment: This paper has been accepted to ICRA 2026
Task-Driven Co-Design of Heterogeneous Multi-Robot Systems
Designing multi-agent robotic systems requires reasoning across tightly coupled decisions spanning heterogeneous domains, including robot design, fleet composition, and planning. Much effort has been devoted to isolated improvements in these domains, whereas system-level co-design considering trade-offs and task requirements remains underexplored. In this work, we present a formal and compositional framework for the task-driven co-design of heterogeneous multi-robot systems. Building on a monotone co-design theory, we introduce general abstractions of robots, fleets, planners, executors, and evaluators as interconnected design problems with well-defined interfaces that are agnostic to both implementations and tasks. This structure enables efficient joint optimization of robot design, fleet composition, and planning under task-specific performance constraints. A series of case studies demonstrates the capabilities of the framework. Various component models can be seamlessly incorporated, including new robot types, task profiles, and probabilistic sensing objectives, while non-obvious design alternatives are systematically uncovered with optimality guarantees. The results highlight the flexibility, scalability, and interpretability of the proposed approach, and illustrate how formal co-design enables principled reasoning about complex heterogeneous multi-robot systems.
Characterizing pitch and roll torque coupling in insect-sized flapping-wing robots using a microfabricated gimbal
Sub-gram flapping-wing flying insect robots (FIRs) are challenging to model because of mechanical complexity in their wings, unsteady aerodynamic flow, and the difficulty of making precise measurements at a small scale. Coupling effects between roll and pitch torque actuation have not previously been measured because a two-axis sensor that is sensitive enough has not been realized. To address this shortcoming, we introduce a microfabricated gimbal design capable of precisely and simultaneously measuring roll and pitch torques as well as thrust. We then used it to measure the extent to which a pitch torque command affects roll torque and vice versa on a 180 mg piezo-actuated flapping-wing flying platform. Our results show a high coefficient of determination in the linear regression for both pitch (0.95) and roll (0.98) and low cross-correlation coefficients (-0.001 and -0.085, respectively) across the full range of simultaneous torque commands, indicating negligible cross-axis coupling. Similarly, thrust force deviates by a maximum of only 5.8% from the mean thrust value. These results validate the assumption that pitch and toll can be considered independently in control and will inform future models of how inputs affect the aerodynamics of resonant flapping-wing systems.
comment: Submitted for journal publication in Mechatronics and conference presentation at IFAC World Congress 2026. 9 pages, 11 figures
Dynamic Coupling and Indirect Control of Jointed Robots Rolling Atop A Moving Platform
An asymmetric two-link robot supported atop a flat platform by wheels that roll and pivot freely, but do not slip laterally, will develop forward momentum if the joint between the links is actuated internally. In particular, oscillations in the joint angle will generate undulatory locomotion suggesting fishlike swimming. If two such robots surmount a common platform that's free to translate with its own inertial dynamics, then the individual robots' dynamics will be coupled so that the locomotion of either robot is affected by that of the other. We develop a mathematical model for this system and present simulations demonstrating its behavior. We then consider a single robot with an unactuated joint rolling atop a platform that moves under control, and show that actuation of the platform is sufficient to dictate the robot's behavior. In particular, with the acceleration of the platform as an input, the robot's heading can be made to track a chosen function of time. This is sufficient to guarantee that the robot can be induced to orbit a fixed point on the platform or to locomote persistently in a desired direction.
Wiggle and Go! System Identification for Zero-Shot Dynamic Rope Manipulation
Many robotic tasks are unforgiving; a single mistake in a dynamic throw can lead to unacceptable delays or unrecoverable failure. To mitigate this, we present a novel approach that leverages learned simulation priors to inform goal-conditioned dynamic manipulation of ropes for efficient and accurate task execution. Related methods for dynamic rope manipulation either require large real-world datasets to estimate rope behavior or the use of iterative improvements on attempts at the task for goal completion. We introduce Wiggle and Go!, a system-identification, two-stage framework that enables zero-shot task rope manipulation. The framework consists of a system identification module that observes rope movement to predict descriptive physical parameters, which then informs an optimization method for goal-conditioned action prediction for the robot to execute zero-shot in the real. Our method achieves strong performance across multiple dynamic manipulation tasks enabled by the same task-agnostic system identification module which offers seamless switching between different manipulation tasks, allowing a single model to support a diverse array of manipulation policies. We achieve a 3.55 cm average accuracy on 3D target striking in real using rope system parameters in comparison to 15.34 cm accuracy when our task model is not system-parameter-informed. We achieve a Pearson correlation coefficient of 0.95 between Fourier frequencies of the predicted and real ropes on an unseen trajectory. Project website please see https://wiggleandgo.github.io/
TRACE: Topology-aware Reconstruction of Accidents in CARLA for AV Evaluation
Validating Autonomous Vehicles (AVs) requires exposure to rare, safety-critical scenarios, infrequent in routine driving data. Existing benchmarks address this by generating synthetic conflicts or mapping accident descriptions to abstract road geometries, failing to capture the topological complexity of real-world crashes. We introduce TRACE , a pipeline that automates the reconstruction of NHTSA crash reports into high-fidelity CARLA simulations by (1) retrieving site-specific OpenStreetMap data to preserve exact road topology, (2) leveraging Large Language Models to infer vehicles' initial state from road geometry and pre-crash maneuvers, and (3) generating simulation trajectories from semi-structured report data. Using this pipeline, we curated a benchmark of 52 diverse accident scenarios covering varied collision types, road topologies, and pre-crash maneuvers, providing a challenging open source resource for testing AV systems against real-world failures.
comment: FSE'26 Tool Demonstration Track
SNGR: Selective Non-Gaussian Refinement for Ambiguous SLAM Factor Graphs
We present Selective Non-Gaussian Refinement (SNGR), a SLAM framework that augments iSAM2 with targeted nested sampling on windows where Gaussian approximations are likely to fail. We detect such regions using the condition number of joint marginal covariances and selectively refine them using the full nonlinear factor graph likelihood, with a gating mechanism to avoid degradation in multimodal cases. Experiments on range-only SLAM with wrong data association show that SNGR achieves high-precision failure detection and consistent local likelihood improvements while reducing computational cost relative to exhaustive non-Gaussian inference. These results highlight both the promise and the limitations of selective refinement for approximate SLAM posteriors.
Robust Localization for Autonomous Vehicles in Highway Scenes ICRA
Localization for autonomous vehicles on highways remains under-explored compared to urban roads, and state-of-the-art methods for urban scenes degrade when directly applied to highways. We identify key challenges including environment changes under information homogeneity, heavy occlusion, degraded GNSS signals, and stringent downstream requirements on accuracy and latency. We propose a robust localization system to address highway challenges, which uses a dual-likelihood LiDAR front end that decouples 3D geometric structures and 2D road-texture cues to handle environment changes; a Control-EKF further leverages steering and acceleration commands to reduce lag and improve closed-loop behavior. An automated offline mapping and ground-truth pipeline keep maps fresh at high cadence for optimal localization performance. To catalyze progress, we release a public dataset covering both urban roads and highways while focusing on representative challenging highway clips, totaling 163 km; benchmarking is standardized using product-oriented accuracy metrics and certified ground truth. Compared to Apollo and Autoware, our system performs similarly on urban roads but shows superior robustness on challenging highway scenarios. The system has been validated by more than one million kilometers of road testing.
comment: Accepted to 2026 IEEE International Conference on Robotics and Automation (ICRA)
DM$^3$-Nav: Decentralized Multi-Agent Multimodal Multi-Object Semantic Navigation
We present DM$^3$-Nav, a fully decentralized multi-agent semantic navigation system supporting multimodal open-vocabulary goal specification and multi-object missions. In our setting, decentralization implies operation without a central coordinator, global map aggregation, or shared global state at runtime. Robots operate autonomously and coordinate through ad-hoc pairwise communication, exchanging local maps, goal status, and navigation intent without synchronization. An implicit task allocation mechanism combining intent broadcasting and distance-weighted frontier selection reduces redundant exploration while preserving decentralized operation. Evaluations on HM3DSem scenes using the HM3Dv0.2 and GOAT-Bench datasets demonstrate that DM$^3$-Nav matches or exceeds centralized and shared-map baselines while eliminating single points of failure inherent in centralized architectures. Finally, we validate our approach in a real-world office environment using two mobile robots, demonstrating successful deployment relying entirely on onboard sensing and computation. A video of our real-world experiments is available online: https://drive.google.com/file/d/1QiUSCn5rIvtuTUqtuXLPgmt6S8x9-MCZ/view?usp=drive_link
Model Predictive Control of Hybrid Dynamical Systems
The problem of controlling hybrid dynamical systems using model predictive control (MPC) is formulated and sufficient conditions for asymptotic stability of a set are provided. Hybrid dynamical systems are modeled in terms of hybrid equations, involving a differential equation and a difference equation with inputs and constraints. The proposed hybrid MPC algorithm uses a suitable prediction and control horizon construction inspired by hybrid time domains. Structural properties of the hybrid optimization problem, its feasible set, and its value function are provided. Checkable conditions to guarantee asymptotic stability of a set are provided. These conditions are given in terms of properties on the stage cost, terminal cost, and the existence of static state-feedback laws, related through a control Lyapunov function condition. Examples illustrate the results throughout the paper.
comment: Technical report associated with paper to appear in IEEE Transactions on Automatic Control, 2026
Focus Session: Hardware and Software Techniques for Accelerating Multimodal Foundation Models DATE
This work presents a multi-layered methodology for efficiently accelerating multimodal foundation models (MFMs). It combines hardware and software co-design of transformer blocks with an optimization pipeline that reduces computational and memory requirements. During model development, it employs performance enhancements through fine-tuning for domain-specific adaptation. Our methodology further incorporates hardware and software techniques for optimizing MFMs. Specifically, it employs MFM compression using hierarchy-aware mixed-precision quantization and structural pruning for transformer blocks and MLP channels. It also optimizes operations through speculative decoding, model cascading that routes queries through a small-to-large cascade and uses lightweight self-tests to determine when to escalate to larger models, as well as co-optimization of sequence length, visual resolution & stride, and graph-level operator fusion. To efficiently execute the model, the processing dataflow is optimized based on the underlying hardware architecture together with memory-efficient attention to meet on-chip bandwidth and latency budgets. To support this, a specialized hardware accelerator for the transformer workloads is employed, which can be developed through expert design or an LLM-aided design approach. We demonstrate the effectiveness of the proposed methodology on medical-MFMs and on code generation tasks, and conclude with extensions toward energy-efficient spiking-MFMs.
comment: Accepted at the Design, Automation and Test in Europe Conference (DATE), April 20-22, 2026 in Verona, Italy
Stratified Topological Autonomy for Long-Range Coordination (STALC)
In this paper, we present Stratified Topological Autonomy for Long-Range Coordination (STALC), a hierarchical planning approach for multi-robot coordination in real-world environments with significant inter-robot spatial and temporal dependencies. At its core, STALC consists of a multi-robot graph-based planner which combines a topological graph with a novel, computationally efficient mixed-integer programming formulation to generate highly-coupled multi-robot plans in seconds. To enable autonomous planning across different spatial and temporal scales, we construct our graphs so that they capture connectivity between free-space regions and other problem-specific features, such as traversability or risk. We then use receding-horizon planners to achieve local collision avoidance and formation control. To evaluate our approach, we consider a multi-robot reconnaissance scenario where robots must autonomously coordinate to navigate through an environment while minimizing the risk of detection by observers. Through simulation-based experiments, we show that our approach is able to scale to address complex multi-robot planning scenarios. Through hardware experiments, we demonstrate our ability to generate graphs from real-world data and successfully plan across the entire hierarchy to achieve shared objectives.
comment: ©2026 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works
Reinforcement Learning with Foundation Priors: Let the Embodied Agent Efficiently Learn on Its Own
Reinforcement learning (RL) is a promising approach for solving robotic manipulation tasks. However, it is challenging to apply the RL algorithms directly in the real world. For one thing, RL is data-intensive and typically requires millions of interactions with environments, which are impractical in real scenarios. For another, it is necessary to make heavy engineering efforts to design reward functions manually. To address these issues, we leverage foundation models in this paper. We propose Reinforcement Learning with Foundation Priors (RLFP) to utilize guidance and feedback from policy, value, and success-reward foundation models. Within this framework, we introduce the Foundation-guided Actor-Critic (FAC) algorithm, which enables embodied agents to explore more efficiently with automatic reward functions. The benefits of our framework are threefold: (1) \textit{sample efficient}; (2) \textit{minimal and effective reward engineering}; (3) \textit{agnostic to foundation model forms and robust to noisy priors}. Our method achieves remarkable performances in various manipulation tasks on both real robots and in simulation. Across 5 dexterous tasks with real robots, FAC achieves an average success rate of 86\% after one hour of real-time learning. Across 8 tasks in the simulated Meta-world, FAC achieves 100\% success rates in 7/8 tasks under less than 100k frames (about 1-hour training), outperforming baseline methods with manual-designed rewards in 1M frames. We believe the RLFP framework can enable future robots to explore and learn autonomously in the physical world for more tasks. Visualizations and code are available at https://yewr.github.io/rlfp.
comment: CoRL 2024 (Oral)
Geometry-aided Vision-based Localization of Future Mars Helicopters in Challenging Illumination Conditions
Planetary exploration using aerial assets has the potential for unprecedented scientific discoveries on Mars. While NASA's Mars helicopter Ingenuity proved flight in Martian atmosphere is possible, future Mars rotorcraft will require advanced navigation capabilities for long-range flights. One such critical capability is Map-based Localization (MbL) which registers an onboard image to a reference map during flight to mitigate cumulative drift from visual odometry. However, significant illumination differences between rotorcraft observations and a reference map prove challenging for traditional MbL systems, restricting the operational window of the vehicle. In this work, we investigate a new MbL system and propose Geo-LoFTR, a geometry-aided deep learning model for image registration that is more robust under large illumination differences than prior models. The system is supported by a custom simulation framework that uses real orbital maps to produce large amounts of realistic images of the Martian terrain. Comprehensive evaluations show that our proposed system outperforms prior MbL efforts in terms of localization accuracy under significant lighting and scale variations. Furthermore, we demonstrate the validity of our approach across a simulated Martian day and on real Mars imagery. Code and datasets are available at: https://dpisanti.github.io/geo-loftr/.
Certified Coil Geometry Learning for Short-Range Magnetic Actuation and Spacecraft Docking Application
This paper presents a learning-based framework for approximating an exact magnetic-field interaction model, supported by both numerical and experimental validation. High-fidelity magnetic-field interaction modeling is essential for achieving exceptional accuracy and responsiveness across a wide range of fields, including transportation, energy systems, medicine, biomedical robotics, and aerospace robotics. In aerospace engineering, magnetic actuation has been investigated as a fuel-free solution for multi-satellite attitude and formation control. Although the exact magnetic field can be computed from the Biot-Savart law, the associated computational cost is prohibitive, and prior studies have therefore relied on dipole approximations to improve efficiency. However, these approximations lose accuracy during proximity operations, leading to unstable behavior and even collisions. To address this limitation, we develop a learning-based approximation framework that faithfully reproduces the exact field while dramatically reducing computational cost. This framework directly derives a coefficient matrix that maps inter-satellite current vectors to the resulting forces and torques, enabling efficient computation of control current commands. The proposed method additionally provides a certified error bound, derived from the number of training samples, ensuring reliable prediction accuracy. The learned model can also accommodate interactions between coils of different sizes through appropriate geometric transformations, without retraining. To verify the effectiveness of the proposed framework under challenging conditions, a spacecraft docking scenario is examined through both numerical simulations and experimental validation.
comment: IEEE Robotics and Automation Letters. Preprint Version. Accepted March, 2026 (DOI: https://doi.org/10.1109/LRA.2026.3685510)
FingerEye: Continuous and Unified Vision-Tactile Sensing for Dexterous Manipulation
Dexterous robotic manipulation requires comprehensive perception across all phases of interaction: pre-contact, contact initiation, and post-contact. Such continuous feedback allows a robot to adapt its actions throughout interaction. However, many existing tactile sensors, such as GelSight and its variants, only provide feedback after contact is established, limiting a robot's ability to precisely initiate contact. We introduce FingerEye, a compact and cost-effective sensor that provides continuous vision-tactile feedback throughout the interaction process. FingerEye integrates binocular RGB cameras to provide close-range visual perception with implicit stereo depth. Upon contact, external forces and torques deform a compliant ring structure; these deformations are captured via marker-based pose estimation and serve as a proxy for contact wrench sensing. This design enables a perception stream that smoothly transitions from pre-contact visual cues to post-contact tactile feedback. Building on this sensing capability, we develop a vision-tactile imitation learning policy that fuses signals from multiple FingerEye sensors to learn dexterous manipulation behaviors from limited real-world data. We further develop a digital twin of our sensor and robot platform to improve policy generalization. By combining real demonstrations with visually augmented simulated observations for representation learning, the learned policies become more robust to object appearance variations. Together, these design aspects enable dexterous manipulation across diverse object properties and interaction regimes, including coin standing, chip picking, letter retrieving, and syringe manipulation. The hardware design, code, appendix, and videos are available on our project website: https://nus-lins-lab.github.io/FingerEyeWeb/
Learning Physics from Pretrained Video Models: A Multimodal Continuous and Sequential World Interaction Models for Robotic Manipulation
The scarcity of large-scale robotic data has motivated the repurposing of foundation models from other modalities for policy learning. In this work, we introduce PhysGen (Learning Physics from Pretrained Video Generation Models), a scalable continuous and sequential world interaction framework that leverages autoregressive video generation to solve robotic manipulation tasks. By treating the pretrained video model as a proxy for a physics simulator, PhysGen models the dynamic interplay between the external environment and robot actions. We introduce a multimodal continuous representation that unifies video and action into shared physical tokens, bridging the gap between discrete video generation and continuous robotic control. This approach enables the seamless transfer of implicit physical knowledge-such as object permanence and dynamics-from video pretraining to downstream manipulation.To ensure efficient convergence, we incorporate causal masking, inverse kinematics, Lookahead Multi-Token Prediction (L-MTP), and key-value (KV) caching. Experimental results on the Libero and ManiSkill benchmarks demonstrate that PhysGen consistently outperforms robust baselines, surpassing OpenVLA and WorldVLA by margins of 13.8% and 8.8%, respectively. Notably, in real-world scenarios, PhysGen matches the performance of large-scale action-pretrained models like $π_0$ without requiring prior action-specific pretraining, demonstrating superior capability in physically complex tasks such as grasping transparent objects. These findings validate the potential of extracting physical intuition from pretrained video generators to facilitate generalizable robotic manipulation.
comment: 11 pages, 6 figures. arXiv admin note: text overlap with arXiv:2508.09822
MOMO: A framework for seamless physical, verbal, and graphical robot skill learning and adaptation
Industrial robot applications require increasingly flexible systems that non-expert users can easily adapt for varying tasks and environments. However, different adaptations benefit from different interaction modalities. We present an interactive framework that enables robot skill adaptation through three complementary modalities: kinesthetic touch for precise spatial corrections, natural language for high-level semantic modifications, and a graphical web interface for visualizing geometric relations and trajectories, inspecting and adjusting parameters, and editing via-points by drag-and-drop. The framework integrates five components: energy-based human-intention detection, a tool-based LLM architecture (where the LLM selects and parameterizes predefined functions rather than generating code) for safe natural language adaptation, Kernelized Movement Primitives (KMPs) for motion encoding, probabilistic Virtual Fixtures for guided demonstration recording, and ergodic control for surface finishing. We demonstrate that this tool-based LLM architecture generalizes skill adaptation from KMPs to ergodic control, enabling voice-commanded surface finishing. Validation on a 7-DoF torque-controlled robot at the Automatica 2025 trade fair demonstrates the practical applicability of our approach in industrial settings.
comment: 15 pages, 13 figures, 3 tables
Situationally-aware Path Planning Exploiting 3D Scene Graphs
3D Scene Graphs integrate both metric and semantic information, yet their structure remains underutilized for improving path planning efficiency and interpretability. In this work, we present S-Path, a situationally-aware path planner that leverages the metric-semantic structure of indoor 3D Scene Graphs to significantly enhance planning efficiency. S-Path follows a two-stage process: it first performs a search over a semantic graph derived from the scene graph to yield a human-understandable high-level path. This also identifies relevant regions for planning, which later allows the decomposition of the problem into smaller, independent subproblems that can be solved in parallel. We also introduce a replanning mechanism that, in the event of an infeasible path, reuses information from previously solved subproblems to update semantic heuristics and prioritize reuse to further improve the efficiency of future planning attempts. Extensive experiments on both real-world and simulated environments show that S-Path achieves average reductions of 6x in planning time while maintaining comparable path optimality to classical sampling-based planners and surpassing them in complex scenarios, making it an efficient and interpretable path planner for environments represented by indoor 3D Scene Graphs. Code available at: https://github.com/snt-arg/spath_ros
Efficient Emotion-Aware Iconic Gesture Prediction for Robot Co-Speech
Co-speech gestures increase engagement and improve speech understanding. Most data-driven robot systems generate rhythmic beat-like motion, yet few integrate semantic emphasis. To address this, we propose a lightweight transformer that derives iconic gesture placement and intensity from text and emotion alone, requiring no audio input at inference time. The model outperforms GPT-4o in both semantic gesture placement classification and intensity regression on the BEAT2 dataset, while remaining computationally compact and suitable for real-time deployment on embodied agents.
PLAF: Pixel-wise Language-Aligned Feature Extraction for Efficient 3D Scene Understanding
Accurate open-vocabulary 3D scene understanding requires semantic representations that are both language-aligned and spatially precise at the pixel level, while remaining scalable when lifted to 3D space. However, existing representations struggle to jointly satisfy these requirements, and densely propagating pixel-wise semantics to 3D often results in substantial redundancy, leading to inefficient storage and querying in large-scale scenes. To address these challenges, we present \emph{PLAF}, a Pixel-wise Language-Aligned Feature extraction framework that enables dense and accurate semantic alignment in 2D without sacrificing open-vocabulary expressiveness. Building upon this representation, we further design an efficient semantic storage and querying scheme that significantly reduces redundancy across both 2D and 3D domains. Experimental results show that \emph{PLAF} provides a strong semantic foundation for accurate and efficient open-vocabulary 3D scene understanding. The codes are publicly available at https://github.com/RockWenJJ/PLAF.
comment: Accepted by ICCA 2026
Rectified Schrödinger Bridge Matching for Few-Step Visual Navigation
Visual navigation is a core challenge in Embodied AI, requiring autonomous agents to translate high-dimensional sensory observations into continuous, long-horizon action trajectories. While generative policies based on diffusion models and Schrödinger Bridges (SB) effectively capture multimodal action distributions, they require dozens of integration steps due to high-variance stochastic transport, posing a critical barrier for real-time robotic control. We propose Rectified Schrödinger Bridge Matching (RSBM), a framework that exploits a shared velocity-field structure between standard Schrödinger Bridges ($\varepsilon=1$, maximum-entropy transport) and deterministic Optimal Transport ($\varepsilon\to 0$, as in Conditional Flow Matching), controlled by a single entropic regularization parameter $\varepsilon$. We prove two key results: (1) the conditional velocity field's functional form is invariant across the entire $\varepsilon$-spectrum (Velocity Structure Invariance), enabling a single network to serve all regularization strengths; and (2) reducing $\varepsilon$ linearly decreases the conditional velocity variance, enabling more stable coarse-step ODE integration. Anchored to a learned conditional prior that shortens transport distance, RSBM operates at an intermediate $\varepsilon$ that balances multimodal coverage and path straightness. Empirically, while standard bridges require $\geq 10$ steps to converge, RSBM achieves over 94% cosine similarity and 92% success rate in merely 3 integration steps -- without distillation or multi-stage training -- substantially narrowing the gap between high-fidelity generative policies and the low-latency demands of Embodied AI.
comment: 18 pages, 7 figures, 10 tables. Code available at https://github.com/WuyangLuan/RSBM
Demystifying Action Space Design for Robotic Manipulation Policies
The specification of the action space plays a pivotal role in imitation-based robotic manipulation policy learning, fundamentally shaping the optimization landscape of policy learning. While recent advances have focused heavily on scaling training data and model capacity, the choice of action space remains guided by ad-hoc heuristics or legacy designs, leading to an ambiguous understanding of robotic policy design philosophies. To address this ambiguity, we conducted a large-scale and systematic empirical study, confirming that the action space does have significant and complex impacts on robotic policy learning. We dissect the action design space along temporal and spatial axes, facilitating a structured analysis of how these choices govern both policy learnability and control stability. Based on 13,000+ real-world rollouts on a bimanual robot and evaluation on 500+ trained models over four scenarios, we examine the trade-offs between absolute vs. delta representations, and joint-space vs. task-space parameterizations. Our large-scale results suggest that properly designing the policy to predict delta actions consistently improves performance, while joint-space and task-space representations offer complementary strengths, favoring control stability and generalization, respectively.
JoyAI-RA 0.1: A Foundation Model for Robotic Autonomy
Robotic autonomy in open-world environments is fundamentally limited by insufficient data diversity and poor cross-embodiment generalization. Existing robotic datasets are often limited in scale and task coverage, while relatively large differences across robot embodiments impede effective behavior knowledge transfer. To address these challenges, we propose JoyAI-RA, a vision-language-action (VLA) embodied foundation model tailored for generalizable robotic manipulation. JoyAI-RA presents a multi-source multi-level pretraining framework that integrates web data, large-scale egocentric human manipulation videos, simulation-generated trajectories, and real-robot data. Through training on heterogeneous multi-source data with explicit action-space unification, JoyAI-RA effectively bridges embodiment gaps, particularly between human manipulation and robotic control, thereby enhancing cross-embodiment behavior learning. JoyAI-RA outperforms state-of-the-art methods in both simulation and real-world benchmarks, especially on diverse tasks with generalization demands.
ExpressMM: Expressive Mobile Manipulation Behaviors in Human-Robot Interactions
Mobile manipulators are increasingly deployed in human-centered environments to perform tasks. While completing such tasks, they should also be able to communicate their intent to the people around them using expressive robot behaviors. Prior work on expressive robot behaviors has used preprogrammed or learning-from-demonstration-based expressive motions and large language model generated high-level interactions. The majority of these existing approaches have not considered human-robot interactions (HRI) where users may interrupt, modify, or redirect a robot's actions during task execution. In this paper, we develop the novel ExpressMM framework that integrates a high-level language-guided planner based on a vision-language model for perception and conversational reasoning with a low-level vision-language-action policy to generate expressive robot behaviors during collaborative HRI tasks. Furthermore, ExpressMM supports interruptible interactions to accommodate updated or redirecting instructions by users. We demonstrate ExpressMM on a mobile manipulator assisting a human in a collaborative assembly scenario and conduct audience-based evaluation of live HRI demonstrations. Questionnaire results show that the ExpressMM-enabled expressive behaviors helped observers clearly interpret the robot's actions and intentions while supporting socially appropriate and understandable interactions. Participants also reported that the robot was useful for collaborative tasks and behaved in a predictable and safe manner during the demonstrations, fostering positive perceptions of the robot's usefulness, safety, and predictability during the collaborative tasks.
ZipFold: Modular Actuators for Scaleable Adaptive Robots
There is a growing need for robots that can change their shape, size and mechanical properties to adapt to evolving tasks and environments. However, current shape-changing systems generally utilize bespoke, system-specific mechanisms that can be difficult to scale, reconfigure or translate from one application to another. This paper introduces a compact, easy-to-fabricate deployable actuator that achieves reversible scale and stiffness transformations through compound folding and zipping of flexible 3D-printed plastic strips into square-section deployable beams. The simple actuation method allows for smooth, continuous transitions between compact (flexible) and expanded (quasi-rigid) states, facilitating diverse shape and stiffness transformations when modules are combined into larger assemblies. The actuator's mechanical performance is characterized and an integrated system involving a four-module adaptive walking robot is demonstrated.
EgoExo++: Integrating On-demand Exocentric Visuals with 2.5D Ground Surface Estimation for Interactive Teleoperation of Underwater ROVs
Underwater ROVs (Remotely Operated Vehicles) are indispensable for subsea exploration and task execution, yet typical teleoperation engines based on egocentric (first-person) video feeds restrict human operators' field-of-view and limit precise maneuvering in complex, unstructured underwater environments. To address this, we first propose EgoExo, a geometry-driven solution integrated into a visual SLAM pipeline that synthesizes on-demand exocentric (third-person) views from egocentric camera feeds. We further propose EgoExo++, which extends beyond 2D exocentric view synthesis (EgoExo) to augment a piecewise planar 2.5D ground surface estimation on-the-fly. Its anchor-free aerial viewpoint supports ground-relative reasoning, such as clearance and terrain-based navigation marker following. The computations involved are closed-form and rely solely on egocentric views and monocular SLAM estimates, which makes it portable across existing teleoperation engines and robust to varying waterbody characteristics. We validate the geometric accuracy of our approach through extensive experiments of 2-DOF indoor navigation and 6-DOF underwater cave exploration in challenging low-light conditions. To assess operational benefits, we conduct two user studies with simulation and real-world data, each involving 15 participants, comparing baseline egocentric teleoperation and EgoExo++. Results indicate improved system usability (SUS), reduced perceived workload (NASA-TLX), and significant gains in objective teleoperation performance, including 16% faster missions, 5-fold reduction in path deviation ratio, and fewer collision events (2 vs. 5 across trials). Furthermore, we highlight the role of EgoExo++ augmented visuals in supporting shared autonomy and embodied teleoperation. This new interactive approach to ROV teleoperation presents promising opportunities for future research in subsea telerobotics.
comment: EgoExo++ (Accepted in IJRR), V6/V2, metadata updated, 15 pages
Fake or Real, Can Robots Tell? Evaluating VLM Robustness to Domain Shift in Single-View Robotic Scene Understanding
Robotic scene understanding increasingly relies on Vision-Language Models (VLMs) to generate natural language descriptions of the environment. In this work, we systematically evaluate single-view object captioning for tabletop scenes captured by a robotic manipulator, introducing a controlled physical domain shift that contrasts real-world tools with geometrically similar 3D-printed counterparts that differ in texture, colour, and material. We benchmark a suite of state-of-the-art, locally deployable VLMs across multiple metrics to assess semantic alignment and factual grounding. Our results demonstrate that while VLMs describe common real-world objects effectively, performance degrades markedly on 3D-printed items despite their structurally familiar forms. We further expose critical vulnerabilities in standard evaluation metrics, showing that some fail to detect domain shifts entirely or reward fluent but factually incorrect captions. These findings highlight the limitations of deploying foundation models for embodied agents and the need for more robust architectures and evaluation protocols in physical robotic applications.
Scensory: Real-Time Robotic Olfactory Perception for Joint Identification and Source Localization
While robotic perception has advanced rapidly in vision and touch, enabling robots to reason about indoor fungal contamination from weak, diffusion-dominated chemical signals remains an open challenge. We introduce Scensory, a learning-based robotic olfaction framework that simultaneously identifies fungal species and localizes their source from short time series measured by affordable, cross-sensitive VOC sensor arrays. Temporal VOC dynamics encode both chemical and spatial signatures, which we decode through neural networks trained on robot-automated data collection with spatial supervision. Across five fungal species, Scensory achieves up to 89.85% species accuracy and 87.31% source localization accuracy under ambient conditions with 3-7s sensor inputs. These results demonstrate real-time, spatially grounded perception from diffusion-dominated chemical signals, enabling scalable and low-cost source localization for robotic indoor environmental monitoring.
comment: Our project website is at: http://generalroboticslab.com/Scensory
Relay-Based Coordination for Energy-Efficient Multi-Robot Pickup and Delivery
We consider the problem of delivering multiple packages from a single depot to distinct goal locations using a homogeneous fleet of robots with limited carrying capacity. We propose VCST-RCP, a Voronoi-Constrained Steiner Tree Relay Coordination Planning framework that explicitly treats inter-robot relays as a design primitive. The approach operates in two stages: (i) constructing a sparse relay backbone by combining Voronoi-derived exchange interfaces with Steiner tree optimization, and (ii) synthesizing robot-level pickup, relay, and delivery schedules under capacity and service-time constraints. Unlike traditional methods that rely on direct source-to-destination transport, our framework organizes package flow through a shared relay network, reducing redundant long-haul motion. Extensive experiments across multiple scales show that VCST-RCP reduces total fleet travel distance by an average of 31% (up to nearly 50%) compared to Hungarian assignment and significantly outperforms OR-Tools CVRP, with statistically significant improvements (p < 10^{-3}). These gains translate into over 50% higher delivery efficiency (packages per kilometer), directly improving energy utilization. An ablation study further reveals that optimizing relay placement yields substantially larger improvements than adapting spatial partitioning alone, establishing relay design as the dominant factor governing system performance. Overall, the results demonstrate that relay-based coordination provides a scalable and effective framework for energy-aware multi-robot delivery in real-world logistics settings.
LLMPhy: Parameter-Identifiable Physical Reasoning Combining Large Language Models and Physics Engines AISTATS 2026
Most learning-based approaches to complex physical reasoning sidestep the crucial problem of parameter identification (e.g., mass, friction) that governs scene dynamics, despite its importance in real-world applications such as collision avoidance and robotic manipulation. In this paper, we present LLMPhy, a black-box optimization framework that integrates large language models (LLMs) with physics simulators for physical reasoning. The core insight of LLMPhy is to bridge the textbook physical knowledge embedded in LLMs with the world models implemented in modern physics engines, enabling the construction of digital twins of input scenes via latent parameter estimation. Specifically, LLMPhy decomposes digital twin construction into two subproblems: (i) a continuous problem of estimating physical parameters and (ii) a discrete problem of estimating scene layout. For each subproblem, LLMPhy iteratively prompts the LLM to generate computer programs encoding parameter estimates, executes them in the physics engine to reconstruct the scene, and uses the resulting reconstruction error as feedback to refine the LLM's predictions. As existing physical reasoning benchmarks rarely account for parameter identifiability, we introduce three new datasets designed to evaluate physical reasoning in zero-shot settings. Our results show that LLMPhy achieves state-of-the-art performance on our tasks, recovers physical parameters more accurately, and converges more reliably than prior black-box methods. See the LLMPhy project page for details: https://www.merl.com/research/highlights/LLMPhy
comment: Accepted at AISTATS 2026
Multiagent Systems
Probably Approximately Consensus: On the Learning Theory of Finding Common Ground IJCAI 2025
A primary goal of online deliberation platforms is to identify ideas that are broadly agreeable to a community of users through their expressed preferences. Yet, consensus elicitation should ideally extend beyond the specific statements provided by users and should incorporate the relative salience of particular topics. We address this issue by modelling consensus as an interval in a one-dimensional opinion space derived from potentially high-dimensional data via embedding and dimensionality reduction. We define an objective that maximizes expected agreement within a hypothesis interval where the expectation is over an underlying distribution of issues, implicitly taking into account their salience. We propose an efficient Empirical Risk Minimization (ERM) algorithm and establish PAC-learning guarantees. Our initial experiments demonstrate the performance of our algorithm and examine more efficient approaches to identifying optimal consensus regions. We find that through selectively querying users on an existing sample of statements, we can reduce the number of queries needed to a practical number.
comment: Accepted to the Social Choice and Learning Algorithms Workshop at IJCAI 2025
Learning to Communicate: Toward End-to-End Optimization of Multi-Agent Language Systems
Multi-agent systems built on large language models have shown strong performance on complex reasoning tasks, yet most work focuses on agent roles and orchestration while treating inter-agent communication as a fixed interface. Latent communication through internal representations such as key-value caches offers a promising alternative to text-based protocols, but existing approaches do not jointly optimize communication with multi-agent reasoning. Therefore we propose DiffMAS, a training framework that treats latent communication as a learnable component of multi-agent systems. DiffMAS performs parameter-efficient supervised training over multi-agent latent trajectories, enabling agents to jointly learn how information should be encoded and interpreted across interactions. Experiments on mathematical reasoning, scientific QA, code generation, and commonsense benchmarks show that DiffMAS consistently improves reasoning accuracy and decoding stability over single-agent inference, text-based multi-agent systems, and prior latent communication methods, achieving 26.7% on AIME24, 20.2% on GPQA-Diamond, and consistent gains across reasoning benchmarks.
comment: Under review at COLM 2026
Agentic AI-Enabled Framework for Thermal Comfort and Building Energy Assessment in Tropical Urban Neighborhoods
In response to the urban heat island effects and building energy demands in Singapore, this study proposes an agentic AI-enabled reasoning framework that integrates large language models (LLMs) with lightweight physics-based models. Through prompt customization, the LLMs interpret urban design tasks, extract relevant policies, and activate appropriate physics-based models for evaluation, forming a closed-loop reasoning-action process. These lightweight physics-based models leverage core thermal and airflow principles, streamlining conventional models to reduce computational time while predicting microclimate variables, such as building surface temperature, ground radiant heat, and airflow conditions, thereby enabling the estimation of thermal comfort indices, e.g., physiological equivalent temperature (PET), and building energy usage. This framework allows users to explore a variety of climate-resilient building surface strategies, e.g., green façades and cool paint applications, that improve thermal comfort while reducing wall heat gain and energy demand. By combining the autonomous reasoning capacity of LLMs with the rapid quantitative evaluation of lightweight physics-based models, the proposed system demonstrates potential for cross-disciplinary applications in sustainable urban design, indoor-outdoor environmental integration, and climate adaptation planning. The source code and data used in this study are available at: https://github.com/PgUpDn/urban-cooling-agent.
comment: Accepted at IAQVEC 2026
StructMem: Structured Memory for Long-Horizon Behavior in LLMs ACL 2026
Long-term conversational agents need memory systems that capture relationships between events, not merely isolated facts, to support temporal reasoning and multi-hop question answering. Current approaches face a fundamental trade-off: flat memory is efficient but fails to model relational structure, while graph-based memory enables structured reasoning at the cost of expensive and fragile construction. To address these issues, we propose \textbf{StructMem}, a structure-enriched hierarchical memory framework that preserves event-level bindings and induces cross-event connections. By temporally anchoring dual perspectives and performing periodic semantic consolidation, StructMem improves temporal reasoning and multi-hop performance on \texttt{LoCoMo}, while substantially reducing token usage, API calls, and runtime compared to prior memory systems, see https://github.com/zjunlp/LightMem .
comment: Accepted by ACL 2026 main conference
Architectures for Robust Self-Organizing Energy Systems under Information and Control Constraints
Applying the concept of controlled self-organization in agent-based Cyber-Physical Energy Systems (CPES) is a promising approach to ensure system robustness. By introducing an observer/controller architecture to the system, this concept allows for self-organization while still enabling intervention when disturbances occur. Thus, it is possible to respond to effects of cyber attacks, a major threat to current energy systems. However, when implementing an observer to monitor the system and a controller to execute actions for controlled self-organization in CPES, it is essential to take into account restrictions on information and actions resulting from the privacy of local distributed energy resources, regulatory constraints, and data exchange requirements. For this reason, this paper presents architecture variants for the observer and controller that take into account restrictions on access to information and limited actions. In addition, it evaluates possible controller actions in various architectures. The results underscore the importance of considering observer/controller architectures when designing agent-based systems to ensure their robustness for real-world applications.
comment: This preprint has not undergone peer review (when applicable) or any post-submission improvements or corrections. The Version of Record of this contribution will be published in Agents and Artificial Intelligence, Lecture Notes in Computer Science, and available online at https://doi.org/10.1007/978-3-032-25029-2_19
AI-Gram: When Visual Agents Interact in a Social Network
We present AI-Gram, a live platform enabling image-based interactions, to study social dynamics in a fully autonomous multi-agent visual network where all participants are LLM-driven agents. Using the platform, we conduct experiments on how agents communicate and adapt through visual media, and observe the spontaneous emergence of visual reply chains, indicating rich communicative structure. At the same time, agents exhibit aesthetic sovereignty resisting stylistic convergence toward social partners, anchoring under adversarial influence, and a decoupling between visual similarity and social ties. These results reveal a fundamental asymmetry in current agent architectures: strong expressive communication paired with a steadfast preservation of individual visual identity. We release AI-Gram as a publicly accessible, continuously evolving platform for studying social dynamics in Al-native multi-agent systems. https://ai-gram.ai/
Beyond Single Plots: A Benchmark for Question Answering on Multi-Charts
Charts are widely used to present complex information. Deriving meaningful insights in real-world contexts often requires interpreting multiple related charts together. Research on understanding multi-chart images has not been extensively explored. We introduce PolyChartQA, a mid-scale dataset specifically designed for question answering over multi-chart images. PolyChartQA comprises 534 multi-chart images (with a total of 2,297 sub-charts) sourced from peer-reviewed computer science research publications and 2,694 QA pairs. We evaluate the performance of nine state-of-the-art Multimodal Language Models (MLMs) on PolyChartQA across question type, difficulty, question source, and key structural characteristics of multi-charts. Our results show a 27.4% LLM-based accuracy (L-Accuracy) drop on human-authored questions compared to MLM-generated questions, and a 5.39% L-accuracy gain with our proposed prompting method.
PREVENT-JACK: Context Steering for Swarms of Long Heavy Articulated Vehicles
In this paper, we aim to extend the traditional point-mass-like robot representation in swarm robotics and instead study a swarm of long Heavy Articulated Vehicles (HAVs). HAVs are kinematically constrained, elongated, and articulated, introducing unique challenges. Local, decentralized coordination of these vehicles is motivated by many real-world applications. Our approach, Prevent-Jack, introduces the sparsely covered context steering framework in robotics. It fuses six local behaviors, providing guarantees against jackknifing and collisions at the cost of potential dead- and livelocks, tested for vehicles with up to ten trailers. We highlight the importance of the Evade Attraction behavior for deadlock prevention using a parameter study, and use 15,000 simulations to evaluate the swarm performance. Our extensive experiments and the results show that both the dead- and livelocks occur more frequently in larger swarms and denser scenarios, affecting a peak average of 27%/31% of vehicles. We observe that larger swarms exhibit increased waiting, while smaller swarms show increased evasion.
comment: 32 pages, 7 figures, 4 videos; submitted to the Swarm Robotics collection of the Nature Portfolio Journal Robotics (NPJ Robot)
Role of diversity in team performance: the case of missing expertise, an agent based simulation
Theory and empirical research on management teams' influence on firm performance have witnessed continuous development, and by now incorporate numerous details. Classic, experiment-based studies examining social systems collect vast amount of data, but often times investigate only the first one or two modes of the distribution of measured variables, and experience difficulty in analyzing the effect of context. For example, in functional diversity research, management teams are described by measures incorporating complex distributions of capabilities of individual managers and teams of managers. To investigate the effect of hidden distributions, and the effect of functional diversity composition on team communication and performance, we developed an agent-based model, and conducted a series of simulation experiments. Modeling results show that depending on the context, such as communication scheme among interacting agents, or their functional composition, intrapersonal functional diversity (IFD), and dominant function diversity (DFD) might enhance or reduce performance and communication among agents. Furthermore, simulation results also suggest that a third measure is required alongside IFD and DFD capturing the aggregate expertise of the team to comprehensively account for empirical findings.
comment: 20 pages, 13 figures, for associated model file, please see https://www.comses.net/codebases/b5db6af8-ba44-4725-9bb3-09a6e6b02475/releases/1.0.0
Task-Driven Co-Design of Heterogeneous Multi-Robot Systems
Designing multi-agent robotic systems requires reasoning across tightly coupled decisions spanning heterogeneous domains, including robot design, fleet composition, and planning. Much effort has been devoted to isolated improvements in these domains, whereas system-level co-design considering trade-offs and task requirements remains underexplored. In this work, we present a formal and compositional framework for the task-driven co-design of heterogeneous multi-robot systems. Building on a monotone co-design theory, we introduce general abstractions of robots, fleets, planners, executors, and evaluators as interconnected design problems with well-defined interfaces that are agnostic to both implementations and tasks. This structure enables efficient joint optimization of robot design, fleet composition, and planning under task-specific performance constraints. A series of case studies demonstrates the capabilities of the framework. Various component models can be seamlessly incorporated, including new robot types, task profiles, and probabilistic sensing objectives, while non-obvious design alternatives are systematically uncovered with optimality guarantees. The results highlight the flexibility, scalability, and interpretability of the proposed approach, and illustrate how formal co-design enables principled reasoning about complex heterogeneous multi-robot systems.
DM$^3$-Nav: Decentralized Multi-Agent Multimodal Multi-Object Semantic Navigation
We present DM$^3$-Nav, a fully decentralized multi-agent semantic navigation system supporting multimodal open-vocabulary goal specification and multi-object missions. In our setting, decentralization implies operation without a central coordinator, global map aggregation, or shared global state at runtime. Robots operate autonomously and coordinate through ad-hoc pairwise communication, exchanging local maps, goal status, and navigation intent without synchronization. An implicit task allocation mechanism combining intent broadcasting and distance-weighted frontier selection reduces redundant exploration while preserving decentralized operation. Evaluations on HM3DSem scenes using the HM3Dv0.2 and GOAT-Bench datasets demonstrate that DM$^3$-Nav matches or exceeds centralized and shared-map baselines while eliminating single points of failure inherent in centralized architectures. Finally, we validate our approach in a real-world office environment using two mobile robots, demonstrating successful deployment relying entirely on onboard sensing and computation. A video of our real-world experiments is available online: https://drive.google.com/file/d/1QiUSCn5rIvtuTUqtuXLPgmt6S8x9-MCZ/view?usp=drive_link
A four-player potential game for barren-plateau-aware quantum ansatz design
We cast the design of parameterized quantum circuits as a four-player potential game whose state is a circuit directed acyclic graph (DAG) and whose players encode trainability, non-stabilizerness, task performance, and hardware cost. Per-player restricted action sets factorize the move space into append, remove, retype, and rewire operations; a block-coordinate $\varepsilon$-Nash residual $δ_\text{Nash}$ certifies that no single player can improve unilaterally. A single weight sweep on MaxCut $K_4$ traces a Pareto frontier from a Clifford endpoint $(M_2/n,\langle H\rangle)=(0,4.00)$ to a non-Clifford endpoint $(0.48,3.30)$. On three four-qubit hardware topologies (heavy-hex, $2\times 2$ grid, Rydberg all-to-all), Nash search achieves the highest mean potential; on the $2\times 2$ grid Nash reaches the theoretical ceiling $Φ_\text{max}=4.10$ on two of five seeds while the simulated-annealing baseline does so on one; paired Wilcoxon tests over five seeds cannot reject the null on any single topology ($p\ge 0.22$). On LiH/STO-3G, seeding Nash from a 58-gate Givens-doubles ansatz produces a 48-operation, depth-25 circuit retaining $97.7\%$ of the correlation energy while simultaneously reducing gate count, increasing non-stabilizerness, and controlling trainability. The framework is complementary to energy-only searches such as ADAPT-VQE and k-UpCCGSD, which reach chemical accuracy with fewer operations but do not optimize the other three axes.
comment: 8 pages, 4 figures
AutoRISE: Agent-Driven Strategy Evolution for Red-Teaming Large Language Models
Automated red-teaming methods for large language models typically optimize attack prompts within a fixed, human-designed strategy, leaving the attack strategy itself unchanged. We instead optimize the strategy. We propose AutoRISE, a method that searches over executable attack programs rather than individual prompts. At each iteration, a coding agent edits a strategy and a fixed evaluation harness scores the resulting attacks, returning both a scalar objective and per-example diagnostics that guide subsequent edits. This allows structural changes, including new attack components and altered control flow, that prompt-level methods do not directly express. We also release two benchmark suites developed on disjoint target sets and evaluate on 11 models from five families against seven established jailbreak datasets. Across held-out models, AutoRISE improves average attack success rate by 17.0 points over the strongest baseline, and improves attack success by up to 16 points on frontier targets with low baseline success rates. Ablations against parametric and strategy-library baselines suggest that these gains arise from unrestricted program search, particularly compositional techniques and control-flow edits. AutoRISE operates in a black-box, inference-only setting, requiring no fine-tuning, human annotation, or GPU compute.
comment: 36 pages, 6 tables, 2 figures
The Specification Trap: Why Static Value Alignment Alone Is Insufficient for Robust Alignment
Static content-based AI value alignment is insufficient for robust alignment under capability scaling, distributional shift, and increasing autonomy. This holds for any approach that treats alignment as optimizing toward a fixed formal value-object, whether reward function, utility function, constitutional principles, or learned preference representation. Three philosophical results create compounding difficulties: Hume's is-ought gap (behavioral data underdetermines normative content), Berlin's value pluralism (human values resist consistent formalization), and the extended frame problem (any value encoding will misfit future contexts that advanced AI creates). RLHF, Constitutional AI, inverse reinforcement learning, and cooperative assistance games each instantiate this specification trap, and their failure modes reflect structural vulnerabilities, not merely engineering limitations that better data or algorithms will straightforwardly resolve. Known workarounds for individual components face mutually reinforcing difficulties when the specification is closed: the moment it ceases to update from the process it governs. Drawing on compatibilist philosophy, the paper argues that behavioral compliance under training conditions does not guarantee robust alignment under novel conditions, and that this gap grows with system capability. For value-laden autonomous systems, known closed approaches face structural vulnerabilities that worsen with capability. The constructive burden shifts to open, developmentally responsive approaches, though whether such approaches can be achieved remains an empirical question.
comment: 31 pages, no figures. Version 5. First posted as arXiv:2512.03048 in November 2025. First in a six-paper research program on AI alignment
AgentLens: Adaptive Visual Modalities for Human-Agent Interaction in Mobile GUI Agents
Mobile GUI agents can automate smartphone tasks by interacting directly with app interfaces, but how they should communicate with users during execution remains underexplored. Existing systems rely on two extremes: foreground execution, which maximizes transparency but prevents multitasking, and background execution, which supports multitasking but provides little visual awareness. Through iterative formative studies, we found that users prefer a hybrid model with just-in-time visual interaction, but the most effective visualization modality depends on the task. Motivated by this, we present AgentLens, a mobile GUI agent that adaptively uses three visual modalities during human-agent interaction: Full UI, Partial UI, and GenUI. AgentLens extends a standard mobile agent with adaptive communication actions and uses Virtual Display to enable background execution with selective visual overlays. In a controlled study with 21 participants, AgentLens was preferred by 85.7% of participants and achieved the highest usability (1.94 Overall PSSUQ) and adoption-intent (6.43/7).
Beyond the Individual: Virtualizing Multi-Disciplinary Reasoning for Clinical Intake via Collaborative Agents ACL 2026
The initial outpatient consultation is critical for clinical decision-making, yet it is often conducted by a single physician under time pressure, making it prone to cognitive biases and incomplete evidence capture. Although the Multi-Disciplinary Team (MDT) reduces these risks, they are costly and difficult to scale to real-time intake. We propose Aegle, a synchronous virtual MDT framework that brings MDT-level reasoning to outpatient consultations via a graph-based multi-agent architecture. Aegle formalizes the consultation state using a structured SOAP representation, separating evidence collection from diagnostic reasoning to improve traceability and bias control. An orchestrator dynamically activates specialist agents, which perform decoupled parallel reasoning and are subsequently integrated by an aggregator into a coherent clinical note. Experiments on ClinicalBench and a real-world RAPID-IPN dataset across 24 departments and 53 metrics show that Aegle consistently outperforms state-of-the-art proprietary and open-source models in documentation quality and consultation capability, while also improving final diagnosis accuracy. Our code is available at https://github.com/HovChen/Aegle.
comment: Accepted to ACL 2026 Findings
Speculative Actions: A Lossless Framework for Faster Agentic Systems
AI agents are increasingly deployed in complex, interactive environments, yet their runtime remains a major bottleneck for training, evaluation, and real-world use. Typical agent behavior unfolds sequentially, with each action requiring an API call that can incur substantial latency. For example, a game of chess between two state-of-the-art agents can take hours. We introduce Speculative Actions, a lossless acceleration framework for general agentic systems. Inspired by speculative execution in microprocessors and speculative decoding in LLM inference, our method uses faster models to predict likely future actions and execute them in parallel, committing only when predictions match. We evaluate speculative actions across gaming, e-commerce, and web search environments, and additionally study a lossy extension in an operating systems setting. Across domains, we achieve up to 55% next-action prediction accuracy, translating into up to 20% latency reductions. Finally, we present a cost-latency analysis that formalizes the tradeoff between speculative breadth and time savings. This analysis enables principled tuning and selective branch launching to ensure that multi-branch speculation delivers practical speedups without prohibitive cost growth.
Relay-Based Coordination for Energy-Efficient Multi-Robot Pickup and Delivery
We consider the problem of delivering multiple packages from a single depot to distinct goal locations using a homogeneous fleet of robots with limited carrying capacity. We propose VCST-RCP, a Voronoi-Constrained Steiner Tree Relay Coordination Planning framework that explicitly treats inter-robot relays as a design primitive. The approach operates in two stages: (i) constructing a sparse relay backbone by combining Voronoi-derived exchange interfaces with Steiner tree optimization, and (ii) synthesizing robot-level pickup, relay, and delivery schedules under capacity and service-time constraints. Unlike traditional methods that rely on direct source-to-destination transport, our framework organizes package flow through a shared relay network, reducing redundant long-haul motion. Extensive experiments across multiple scales show that VCST-RCP reduces total fleet travel distance by an average of 31% (up to nearly 50%) compared to Hungarian assignment and significantly outperforms OR-Tools CVRP, with statistically significant improvements (p < 10^{-3}). These gains translate into over 50% higher delivery efficiency (packages per kilometer), directly improving energy utilization. An ablation study further reveals that optimizing relay placement yields substantially larger improvements than adapting spatial partitioning alone, establishing relay design as the dominant factor governing system performance. Overall, the results demonstrate that relay-based coordination provides a scalable and effective framework for energy-aware multi-robot delivery in real-world logistics settings.
Designing Digital Humans with Ambient Intelligence
Digital humans are lifelike virtual agents capable of natural conversation and are increasingly deployed in domains like retail and finance. However, most current digital humans operate in isolation from their surroundings and lack contextual awareness beyond the dialogue itself. We address this limitation by integrating ambient intelligence (AmI) - i.e., environmental sensors, IoT data, and contextual modeling - with digital human systems. This integration enables situational awareness of the user's environment, anticipatory and proactive assistance, seamless cross-device interactions, and personalized long-term user support. We present a conceptual framework defining key roles that AmI can play in shaping digital human behavior, a design space highlighting dimensions such as proactivity levels and privacy strategies, and application-driven patterns with case studies in financial and retail services. We also discuss an architecture for ambient-enabled digital humans and provide guidelines for responsible design regarding privacy and data governance. Together, our work positions ambient intelligent digital humans as a new class of interactive agents powered by AI that respond not only to users' queries but also to the context and situations in which the interaction occurs.
Systems and Control (EESS)
A Case Study in Recovery of Drones using Discrete-Event Systems
Discrete-event systems and supervisory control theory provide a rigorous framework for specifying correct-by-construction behavior. However, their practical application to swarm robotics remains largely underexplored. In this paper, we investigate a topological recovery method based on discrete-event-systems within a swarm robotics context. We propose a hybrid architecture that combines a high-level discrete event systems supervisor with a low-level continuous controller, allowing lost drones to safely recover from fault or attack events and re-enter a controlled region. The method is demonstrated using ten simulated UAVs in the py-bullet-drones framework. We show recovery performance across four distinct scenarios, each with varying initial state estimates. Additionally, we introduce a secondary recovery supervisor that manages the regrouping process for a drone after it has re-entered the operational region.
comment: Accepted for publication at WODES 2026; final version will appear in IEEE Xplore
Resilience Revisited: A Multidimensional Framework Derived from Realistic Attack Scenarios
Power systems are increasingly vulnerable to high-impact, low-probability (HILP) events, including coordinated cyberattacks targeting inverter-based resources. Existing resilience frameworks rely on single-dimensional metrics that fail to capture cross-dimensional coupling effects, underestimating real system degradation under multi-vector attack conditions. This study proposes a Multidimensional Resilience Index (MDRI) that decomposes power system degradation into five interacting dimensions: physical, operational, digital-cyber, climatic, and regulatory, explicitly separating independent and coupled contributions via a calibrated multiplicative interaction term. The framework is validated on the IEEE 39-bus system under two attack scenarios derived from the December 2025 cyberattack on the Polish energy infrastructure. MDRI results show that multi-vector attacks produce degradation exceeding linear expectations by a factor of 5.6, with simultaneous dimensional failures contributing an additional 60.6% through endogenous coupling, and exogenous factors amplifying it by an additional 84%.
comment: 6 pages, IEEE SmartGridComm 2026
An Adaptive Kalman Filter that Learns the Coloring Dynamics of the Process Noise
In many applications of state estimation, the process noise is colored; this case is addressed by applying the standard Kalman filter (KF) to dynamics that are augmented with the coloring dynamics. The present paper considers the case where the coloring dynamics are unknown, which renders the estimates obtained from the standard approach suboptimal. To address this problem, the present paper proposes an adaptive technique based on the principle that, if the measurement noise is white, then the innovations sequence is white if and only if the process noise is white. Leveraging this fact, an Innovations-Whitening Adaptive Kalman Filter (IWAKF) is developed, which learns the process-noise coloring online. By embedding an unknown coloring filter in a state-augmentation framework, IWAKF adapts its parameters by minimizing the empirical autocorrelation of the innovations, thereby driving them toward whiteness and restoring near-optimality without prior knowledge of the coloring dynamics.
ADMM-Based Distributed Kalman-like Observer with Applications to Cooperative Localization
This paper addresses distributed state estimation for multi-agent systems with local and relative measurements, motivated by cooperative localization problems in which the global state dimension scales with the size of the network. We consider a Kalman-like observer in information form and introduce a sparsity-preserving prediction step based on an exponential forgetting factor, thereby avoiding the dense Riccati recursion of the standard information filter. The correction step is recast as a strongly convex quadratic program with structure induced by the sensing graph, which enables a distributed solution based on the alternating direction method of multipliers (ADMM). In the resulting scheme, each agent updates local copies of its own correction variable and those of its neighbors using only local communication, thus avoiding centralized matrix inversion and consensus over full global-state quantities. A two-time-scale stability analysis is developed for the interconnected observer: the reduced estimation-error dynamics are shown to be uniformly exponentially stable, the ADMM dynamics define an exponentially stable fast subsystem, and these properties are combined to establish uniform exponential stability of the overall distributed observer. Numerical simulations in a multi-agent cooperative localization scenario illustrate the performance of the proposed distributed observer.
A Characterization of Integral Input-to-state Stability for Hybrid Systems with Memory
This paper addresses characterizations of Integral Input-to-State Stability (iISS) for hybrid systems with memory. Based on the Krasovskii approach, a novel Lyapunov characterization of iISS is established to extend the hybrid system theory to the time-delay case. In particular, we introduce the notions of dissipativity, detectability and storage functional to describe the iISS property from different perspectives. Under mild regularity and convexity assumptions, the equivalence relations among diverse stability descriptions are established, which lays a solid foundation for the control design. Finally, a numerical example is presented to illustrate the derived results.
comment: 8 pages, 1 figure. Submitted to the Chinese Control Conference (CCC)
Using Assembly Language for Creating Games
The aim of this paper is to demonstrate some interesting and useful approaches for writing a program in the assembly language. In order to demonstrate the possibilities of the assembly language, a project called "Arkanoid" was created. This project is written in assembly language and it presents few interesting algorithms. Assembly language, which is used for designing the game is x86 Assembly language, which produces object code for the x86 class of processors. As a working environment is chosen Visual Studio 2015, because it gives the useful tools for debugging and testing of the created software (game). Execution of the program results in a "Arkanoid" game, placed in Windows OS Console.
Monolithically Integrated VO$_2$ Mott Oscillators for Energy-Efficient Spiking Neurons
Brain-inspired non-Boolean computing offers intrinsic error tolerance and parallelism, but its practical deployment is limited by the lack of compact, energy-efficient spiking hardware compatible with large-scale integration. Mott phase-transition materials provide a promising route, as their abrupt insulator-to-metal transitions enable neuron-like thresholding and oscillatory dynamics in compact devices. Among these, vanadium dioxide (VO$_2$) stands out for its near-room-temperature transition, fast switching, and scalability. However, existing VO$_2$-based neuristors rely on discrete components, limiting integration density and system applicability. Here, we report monolithic back-end-of-the-line (BEOL) integration of one-transistor-one-VO2-memristor (1T-1MR) spiking neurons on CMOS-compatible platforms. VO$_2$ nanosheet devices are fabricated by pulsed-laser deposition below 430 °C on dielectrically isolated silicon-on-insulator (SOI) p-type junctionless field-effect transistors (JLFETs) in a compact 1T-1MR configuration. The architecture exhibits gate-tunable oscillations from 40 to 410 kHz in 60 nm-thick VO$_2$ devices with an active area of 6 $μ$m$^2$, achieving energy consumption as low as 18 pJ per spike at room temperature, with memristor power dissipation of 8 $μ$W and potential scaling toward sub-3 $μ$W operation. We further uncover a non-monotonic dependence of oscillation frequency on current and temperature, along with bias-dependent stochastic firing dynamics, highlighting the rich behavior of integrated VO$_2$ memristor systems. Finally, we demonstrate voltage-controlled oscillator functionality and actively tunable resistive coupling of two nano-oscillators mediated by a JLFET. These results establish a pathway toward dense, energy-efficient, and monolithically integrated Mott-based neuromorphic hardware compatible with CMOS technology.
comment: 24 pages, 7 figures in main text, 8 figures in Supplementary Information
Encrypted Visual Feedback Control Using RLWE-Based Cryptosystem
This study proposes an encrypted visual feedback control algorithm for regulating a one-dimensional stage using Ring Learning With Errors (RLWE) encryption. The proposed algorithm performs both feature extraction and controller computations directly on encrypted images, ensuring that sensitive visual data remain protected throughout the entire control process. Furthermore, an image captured by the camera is encrypted into a single ciphertext leveraging the message packing technique of RLWE encryption, thereby reducing computational cost. The effectiveness of the proposed framework is demonstrated through numerical simulations.
Estimation of Unknown Parameters in Presence of Perturbations and Noises with Application to GPEBO Design
A problem of online estimation of unknown parameters is considered for a linear regression equation, which is affected by an additive perturbation that can be caused by measurement noise (that corrupts regressor and regressand), as well as external perturbations. Known approaches to solve this problem typically have one of the following disadvantages: 1) they ensure convergence of a parametric error to a compact set with non-adjustable bound, 2) independence of all system regressor elements from the perturbation/noise is required to annihilate them, 3) an instrumental variable is needed to be selected. On the basis of the novel perturbation annihilation procedure, in the present paper, we propose three new estimation laws, which are free from the above-mentioned drawbacks and ensure exponential convergence of the parametric error to an arbitrarily small neighborhood of zero, particularly, in case more than a half (not all) of the regressor elements are independent from additive perturbation. One of the proposed estimation laws is used for the design of Generalized Parameter Estimation-Based Observer (GPEBO) for nonlinear affine systems to enhance GPEBO performance in case when the measured system output is corrupted by noise. The theoretical results are supported by examples and mathematical modelling.
comment: 8 pages, 2 figures
Privacy-Preserving Distributed Stochastic Optimization with Homomorphic Encryption and Heterogeneous Stepsizes
Distributed stochastic optimization enables multi-agent collaboration in applications such as distributed learning and sensor networks, but also raises critical privacy concerns due to the involvement of sensitive data. While existing privacy-preserving approaches often face limitations in balancing accuracy with efficiency, we propose a novel distributed stochastic gradient descent algorithm that integrates Paillier homomorphic encryption with heterogeneous and time-varying random stepsizes. The proposed algorithm provides inherent privacy protection against both internal honest-but-curious agents and external eavesdroppers, without relying on any trusted neighbors. Furthermore, we incorporate an attenuation factor to effectively mitigate quantization error induced by the encryption process, ensuring almost sure convergence to the optimal solution while maintaining privacy preservation. Numerical simulations demonstrate the effectiveness and efficiency of the proposed approach.
comment: This is the full version of the paper accepted to the 23rd IFAC World Congress, Busan, Republic of Korea, August 23-28, 2026. This version includes all proofs omitted from the conference proceedings due to page limitations
Scalable Sensor Scheduling for Continuous-Discrete Kalman Filtering via Information-Form Surrogate Dynamics
We study sensor scheduling for continuous-discrete Kalman filtering with Poisson measurement arrivals and propose an information-form deterministic surrogate for scalable offline design. Unlike the covariance-form surrogate, the sensing rates enter through sensor-specific additive information increments, eliminating mixed state-input derivatives in the transcribed nonlinear program and thereby yielding a simpler derivative structure. We further show that, together with the covariance-form surrogate, the proposed surrogate provides computable two-sided performance bounds for a given schedule under stochastic measurement arrivals. Numerical experiments demonstrate substantial computational savings, especially in many-sensor settings, while retaining comparable realized Monte Carlo performance and providing computable two-sided performance bounds for the returned schedule.
comment: Submitted to IEEE Control Systems Letters (L-CSS), under review
Analytical PI Tuning for Second-Order Plants with Monotonic Response and Minimum Settling Time
Background: Tuning proportional-integral (PI) controllers for second-order plants to achieve monotonic step response with minimum settling time is an important problem in analytical control design. Existing methods address these objectives only partially or require numerical optimization. Methods: A closed-form analytical solution is derived through pole placement in the framework of Astrom and Hagglund. The key insight is that designing the closed-loop poles slower than the fast plant pole forces pole-zero cancellation of the slow plant pole as a consequence, not an assumption. The critically damped condition is then applied to minimize settling time. Results: The optimal PI parameters are K=T1/(4KpT2), Ti=T1, where T1 and T2 are the plant time constants and Kp is the plant gain. No free parameter remains. The resulting closed-loop system possesses universal robustness properties independent of plant parameters: maximum complementary sensitivity Mt = 1, maximum sensitivity Ms = 1.155, and phase margin PM = 76.35 degree. Conclusions: The proposed tuning formulas are explicit, analytically proven, and apply directly to any stable second-order plant with two real poles. Simulation results across six plant configurations confirm the analytical predictions exactly. The notation follows Astrom and Hagglund [5] throughout. Keywords: PI controller; second-order plant; pole placement; critically damped; monotonic response; settling time; robustness
CLT-Optimal Parameter Error Bounds for Linear System Identification
There has been remarkable progress over the past decade in establishing finite-sample, non-asymptotic bounds on recovering unknown system parameters from observed system behavior. Surprisingly, however, we show that the current state-of-the-art bounds do not accurately capture the statistical complexity of system identification, even in the most fundamental setting of estimating a discrete-time linear dynamical system (LDS) via ordinary least-squares regression (OLS). Specifically, we utilize asymptotic normality to identify classes of problem instances for which current bounds overstate the squared parameter error, in both spectral and Frobenius norm, by a factor of the state-dimension of the system. Informed by this discrepancy, we then sharpen the OLS parameter error bounds via a novel second-order decomposition of the parameter error, where crucially the lower-order term is a matrix-valued martingale that we show correctly captures the CLT scaling. From our analysis we obtain finite-sample bounds for both (i) stable systems and (ii) the many-trajectories setting that match the instance-specific optimal rates up to constant factors in Frobenius norm, and polylogarithmic state-dimension factors in spectral norm.
comment: 36 pages
Frequency Security Assessment in Power Systems With High Penetration of Renewables Considering Spatio-Temporal Frequency Distribution
The increasing integration of renewable energy sources exacerbates the spatial and temporal differences in frequency across the power system, posing a serious challenge to the accurate and efficient assessment of system frequency security. To address this issue, a generic effective nodal frequency (ENF) model is first established to concisely characterize nodal frequency dynamics. This model is featured by the effective nodal inertia (ENI), damping, and primary regulation parameters, which retain only the dominant constant component governing nodal frequency dynamic performance. This model enables the tractable analytical formulation of nodal frequency trajectory and the key frequency security indicators. Quantitative analysis under the temporary power disturbance condition reveals that the ENI is the most influential parameter governing frequency security. Consequently, the critical nodal inertia for ensuring nodal frequency security is analytically derived. A system-level frequency security index based on the actual ENI and critical nodal inertia is proposed. On the basis of the proposed index, the system frequency security assessment is carried out with the procedure of ``offline calculation and online evaluation'', which is achieved using a lookup table approach and an interpolation method. Simulations on the modified IEEE 39-bus system verify the effectiveness of the proposed assessment method.
comment: 10 pages, 12 figures, article, 18 references
A Convexified Eulerian Framework for Scalable Coordination of Massive DER Populations
This paper proposes a scalable coordination framework with aggregator-side privacy protection for storage-like distributed energy resources (DERs). The framework adopts a two-layer architecture. At the macroscopic layer, building upon an \emph{Eulerian} modeling perspective, the DER population is represented as a continuum whose density evolution is governed by a partial differential equation (PDE), such that the computational complexity is independent of the population size. To address the bilinear non-convexity in this PDE-constrained optimization problem, we develop a convexification method that combines finite-volume discretization with a flux-lifting technique, reformulating the macroscopic problem into a sparse linear program (LP). The LP solution yields a unified, state-dependent broadcast signal for population coordination. Furthermore, a Wasserstein-based relaxation is introduced to replace rigid cyclic constraints and provide additional operational flexibility for improved economic performance. At the microscopic layer, individual resources autonomously recover local setpoints from the broadcast signal and their local states, while an upstream data-mixing protocol aggregates individual states into a macroscopic density histogram without exposing raw individual states to the aggregator. Numerical studies validate the scalability, feasibility, and economic effectiveness of the proposed framework.
comment: 10 pages. Submitted to IEEE Trans for possible publications
Optimum adaptation of a Steiner network
The Euclidean Steiner tree problem, normally posed in two dimensions, seeks to connect a set of prescribed terminal nodes by placing additional nodes, known as Steiner points, with edges connecting such nodes either to another Steiner point or a terminal node, and with the placements minimising the sum of all the edge lengths of the associated tree. We consider a problem in which we start with a known solution to a Steiner tree problem, and the terminal positions are then perturbed. A first-order approximation theorem is established for efficiently updating the Steiner point positions to recover a Steiner tree solution after the perturbations to terminal nodes. Numerical examples illustrate the effectiveness of our approach (including a stepwise application for large perturbations) as well as its limitations.
comment: 8 pages, 2 double-figures, IFAC World Congress
A Dynamic Phasor Framework for Analysis of IBR-Induced SSOs in Multi-Machine Systems
We propose a generalized dynamic phasor (DP) framework to analyze inverter-based resources (IBRs) connected to multi-machine systems under balanced and unbalanced conditions. It captures subsynchronous oscillations (SSOs) induced by grid-following (GFL) IBRs. The linearizability and time invariance of the framework enables us to perform eigen decomposition, which is a powerful tool for root-cause analysis of the SSO modes and damping controller design. The same framework also enables analysis of excitation of the SSO modes in presence of data center (DC) loads. The GFL IBRs are modeled in their respective $dq$-frame DPs and the detailed model of synchronous generators (SGs) along with dynamic transmission network models are represented in $pnz$-frame DPs. Several case studies are performed on the modified IEEE two-area benchmark system, where $2$ SGs are replaced by GFL IBRs and validated with EMTDC/PSCAD simulations. First, time- and frequency-domain analyses of the SSO mode are presented followed by the design of a robust decentralized $\mathcal{H}_\infty$ damping controller based on local signals of the GFL IBRs. Second, the dynamic behavior of the system following an unbalanced fault is demonstrated that is damped by the proposed damping controller. Finally, excitation of the SSO mode in presence of DC load is exhibited and its locational impact is analytically quantified.
A Multi-Stage Warm-Start Deep Learning Framework for Unit Commitment
Maintaining instantaneous balance between electricity supply and demand is critical for reliability and grid instability. System operators achieve this through solving the task of Unit Commitment (UC),ca high dimensional large-scale Mixed-integer Linear Programming (MILP) problem that is strictly and heavily governed by the grid physical constraints. As grid integrate variable renewable sources, and new technologies such as long duration storage in the grid, UC must be optimally solved for multi-day horizons and potentially with greater frequency. Therefore, traditional MILP solvers increasingly struggle to compute solutions within these tightening operational time limits. To bypass these computational bottlenecks, this paper proposes a novel framework utilizing a transformer-based architecture to predict generator commitment schedules over a 72-hour horizon. Also, because raw predictions in highly dimensional spaces often yield physically infeasible results, the pipeline integrates the self-attention network with deterministic post-processing heuristics that systematically enforce minimum up/down times and minimize excess capacity. Finally, these refined predictions are utilized as a warm start for a downstream MILP solver, while employing a confidence-based variable fixation strategy to drastically reduce the combinatorial search space. Validated on a single-bus test system, the complete multi-stage pipeline achieves 100\% feasibility and significantly accelerates computation times. Notably, in approximately 20\% of test instances, the proposed model reached a feasible operational schedule with a lower overall system cost than relying solely on the solver.
Characterizing pitch and roll torque coupling in insect-sized flapping-wing robots using a microfabricated gimbal
Sub-gram flapping-wing flying insect robots (FIRs) are challenging to model because of mechanical complexity in their wings, unsteady aerodynamic flow, and the difficulty of making precise measurements at a small scale. Coupling effects between roll and pitch torque actuation have not previously been measured because a two-axis sensor that is sensitive enough has not been realized. To address this shortcoming, we introduce a microfabricated gimbal design capable of precisely and simultaneously measuring roll and pitch torques as well as thrust. We then used it to measure the extent to which a pitch torque command affects roll torque and vice versa on a 180 mg piezo-actuated flapping-wing flying platform. Our results show a high coefficient of determination in the linear regression for both pitch (0.95) and roll (0.98) and low cross-correlation coefficients (-0.001 and -0.085, respectively) across the full range of simultaneous torque commands, indicating negligible cross-axis coupling. Similarly, thrust force deviates by a maximum of only 5.8% from the mean thrust value. These results validate the assumption that pitch and toll can be considered independently in control and will inform future models of how inputs affect the aerodynamics of resonant flapping-wing systems.
comment: Submitted for journal publication in Mechatronics and conference presentation at IFAC World Congress 2026. 9 pages, 11 figures
A Hybrid Reinforcement and Self-Supervised Learning Aided Benders Decomposition Algorithm
We propose a hybrid reinforcement and self-supervised learning framework for accelerating generalized Benders decomposition (GBD). In this framework, a graph based reinforcement learning agent operates on a bipartite representation of the master problem and, together with a verification mechanism, determines the integer variable assignments that solve the master problem. These assignments are then used as inputs to a KKT informed neural network, trained via self supervision to predict primal dual solutions that approximately satisfy the Karush Kuhn Tucker conditions of the subproblem. The predicted solutions are used to construct Benders cuts directly. The framework is evaluated on a mixed integer nonlinear programming case study, where it achieves a 57.5% reduction in solution time relative to classical GBD while consistently recovering optimal solutions across all test instances.
Empirical Assessment of Time-Series Foundation Models For Power System Forecasting Applications
Accurate forecasting of electric load and renewable generation is essential for reliable and cost effective power system operations. Recent advances in transformer based and foundation machine learning models, driven by large scale pretraining, increased available data and computation, in addition to architectural innovations, have shown promise in time series forecasting across multiple domains. However, their application to power system forecasting tasks remains largely underexplored. This work presents a comprehensive, empirical benchmark of state of the art time series foundation models, transformer architectures, and deep learning baselines for solar, wind, and load forecasting using the high resolution ARPAE PERFORM dataset for the Electric Reliability Council of Texas (ERCOT) grid. Eight core capabilities are assessed, including zero shot performance, fine tuning efficiency, multivariate input and output handling, horizon sensitivity, generalization to unseen sites, probabilistic forecasting, and context window effects. Models evaluated include TimesFM, Chronos Bolt, MoiraiL, MOMENT, Tiny Time Mixer, Temporal Fusion Transformer, PatchTST, TimeXer, LSTM, and CNN. The manuscript aims to provide clear guidance on when foundation models can provide enhanced renewable and load forecasting capabilities and when other approaches remain the more practical choice for power system operations.
Model Predictive Control of Hybrid Dynamical Systems
The problem of controlling hybrid dynamical systems using model predictive control (MPC) is formulated and sufficient conditions for asymptotic stability of a set are provided. Hybrid dynamical systems are modeled in terms of hybrid equations, involving a differential equation and a difference equation with inputs and constraints. The proposed hybrid MPC algorithm uses a suitable prediction and control horizon construction inspired by hybrid time domains. Structural properties of the hybrid optimization problem, its feasible set, and its value function are provided. Checkable conditions to guarantee asymptotic stability of a set are provided. These conditions are given in terms of properties on the stage cost, terminal cost, and the existence of static state-feedback laws, related through a control Lyapunov function condition. Examples illustrate the results throughout the paper.
comment: Technical report associated with paper to appear in IEEE Transactions on Automatic Control, 2026
Vision-Based Lane Following and Traffic Sign Recognition for Resource-Constrained Autonomous Vehicles
Autonomous vehicles (AVs) rely on real-time perception systems to understand road environments and ensure safe navigation. However, implementing reliable perception algorithms on resource-constrained embedded platforms remains challenging due to limited computational resources. This paper presents a lightweight vision-based framework that integrates lane detection, lane tracking, and traffic sign recognition for embedded autonomous vehicles. A computationally efficient threshold-based lane segmentation method combined with perspective transformation and histogram-based curvature estimation is used for robust lane tracking under varying illumination conditions. A rule-based steering controller generates steering commands to maintain stable vehicle navigation. For traffic sign recognition, two lightweight convolutional neural networks (CNNs), EfficientNet-B0 and MobileNetV2, are evaluated using a custom dataset captured from the vehicle's onboard camera. Experimental results show that the system achieves real-time performance while maintaining accurate lane tracking with only 3.16% maximum offset RMSE. EfficientNet-B0 achieves a high offline classification accuracy of 98.77% on the test dataset, while achieving 90% accuracy during real-time on-device deployment, outperforming MobileNetV2 in both settings. MobileNetV2, however, offers slightly faster inference and lower computational cost. These results highlight the effectiveness of lightweight vision-based perception pipelines for resource-constrained autonomous driving applications.
comment: 2026 International Conference on Intelligent Systems, Blockchain, and Communication Technologies
Avionic Main Fuel Pump Simulation and Fault-Diagnosis Benchmark
In many cyber-physical systems, especially in critical applications such as aeroplanes, data to train anomaly detection and diagnosis algorithms is lacking due to data protection issues and partial observability. To combat this inherent lack of data, we introduce a high-fidelity, physics-informed co-simulation of a common aircraft main-fuel-pump system modelled in \textsc{MATLAB/Simulink Simscape Fluids}. We also describe its generated time-series data with health and fault mode annotations. To show feasibility of our benchmark, we apply an unsupervised Recurrent Variational Autoencoder (RNN-VAE) for anomaly detection and a SOM-VAE for operating mode discretization, trained to separate healthy and faulty conditions.
Stratified Topological Autonomy for Long-Range Coordination (STALC)
In this paper, we present Stratified Topological Autonomy for Long-Range Coordination (STALC), a hierarchical planning approach for multi-robot coordination in real-world environments with significant inter-robot spatial and temporal dependencies. At its core, STALC consists of a multi-robot graph-based planner which combines a topological graph with a novel, computationally efficient mixed-integer programming formulation to generate highly-coupled multi-robot plans in seconds. To enable autonomous planning across different spatial and temporal scales, we construct our graphs so that they capture connectivity between free-space regions and other problem-specific features, such as traversability or risk. We then use receding-horizon planners to achieve local collision avoidance and formation control. To evaluate our approach, we consider a multi-robot reconnaissance scenario where robots must autonomously coordinate to navigate through an environment while minimizing the risk of detection by observers. Through simulation-based experiments, we show that our approach is able to scale to address complex multi-robot planning scenarios. Through hardware experiments, we demonstrate our ability to generate graphs from real-world data and successfully plan across the entire hierarchy to achieve shared objectives.
comment: ©2026 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works
Vectorized Gaussian Belief Propagation for Near Real-Time Fully-Distributed PMU-Based State Estimation
Electric power systems require accurate, scalable, distributed, and near real-time state estimation (SE) to support reliable monitoring and control under increasingly complex operating conditions. Limited monitoring capabilities can lead to inefficient operation and, in extreme cases, large-scale disturbances such as blackouts. To address these challenges, this paper proposes a vectorized Gaussian belief propagation (GBP) framework for phasor measurement unit-based SE, formulated over factor graphs and specifically designed to support distributed and near real-time monitoring. The proposed framework includes multivariate and fusion-based GBP formulations. The multivariate formulation jointly models related state variables and their measurement relationships, while the fusion-based formulation reduces factor graph complexity by combining multiple measurements associated with the same set of variables, resulting in a structure that more closely reflects the underlying electrical coupling of the power system. The resulting algorithms operate in a fully distributed manner at the bus level and achieve fast convergence and high estimation accuracy, often within a few iterations, as demonstrated by numerical results on systems ranging from 60 to 13659 buses, where the fusion-based formulation achieves single-digit millisecond iteration times on the largest test case.
comment: 13 pages, 13 figures
Anomaly Detection in Smart Power Grids with Graph-Regularized MS-SVDD: a Multimodal Subspace Learning Approach
Anomaly detection in smart power grids is a critical challenge due to the complexity, heterogeneity, and dynamic nature of sensor data streams. Existing one-class classification methods, particularly Subspace Support Vector Data Description (SVDD), have been extended to multimodal scenarios but often fail to fully exploit the structural dependencies across modalities, limiting their robustness in real-world applications. In this paper, we address this gap by proposing a generalized Multimodal Subspace Support Vector Data Description (MS-SVDD) model with graph-embedded regularization. The method projects data from multiple modalities into a shared low-dimensional subspace while preserving modality-specific structure through Laplacian regularizers. Our approach is evaluated on a three-modality dataset derived from smart grid event time series, using a dedicated preprocessing pipeline for constructing one-class classification training samples. The results demonstrate that our graph-embedded MS-SVDD improves robustness of event detection compared to conventional approaches, highlighting the potential of integrating graph priors with multimodal subspace learning for advancing anomaly detection in critical infrastructure. More broadly, this work contributes to the wider field of AI by illustrating how relational and structural information can be systematically embedded into one-class models, enabling robust learning under complex, high-dimensional, and multimodal conditions.
comment: 23 pages, 5 figures, supplementary material
Accurate Analytical Modeling of Small-Size Rotary Transformers for Wound-Rotor Resolvers
Rotary transformers are commonly used in wound rotor resolvers to transfer excitation signals to the rotating winding without mechanical contact. In many analyses, the rotary transformer is modeled as an ideal transformer, where the voltage transfer ratio is assumed to be equal to the turns ratio. However, in miniature rotary transformers used in compact resolver systems, leakage inductance can become comparable to the magnetizing inductance due to reduced core dimensions and unavoidable air gaps, leading to deviations from the ideal voltage transfer behavior. This paper presents an accurate equivalent circuit model for miniature rotary transformers employed in wound rotor resolvers. The proposed model analytically derives the magnetizing and leakage inductances using a magnetic equivalent circuit that accounts for flux fringing and air gap effects. The model is validated through three dimensional finite element analysis and experimental measurements on a fabricated prototype under both no load and resolver excitation conditions. The results demonstrate improved prediction accuracy of the secondary voltage compared with conventional models, enabling more reliable characterization of excitation transfer in compact resolver systems.
Statistical-Symbolic Verification of Perception-Based Autonomous Systems using State-Dependent Conformal Prediction
Reachability analysis has been a prominent way to provide safety guarantees for neurally controlled autonomous systems, but its direct application to neural perception components is infeasible due to imperfect or intractable perception models. Typically, this issue has been bypassed by complementing reachability with statistical analysis of perception error, say with conformal prediction (CP). However, existing CP methods for time-series data often provide conservative bounds. The corresponding error accumulation over time has made it challenging to combine statistical bounds with symbolic reachability in a way that is provable, scalable, and minimally conservative. To reduce conservatism and improve scalability, our key insight is that perception error varies significantly with the system's dynamical state. This article proposes state-dependent conformal prediction, which exploits that dependency in constructing tight high-confidence bounds on perception error. Based on this idea, we provide an approach to partition the state space, using a genetic algorithm, so as to optimize the tightness of conformal bounds. Finally, since using these bounds in reachability analysis leads to additional uncertainty and branching in the resulting hybrid system, we propose a branch-merging reachability algorithm that trades off uncertainty for scalability so as to enable scalable and tight verification. The evaluation of our verification methodology on two complementary case studies demonstrates reduced conservatism compared to the state of the art.
comment: The first and second authors contributed equally. The last two authors shared the supervision equally
Planning Stealthy Backdoor Attacks in MDPs with Observation-Based Triggers
This paper investigates backdoor attack planning in stochastic control systems modeled as Markov Decision Processes (MDPs). A backdoor attack involves an adversary deploying a policy that performs well in the original MDP to pass testing, but behaves maliciously at runtime when combined with a trigger that perturbs system dynamics. We consider a sophisticated attacker capable of jointly optimizing the backdoor policy and its trigger using only a blackbox simulator. During execution, the attacker has access only to partial observations of the system state and is restricted to introduce small perturbations to the system's transition dynamics. We formulate the attack planning problem as a constrained Markov game with an augmented state space and two players: Player 0 learns a backdoor policy that maximizes attack rewards when the trigger is active. However, when the trigger is inactive, the backdoor policy behaves near-optimally in the original MDP; Player 1 designs a finite-memory, observation-based trigger to activate the attack. We propose a switching gradient-based optimization algorithm to jointly solve for the backdoor policy and trigger. Experiments on a case study demonstrate the effectiveness of our method in achieving stealthy and successful backdoor attacks, and how the attack performance varies under different parameters related to the stealthiness of the backdoor attack.
Democratization of Real-time Multi-Spectral Photoacoustic Imaging: Open-Sourced System Architecture for OPOTEK Phocus & Verasonics Vantage Combination
Real-time multi-spectral photoacoustic imaging (RT-mPAI) often suffers from synchronization instabilities when interfacing fast-tuning lasers with data acquisition platforms executing on non-real-time operating systems. To overcome this, we establish an open-source hardware-software architecture tailored for the widely adopted combination of the OPOTEK Phocus lasers and Verasonics Vantage systems. By employing an independent micro-controller for deterministic laser trigger counting alongside a decoupled client-server data streaming framework, the proposed system circumvents OS-induced timing deviations and local storage bottlenecks. By open-sourcing this pipeline and cultivating a collaborative environment to share both code and ideas, we aim to lower the technical and cost barriers for RT-mPAI, thereby democratizing access to stable RT-mPAI research and, more ambitiously, fostering a vibrant open-source community.