Robotics
★ Source-Free Bistable Fluidic Gripper for Size-Selective and Stiffness-Adaptive Grasping
Zhihang Qin, Yueheng Zhang, Wan Su, Linxin Hou, Shenghao Zhou, Zhijun Chen, Yu Jun Tan, Cecilia Laschi
Conventional fluid-driven soft grippers typically depend on external sources,
which limit portability and long-term autonomy. This work introduces a
self-contained soft gripper with fixed size that operates solely through
internal liquid redistribution among three interconnected bistable snap-through
chambers. When the top sensing chamber deforms upon contact, the displaced
liquid triggers snap-through expansion of the grasping chambers, enabling
stable and size-selective grasping without continuous energy input. The
internal hydraulic feedback further allows passive adaptation of gripping
pressure to object stiffness. This source-free and compact design opens new
possibilities for lightweight, stiffness-adaptive fluid-driven manipulation in
soft robotics, providing a feasible approach for targeted size-specific
sampling and operation in underwater and field environments.
★ Unconscious and Intentional Human Motion Cues for Expressive Robot-Arm Motion Design
This study investigates how human motion cues can be used to design
expressive robot-arm movements. Using the imperfect-information game Geister,
we analyzed two types of human piece-moving motions: natural gameplay
(unconscious tendencies) and instructed expressions (intentional cues). Based
on these findings, we created phase-specific robot motions by varying movement
speed and stop duration, and evaluated observer impressions under two
presentation modalities: a physical robot and a recorded video. Results
indicate that late-phase motion timing, particularly during withdrawal, plays
an important role in impression formation and that physical embodiment enhances
the interpretability of motion cues. These findings provide insights for
designing expressive robot motions based on human timing behavior.
comment: 5 pages, 5 figures, HAI2025 Workshop on Socially Aware and
Cooperative Intelligent Systems
★ Motion Planning Under Temporal Logic Specifications In Semantically Unknown Environments
This paper addresses a motion planning problem to achieve
spatio-temporal-logical tasks, expressed by syntactically co-safe linear
temporal logic specifications (scLTL\next), in uncertain environments. Here,
the uncertainty is modeled as some probabilistic knowledge on the semantic
labels of the environment. For example, the task is "first go to region 1, then
go to region 2"; however, the exact locations of regions 1 and 2 are not known
a priori, instead a probabilistic belief is available. We propose a novel
automata-theoretic approach, where a special product automaton is constructed
to capture the uncertainty related to semantic labels, and a reward function is
designed for each edge of this product automaton. The proposed algorithm
utilizes value iteration for online replanning. We show some theoretical
results and present some simulations/experiments to demonstrate the efficacy of
the proposed approach.
comment: 8 pages, 6 figures
★ Flying Robotics Art: ROS-based Drone Draws the Record-Breaking Mural
This paper presents the innovative design and successful deployment of a
pioneering autonomous unmanned aerial system developed for executing the
world's largest mural painted by a drone. Addressing the dual challenges of
maintaining artistic precision and operational reliability under adverse
outdoor conditions such as wind and direct sunlight, our work introduces a
robust system capable of navigating and painting outdoors with unprecedented
accuracy. Key to our approach is a novel navigation system that combines an
infrared (IR) motion capture camera and LiDAR technology, enabling precise
location tracking tailored specifically for largescale artistic applications.
We employ a unique control architecture that uses different regulation in
tangential and normal directions relative to the planned path, enabling precise
trajectory tracking and stable line rendering. We also present algorithms for
trajectory planning and path optimization, allowing for complex curve drawing
and area filling. The system includes a custom-designed paint spraying
mechanism, specifically engineered to function effectively amidst the turbulent
airflow generated by the drone's propellers, which also protects the drone's
critical components from paint-related damage, ensuring longevity and
consistent performance. Experimental results demonstrate the system's
robustness and precision in varied conditions, showcasing its potential for
autonomous large-scale art creation and expanding the functional applications
of robotics in creative fields.
★ Multi-robot searching with limited sensing range for static and mobile intruders
We consider the problem of searching for an intruder in a geometric domain by
utilizing multiple search robots. The domain is a simply connected orthogonal
polygon with edges parallel to the cartesian coordinate axes. Each robot has a
limited sensing capability. We study the problem for both static and mobile
intruders. It turns out that the problem of finding an intruder is NP-hard,
even for a stationary intruder. Given this intractability, we turn our
attention towards developing efficient and robust algorithms, namely methods
based on space-filling curves, random search, and cooperative random search.
Moreover, for each proposed algorithm, we evaluate the trade-off between the
number of search robots and the time required for the robots to complete the
search process while considering the geometric properties of the connected
orthogonal search area.
★ Manifold-constrained Hamilton-Jacobi Reachability Learning for Decentralized Multi-Agent Motion Planning
Safe multi-agent motion planning (MAMP) under task-induced constraints is a
critical challenge in robotics. Many real-world scenarios require robots to
navigate dynamic environments while adhering to manifold constraints imposed by
tasks. For example, service robots must carry cups upright while avoiding
collisions with humans or other robots. Despite recent advances in
decentralized MAMP for high-dimensional systems, incorporating manifold
constraints remains difficult. To address this, we propose a
manifold-constrained Hamilton-Jacobi reachability (HJR) learning framework for
decentralized MAMP. Our method solves HJR problems under manifold constraints
to capture task-aware safety conditions, which are then integrated into a
decentralized trajectory optimization planner. This enables robots to generate
motion plans that are both safe and task-feasible without requiring assumptions
about other agents' policies. Our approach generalizes across diverse
manifold-constrained tasks and scales effectively to high-dimensional
multi-agent manipulation problems. Experiments show that our method outperforms
existing constrained motion planners and operates at speeds suitable for
real-world applications. Video demonstrations are available at
https://youtu.be/RYcEHMnPTH8 .
★ Multi-User Personalisation in Human-Robot Interaction: Using Quantitative Bipolar Argumentation Frameworks for Preferences Conflict Resolution
While personalisation in Human-Robot Interaction (HRI) has advanced
significantly, most existing approaches focus on single-user adaptation,
overlooking scenarios involving multiple stakeholders with potentially
conflicting preferences. To address this, we propose the Multi-User Preferences
Quantitative Bipolar Argumentation Framework (MUP-QBAF), a novel multi-user
personalisation framework based on Quantitative Bipolar Argumentation
Frameworks (QBAFs) that explicitly models and resolves multi-user preference
conflicts. Unlike prior work in Argumentation Frameworks, which typically
assumes static inputs, our approach is tailored to robotics: it incorporates
both users' arguments and the robot's dynamic observations of the environment,
allowing the system to adapt over time and respond to changing contexts.
Preferences, both positive and negative, are represented as arguments whose
strength is recalculated iteratively based on new information. The framework's
properties and capabilities are presented and validated through a realistic
case study, where an assistive robot mediates between the conflicting
preferences of a caregiver and a care recipient during a frailty assessment
task. This evaluation further includes a sensitivity analysis of argument base
scores, demonstrating how preference outcomes can be shaped by user input and
contextual observations. By offering a transparent, structured, and
context-sensitive approach to resolving competing user preferences, this work
advances the field of multi-user HRI. It provides a principled alternative to
data-driven methods, enabling robots to navigate conflicts in real-world
environments.
comment: Preprint submitted to a journal
★ OneOcc: Semantic Occupancy Prediction for Legged Robots with a Single Panoramic Camera
Hao Shi, Ze Wang, Shangwei Guo, Mengfei Duan, Song Wang, Teng Chen, Kailun Yang, Lin Wang, Kaiwei Wang
Robust 3D semantic occupancy is crucial for legged/humanoid robots, yet most
semantic scene completion (SSC) systems target wheeled platforms with
forward-facing sensors. We present OneOcc, a vision-only panoramic SSC
framework designed for gait-introduced body jitter and 360{\deg} continuity.
OneOcc combines: (i) Dual-Projection fusion (DP-ER) to exploit the annular
panorama and its equirectangular unfolding, preserving 360{\deg} continuity and
grid alignment; (ii) Bi-Grid Voxelization (BGV) to reason in Cartesian and
cylindrical-polar spaces, reducing discretization bias and sharpening
free/occupied boundaries; (iii) a lightweight decoder with Hierarchical AMoE-3D
for dynamic multi-scale fusion and better long-range/occlusion reasoning; and
(iv) plug-and-play Gait Displacement Compensation (GDC) learning feature-level
motion correction without extra sensors. We also release two panoramic
occupancy benchmarks: QuadOcc (real quadruped, first-person 360{\deg}) and
Human360Occ (H3O) (CARLA human-ego 360{\deg} with RGB, Depth, semantic
occupancy; standardized within-/cross-city splits). OneOcc sets new
state-of-the-art (SOTA): on QuadOcc it beats strong vision baselines and
popular LiDAR ones; on H3O it gains +3.83 mIoU (within-city) and +8.08
(cross-city). Modules are lightweight, enabling deployable full-surround
perception for legged/humanoid robots. Datasets and code will be publicly
available at https://github.com/MasterHow/OneOcc.
comment: Datasets and code will be publicly available at
https://github.com/MasterHow/OneOcc
★ Indicating Robot Vision Capabilities with Augmented Reality
Research indicates that humans can mistakenly assume that robots and humans
have the same field of view (FoV), possessing an inaccurate mental model of
robots. This misperception may lead to failures during human-robot
collaboration tasks where robots might be asked to complete impossible tasks
about out-of-view objects. The issue is more severe when robots do not have a
chance to scan the scene to update their world model while focusing on assigned
tasks. To help align humans' mental models of robots' vision capabilities, we
propose four FoV indicators in augmented reality (AR) and conducted a user
human-subjects experiment (N=41) to evaluate them in terms of accuracy,
confidence, task efficiency, and workload. These indicators span a spectrum
from egocentric (robot's eye and head space) to allocentric (task space).
Results showed that the allocentric blocks at the task space had the highest
accuracy with a delay in interpreting the robot's FoV. The egocentric indicator
of deeper eye sockets, possible for physical alteration, also increased
accuracy. In all indicators, participants' confidence was high while cognitive
load remained low. Finally, we contribute six guidelines for practitioners to
apply our AR indicators or physical alterations to align humans' mental models
with robots' vision capabilities.
★ ROSBag MCP Server: Analyzing Robot Data with LLMs for Agentic Embodied AI Applications
Agentic AI systems and Physical or Embodied AI systems have been two key
research verticals at the forefront of Artificial Intelligence and Robotics,
with Model Context Protocol (MCP) increasingly becoming a key component and
enabler of agentic applications. However, the literature at the intersection of
these verticals, i.e., Agentic Embodied AI, remains scarce. This paper
introduces an MCP server for analyzing ROS and ROS 2 bags, allowing for
analyzing, visualizing and processing robot data with natural language through
LLMs and VLMs. We describe specific tooling built with robotics domain
knowledge, with our initial release focused on mobile robotics and supporting
natively the analysis of trajectories, laser scan data, transforms, or time
series data. This is in addition to providing an interface to standard ROS 2
CLI tools ("ros2 bag list" or "ros2 bag info"), as well as the ability to
filter bags with a subset of topics or trimmed in time. Coupled with the MCP
server, we provide a lightweight UI that allows the benchmarking of the tooling
with different LLMs, both proprietary (Anthropic, OpenAI) and open-source
(through Groq). Our experimental results include the analysis of tool calling
capabilities of eight different state-of-the-art LLM/VLM models, both
proprietary and open-source, large and small. Our experiments indicate that
there is a large divide in tool calling capabilities, with Kimi K2 and Claude
Sonnet 4 demonstrating clearly superior performance. We also conclude that
there are multiple factors affecting the success rates, from the tool
description schema to the number of arguments, as well as the number of tools
available to the models. The code is available with a permissive license at
https://github.com/binabik-ai/mcp-rosbags.
★ Development of the Bioinspired Tendon-Driven DexHand 021 with Proprioceptive Compliance Control
The human hand plays a vital role in daily life and industrial applications,
yet replicating its multifunctional capabilities-including motion, sensing, and
coordinated manipulation-with robotic systems remains a formidable challenge.
Developing a dexterous robotic hand requires balancing human-like agility with
engineering constraints such as complexity, size-to-weight ratio, durability,
and force-sensing performance. This letter presents Dex-Hand 021, a
high-performance, cable-driven five-finger robotic hand with 12 active and 7
passive degrees of freedom (DoFs), achieving 19 DoFs dexterity in a lightweight
1 kg design. We propose a proprioceptive force-sensing-based admittance control
method to enhance manipulation. Experimental results demonstrate its superior
performance: a single-finger load capacity exceeding 10 N, fingertip
repeatability under 0.001 m, and force estimation errors below 0.2 N. Compared
to PID control, joint torques in multi-object grasping are reduced by 31.19%,
significantly improves force-sensing capability while preventing overload
during collisions. The hand excels in both power and precision grasps,
successfully executing 33 GRASP taxonomy motions and complex manipulation
tasks. This work advances the design of lightweight, industrial-grade dexterous
hands and enhances proprioceptive control, contributing to robotic manipulation
and intelligent manufacturing.
comment: 8 pages 18 fogures, IEEE RAL accept
★ Value Elicitation for a Socially Assistive Robot Addressing Social Anxiety: A Participatory Design Approach ECAI 2025
Social anxiety is a prevalent mental health condition that can significantly
impact overall well-being and quality of life. Despite its widespread effects,
adequate support or treatment for social anxiety is often insufficient.
Advances in technology, particularly in social robotics, offer promising
opportunities to complement traditional mental health. As an initial step
toward developing effective solutions, it is essential to understand the values
that shape what is considered meaningful, acceptable, and helpful. In this
study, a participatory design workshop was conducted with mental health
academic researchers to elicit the underlying values that should inform the
design of socially assistive robots for social anxiety support. Through
creative, reflective, and envisioning activities, participants explored
scenarios and design possibilities, allowing for systematic elicitation of
values, expectations, needs, and preferences related to robot-supported
interventions. The findings reveal rich insights into design-relevant
values-including adaptivity, acceptance, and efficacy-that are core to support
for individuals with social anxiety. This study highlights the significance of
a research-led approach to value elicitation, emphasising user-centred and
context-aware design considerations in the development of socially assistive
robots.
comment: Accepted at Value Engineering in AI (VALE) Workshop (ECAI 2025)
★ GUIDES: Guidance Using Instructor-Distilled Embeddings for Pre-trained Robot Policy Enhancement IROS 2025
Pre-trained robot policies serve as the foundation of many validated robotic
systems, which encapsulate extensive embodied knowledge. However, they often
lack the semantic awareness characteristic of foundation models, and replacing
them entirely is impractical in many situations due to high costs and the loss
of accumulated knowledge. To address this gap, we introduce GUIDES, a
lightweight framework that augments pre-trained policies with semantic guidance
from foundation models without requiring architectural redesign. GUIDES employs
a fine-tuned vision-language model (Instructor) to generate contextual
instructions, which are encoded by an auxiliary module into guidance
embeddings. These embeddings are injected into the policy's latent space,
allowing the legacy model to adapt to this new semantic input through brief,
targeted fine-tuning. For inference-time robustness, a large language
model-based Reflector monitors the Instructor's confidence and, when confidence
is low, initiates a reasoning loop that analyzes execution history, retrieves
relevant examples, and augments the VLM's context to refine subsequent actions.
Extensive validation in the RoboCasa simulation environment across diverse
policy architectures shows consistent and substantial improvements in task
success rates. Real-world deployment on a UR5 robot further demonstrates that
GUIDES enhances motion precision for critical sub-tasks such as grasping.
Overall, GUIDES offers a practical and resource-efficient pathway to upgrade,
rather than replace, validated robot policies.
comment: 8 pages, 4 figures, Accepted by IEEE IROS 2025 Workshop WIR-M
★ Collaborative Assembly Policy Learning of a Sightless Robot
This paper explores a physical human-robot collaboration (pHRC) task
involving the joint insertion of a board into a frame by a sightless robot and
a human operator. While admittance control is commonly used in pHRC tasks, it
can be challenging to measure the force/torque applied by the human for
accurate human intent estimation, limiting the robot's ability to assist in the
collaborative task. Other methods that attempt to solve pHRC tasks using
reinforcement learning (RL) are also unsuitable for the board-insertion task
due to its safety constraints and sparse rewards. Therefore, we propose a novel
RL approach that utilizes a human-designed admittance controller to facilitate
more active robot behavior and reduce human effort. Through simulation and
real-world experiments, we demonstrate that our approach outperforms admittance
control in terms of success rate and task completion time. Additionally, we
observed a significant reduction in measured force/torque when using our
proposed approach compared to admittance control. The video of the experiments
is available at https://youtu.be/va07Gw6YIog.
comment: Accepted by IEEE ROBIO 2025
★ Periodic Skill Discovery NeurIPS 2025
Unsupervised skill discovery in reinforcement learning (RL) aims to learn
diverse behaviors without relying on external rewards. However, current methods
often overlook the periodic nature of learned skills, focusing instead on
increasing the mutual dependence between states and skills or maximizing the
distance traveled in latent space. Considering that many robotic tasks --
particularly those involving locomotion -- require periodic behaviors across
varying timescales, the ability to discover diverse periodic skills is
essential. Motivated by this, we propose Periodic Skill Discovery (PSD), a
framework that discovers periodic behaviors in an unsupervised manner. The key
idea of PSD is to train an encoder that maps states to a circular latent space,
thereby naturally encoding periodicity in the latent representation. By
capturing temporal distance, PSD can effectively learn skills with diverse
periods in complex robotic tasks, even with pixel-based observations. We
further show that these learned skills achieve high performance on downstream
tasks such as hurdling. Moreover, integrating PSD with an existing skill
discovery method offers more diverse behaviors, thus broadening the agent's
repertoire. Our code and demos are available at
https://jonghaepark.github.io/psd/
comment: NeurIPS 2025
★ Learning-based Cooperative Robotic Paper Wrapping: A Unified Control Policy with Residual Force Control
Human-robot cooperation is essential in environments such as warehouses and
retail stores, where workers frequently handle deformable objects like paper,
bags, and fabrics. Coordinating robotic actions with human assistance remains
difficult due to the unpredictable dynamics of deformable materials and the
need for adaptive force control. To explore this challenge, we focus on the
task of gift wrapping, which exemplifies a long-horizon manipulation problem
involving precise folding, controlled creasing, and secure fixation of paper.
Success is achieved when the robot completes the sequence to produce a neatly
wrapped package with clean folds and no tears.
We propose a learning-based framework that integrates a high-level task
planner powered by a large language model (LLM) with a low-level hybrid
imitation learning (IL) and reinforcement learning (RL) policy. At its core is
a Sub-task Aware Robotic Transformer (START) that learns a unified policy from
human demonstrations. The key novelty lies in capturing long-range temporal
dependencies across the full wrapping sequence within a single model. Unlike
vanilla Action Chunking with Transformer (ACT), typically applied to short
tasks, our method introduces sub-task IDs that provide explicit temporal
grounding. This enables robust performance across the entire wrapping process
and supports flexible execution, as the policy learns sub-goals rather than
merely replicating motion sequences.
Our framework achieves a 97% success rate on real-world wrapping tasks. We
show that the unified transformer-based policy reduces the need for specialized
models, allows controlled human supervision, and effectively bridges high-level
intent with the fine-grained force control required for deformable object
manipulation.
★ Optimizing Earth-Moon Transfer and Cislunar Navigation: Integrating Low-Energy Trajectories, AI Techniques and GNSS-R Technologies
The rapid growth of cislunar activities, including lunar landings, the Lunar
Gateway, and in-space refueling stations, requires advances in cost-efficient
trajectory design and reliable integration of navigation and remote sensing.
Traditional Earth-Moon transfers suffer from rigid launch windows and high
propellant demands, while Earth-based GNSS systems provide little to no
coverage beyond geostationary orbit. This limits autonomy and environmental
awareness in cislunar space. This review compares four major transfer
strategies by evaluating velocity requirements, flight durations, and fuel
efficiency, and by identifying their suitability for both crewed and robotic
missions. The emerging role of artificial intelligence and machine learning is
highlighted: convolutional neural networks support automated crater recognition
and digital terrain model generation, while deep reinforcement learning enables
adaptive trajectory refinement during descent and landing to reduce risk and
decision latency. The study also examines how GNSS-Reflectometry and advanced
Positioning, Navigation, and Timing architectures can extend navigation
capabilities beyond current limits. GNSS-R can act as a bistatic radar for
mapping lunar ice, soil properties, and surface topography, while PNT systems
support autonomous rendezvous, Lagrange point station-keeping, and coordinated
satellite swarm operations. Combining these developments establishes a scalable
framework for sustainable cislunar exploration and long-term human and robotic
presence.
★ Learning Natural and Robust Hexapod Locomotion over Complex Terrains via Motion Priors based on Deep Reinforcement Learning
Multi-legged robots offer enhanced stability to navigate complex terrains
with their multiple legs interacting with the environment. However, how to
effectively coordinate the multiple legs in a larger action exploration space
to generate natural and robust movements is a key issue. In this paper, we
introduce a motion prior-based approach, successfully applying deep
reinforcement learning algorithms to a real hexapod robot. We generate a
dataset of optimized motion priors, and train an adversarial discriminator
based on the priors to guide the hexapod robot to learn natural gaits. The
learned policy is then successfully transferred to a real hexapod robot, and
demonstrate natural gait patterns and remarkable robustness without visual
information in complex terrains. This is the first time that a reinforcement
learning controller has been used to achieve complex terrain walking on a real
hexapod robot.
★ SENT Map - Semantically Enhanced Topological Maps with Foundation Models ICRA 2025
We introduce SENT-Map, a semantically enhanced topological map for
representing indoor environments, designed to support autonomous navigation and
manipulation by leveraging advancements in foundational models (FMs). Through
representing the environment in a JSON text format, we enable semantic
information to be added and edited in a format that both humans and FMs
understand, while grounding the robot to existing nodes during planning to
avoid infeasible states during deployment. Our proposed framework employs a two
stage approach, first mapping the environment alongside an operator with a
Vision-FM, then using the SENT-Map representation alongside a natural-language
query within an FM for planning. Our experimental results show that
semantic-enhancement enables even small locally-deployable FMs to successfully
plan over indoor environments.
comment: Accepted at ICRA 2025 Workshop on Foundation Models and
Neuro-Symbolic AI for Robotics
★ SENT Map -- Semantically Enhanced Topological Maps with Foundation Models ICRA 2025
We introduce SENT-Map, a semantically enhanced topological map for
representing indoor environments, designed to support autonomous navigation and
manipulation by leveraging advancements in foundational models (FMs). Through
representing the environment in a JSON text format, we enable semantic
information to be added and edited in a format that both humans and FMs
understand, while grounding the robot to existing nodes during planning to
avoid infeasible states during deployment. Our proposed framework employs a two
stage approach, first mapping the environment alongside an operator with a
Vision-FM, then using the SENT-Map representation alongside a natural-language
query within an FM for planning. Our experimental results show that
semantic-enhancement enables even small locally-deployable FMs to successfully
plan over indoor environments.
comment: Accepted at ICRA 2025 Workshop on Foundation Models and
Neuro-Symbolic AI for Robotics
♻ ★ RoboRAN: A Unified Robotics Framework for Reinforcement Learning-Based Autonomous Navigation
Matteo El-Hariry, Antoine Richard, Ricard M. Castan, Luis F. W. Batista, Matthieu Geist, Cedric Pradalier, Miguel Olivares-Mendez
Autonomous robots must navigate and operate in diverse environments, from
terrestrial and aquatic settings to aerial and space domains. While
Reinforcement Learning (RL) has shown promise in training policies for specific
autonomous robots, existing frameworks and benchmarks are often constrained to
unique platforms, limiting generalization and fair comparisons across different
mobility systems. In this paper, we present a multi-domain framework for
training, evaluating and deploying RL-based navigation policies across diverse
robotic platforms and operational environments. Our work presents four key
contributions: (1) a scalable and modular framework, facilitating seamless
robot-task interchangeability and reproducible training pipelines; (2)
sim-to-real transfer demonstrated through real-world experiments with multiple
robots, including a satellite robotic simulator, an unmanned surface vessel,
and a wheeled ground vehicle; (3) the release of the first open-source API for
deploying Isaac Lab-trained policies to real robots, enabling lightweight
inference and rapid field validation; and (4) uniform tasks and metrics for
cross-medium evaluation, through a unified evaluation testbed to assess
performance of navigation tasks in diverse operational conditions (aquatic,
terrestrial and space). By ensuring consistency between simulation and
real-world deployment, RoboRAN lowers the barrier to developing adaptable
RL-based navigation strategies. Its modular design enables straightforward
integration of new robots and tasks through predefined templates, fostering
reproducibility and extension to diverse domains. To support the community, we
release RoboRAN as open-source.
comment: Accepted at Transactions on Machine Learning Research (TMLR)
♻ ★ Depth Matters: Multimodal RGB-D Perception for Robust Autonomous Agents ICRA 2025
Autonomous agents that rely purely on perception to make real-time control
decisions require efficient and robust architectures. In this work, we
demonstrate that augmenting RGB input with depth information significantly
enhances our agents' ability to predict steering commands compared to using RGB
alone. We benchmark lightweight recurrent controllers that leverage the fused
RGB-D features for sequential decision-making. To train our models, we collect
high-quality data using a small-scale autonomous car controlled by an expert
driver via a physical steering wheel, capturing varying levels of steering
difficulty. Our models were successfully deployed on real hardware and
inherently avoided dynamic and static obstacles, under out-of-distribution
conditions. Specifically, our findings reveal that the early fusion of depth
data results in a highly robust controller, which remains effective even with
frame drops and increased noise levels, without compromising the network's
focus on the task.
comment: Submitted to ICRA 2025
♻ ★ An explicit construction of Kaleidocycles by elliptic theta functions
We consider the configuration space of ordered points on the two-dimensional
sphere that satisfy a specific system of quadratic equations. We construct
periodic orbits in this configuration space using elliptic theta functions and
show that they simultaneously satisfy semi-discrete analogues of mKdV and
sine-Gordon equations. The configuration space we investigate corresponds to
the state space of a linkage mechanism known as the Kaleidocycle, and the
constructed orbits describe the characteristic motion of the Kaleidocycle. A
key consequence of our construction is the proof that Kaleidocycles exist for
any number of tetrahedra greater than five. Our approach is founded on the
relationship between the deformation of spatial curves and integrable systems,
offering an intriguing example where an integrable system is explicitly solved
to generate an orbit in the space of real solutions to polynomial equations
defined by geometric constraints.
♻ ★ Autonomous Robotic Drilling System for Mice Cranial Window Creation
Robotic assistance for experimental manipulation in the life sciences is
expected to enable favorable outcomes, regardless of the skill of the
scientist. Experimental specimens in the life sciences are subject to
individual variability and hence require intricate algorithms for successful
autonomous robotic control. As a use case, we are studying the cranial window
creation in mice. This operation requires the removal of an 8-mm circular patch
of the skull, which is approximately 300 um thick, but the shape and thickness
of the mouse skull significantly varies depending on the strain of the mouse,
sex, and age. In this work, we develop an autonomous robotic drilling system
with no offline planning, consisting of a trajectory planner with
execution-time feedback with drilling completion level recognition based on
image and force information. In the experiments, we first evaluate the
image-and-force-based drilling completion level recognition by comparing it
with other state-of-the-art deep learning image processing methods and conduct
an ablation study in eggshell drilling to evaluate the impact of each module on
system performance. Finally, the system performance is further evaluated in
postmortem mice, achieving a success rate of 70% (14/20 trials) with an average
drilling time of 9.3 min.
comment: 14 pages, 11 figures, accepted on T-ASE 2025
♻ ★ Toward Humanoid Brain-Body Co-design: Joint Optimization of Control and Morphology for Fall Recovery
Humanoid robots represent a central frontier in embodied intelligence, as
their anthropomorphic form enables natural deployment in humans' workspace.
Brain-body co-design for humanoids presents a promising approach to realizing
this potential by jointly optimizing control policies and physical morphology.
Within this context, fall recovery emerges as a critical capability. It not
only enhances safety and resilience but also integrates naturally with
locomotion systems, thereby advancing the autonomy of humanoids. In this paper,
we propose RoboCraft, a scalable humanoid co-design framework for fall recovery
that iteratively improves performance through the coupled updates of control
policy and morphology. A shared policy pretrained across multiple designs is
progressively finetuned on high-performing morphologies, enabling efficient
adaptation without retraining from scratch. Concurrently, morphology search is
guided by human-inspired priors and optimization algorithms, supported by a
priority buffer that balances reevaluation of promising candidates with the
exploration of novel designs. Experiments show that RoboCraft achieves an
average performance gain of 44.55% on seven public humanoid robots, with
morphology optimization drives at least 40% of improvements in co-designing
four humanoid robots, underscoring the critical role of humanoid co-design.
♻ ★ Mastering Contact-rich Tasks by Combining Soft and Rigid Robotics with Imitation Learning
Mariano Ramírez Montero, Ebrahim Shahabi, Giovanni Franzese, Jens Kober, Barbara Mazzolai, Cosimo Della Santina
Soft robots have the potential to revolutionize the use of robotic systems
with their capability of establishing safe, robust, and adaptable interactions
with their environment, but their precise control remains challenging. In
contrast, traditional rigid robots offer high accuracy and repeatability but
lack the flexibility of soft robots. We argue that combining these
characteristics in a hybrid robotic platform can significantly enhance overall
capabilities. This work presents a novel hybrid robotic platform that
integrates a rigid manipulator with a fully developed soft arm. This system is
equipped with the intelligence necessary to perform flexible and generalizable
tasks through imitation learning autonomously. The physical softness and
machine learning enable our platform to achieve highly generalizable skills,
while the rigid components ensure precision and repeatability.
comment: Update with additional results and experiments
♻ ★ Augmented Reality for RObots (ARRO): Pointing Visuomotor Policies Towards Visual Robustness
Visuomotor policies trained on human expert demonstrations have recently
shown strong performance across a wide range of robotic manipulation tasks.
However, these policies remain highly sensitive to domain shifts stemming from
background or robot embodiment changes, which limits their generalization
capabilities. In this paper, we present ARRO, a novel visual representation
that leverages zero-shot open-vocabulary segmentation and object detection
models to efficiently mask out task-irrelevant regions of the scene in real
time without requiring additional training, modeling of the setup, or camera
calibration. By filtering visual distractors and overlaying virtual guides
during both training and inference, ARRO improves robustness to scene
variations and reduces the need for additional data collection. We extensively
evaluate ARRO with Diffusion Policy on a range of tabletop manipulation tasks
in both simulation and real-world environments, and further demonstrate its
compatibility and effectiveness with generalist robot policies, such as Octo
and OpenVLA. Across all settings in our evaluation, ARRO yields consistent
performance gains, allows for selective masking to choose between different
objects, and shows robustness even to challenging segmentation conditions.
Videos showcasing our results are available at:
https://augmented-reality-for-robots.github.io/
♻ ★ mmE-Loc: Facilitating Accurate Drone Landing with Ultra-High-Frequency Localization
Haoyang Wang, Jingao Xu, Xinyu Luo, Ting Zhang, Xuecheng Chen, Ruiyang Duan, Jialong Chen, Yunhao Liu, Jianfeng Zheng, Weijie Hong, Xinlei Chen
For precise, efficient, and safe drone landings, ground platforms should
real-time, accurately locate descending drones and guide them to designated
spots. While mmWave sensing combined with cameras improves localization
accuracy, lower sampling frequency of traditional frame cameras compared to
mmWave radar creates bottlenecks in system throughput. In this work, we upgrade
traditional frame camera with event camera, a novel sensor that harmonizes in
sampling frequency with mmWave radar within ground platform setup, and
introduce mmE-Loc, a high-precision, low-latency ground localization system
designed for precise drone landings. To fully exploit the \textit{temporal
consistency} and \textit{spatial complementarity} between these two modalities,
we propose two innovative modules: \textit{(i)} the Consistency-instructed
Collaborative Tracking module, which further leverages the drone's physical
knowledge of periodic micro-motions and structure for accurate measurements
extraction, and \textit{(ii)} the Graph-informed Adaptive Joint Optimization
module, which integrates drone motion information for efficient sensor fusion
and drone localization. Real-world experiments conducted in landing scenarios
with a drone delivery company demonstrate that mmE-Loc significantly
outperforms state-of-the-art methods in both accuracy and latency.
comment: 17 pages, 34 figures. Journal extended version of arXiv:2502.14992
♻ ★ Decentralized Aerial Manipulation of a Cable-Suspended Load using Multi-Agent Reinforcement Learning
This paper presents the first decentralized method to enable real-world 6-DoF
manipulation of a cable-suspended load using a team of Micro-Aerial Vehicles
(MAVs). Our method leverages multi-agent reinforcement learning (MARL) to train
an outer-loop control policy for each MAV. Unlike state-of-the-art controllers
that utilize a centralized scheme, our policy does not require global states,
inter-MAV communications, nor neighboring MAV information. Instead, agents
communicate implicitly through load pose observations alone, which enables high
scalability and flexibility. It also significantly reduces computing costs
during inference time, enabling onboard deployment of the policy. In addition,
we introduce a new action space design for the MAVs using linear acceleration
and body rates. This choice, combined with a robust low-level controller,
enables reliable sim-to-real transfer despite significant uncertainties caused
by cable tension during dynamic 3D motion. We validate our method in various
real-world experiments, including full-pose control under load model
uncertainties, showing setpoint tracking performance comparable to the
state-of-the-art centralized method. We also demonstrate cooperation amongst
agents with heterogeneous control policies, and robustness to the complete
in-flight loss of one MAV. Videos of experiments:
https://autonomousrobots.nl/paper_websites/aerial-manipulation-marl
♻ ★ Thor: Towards Human-Level Whole-Body Reactions for Intense Contact-Rich Environments
Humanoids hold great potential for service, industrial, and rescue
applications, in which robots must sustain whole-body stability while
performing intense, contact-rich interactions with the environment. However,
enabling humanoids to generate human-like, adaptive responses under such
conditions remains a major challenge. To address this, we propose Thor, a
humanoid framework for human-level whole-body reactions in contact-rich
environments. Based on the robot's force analysis, we design a force-adaptive
torso-tilt (FAT2) reward function to encourage humanoids to exhibit human-like
responses during force-interaction tasks. To mitigate the high-dimensional
challenges of humanoid control, Thor introduces a reinforcement learning
architecture that decouples the upper body, waist, and lower body. Each
component shares global observations of the whole body and jointly updates its
parameters. Finally, we deploy Thor on the Unitree G1, and it substantially
outperforms baselines in force-interaction tasks. Specifically, the robot
achieves a peak pulling force of 167.7 N (approximately 48% of the G1's body
weight) when moving backward and 145.5 N when moving forward, representing
improvements of 68.9% and 74.7%, respectively, compared with the
best-performing baseline. Moreover, Thor is capable of pulling a loaded rack
(130 N) and opening a fire door with one hand (60 N). These results highlight
Thor's effectiveness in enhancing humanoid force-interaction capabilities.
♻ ★ Multi-Agent Reinforcement Learning for Autonomous Multi-Satellite Earth Observation: A Realistic Case Study
The exponential growth of Low Earth Orbit (LEO) satellites has revolutionised
Earth Observation (EO) missions, addressing challenges in climate monitoring,
disaster management, and more. However, autonomous coordination in
multi-satellite systems remains a fundamental challenge. Traditional
optimisation approaches struggle to handle the real-time decision-making
demands of dynamic EO missions, necessitating the use of Reinforcement Learning
(RL) and Multi-Agent Reinforcement Learning (MARL). In this paper, we
investigate RL-based autonomous EO mission planning by modelling
single-satellite operations and extending to multi-satellite constellations
using MARL frameworks. We address key challenges, including energy and data
storage limitations, uncertainties in satellite observations, and the
complexities of decentralised coordination under partial observability. By
leveraging a near-realistic satellite simulation environment, we evaluate the
training stability and performance of state-of-the-art MARL algorithms,
including PPO, IPPO, MAPPO, and HAPPO. Our results demonstrate that MARL can
effectively balance imaging and resource management while addressing
non-stationarity and reward interdependency in multi-satellite coordination.
The insights gained from this study provide a foundation for autonomous
satellite operations, offering practical guidelines for improving policy
learning in decentralised EO missions.
♻ ★ AURA: Autonomous Upskilling with Retrieval-Augmented Agents
Designing reinforcement learning curricula for agile robots traditionally
requires extensive manual tuning of reward functions, environment
randomizations, and training configurations. We introduce AURA (Autonomous
Upskilling with Retrieval-Augmented Agents), a schema-validated curriculum
reinforcement learning (RL) framework that leverages Large Language Models
(LLMs) as autonomous designers of multi-stage curricula. AURA transforms user
prompts into YAML workflows that encode full reward functions, domain
randomization strategies, and training configurations. All files are statically
validated before any GPU time is used, ensuring efficient and reliable
execution. A retrieval-augmented feedback loop allows specialized LLM agents to
design, execute, and refine curriculum stages based on prior training results
stored in a vector database, enabling continual improvement over time.
Quantitative experiments show that AURA consistently outperforms LLM-guided
baselines in generation success rate, humanoid locomotion, and manipulation
tasks. Ablation studies highlight the importance of schema validation and
retrieval for curriculum quality. AURA successfully trains end-to-end policies
directly from user prompts and deploys them zero-shot on a custom humanoid
robot in multiple environments - capabilities that did not exist previously
with manually designed controllers. By abstracting the complexity of curriculum
design, AURA enables scalable and adaptive policy learning pipelines that would
be complex to construct by hand. Project page: https://aura-research.org/
♻ ★ Hybrid Dynamics Modeling and Trajectory Planning for a Cable-Trailer System with a Quadruped Robot
Inspired by sled-pulling dogs in transportation, we present a cable-trailer
integrated with a quadruped robot system. The motion planning of this system
faces challenges due to the interactions between the cable's state transitions,
the trailer's nonholonomic constraints, and the system's underactuation. To
address these challenges, we first develop a hybrid dynamics model that
captures the cable's taut and slack states. A search algorithm is then
introduced to compute a suboptimal trajectory while incorporating mode
transitions. Additionally, we propose a novel collision avoidance constraint
based on geometric polygons to formulate the trajectory optimization problem
for the hybrid system. The proposed method is implemented on a Unitree A1
quadruped robot with a customized cable-trailer and validated through
experiments. The real system demonstrates both agile and safe motion with cable
mode transitions.
comment: 8 pages, 8 figures, Accept by RA-L 2025