Robotics
★ TRANSIC: Sim-to-Real Policy Transfer by Learning from Online Correction
Learning in simulation and transferring the learned policy to the real world
has the potential to enable generalist robots. The key challenge of this
approach is to address simulation-to-reality (sim-to-real) gaps. Previous
methods often require domain-specific knowledge a priori. We argue that a
straightforward way to obtain such knowledge is by asking humans to observe and
assist robot policy execution in the real world. The robots can then learn from
humans to close various sim-to-real gaps. We propose TRANSIC, a data-driven
approach to enable successful sim-to-real transfer based on a human-in-the-loop
framework. TRANSIC allows humans to augment simulation policies to overcome
various unmodeled sim-to-real gaps holistically through intervention and online
correction. Residual policies can be learned from human corrections and
integrated with simulation policies for autonomous execution. We show that our
approach can achieve successful sim-to-real transfer in complex and
contact-rich manipulation tasks such as furniture assembly. Through synergistic
integration of policies learned in simulation and from humans, TRANSIC is
effective as a holistic approach to addressing various, often coexisting
sim-to-real gaps. It displays attractive properties such as scaling with human
effort. Videos and code are available at https://transic-robot.github.io/
comment: Project website: https://transic-robot.github.io/
★ Stochastic Q-learning for Large Discrete Action Spaces
In complex environments with large discrete action spaces, effective
decision-making is critical in reinforcement learning (RL). Despite the
widespread use of value-based RL approaches like Q-learning, they come with a
computational burden, necessitating the maximization of a value function over
all actions in each iteration. This burden becomes particularly challenging
when addressing large-scale problems and using deep neural networks as function
approximators. In this paper, we present stochastic value-based RL approaches
which, in each iteration, as opposed to optimizing over the entire set of $n$
actions, only consider a variable stochastic set of a sublinear number of
actions, possibly as small as $\mathcal{O}(\log(n))$. The presented stochastic
value-based RL methods include, among others, Stochastic Q-learning, StochDQN,
and StochDDQN, all of which integrate this stochastic approach for both
value-function updates and action selection. The theoretical convergence of
Stochastic Q-learning is established, while an analysis of stochastic
maximization is provided. Moreover, through empirical validation, we illustrate
that the various proposed approaches outperform the baseline methods across
diverse environments, including different control problems, achieving
near-optimal average returns in significantly reduced time.
★ When LLMs step into the 3D World: A Survey and Meta-Analysis of 3D Tasks via Multi-modal Large Language Models
Xianzheng Ma, Yash Bhalgat, Brandon Smart, Shuai Chen, Xinghui Li, Jian Ding, Jindong Gu, Dave Zhenyu Chen, Songyou Peng, Jia-Wang Bian, Philip H Torr, Marc Pollefeys, Matthias Nießner, Ian D Reid, Angel X. Chang, Iro Laina, Victor Adrian Prisacariu
As large language models (LLMs) evolve, their integration with 3D spatial
data (3D-LLMs) has seen rapid progress, offering unprecedented capabilities for
understanding and interacting with physical spaces. This survey provides a
comprehensive overview of the methodologies enabling LLMs to process,
understand, and generate 3D data. Highlighting the unique advantages of LLMs,
such as in-context learning, step-by-step reasoning, open-vocabulary
capabilities, and extensive world knowledge, we underscore their potential to
significantly advance spatial comprehension and interaction within embodied
Artificial Intelligence (AI) systems. Our investigation spans various 3D data
representations, from point clouds to Neural Radiance Fields (NeRFs). It
examines their integration with LLMs for tasks such as 3D scene understanding,
captioning, question-answering, and dialogue, as well as LLM-based agents for
spatial reasoning, planning, and navigation. The paper also includes a brief
review of other methods that integrate 3D and language. The meta-analysis
presented in this paper reveals significant progress yet underscores the
necessity for novel approaches to harness the full potential of 3D-LLMs. Hence,
with this paper, we aim to chart a course for future research that explores and
expands the capabilities of 3D-LLMs in understanding and interacting with the
complex 3D world. To support this survey, we have established a project page
where papers related to our topic are organized and listed:
https://github.com/ActiveVisionLab/Awesome-LLM-3D.
★ Filling Missing Values Matters for Range Image-Based Point Cloud Segmentation
Point cloud segmentation (PCS) plays an essential role in robot perception
and navigation tasks. To efficiently understand large-scale outdoor point
clouds, their range image representation is commonly adopted. This image-like
representation is compact and structured, making range image-based PCS models
practical. However, undesirable missing values in the range images damage the
shapes and patterns of objects. This problem creates difficulty for the models
in learning coherent and complete geometric information from the objects.
Consequently, the PCS models only achieve inferior performance. Delving deeply
into this issue, we find that the use of unreasonable projection approaches and
deskewing scans mainly leads to unwanted missing values in the range images.
Besides, almost all previous works fail to consider filling in the unexpected
missing values in the PCS task. To alleviate this problem, we first propose a
new projection method, namely scan unfolding++ (SU++), to avoid massive missing
values in the generated range images. Then, we introduce a simple yet effective
approach, namely range-dependent $K$-nearest neighbor interpolation ($K$NNI),
to further fill in missing values. Finally, we introduce the Filling Missing
Values Network (FMVNet) and Fast FMVNet. Extensive experimental results on
SemanticKITTI, SemanticPOSS, and nuScenes datasets demonstrate that by
employing the proposed SU++ and $K$NNI, existing range image-based PCS models
consistently achieve better performance than the baseline models. Besides, both
FMVNet and Fast FMVNet achieve state-of-the-art performance in terms of the
speed-accuracy trade-off. The proposed methods can be applied to other range
image-based tasks and practical applications.
comment: This paper has been submitted to a journal
★ GS-Planner: A Gaussian-Splatting-based Planning Framework for Active High-Fidelity Reconstruction
Active reconstruction technique enables robots to autonomously collect scene
data for full coverage, relieving users from tedious and time-consuming data
capturing process. However, designed based on unsuitable scene representations,
existing methods show unrealistic reconstruction results or the inability of
online quality evaluation. Due to the recent advancements in explicit radiance
field technology, online active high-fidelity reconstruction has become
achievable. In this paper, we propose GS-Planner, a planning framework for
active high-fidelity reconstruction using 3D Gaussian Splatting. With
improvement on 3DGS to recognize unobserved regions, we evaluate the
reconstruction quality and completeness of 3DGS map online to guide the robot.
Then we design a sampling-based active reconstruction strategy to explore the
unobserved areas and improve the reconstruction geometric and textural quality.
To establish a complete robot active reconstruction system, we choose quadrotor
as the robotic platform for its high agility. Then we devise a safety
constraint with 3DGS to generate executable trajectories for quadrotor
navigation in the 3DGS map. To validate the effectiveness of our method, we
conduct extensive experiments and ablation studies in highly realistic
simulation scenes.
★ Towards Consistent and Explainable Motion Prediction using Heterogeneous Graph Attention
In autonomous driving, accurately interpreting the movements of other road
users and leveraging this knowledge to forecast future trajectories is crucial.
This is typically achieved through the integration of map data and tracked
trajectories of various agents. Numerous methodologies combine this information
into a singular embedding for each agent, which is then utilized to predict
future behavior. However, these approaches have a notable drawback in that they
may lose exact location information during the encoding process. The encoding
still includes general map information. However, the generation of valid and
consistent trajectories is not guaranteed. This can cause the predicted
trajectories to stray from the actual lanes. This paper introduces a new
refinement module designed to project the predicted trajectories back onto the
actual map, rectifying these discrepancies and leading towards more consistent
predictions. This versatile module can be readily incorporated into a wide
range of architectures. Additionally, we propose a novel scene encoder that
handles all relations between agents and their environment in a single unified
heterogeneous graph attention network. By analyzing the attention values on the
different edges in this graph, we can gain unique insights into the neural
network's inner workings leading towards a more explainable prediction.
★ Distribution of Test Statistic for Euclidean Distance Matrices
Methods for global navigation satellite system fault detection using
Euclidean Distance Matrices have been presented recently in the literature.
Published methods define a test statistic in terms of eigenvalues of a certain
matrix, but the distribution of the test statistic was not known, which
presented a barrier to practical implementation. This document was a personal
correspondence from Beatty to Derek Knowles. It includes a brief derivation of
the distribution of the test statistic and a representative case showing that
the theoretical distribution closely matches a simulated empirical
distribution.
★ Crash Landing onto "you": Untethered Soft Aerial Robots for Safe Environmental Interaction, Sensing, and Perching
There are various desired capabilities to create aerial forest-traversing
robots capable of monitoring both biological and abiotic data. The features
range from multi-functionality, robustness, and adaptability. These robots have
to weather turbulent winds and various obstacles such as forest flora and
wildlife thus amplifying the complexity of operating in such uncertain
environments. The key for successful data collection is the flexibility to
intermittently move from tree-to-tree, in order to perch at vantage locations
for elongated time. This effort to perch not only reduces the disturbance
caused by multi-rotor systems during data collection, but also allows the
system to rest and recharge for longer outdoor missions. Current systems
feature the addition of perching modules that increase the aerial robots'
weight and reduce the drone's overall endurance. Thus in our work, the key
questions currently studied are: "How do we develop a single robot capable of
metamorphosing its body for multi-modal flight and dynamic perching?", "How do
we detect and land on perchable objects robustly and dynamically?", and "What
important spatial-temporal data is important for us to collect?"
★ Optimizing Search and Rescue UAV Connectivity in Challenging Terrain through Multi Q-Learning
Using Unmanned Aerial Vehicles (UAVs) in Search and rescue operations (SAR)
to navigate challenging terrain while maintaining reliable communication with
the cellular network is a promising approach. This paper suggests a novel
technique employing a reinforcement learning multi Q-learning algorithm to
optimize UAV connectivity in such scenarios. We introduce a Strategic Planning
Agent for efficient path planning and collision awareness and a Real-time
Adaptive Agent to maintain optimal connection with the cellular base station.
The agents trained in a simulated environment using multi Q-learning,
encouraging them to learn from experience and adjust their decision-making to
diverse terrain complexities and communication scenarios. Evaluation results
reveal the significance of the approach, highlighting successful navigation in
environments with varying obstacle densities and the ability to perform optimal
connectivity using different frequency bands. This work paves the way for
enhanced UAV autonomy and enhanced communication reliability in search and
rescue operations.
★ Natural Language Can Help Bridge the Sim2Real Gap
The main challenge in learning image-conditioned robotic policies is
acquiring a visual representation conducive to low-level control. Due to the
high dimensionality of the image space, learning a good visual representation
requires a considerable amount of visual data. However, when learning in the
real world, data is expensive. Sim2Real is a promising paradigm for overcoming
data scarcity in the real-world target domain by using a simulator to collect
large amounts of cheap data closely related to the target task. However, it is
difficult to transfer an image-conditioned policy from sim to real when the
domains are very visually dissimilar. To bridge the sim2real visual gap, we
propose using natural language descriptions of images as a unifying signal
across domains that captures the underlying task-relevant semantics. Our key
insight is that if two image observations from different domains are labeled
with similar language, the policy should predict similar action distributions
for both images. We demonstrate that training the image encoder to predict the
language description or the distance between descriptions of a sim or real
image serves as a useful, data-efficient pretraining step that helps learn a
domain-invariant image representation. We can then use this image encoder as
the backbone of an IL policy trained simultaneously on a large amount of
simulated and a handful of real demonstrations. Our approach outperforms widely
used prior sim2real methods and strong vision-language pretraining baselines
like CLIP and R3M by 25 to 40%.
comment: To appear in RSS 2024
★ ACES: A Teleoperated Robotic Solution to Pipe Inspection from the Inside
This paper presents the definition of a teleoperated robotic system for
non-destructive corrosion inspection of Steel Cylinder Concrete Pipes (SCCP)
from the inside. A general description of in-pipe environment and a state of
the art of in-pipe navigation solutions are exposed, with a zoom on the
characteristics of the SCCP case of interest (pipe dimensions, curves, slopes,
humidity, payload, etc.). Then, two specific steel corrosion measurement
techniques are described. In order to operate them, several possible
architectures of inspection system (mobile platform combined with a robotic
inspection manipulator) are presented, depending if the mobile platform is
self-centred or not and regarding the robotic manipulator type, namely a basic
cylindrical manipulator, a self centred one, or a force-controlled 6 degrees of
freedom (DoF) robotic arm. A suitable mechanical architecture is then selected
according to SCCP inspection needs. This includes relevant interfaces between
the robot, the corrosion measurement Non Destructive Testing (NDT) device and
the pipe. Finally, possible future adaptation of the chosen solution are
exposed.
★ Servo Integrated Nonlinear Model Predictive Control for Overactuated Tiltable-Quadrotors
Quadrotors are widely employed across various domains, yet the conventional
type faces limitations due to underactuation, where attitude control is closely
tied to positional adjustments. In contrast, quadrotors equipped with tiltable
rotors offer overactuation, empowering them to track both position and attitude
trajectories. However, the nonlinear dynamics of the drone body and the
sluggish response of tilting servos pose challenges for conventional cascade
controllers. In this study, we propose a control methodology for tilting-rotor
quadrotors based on nonlinear model predictive control (NMPC). Unlike
conventional approaches, our method preserves the full dynamics without
simplification and utilizes actuator commands directly as control inputs.
Notably, we incorporate a first-order servo model within the NMPC framework.
Through simulation, we observe that integrating the servo dynamics not only
enhances control performance but also accelerates convergence. To assess the
efficacy of our approach, we fabricate a tiltable-quadrotor and deploy the
algorithm onboard at a frequency of 100Hz. Extensive real-world experiments
demonstrate rapid, robust, and smooth pose tracking performance.
comment: This article has been submitted to RA-L
★ SEEK: Semantic Reasoning for Object Goal Navigation in Real World Inspection Tasks
Muhammad Fadhil Ginting, Sung-Kyun Kim, David D. Fan, Matteo Palieri, Mykel J. Kochenderfer, Ali-akbar Agha-Mohammadi
This paper addresses the problem of object-goal navigation in autonomous
inspections in real-world environments. Object-goal navigation is crucial to
enable effective inspections in various settings, often requiring the robot to
identify the target object within a large search space. Current object
inspection methods fall short of human efficiency because they typically cannot
bootstrap prior and common sense knowledge as humans do. In this paper, we
introduce a framework that enables robots to use semantic knowledge from prior
spatial configurations of the environment and semantic common sense knowledge.
We propose SEEK (Semantic Reasoning for Object Inspection Tasks) that combines
semantic prior knowledge with the robot's observations to search for and
navigate toward target objects more efficiently. SEEK maintains two
representations: a Dynamic Scene Graph (DSG) and a Relational Semantic Network
(RSN). The RSN is a compact and practical model that estimates the probability
of finding the target object across spatial elements in the DSG. We propose a
novel probabilistic planning framework to search for the object using
relational semantic knowledge. Our simulation analyses demonstrate that SEEK
outperforms the classical planning and Large Language Models (LLMs)-based
methods that are examined in this study in terms of efficiency for object-goal
inspection tasks. We validated our approach on a physical legged robot in urban
environments, showcasing its practicality and effectiveness in real-world
inspection scenarios.
★ EFEAR-4D: Ego-Velocity Filtering for Efficient and Accurate 4D radar Odometry
Odometry is a crucial component for successfully implementing autonomous
navigation, relying on sensors such as cameras, LiDARs and IMUs. However, these
sensors may encounter challenges in extreme weather conditions, such as
snowfall and fog. The emergence of FMCW radar technology offers the potential
for robust perception in adverse conditions. As the latest generation of FWCW
radars, the 4D mmWave radar provides point cloud with range, azimuth,
elevation, and Doppler velocity information, despite inherent sparsity and
noises in the point cloud. In this paper, we propose EFEAR-4D, an accurate,
highly efficient, and learning-free method for large-scale 4D radar odometry
estimation. EFEAR-4D exploits Doppler velocity information delicately for
robust ego-velocity estimation, resulting in a highly accurate prior guess.
EFEAR-4D maintains robustness against point-cloud sparsity and noises across
diverse environments through dynamic object removal and effective region-wise
feature extraction. Extensive experiments on two publicly available 4D radar
datasets demonstrate state-of-the-art reliability and localization accuracy of
EFEAR-4D under various conditions. Furthermore, we have collected a dataset
following the same route but varying installation heights of the 4D radar,
emphasizing the significant impact of radar height on point cloud quality - a
crucial consideration for real-world deployments. Our algorithm and dataset
will be available soon at https://github.com/CLASS-Lab/EFEAR-4D.
★ Integrating Uncertainty-Aware Human Motion Prediction into Graph-Based Manipulator Motion Planning
There has been a growing utilization of industrial robots as complementary
collaborators for human workers in re-manufacturing sites. Such a human-robot
collaboration (HRC) aims to assist human workers in improving the flexibility
and efficiency of labor-intensive tasks. In this paper, we propose a
human-aware motion planning framework for HRC to effectively compute
collision-free motions for manipulators when conducting collaborative tasks
with humans. We employ a neural human motion prediction model to enable
proactive planning for manipulators. Particularly, rather than blindly trusting
and utilizing predicted human trajectories in the manipulator planning, we
quantify uncertainties of the neural prediction model to further ensure human
safety. Moreover, we integrate the uncertainty-aware prediction into a graph
that captures key workspace elements and illustrates their interconnections.
Then a graph neural network is leveraged to operate on the constructed graph.
Consequently, robot motion planning considers both the dependencies among all
the elements in the workspace and the potential influence of future movements
of human workers. We experimentally validate the proposed planning framework
using a 6-degree-of-freedom manipulator in a shared workspace where a human is
performing disassembling tasks. The results demonstrate the benefits of our
approach in terms of improving the smoothness and safety of HRC. A brief video
introduction of this work is available as the supplemental materials.
★ Combining RL and IL using a dynamic, performance-based modulation over learning signals and its application to local planning
This paper proposes a method to combine reinforcement learning (RL) and
imitation learning (IL) using a dynamic, performance-based modulation over
learning signals. The proposed method combines RL and behavioral cloning (IL),
or corrective feedback in the action space (interactive IL/IIL), by dynamically
weighting the losses to be optimized, taking into account the backpropagated
gradients used to update the policy and the agent's estimated performance. In
this manner, RL and IL/IIL losses are combined by equalizing their impact on
the policy's updates, while modulating said impact such that IL signals are
prioritized at the beginning of the learning process, and as the agent's
performance improves, the RL signals become progressively more relevant,
allowing for a smooth transition from pure IL/IIL to pure RL. The proposed
method is used to learn local planning policies for mobile robots, synthesizing
IL/IIL signals online by means of a scripted policy. An extensive evaluation of
the application of the proposed method to this task is performed in
simulations, and it is empirically shown that it outperforms pure RL in terms
of sample efficiency (achieving the same level of performance in the training
environment utilizing approximately 4 times less experiences), while
consistently producing local planning policies with better performance metrics
(achieving an average success rate of 0.959 in an evaluation environment,
outperforming pure RL by 12.5% and pure IL by 13.9%). Furthermore, the obtained
local planning policies are successfully deployed in the real world without
performing any major fine tuning. The proposed method can extend existing RL
algorithms, and is applicable to other problems for which generating IL/IIL
signals online is feasible. A video summarizing some of the real world
experiments that were conducted can be found in https://youtu.be/mZlaXn9WGzw.
comment: 17 pages, 11 figures
★ Collision Avoidance Metric for 3D Camera Evaluation
3D cameras have emerged as a critical source of information for applications
in robotics and autonomous driving. These cameras provide robots with the
ability to capture and utilize point clouds, enabling them to navigate their
surroundings and avoid collisions with other objects. However, current standard
camera evaluation metrics often fail to consider the specific application
context. These metrics typically focus on measures like Chamfer distance (CD)
or Earth Mover's Distance (EMD), which may not directly translate to
performance in real-world scenarios. To address this limitation, we propose a
novel metric for point cloud evaluation, specifically designed to assess the
suitability of 3D cameras for the critical task of collision avoidance. This
metric incorporates application-specific considerations and provides a more
accurate measure of a camera's effectiveness in ensuring safe robot navigation.
★ JIGGLE: An Active Sensing Framework for Boundary Parameters Estimation in Deformable Surgical Environments
Nikhil Uday Shinde, Xiao Liang, Fei Liu, Yutong Zhang, Florian Richter, Sylvia Herbert, Michael C. Yip
Surgical automation can improve the accessibility and consistency of life
saving procedures. Most surgeries require separating layers of tissue to access
the surgical site, and suturing to reattach incisions. These tasks involve
deformable manipulation to safely identify and alter tissue attachment
(boundary) topology. Due to poor visual acuity and frequent occlusions,
surgeons tend to carefully manipulate the tissue in ways that enable inference
of the tissue's attachment points without causing unsafe tearing. In a similar
fashion, we propose JIGGLE, a framework for estimation and interactive sensing
of unknown boundary parameters in deformable surgical environments. This
framework has two key components: (1) a probabilistic estimation to identify
the current attachment points, achieved by integrating a differentiable
soft-body simulator with an extended Kalman filter (EKF), and (2) an
optimization-based active control pipeline that generates actions to maximize
information gain of the tissue attachments, while simultaneously minimizing
safety costs. The robustness of our estimation approach is demonstrated through
experiments with real animal tissue, where we infer sutured attachment points
using stereo endoscope observations. We also demonstrate the capabilities of
our method in handling complex topological changes such as cutting and
suturing.
comment: Accepted at RSS 2024
♻ ★ Learning Reward for Robot Skills Using Large Language Models via Self-Alignment ICML 2024
Learning reward functions remains the bottleneck to equip a robot with a
broad repertoire of skills. Large Language Models (LLM) contain valuable
task-related knowledge that can potentially aid in the learning of reward
functions. However, the proposed reward function can be imprecise, thus
ineffective which requires to be further grounded with environment information.
We proposed a method to learn rewards more efficiently in the absence of
humans. Our approach consists of two components: We first use the LLM to
propose features and parameterization of the reward, then update the parameters
through an iterative self-alignment process. In particular, the process
minimizes the ranking inconsistency between the LLM and the learnt reward
functions based on the execution feedback. The method was validated on 9 tasks
across 2 simulation environments. It demonstrates a consistent improvement over
training efficacy and efficiency, meanwhile consuming significantly fewer GPT
tokens compared to the alternative mutation-based method.
comment: ICML 2024
♻ ★ GrainGrasp: Dexterous Grasp Generation with Fine-grained Contact Guidance ICRA2024
One goal of dexterous robotic grasping is to allow robots to handle objects
with the same level of flexibility and adaptability as humans. However, it
remains a challenging task to generate an optimal grasping strategy for
dexterous hands, especially when it comes to delicate manipulation and accurate
adjustment the desired grasping poses for objects of varying shapes and sizes.
In this paper, we propose a novel dexterous grasp generation scheme called
GrainGrasp that provides fine-grained contact guidance for each fingertip. In
particular, we employ a generative model to predict separate contact maps for
each fingertip on the object point cloud, effectively capturing the specifics
of finger-object interactions. In addition, we develop a new dexterous grasping
optimization algorithm that solely relies on the point cloud as input,
eliminating the necessity for complete mesh information of the object. By
leveraging the contact maps of different fingertips, the proposed optimization
algorithm can generate precise and determinable strategies for human-like
object grasping. Experimental results confirm the efficiency of the proposed
scheme.
comment: This paper is accepted by the ICRA2024
♻ ★ Reinforcement Learning based Autonomous Multi-Rotor Landing on Moving Platforms
Multi-rotor UAVs suffer from a restricted range and flight duration due to
limited battery capacity. Autonomous landing on a 2D moving platform offers the
possibility to replenish batteries and offload data, thus increasing the
utility of the vehicle. Classical approaches rely on accurate, complex and
difficult-to-derive models of the vehicle and the environment. Reinforcement
learning (RL) provides an attractive alternative due to its ability to learn a
suitable control policy exclusively from data during a training procedure.
However, current methods require several hours to train, have limited success
rates and depend on hyperparameters that need to be tuned by trial-and-error.
We address all these issues in this work. First, we decompose the landing
procedure into a sequence of simpler, but similar learning tasks. This is
enabled by applying two instances of the same RL based controller trained for
1D motion for controlling the multi-rotor's movement in both the longitudinal
and the lateral directions. Second, we introduce a powerful state space
discretization technique that is based on i) kinematic modeling of the moving
platform to derive information about the state space topology and ii)
structuring the training as a sequential curriculum using transfer learning.
Third, we leverage the kinematics model of the moving platform to also derive
interpretable hyperparameters for the training process that ensure sufficient
maneuverability of the multi-rotor vehicle. The training is performed using the
tabular RL method Double Q-Learning. Through extensive simulations we show that
the presented method significantly increases the rate of successful landings,
while requiring less training time compared to other deep RL approaches.
Finally, we deploy and demonstrate our algorithm on real hardware. For all
evaluation scenarios we provide statistics on the agent's performance.
comment: 24 pages, 13 figures, 13 tables
♻ ★ NID-SLAM: Neural Implicit Representation-based RGB-D SLAM in dynamic environments
Neural implicit representations have been explored to enhance visual SLAM
algorithms, especially in providing high-fidelity dense map. Existing methods
operate robustly in static scenes but struggle with the disruption caused by
moving objects. In this paper we present NID-SLAM, which significantly improves
the performance of neural SLAM in dynamic environments. We propose a new
approach to enhance inaccurate regions in semantic masks, particularly in
marginal areas. Utilizing the geometric information present in depth images,
this method enables accurate removal of dynamic objects, thereby reducing the
probability of camera drift. Additionally, we introduce a keyframe selection
strategy for dynamic scenes, which enhances camera tracking robustness against
large-scale objects and improves the efficiency of mapping. Experiments on
publicly available RGB-D datasets demonstrate that our method outperforms
competitive neural SLAM approaches in tracking accuracy and mapping quality in
dynamic environments.
♻ ★ Autonomous Drone Racing: A Survey
Drew Hanover, Antonio Loquercio, Leonard Bauersfeld, Angel Romero, Robert Penicka, Yunlong Song, Giovanni Cioffi, Elia Kaufmann, Davide Scaramuzza
Over the last decade, the use of autonomous drone systems for surveying,
search and rescue, or last-mile delivery has increased exponentially. With the
rise of these applications comes the need for highly robust, safety-critical
algorithms which can operate drones in complex and uncertain environments.
Additionally, flying fast enables drones to cover more ground which in turn
increases productivity and further strengthens their use case. One proxy for
developing algorithms used in high-speed navigation is the task of autonomous
drone racing, where researchers program drones to fly through a sequence of
gates and avoid obstacles as quickly as possible using onboard sensors and
limited computational power. Speeds and accelerations exceed over 80 kph and 4
g respectively, raising significant challenges across perception, planning,
control, and state estimation. To achieve maximum performance, systems require
real-time algorithms that are robust to motion blur, high dynamic range, model
uncertainties, aerodynamic disturbances, and often unpredictable opponents.
This survey covers the progression of autonomous drone racing across
model-based and learning-based approaches. We provide an overview of the field,
its evolution over the years, and conclude with the biggest challenges and open
questions to be faced in the future.
comment: 26 pages, submitted to T-RO January 3rd, 2022; accepted to T-RO May
8th, 2024
♻ ★ Geo-Localization Based on Dynamically Weighted Factor-Graph
Miguel Ángel Muñoz-Bañón, Alejandro Olivas, Edison Velasco-Sánchez, Francisco A. Candelas, Fernando Torres
Feature-based geo-localization relies on associating features extracted from
aerial imagery with those detected by the vehicle's sensors. This requires that
the type of landmarks must be observable from both sources. This lack of
variety of feature types generates poor representations that lead to outliers
and deviations produced by ambiguities and lack of detections, respectively. To
mitigate these drawbacks, in this paper, we present a dynamically weighted
factor graph model for the vehicle's trajectory estimation. The weight
adjustment in this implementation depends on information quantification in the
detections performed using a LiDAR sensor. Also, a prior (GNSS-based) error
estimation is included in the model. Then, when the representation becomes
ambiguous or sparse, the weights are dynamically adjusted to rely on the
corrected prior trajectory, mitigating outliers and deviations in this way. We
compare our method against state-of-the-art geo-localization ones in a
challenging and ambiguous environment, where we also cause detection losses. We
demonstrate mitigation of the mentioned drawbacks where the other methods fail.
comment: This paper is published in the journal "IEEE Robotics and Automation
Letters"
♻ ★ ViKi-HyCo: A Hybrid-Control approach for complex car-like maneuvers
Edison P. Velasco Sánchez, Miguel Ángel Muñoz-Bañón, Francisco A. Candelas, Santiago T. Puente, Fernando Torres
While Visual Servoing is deeply studied to perform simple maneuvers, the
literature does not commonly address complex cases where the target is far out
of the camera's field of view (FOV) during the maneuver. For this reason, in
this paper, we present ViKi-HyCo (Visual Servoing and Kinematic
Hybrid-Controller). This approach generates the necessary maneuvers for the
complex positioning of a non-holonomic mobile robot in outdoor environments. In
this method, we use \hbox{LiDAR-camera} fusion to estimate objects bounding
boxes using image and metrics modalities. With the multi-modality nature of our
representation, we can automatically obtain a target for a visual servoing
controller. At the same time, we also have a metric target, which allows us to
hybridize with a kinematic controller. Given this hybridization, we can perform
complex maneuvers even when the target is far away from the camera's FOV. The
proposed approach does not require an object-tracking algorithm and can be
applied to any robotic positioning task where its kinematic model is known.
ViKi-HyCo has an error of 0.0428 \pm 0.0467 m in the X-axis and 0.0515 \pm
0.0323 m in the Y-axis at the end of a complete positioning task.
comment: This paper is published at the journal "IEEE Access"
♻ ★ Incremental Learning of Humanoid Robot Behavior from Natural Interaction and Large Language Models
Natural-language dialog is key for intuitive human-robot interaction. It can
be used not only to express humans' intents, but also to communicate
instructions for improvement if a robot does not understand a command
correctly. Of great importance is to endow robots with the ability to learn
from such interaction experience in an incremental way to allow them to improve
their behaviors or avoid mistakes in the future. In this paper, we propose a
system to achieve incremental learning of complex behavior from natural
interaction, and demonstrate its implementation on a humanoid robot. Building
on recent advances, we present a system that deploys Large Language Models
(LLMs) for high-level orchestration of the robot's behavior, based on the idea
of enabling the LLM to generate Python statements in an interactive console to
invoke both robot perception and action. The interaction loop is closed by
feeding back human instructions, environment observations, and execution
results to the LLM, thus informing the generation of the next statement.
Specifically, we introduce incremental prompt learning, which enables the
system to interactively learn from its mistakes. For that purpose, the LLM can
call another LLM responsible for code-level improvements of the current
interaction based on human feedback. The improved interaction is then saved in
the robot's memory, and thus retrieved on similar requests. We integrate the
system in the robot cognitive architecture of the humanoid robot ARMAR-6 and
evaluate our methods both quantitatively (in simulation) and qualitatively (in
simulation and real-world) by demonstrating generalized incrementally-learned
knowledge.
comment: This version (v3) adds further quantitative evaluation and many
improvements. v2 was presented at the Workshop on Language and Robot Learning
(LangRob) at the Conference on Robot Learning (CoRL) 2023. Supplementary
video available at https://youtu.be/y5O2mRGtsLM
♻ ★ Swarm Synergy: A Silent and Anonymous Way of Forming Community
In this paper, we present a novel swarm algorithm, swarm synergy, designed
for robots to form communities within a swarm autonomously and anonymously.
These communities, characterized as clusters of robots, emerge without any
(pre-defined or communicated) specific locations. Each robot operates as a
silent agent, having no communication capability, making independent decisions
based on local parameters. The proposed algorithm allows silent robots to
achieve this self-organized swarm behavior using only sensory inputs from the
environment. The robots intend to form a community by sensing the neighbors,
creating synergy in a bounded environment. We further infer the behavior of
swarm synergy to ensure the anonymity/untraceability of both robots and
communities and show the results on dynamicity of various parameters relevant
to swarm communities such as community size, community location, number of
community, no specific agent structure in the community, etc. The results are
further analysed to observe the effect of sensing limitations posed by the
onboard sensor's field of view. Simulations and experiments are performed to
showcase the algorithm's scalability, robustness, and fast convergence.
Compared to the state-of-art with similar objectives, the proposed
communication-free swarm synergy shows comparative time to synergize or form
communities. The proposed algorithm finds applications in studying crowd
dynamics under high-stress scenarios such as fire, attacks, or disasters.
comment: 22 Pages, 8 figures, 6 tables, pre-print version
♻ ★ Testing learning-enabled cyber-physical systems with Large-Language Models: A Formal Approach
Xi Zheng, Aloysius K. Mok, Ruzica Piskac, Yong Jae Lee, Bhaskar Krishnamachari, Dakai Zhu, Oleg Sokolsky, Insup Lee
The integration of machine learning (ML) into cyber-physical systems (CPS)
offers significant benefits, including enhanced efficiency, predictive
capabilities, real-time responsiveness, and the enabling of autonomous
operations. This convergence has accelerated the development and deployment of
a range of real-world applications, such as autonomous vehicles, delivery
drones, service robots, and telemedicine procedures. However, the software
development life cycle (SDLC) for AI-infused CPS diverges significantly from
traditional approaches, featuring data and learning as two critical components.
Existing verification and validation techniques are often inadequate for these
new paradigms. In this study, we pinpoint the main challenges in ensuring
formal safety for learningenabled CPS.We begin by examining testing as the most
pragmatic method for verification and validation, summarizing the current
state-of-the-art methodologies. Recognizing the limitations in current testing
approaches to provide formal safety guarantees, we propose a roadmap to
transition from foundational probabilistic testing to a more rigorous approach
capable of delivering formal assurance.