Accepted Papers – IJCAI 2026

IJCAI-ECAI 2026 Accepted Papers · Special Track on AI and Robotics

Presentation format

Every accepted paper is presented in two formats: an oral talk (6 min talk + 2 min for Q&A) — which must be delivered in person in Bremen by one of the authors — and a poster (A0, free format) during a dedicated poster session.

#AIR20

Session Aug 20 · 11:30–12:30 · Room 11

Poster Aug 20 · 16:30–18:00

GRALP: A Generative Representation Framework for Action Refinement and Latent Planning in Offline Robotic Control

Talha Zaidi, Arslan Munir, Sardar Ali Abbas

Offline robotic control requires long-horizon reasoning from fixed datasets while avoiding unsafe extrapolation beyond demonstrated behavior.
We propose GRALP, a principled framework that resolves this tension by jointly enforcing support preservation and controllability at the level of temporal abstraction. GRALP adopts a deliberate architectural separation: diffusion is used exclusively as a deterministic action decoder for executing fixed latent skills, while planning and value estimation operate entirely in latent space under conservative constraints. This design enables stable value learning, controllable skill composition, and efficient planning without trajectory-level diffusion sampling at inference. Across unified D4RL benchmarks, GRALP achieves the highest average performance on Navigation, Sequential (Kitchen), and Adroit domains while remaining competitive on locomotion tasks. On contact-rich RoboSuite manipulation with human demonstrations (Lift and Pick-and-Place), GRALP achieves consistently high success rates (over 94%). These results indicate that reliable long-horizon offline control emerges when expressivity is confined to execution and decision-making operates over support-aligned latent abstractions.

AIRGenerative AI, robotic foundation models, and reinforcement learningAIRRobot control, planning, and execution with guaranteesRobot control, planning, and execution with guaranteesSafe and robust control under uncertainty
#AIR22

Session Aug 20 · 11:30–12:30 · Room 11

Poster Aug 20 · 16:30–18:00

FILD-Nav:Vision-and-Language Navigation with Instruction Landmark Features in Continuous Environments

Chuangye Hu, Lulu Liu, Huaiwei Si, Yawen Zhao, Nan Ding

Vision-and-language navigation (VLN) requires agents to follow natural language instructions to navigate autonomously in continuous environments. However, existing approaches often lack high-level semantic guidance in waypoint prediction and explicit language–landmark alignment in cross-modal planning. To address these limitations, we propose FILD-Nav, a vision-and-language navigation framework that integrates instruction landmark features. FILD-Nav extracts task-relevant landmarks from instructions and incorporates landmark semantics into both waypoint prediction and topological planning. Specifically, landmark-guided waypoint prediction improves waypoint relevance, while landmark-enhanced cross-modal planning enables more effective long-horizon navigation. Extensive experiments on the VLN-CE benchmark demonstrate that FILD-Nav consistently outperforms prior methods, achieving improvements of 2% in Success Rate (SR), 3% in Success weighted by Path Length (SPL), and 7% in Oracle Success Rate (OSR), particularly in unseen environments.

Robot control, planning, and execution with guaranteesArchitectures connecting high-level intent and constraints to low-level trajectoriesLearning to understand, generalize, and explain actionsRobust generalization and transfer across tasks, objects, environments, embodiments, and long-horizon scenariosAIRRobot control, planning, and execution with guaranteesFoundations of human–robot interaction and assistanceLearning and inference methods for aligning robot behavior with human instructions, demonstrations, and feedback
#AIR36

Session Aug 20 · 11:30–12:30 · Room 11

Poster Aug 20 · 16:30–18:00

Perturbation-Resilient Autonomous Navigation with Distributionally Robust Reinforcement Learning

Zhaofan Zhang, Minghao Yang, Sihong Xie, Hui Xiong

The robustness of autonomous vehicles such as drones and Unmanned Surface Vehicles (USV) is crucial when facing unknown and complex marine environments, especially when heteroscedastic observational noise poses significant challenges to sensor-based navigation tasks. Recently, Distributional Reinforcement Learning (DistRL) has shown promising results in some challenging autonomous navigation tasks without prior environmental information. However, these methods overlook situations where noise patterns vary across different environmental conditions, hindering safe navigation and disrupting the learning of value functions. To address the problem, we propose DRIQN to integrate Distributionally
Robust Optimization (DRO) with implicit quantile networks to optimize worst-case performance under natural environmental conditions. Leveraging explicit subgroup modeling in the replay buffer, DRIQN incorporates heterogeneous noise sources and target robustness-critical scenarios. Experimental results based on the risk-sensitive environment demonstrate that DRIQN significantly outperforms state-of-the-art meth-
ods, achieving +13.51% success rate, -12.28% collision rate and +35.46% for time saving, +27.99% for energy saving, compared with the runner-up.

AIRRobot control, planning, and execution with guaranteesAIRGenerative AI, robotic foundation models, and reinforcement learningRobot control, planning, and execution with guaranteesSafe and robust control under uncertaintySafety, trustworthiness, generalizability, and evaluationTheoretical and algorithmic guarantees on safety, robustness, and out-of-distribution generalization
#AIR41

Session Aug 20 · 11:30–12:30 · Room 11

Poster Aug 20 · 16:30–18:00

Disturbance-Aware Hybrid Learning for Robust and Adaptive UAV Flight in Extreme Winds

Huidong Liu, Jiarui Dou, Jiangshan Ai, Enwen Hu, Xianlei Long, Mingyan Li, Chao Chen, Fuqiang Gu

Safe and precise maneuvering of quadrotor unmanned aerial vehicles (UAVs) in high-speed wind environments remains a critical challenge. Wind disturbances are nonlinear, time-varying, and difficult to model, causing traditional controllers to struggle with perception and compensation, especially under unseen wind distributions. To address these limitations, we introduce WA-TD3, a data-driven control framework that enables real-time wind disturbance perception and adaptive compensation without dedicated wind sensors. WA-TD3 employs a deep residual network to extract wind characteristics from temporal patterns in state deviations, forming a dynamics residual-driven perception mechanism that implicitly models and compensates for unknown winds. This residual is integrated into a perception-augmented reinforcement learning architecture, providing the policy with enhanced state information for proactive disturbance-aware control. Extensive experiments on complex trajectories under varying wind intensities demonstrate that WA-TD3 consistently outperforms state-of-the-art methods, achieving over 62% improvement in tracking accuracy under strong winds.

AIRRobot control, planning, and execution with guaranteesRobot control, planning, and execution with guaranteesSafe and robust control under uncertaintySafety, trustworthiness, generalizability, and evaluationTheoretical and algorithmic guarantees on safety, robustness, and out-of-distribution generalization
#AIR44

Session Aug 20 · 11:30–12:30 · Room 11

Poster Aug 20 · 16:30–18:00

RepSAM: Bridging Foundation Models to Robotic Vision via Representation-Guided Adaptation

Wenhui Chu

Robotic perception in unstructured environments remains challenging despite the zero-shot capabilities of foundation models such as SAM. This work attributes performance degradation to non-uniform representation shifts across transformer layers: shallow layers exhibit substantial domain gaps (CKA < 0.5), whereas deep layers transfer effectively (CKA > 0.7). Based on this observation, we propose RepSAM, a representation-guided parameter-efficient fine-tuning (PEFT) framework for adapting foundation models to robotic vision. RepSAM employs a theoretically grounded CKA-guided rank allocation strategy combined with a multi-modal fusion module for robust handling of challenging robotic scenarios, including transparent objects and cluttered scenes. Experimental evaluation across six benchmarks and robotic manipulation tasks demonstrates that RepSAM achieves 97.9% of full fine-tuning performance (89.0% vs. 90.9% mIoU) while reducing trainable parameters by 158× (from 632M to 4.0M). RepSAM outperforms DoRA by 7.9% mIoU with just 4 hours of training on a single A100 GPU (a 96× reduction from full fine-tuning, which takes 384 GPU-hours). These improvements are statistically significant (p<0.01) and translate to a 12.0% absolute improvement in robotic manipulation success rates over the LoRA (RGB) baseline.

Generative AI, robotic foundation models, and reinforcement learningGrounding large models in real robot interaction with cross-task, cross-environment, cross-platform, and open-domain generalization and transferLearning to understand, generalize, and explain actionsRobust generalization and transfer across tasks, objects, environments, embodiments, and long-horizon scenariosSafety, trustworthiness, generalizability, and evaluationVerification and validation methods for AI-powered robotic systemsSafety, trustworthiness, generalizability, and evaluationTheoretical and algorithmic guarantees on safety, robustness, and out-of-distribution generalization
#AIR46

Session Aug 20 · 11:30–12:30 · Room 11

Poster Aug 20 · 16:30–18:00

VLAs Are Confined yet Capable of Generalizing to Novel Tasks

Quanyi Li

Vision-language-action models (VLAs) often achieve high performance on demonstrated tasks but struggle significantly when required to extrapolate, recombining skills used in different tasks in novel ways. For instance, VLAs might successfully put the cream cheese in the bowl and put the bowl on top of the cabinet, yet still fail to put the cream cheese on top of the cabinet. This motivates us to investigate whether VLAs merely overfit to demonstrated tasks or still hold the potential to extrapolate. Our study uses text latent as the ingredient; it is a task-specific vector derived from the models’ hidden states. It thus encodes semantics necessary for completing a task and can be used to reconstruct the associated task behavior by writing it to the model’s residual stream. Furthermore, we find that skills used in distinct tasks can be combined to produce novel behaviors by blending their respective text latent. Applying this to π0, we increase its success rate from 9% to 83% on the proposed libero-ood benchmark, which features 20 tasks extrapolated from standard LIBERO tasks. This reveals that the skill representations encoded in text-latent are individual yet composable, while π0 fails to autonomously combine these representations for extrapolation. This also validates the design of libero-ood; it comprises tasks that the model fails, yet should be able to complete. We then tested other VLAs on libero-ood, and none of them achieved a success rate higher than 21%. Further analysis reveals VLAs share a common pattern to exhibit spatial overfitting, associating object names with where the object is spatially located in the demonstrated scene rather than achieving true object and goal understanding.

Safety, trustworthiness, generalizability, and evaluationVerification and validation methods for AI-powered robotic systemsAIRSafety, trustworthiness, generalizability, and evaluationIntentional, causal, and intuitive physics reasoningRepresentations and inference for intentions, goals, and affordancesLearning to understand, generalize, and explain actionsRobust generalization and transfer across tasks, objects, environments, embodiments, and long-horizon scenarios
#AIR54

Session Aug 20 · 11:30–12:30 · Room 11

Poster Aug 20 · 16:30–18:00

TeNet: Text-to-Network for Compact Policy Synthesis

Ariyan Bighashdel, Kevin Sebastian Luck

Robots that follow natural-language instructions often either plan at a high level using hand-designed interfaces or rely on large end-to-end models that are difficult to deploy for real-time control. We propose TeNet (Text-to-Network), a framework for instantiating compact, task-specific robot policies directly from natural language descriptions. TeNet conditions a hypernetwork on text embeddings produced by a pretrained large language model (LLM) to generate a fully executable policy, which then operates solely on low-dimensional state inputs at high control frequencies. By using the language only once at the policy instantiation time, TeNet inherits the general knowledge and paraphrasing robustness of pretrained LLMs while remaining lightweight and efficient at execution time. To improve generalization, we optionally ground language in behavior during training by aligning text embeddings with demonstrated actions, while requiring no demonstrations at inference time. Experiments on MuJoCo and Meta-World benchmarks show that TeNet produces policies that are orders of magnitude smaller than sequence-based baselines, while achieving strong performance in both multi-task and meta-learning settings and supporting high-frequency control. These results show that text-conditioned hypernetworks offer a practical way to build compact, language-driven controllers for ressource-constrained robot control tasks with real-time requirements.

AIRGenerative AI, robotic foundation models, and reinforcement learningGenerative AI, robotic foundation models, and reinforcement learningModel construction, representations, task and policy synthesis from language, demonstrations, or instructional videoGenerative AI, robotic foundation models, and reinforcement learningGrounding large models in real robot interaction with cross-task, cross-environment, cross-platform, and open-domain generalization and transferLearning to understand, generalize, and explain actionsRobust generalization and transfer across tasks, objects, environments, embodiments, and long-horizon scenarios
#AIR61

Session Aug 18 · 11:30–12:30 · Room 11

Poster Aug 18 · 16:30–18:00

PECHC: Robust Tactile Grasping Stabilization in Vision-Denied Peripersonal Space

Changlin Chen, Sisheng Chen, Hang Zhang, Xianglai Zhou, Zhen Tian, Weitao Liu, Feng-Qi Cui, Erbao Dong, Wenjing Chen

In the final "Last-Centimeter" phase of manipulation, where visual occlusion or calibration errors render vision unreliable, robots often suffer high failure rates due to local pose uncertainty and simulation dynamics deviations. To address these issues, this paper proposes the PECHC (Physics-Evolving Cascade Constraint and Human-Correction) algorithm. To rigorously isolate the contribution of tactile feedback in multi-finger coordination, we adopt a decoupled control strategy that focuses on grasp stabilization within the hand's workspace, acting as a fail-safe reflex. The core of our approach is Hybrid Correction Imitation Learning (HCIL), which establishes a "failure-triggered" human-machine mechanism to efficiently resolve the "model gap" via sparse expert corrections. To ensure sample efficiency and baseline performance, we introduce two supporting modules: Cascaded Constraint Scheduling (CCS) addresses the "geometric gap" by enforcing physically plausible behavioral constraints (geometric approach, force closure, and dynamic stability), while Temporal Heterogeneous Distillation (THED) resolves the "physical gap" by enabling implicit system identification from tactile history. Experiments demonstrate that PECHC achieves a 97.3% real-robot success rate on 150 objects from the Visual Dexterity Dataset under fully autonomous testing, where one object is used for one-time HCIL calibration and the remaining 149 objects are evaluated without further intervention. Compared to a standard Sim-to-Real reinforcement learning baseline (Vanilla PPO with Domain Randomization), PECHC delivers a significant performance improvement (+42.8%) and exhibits human-like force modulation capabilities for fragile objects.

AIRGenerative AI, robotic foundation models, and reinforcement learningLearning to understand, generalize, and explain actionsLearning from language, corrections, preferences, and sparse feedbackAIRRobot control, planning, and execution with guaranteesRobot control, planning, and execution with guaranteesSafe and robust control under uncertainty
#AIR66

Session Aug 18 · 11:30–12:30 · Room 11

Poster Aug 18 · 16:30–18:00

Operationalising Normative Rules in Autonomous Robotic Systems Through Context-Oriented Programming

Roberto Casadei, Martina De Sanctis, Gianluca Filippone, Sara Pettinari, Gian Luca Scoccia, Nicolas Troquard

The increasing sensitivity to human aspects in autonomous systems engineering calls for principled approaches to embed ethical and normative concerns into their behaviour. Indeed, recent research has focused on expressing and validating sets of social, legal, ethical, empathetic, and cultural (SLEEC) concerns as rules. However, existing work is limited to rule specification or verification, leaving the problem of semantic-preserving operationalisation of ethical rules in autonomous systems largely unaddressed. For this purpose, we provide an operational solution for ethical-aware autonomous systems, applied to the realm of multi-service robots.
Specifically, we devise a principled approach, named CO-SLEEC (Context-Oriented SLEEC), connecting the normative setting of SLEEC rules to context-oriented programming (COP). CO-SLEEC enables runtime adaptation while preserving the semantics of SLEEC rules during robot task execution. It features two reusable Python libraries for (i) parsing SLEEC rules into contextual elements for operationalising them, and (ii) connecting the operational model to the Robot Operating System (ROS), respectively.
We evaluate our implementation for correctness, efficiency, and maintainability over multi-service assistive robot scenarios.

Foundations of human–robot interaction and assistanceAI foundations for robot assistance and collaboration with humans, emphasizing explicit representations of tasks, roles, goals, norms, and shared contextRobot control, planning, and execution with guaranteesArchitectures connecting high-level intent and constraints to low-level trajectoriesAIRSafety, trustworthiness, generalizability, and evaluation
#AIR71

Session Aug 18 · 11:30–12:30 · Room 11

Poster Aug 18 · 16:30–18:00

Closing the Motion Execution Gap: From Semantic Motion Task Constraints to Kinematic Control

Simon Stelter, Vanessa Hassouna, Malte Huerkamp, Michael Beetz

This paper addresses the Motion Execution Gap, the disconnect between high-level symbolic task descriptions using semantic constraints and executable robot motions.
\textbf{Motion Statecharts} are introduced as an executable symbolic representation for complex motions.
They allow the arbitrary arrangement of motion constraints, monitors or nested statecharts in parallel and sequence.
World-centric motion specification and generalization across embodiments are enabled through the use of a unified differentiable kinematic world model of both, robots and environments.
Motion execution is realized through a \ac{lmpc}-based implementation of the task-function approach, in which smooth transitions during task switches are ensured using jerk bounds.
Cross-platform transferability was demonstrated by deploying the method on eight robot platforms, operating in diverse environments.
The proposed framework is called Giskard and is available open source (https://github.com/cram2/cognitive_robot_abstract_machine).

Robot control, planning, and execution with guaranteesArchitectures connecting high-level intent and constraints to low-level trajectoriesRobot control, planning, and execution with guaranteesOnline monitoring, explanation, and adaptationStructured, semantic, and explicit world models, digital twins, and action representationsExplicit action representations such as task schemas, scripts, and parameterized skillsStructured, semantic, and explicit world models, digital twins, and action representationsStructured world models and semantic digital twins
#AIR73

Session Aug 18 · 11:30–12:30 · Room 11

Poster Aug 18 · 16:30–18:00

Real-Time Multi-Robot Motion Planning with Safe-Interval Search and Learning-Guided Repair

Rajat Kumar, Kristin Predeck, Ken Meszaros, Trevor Dardik

Motion planning among multiple robots in a shared space is a fundamental yet computationally challenging problem in robotics, with applications ranging from warehouse automation to autonomous fleets. In this work, we introduce a fast, scalable motion planner that achieves real-time, collision-free trajectory planning via a two-staged algorithm combining deterministic search-based planning with machine learning-driven conflict resolution. We present a prioritized Safe Interval Path Planning algorithm (SIPP-PP) with a novel limited goal reservation strategy to prevent goal-blocking conflicts while allowing shared goal regions. We added a second layer of ML-guided Large Neighborhood Search (LNS) procedure to our SIPP-PP algorithm for improving success rates in highly congested environments via intelligent selection of conflict resolution actions. The result is a planning system that generates collision-free paths for multiple robots in complex environments within tens of milliseconds. For example, compared to recent advanced learning-based methods such as diffusion planners, our planner is two-to-three orders of magnitude faster. Our work demonstrates a multi-robot planner capable of real-time operation in dense scenarios, satisfying the stringent requirements of industrial applications such as drive units in fulfillment centers.

AIRRobot control, planning, and execution with guaranteesRobot control, planning, and execution with guaranteesIntegrated task and motion planning with feedback controlRobot control, planning, and execution with guaranteesSafe and robust control under uncertainty

GRALP: A Generative Representation Framework for Action Refinement and Latent Planning in Offline Robotic Control

FILD-Nav:Vision-and-Language Navigation with Instruction Landmark Features in Continuous Environments

Perturbation-Resilient Autonomous Navigation with Distributionally Robust Reinforcement Learning

Disturbance-Aware Hybrid Learning for Robust and Adaptive UAV Flight in Extreme Winds

RepSAM: Bridging Foundation Models to Robotic Vision via Representation-Guided Adaptation

VLAs Are Confined yet Capable of Generalizing to Novel Tasks

TeNet: Text-to-Network for Compact Policy Synthesis

PECHC: Robust Tactile Grasping Stabilization in Vision-Denied Peripersonal Space

Operationalising Normative Rules in Autonomous Robotic Systems Through Context-Oriented Programming

Closing the Motion Execution Gap: From Semantic Motion Task Constraints to Kinematic Control

Real-Time Multi-Robot Motion Planning with Safe-Interval Search and Learning-Guided Repair