Accepted Papers – IJCAI 2026

IJCAI-ECAI 2026 Accepted Papers · Demonstrations Track

Presentation format

Every accepted paper is presented in two formats: an oral talk (6 min talk + 2 min for Q&A) — which must be delivered in person in Bremen by one of the authors — and a poster (A0, free format) during a dedicated poster session.

#DM33

Wavelength.AI: Extending the Collaborative Game Wavelength as a Testbed for Studying Shared Understanding in Human–Agent Collaboration

Katelyn Morrison, Gabriel Enrique Gonzalez, Zahra Ashktorab, Matt Riemer, Andrew Anderson, Djallel Bouneffouf, Justin D. Weisz

AI's increasing role as a personal agent assisting knowledge workers in everyday tasks underscores the need to investigate how to help human–agent teams build a shared understanding. We extend the collaborative "mind-reading" game Wavelength to include an AI teammate, presenting the first demonstration of an LLM capable of playing this game. Based on our agent–agent play experiments, we developed Wavelength.AI, which implements two strategies to support shared understanding: an initial team grounding conversation and post-game reflective explanations. We interpret higher team scores as evidence for better shared understanding in a preliminary user study with 24 human–AI teams. Our findings reveal that Wavelength.AI can help researchers evaluate and design different strategies to shape human-agent teams' shared understanding. Human players can see if they are on the same wavelength with AI and view our demo video today at https://play-wavelength-ai.com.

AIHumans and AIAIAgent-based and Multi-agent Systems
#DM67

SoilNet App: AI-Assisted Expert-level Annotations of Soil Horizons

Vipin Singh, Joey Pruessing, Teodor Chiaburu, Einar Eberhardt, Sina Hesse, Stefan Broda, Frank Haußer, Felix Biessmann

Precise descriptions of soil horizons are required for policy makers, agriculture and many applications in civil engineering. Up to date correct soil horizon annotations require human experts as they follow complex hierarchical taxonomies. We present the SoilNet App, a web-based demonstrator that guides experts through relevant tasks for expert-level soil horizon annotations from soil profile images. To demonstrate the reliability of the SoilNet app we present results of a user study with soil horizon annotation experts, which highlights the difficulty of image-only-based annotation and suggests that collaborating with our model not only increases expert performance but also improves inter-annotator consistency. Our app is publicly accessible (https://soilnet.demo.calgo-lab.de).

AIMultidisciplinary Topics and ApplicationsAIHumans and AIAIComputer Vision
#DM68

A Resilient Solution for Sewer Overflow Monitoring Across Cloud and Edge

Vipin Singh, Tianheng Ling, Peter Ghaly, Felix Grimmeisen, Gregor Schiele, Felix Biessmann

Aging combined sewer systems in many historical cities are increasingly stressed by extreme rainfall events, which can trigger combined sewer overflows (CSO) with significant environmental and public health impacts. Forecasting the filling dynamics of overflow basins is critical for anticipating capacity exceedance and enabling timely preventive actions for CSO. We present a web-based demonstrator that integrates Deep Learning forecasting methods in both cloud and edge settings into an interactive monitoring dashboard for overflow monitoring, resilient to network outages.

AIMultidisciplinary Topics and ApplicationsAIPlanning and SchedulingAIHumans and AIAIAI Ethics, Trust, Fairnes
#DM76

Neuro-Symbolic Logical Reasoning with Textual Entailment

Zacchary Sadeddine, Fabian Suchanek

Large Language Models can use logical deduction to answer natural language questions, but they remain black-boxes with potentially erroneous chains-of-thought. In this paper, we adapt VANESSA, a neuro-symbolic method for chain-of-thought verification, to reasoning-based question answering. VANESSA combines a logical reasoner with a neural textual entailment model to handle phrasing variations. Building on VANESSA, we develop a transparent, logic-based approach to answer natural language questions even with phrase variations. Our experiments across various datasets show our method is competitive with the state of the art, while also delivering proof trees for its answers. A demo interface allows users to interact with the system.

AINatural Language ProcessingAIKnowledge Representation and Reasoning
#DM80

RUVA: Personalized Transparent On-Device Graph Reasoning

Gabriele Conte, Alessio Mattiace, Gianni Carmosino, Potito Aghilar, Giovanni Servedio, Francesco Musicco, Vito Walter Anelli, Tommaso Di Noia, Francesco Maria Donini

The Personal AI landscape is currently dominated by "Black Box" Retrieval-Augmented Generation. While standard vector databases offer statistical matching, they suffer from a fundamental lack of accountability: when an AI hallucinates or retrieves sensitive data, the user cannot inspect the cause nor correct the error. Worse, "deleting" a concept from a vector space is mathematically imprecise, leaving behind probabilistic "ghosts" that violate true privacy. We propose Ruva, the first "Glass Box" architecture designed for Human-in-the-Loop Memory Curation.
Ruva grounds Personal AI in a Personal Knowledge Graph, enabling users to inspect what the AI knows and to perform precise redaction of specific facts. By shifting the paradigm from Vector Matching to Graph Reasoning, Ruva ensures the "Right to be Forgotten." Users are the editors of their own lives; Ruva hands them the pen.

AIKnowledge Representation and ReasoningAIAI Ethics, Trust, FairnesAIHumans and AIAINatural Language Processing
#DM81

MC-RAG System: A Structure-Driven RAG System for Multi-Constraint Queries

Xiao Zhang, Yang Wan, Yi Li, Miao Xie, Chunli Lv

Retrieval-Augmented Generation (RAG) systems are widely adopted in question answering, yet they often fail to satisfy complex multi-constraint queries, leading to constraint violations, factual inconsistencies, or hallucinations. We present Structure-Driven RAG System for Multi-Constraint Queries(MC-RAG), a structure-driven RAG system that reformulates retrieval as a subgraph matching problem over a knowledge graph. By integrating semantic and structural embeddings with path-level indexing, MC-RAG performs interpretable, structure-aware, and constraint-consistent retrieval and generation. During the demonstration, participants can input medical or encyclopedic multi-constraint queries, visualize how the system parses constraints, performs structural matching, and generates answers, thereby experiencing an end-to-end, interactive, and explainable RAG pipeline.
A demo video is available at https://youtu.be/J8kahzmAnu0.

AINatural Language ProcessingAIKnowledge Representation and Reasoning
#DM84

vSpeedUI: Turning Past GUI Experience into Fast Executable Plans

Xiaohan Zheng, Yihong Chen, Haiquan Qiu, Quanming Yao

LLM-based mobile GUI agents usually invoke large models for nearly every micro-action, making real-device automation slow even when similar workflows have been completed before. We present vSpeedUI, a public demo system that turns past GUI experience into fast executable plans. It organizes historical trajectories into an Executable Experience Graph (EXG), where UI states are connected by Semantic Step Summaries with explicit preconditions. At task initialization, vSpeedUI performs Global Look-ahead Planning to retrieve, validate, and rank candidate transitions into a pre-verified plan. During execution, the agent uses lightweight graph traversal with state localization, target adaptation, and fallback when needed. On HarmonyOS, vSpeedUI reduces LLM latency and total task time while maintaining strong success rates, showing a practical route toward data-efficient GUI automation. Code is available at: https://github.com/LARS-research/vSpeedUI.

AIAgent-based and Multi-agent SystemsAIPlanning and SchedulingAIKnowledge Representation and ReasoningAIMachine Learning
#DM87

eNNcode: Optimization-Based Analysis of Neural Networks

Muhammad Sabri Farag Atallah, Lukas Dankwart, Daniel Neider, Mustafa Yalçıner

The increasing use of neural networks (NNs) in high-stakes decision-making requires rigorous analysis to ensure safety, fairness, and explainability.
Formal verification tools for neural networks typically focus on determining the satisfiability of properties such as safety or fairness.
However, many fairness and explainability tasks go beyond satisfiability and instead rely on arbitrary optimization objectives.
To address such problems, neural networks are often encoded as mixed-integer linear optimization (MILO) problems with linear objectives.
In practice, these encodings are usually implemented in an ad-hoc manner, limiting comparability across works, reducing transparency, and increasing implementation effort.
We address this gap by introducing eNNcode, a user-friendly PyPI library that converts any piecewise-linear neural network in Open Neural Network Exchange (ONNX) format into a MILO instance.
The library supports arbitrary constraints on input and output nodes, as well as user-defined optimization objectives.
Our experiments show that eNNcode achieves performance comparable to existing libraries, despite its simplicity and ease of use.
Overall, eNNcode facilitates reproducible and standardized optimization-based analysis of neural networks.

AIConstraint Satisfaction and Optimization
#DM99

AI-Powered Interactive Multimodal Digital Book & Online Shop For Blind and Visually Impaired Users

Mazen Salous, Daniel Westphal, Wilko Heuen, Ayoub Ben Dhiab, Charles Hudin, Sabrina Paneels, Susanne Boll, Larbi Abdenebaoui

We present an AI-powered interactive multimodal system that enriches digital image accessibility for blind and visually impaired (BVI) users. Our demonstration showcases two application domains: (1) an educational digital book, and (2) an online shopping interface. In both use-cases, users can virtually feel material textures (leather, wood, etc.) and engage in voice-driven inquiry about images. The system integrates state-of-the-art AI components – including voice-to-voice conversational agents, vision models for object segmentation, and a custom 16-actuator vibrotactile display – to provide multimodal feedback (haptic vibrations, spoken descriptions, and audio cues).
The result is an inclusive technology with significant societal benefit, empowering BVI users to learn and shop more independently through natural multimodal interactions.

AIHumans and AIAIComputer VisionAIMachine Learning
#DM100

Anti-Slavery Intelligence (ASI): An AI-Powered Tool for Modern Slavery Compliance Analysis and Remediation

Mahmoud Gad, Abdessalam Elhabbash, Steven Young

We present Anti-Slavery Intelligence (ASI), a deployed AI system that analyses corporate modern slavery statements to identify compliance gaps and generate prioritised remediation advice. ASI orchestrates a multi-model pipeline: Gemini-2.5-Flash for vision-enhanced PDF parsing and structured compliance scoring against 48 expert-defined criteria, and Gemini-2.5-Pro for synthesising company-specific, timeline-based recommendations. A benchmarking engine compares each statement against industry and FTSE index averages across 11 sectors. Evaluation on 95 expert-annotated statements yields an F1-score of 0.86 and recall of 0.94, reducing time to generate an initial compliance assessment from several hours to approximately five minutes per document. ASI is publicly available at https://www.antislaveryintelligence.co.uk/ with free access for academics and NGOs. A demonstration video is at https://youtu.be/DNhdEGrRwzg.

AIMultidisciplinary Topics and ApplicationsAINatural Language ProcessingAIHumans and AI
#DM105

Visualizing Deep Agents in Long-Horizon Tasks: Towards Explainable and Trustworthy Agentic AI

Amirkia Rafiei Oskooei, Mehmet S. Aktas

The transition from prompt-based Large Language Models (LLMs) to autonomous Deep Agents has enabled the automation of long-horizon tasks. However, as these agents adopt hierarchical architectures with nested tool usage, they suffer from significant opacity. Existing linear tracing tools fail to capture the multi-dimensional complexity of parallel sub-agent execution, hindering both debugging and user trust. We propose a general-purpose observability framework that decomposes agent execution into four distinct visualization dimensions: Temporal, Cognitive, Hierarchical, and Spatial. We validate this framework through RepoLearn, an open-source workbench for automated codebase comprehension. Our user study demonstrates that this multi-dimensional approach reduces the Time-to-Insight (TTI) for complex behavioral analysis by 56% and significantly lowers cognitive load (NASA-TLX) compared to state-of-the-art linear traces. The source code is available at https://github.com/amirkiarafiei/repo-learn and the demo at https://www.youtube.com/watch?v=s3U6E9o94gk.

AIAgent-based and Multi-agent SystemsAIHumans and AIAIAI Ethics, Trust, Fairnes
#DM110

DeepMed Search: An Open-Source Agentic Platform for Medical Deep Research with Introspective Verification

Maolin Liu, Fanyu Xu, Ruoqing Xu, JiaHang Zhang, Hao Wang, Rui Wang

Navigating the deluge of heterogeneous medical data, from academic literature (PubMed) to clinical guidelines (Web) and private knowledge bases remains a critical bottleneck for evidence-based medicine. While commercial black-box tools lack transparency, standard open-source RAG implementations frequently suffer from ``reasoning drift'' when handling complex, long-tail queries. We present DeepMed Search, a fully open-source, agentic platform designed for transparent medical deep research. Built on a high-performance Next.js architecture, DeepMed Search features a source-adaptive router that autonomously dispatches sub-queries to PubMed, web search, or local graph-based knowledge bases based on information density. Crucially, the platform integrates an introspective verification module, powered by a causal-consistent multi-agent debate framework, to validate retrieved evidence against diagnostic logic before synthesis. To demonstrate its robustness, we showcase DeepMed Search's ability to autonomously decompose high-difficulty rare disease queries, filter out confounding noise, and generate structured, citation-backed research reports in minutes. By open-sourcing this software, we provide the community with a robust infrastructure to democratize access to trustworthy, glass-box medical reasoning at a commercial-grade performance level, which is publicly available at: https://www.deepmedsearch.cloud and the demonstration video is available at: https://youtu.be/4U4aok8yLpk.

AIAgent-based and Multi-agent SystemsAIAI Ethics, Trust, Fairnes
#DM113

Intent Hub: A Self-Healing Semantic Agent Routing System for Resolving Overlap in Agentic Systems

Chenrui Liang, Peng Xu, Xinyuan Liu

Semantic overlap challenges accurate agent routing in large-scale agentic systems. We present Intent Hub, a self-healing semantic agent routing system that combines offline asynchronous HITL repair with online Dual Filtering. LLM-generated positive and adversarial negative utterances help construct explicit decision boundaries, enabling interpretable millisecond-level routing. Intent Hub further supports interactive semantic debugging, allowing developers to diagnose conflicts, repair rules, and observe online routing changes.

AIAgent-based and Multi-agent SystemsAIHumans and AIAINatural Language Processing
#DM114

A Scalable Cross-Domain Event Extraction System via a Unified Generative Training Framework

Siting Liang, Omar Adjali, Bhatti Omair, Daniel Sonntag

Event extraction is fundamental to information extraction. Prior approaches often separate event detection and argument extraction or depend on dataset-specific designs, limiting scalability and cross-domain generalization. We propose a unified generative, sequence-to-sequence framework that performs all event extraction subtasks jointly and supports both end-to-end and pipeline configurations. We fine-tune pre-trained language models on multiple event datasets across diverse domains, enabling a single model to retain domain-specific semantics while generalizing over large, evolving label spaces. Cross-domain experiments show strong, robust performance across datasets, demonstrating a scalable solution for real-world event extraction. We demonstrate these capabilities through a web-based application tailored for researchers and practitioners. The platform supports inspection of different configurations and facilitates cross-domain comparisons.
#DM118

RoboVineSim: A Simulation Tool for Human-Robot Collaboration in Vineyard Harvesting

Dimitrios Troullinos, Maria Nuria Conejero, Filippo Bistaffa, Jose M. Bengochea-Guevara, Ángela Ribeiro, Juan A. Rodriguez-Aguilar

In agricultural tasks, manual grape harvesting remains a labor-intensive activity facing challenges of efficiency, labor shortages, and sustainability. To this end, tailor-made robotic systems have been designed with the capabilities to transfer heavy boxes, navigate vineyard terrains, communicate, accurately locate, and safely interact with humans. The introduction of collaborative robotic fleets alongside human workers in large-scale vineyard harvesting effectively presents a Multi-Robot Task Allocation (MRTA) problem, where the real-world domain possesses characteristics that, when combined, pose a challenging research endeavor and align with open issues in MRTA research. Here, we present RoboVineSim, a simulation tool that can capture any vineyard area using geographical data and model the behavior of humans and robots in the environment. In addition, we have established the necessary mechanisms to facilitate the development of novel MRTA methods in this domain.

AIAgent-based and Multi-agent SystemsAIConstraint Satisfaction and OptimizationAIMultidisciplinary Topics and Applications
#DM121

ADP-MA: An Interactive System for Autonomous Data Processing using Meta-Agents

Udayan Khurana

We demonstrate ADP-MA (Autonomous Data Processing using Meta-Agents), a system that autonomously solves a complex and diverse set of data processing tasks. Three domain-agnostic meta-agents coordinate task-specific ground agents through a multi-stage pipeline: data understanding, planning, critique, expansion, execution, and finalization. Errors are caught early via progressive sampling on small data subsets before running on full data. The system supports three execution strategies, twelve domain knowledge packs, and confidence-based early stopping. An interactive web interface lets users watch pipelines being built in real time, replay completed runs at any stage, and compare results across cases. On four benchmarks, ADP-MA reaches 90.6% on DSEval, 44.8% on KramaBench, 50.0% on DA-Code, and 70.0% on AgentBench, outperforming published single-agent baselines.

AIAgent-based and Multi-agent SystemsAIHumans and AI
#DM122

Making Weak Supervision Interactive: Exploring Transfer from Sound Libraries to Passive Acoustic Monitoring Data

Novruz Mammadli, Rida Saghir, Kanwar Ammar Ali, Prathmesh Doddanawar, Thiago S. Gouvêa, Daniel Sonntag

Passive Acoustic Monitoring (PAM), an increasingly popular method for wildlife monitoring, generates large volumes of data whose analysis depends on instance-level annotations that are costly to obtain. Archival sound collections provide weak labels that lack temporal localisation. In prior work, we demonstrated that Multiple Instance Learning (MIL) can extract approximate event locations from weakly labelled PAM data, suggesting it may be applied to sound collection data.
This demo operationalizes that approach within an interactive workflow that connects weakly annotated sound collections to downstream PAM deployment. The system supports configurable MIL-based localisation, lightweight interactive refinement, and transfer to an independent PAM dataset.
We carried out a preliminary evaluation with an actual sound library from a museum collection and a benchmark PAM dataset. Results confirm that weakly annotated sound collections can serve as a viable training signal for downstream PAM detection and illustrate differences between alternative MIL instantiations under real transfer conditions. (Video available at https://cst.dfki.de/projects-weak-supervision-demo)

AIMachine LearningAIMultidisciplinary Topics and ApplicationsAIHumans and AI
#DM124

AwakeForest: An Interactive Geospatial Platform for Large-Scale Forest Imagery

Suraj Prasai, Kangning Cui, Rongkun Zhu, Sarra Alqahtani, Ying Zhang, Victor Paúl Pauca, Miles R. Silman, Fan Yang

Forest imagery analysis often involves multiple tightly coupled vision tasks, which must be performed under substantial variation in geographic regions, sensors, and acquisition conditions. However, practitioners often lack a unified tool that is geospatial-native, cloud-optimized, and ML-integrated for end-to-end workflows spanning annotation, prediction, visualization, and downstream analysis at scale. We present AwakeForest, an interactive end-to-end platform designed for large-scale forest imagery that integrates model-assisted inference, automatic annotation, and human-in-the-loop refinement within a single workflow. Our platform supports plug-and-play integration of pretrained models and enables scalable interaction with forest imagery ranging from standard aerial scenes to large orthomosaics that can span several gigabytes to hundreds of gigabytes. AwakeForest produces analysis-ready outputs that can be directly used for downstream analysis and to support iterative model and annotation updates on new scenes. We demonstrate the system on the PALMS dataset and illustrate how AwakeForest supports an end-to-end workflow for practical forest management and analysis.

AIComputer VisionAIMultidisciplinary Topics and ApplicationsAINatural Language ProcessingAIKnowledge Representation and Reasoning
#DM125

Low-Latency Real-Time Audio Game Commentary System via LLM-Based Parallel Text Generation

Ryota Kawamatsu, Anum Afzal, Yuki Saito, Shinnosuke Takamichi, Graham Neubig, Katsuhito Sudoh, Hiroya Takamura, Tatsuya Ishigaki

We present a low-latency real-time audio game commentary system that generates spoken commentary directly from live gameplay video. In this end-to-end setting, a key bottleneck is accumulated waiting time; conventional pipelines capture frames, generate text, and synthesize speech sequentially for each utterance, and do not request the next generation until speech playback has completed. This strict sequentiality causes long and unnatural silence between utterances. To address this latency bottleneck, our system runs text generation in parallel with speech playback and buffers multiple candidate utterances ahead of time, enabling immediate synthesis at playback boundaries. Experiments on fast-paced game videos show that our parallel design reduces the mean inter-utterance silence from 9.6 seconds to 0.3 seconds compared to sequential baselines. It also improves similarity to professional speaking--silence timing patterns by over 40 %, and a user study with 120 experienced game players confirms significantly improved perceived speaking rhythm. Our demo video is available at: https://youtu.be/pmrRUlvav8M.

AIMultidisciplinary Topics and ApplicationsAINatural Language Processing
#DM130

DataCube: A Video Retrieval Platform via Natural Language Semantic Profiling

Yiming Ju, Hanyu Zhao, Quanyue Ma, Donglin Hao, Chenwei Wu, Ming Li, Songjing Wang, Tengfei Pan

Large-scale video repositories are increasingly available for modern video understanding and generation tasks. However, transforming raw videos into high-quality, task-specific datasets remains costly and inefficient. We present DataCube, an intelligent platform for automatic video processing, multi-dimensional profiling, and query-driven retrieval. DataCube constructs structured semantic representations of video clips and supports hybrid retrieval with neural re-ranking and deep semantic matching. Through an interactive web interface, users can efficiently construct customized video subsets from massive repositories for training, analysis, and evaluation, and build searchable systems over their own private video collections. The system is publicly accessible at https://datacube.baai.ac.cn/. Demo Video: https://youtu.be/L7bKfPBm2tU

AIComputer VisionAISearchAIData MiningAINatural Language Processing
#DM134

Visualizing and Interacting with Model Representation Space for Human-Centric Active Learning

Rida Saghir, Thiago S. Gouvêa, Daniel Sonntag

Active learning reduces annotation effort by selecting informative samples, yet most approaches remain model-driven, offering users little control over training or support for understanding model behaviour. Human-centric active learning brings users further into the loop by introducing additional points of interaction, particularly in the sample selection process. However, such systems are typically demonstrated using fixed feature projections or visualizations of shallow classifier outputs. We present a representation-centric active learning tool in which interaction takes place directly within the model’s representation space. By operating in the same space the model uses for decision making, the interface supports the co-evolution of representations and user understanding. We additionally report initial qualitative (think-aloud) and quantitative findings from a pilot study, illustrating that such representation-centric frameworks can achieve comparable performance to standard baselines while fostering improved human–model collaboration. (Video and the code available at https://cst.dfki.de/demo-interacting-model-space)

AIHumans and AIAIMachine Learning
#DM137

PEACE 2.0: Grounded Explanations and Counter-Speech for Combating Hate Expressions

Greta Damo, Stéphane Petiot, Elena Cabrio, Serena Villata

The increasing volume of hate speech on online platforms poses significant societal challenges. While the Natural Language Processing community has developed effective methods to automatically detect the presence of hate speech, responses to it, called counter-speech, are still an open challenge. We present PEACE 2.0, a novel tool that, besides analysing and explaining why a message is considered hateful or not, also generates a response to it. More specifically, PEACE 2.0 has three main new functionalities: leveraging a Retrieval-Augmented Generation (RAG) pipeline i) to ground HS explanations into evidence and facts, ii) to automatically generate evidence-grounded counter-speech, and iii) exploring the characteristics of counter-speech replies.
By integrating these capabilities, PEACE 2.0 enables in-depth analysis and response generation for both explicit and implicit hateful messages.

AINatural Language ProcessingAIMultidisciplinary Topics and ApplicationsAIAI Ethics, Trust, Fairnes
#DM138

DeepLog: A Software Framework for Modular Neurosymbolic AI

Robin Manhaeve, Stefano Colamonaco, Vincent Derkinderen, Rik Adriaensen, Lucas Van Praet, Luc De Raedt, Giuseppe Marra

DeepLog is an operational neurosymbolic framework that unifies logic and deep learning within standard PyTorch workflows. While existing neurosymbolic systems focus on a particular paradigm and semantics, DeepLog serves as a universal backend that can emulate many systems in the neurosymbolic alphabet soup. By treating diverse neurosymbolic languages as high-level specifications, the DeepLog software automatically compiles them into optimized arithmetic circuits. This design lowers the barrier for machine learning practitioners by treating logic as composable modules, while providing neurosymbolic developers with a shared, high-performance basis for prototyping new integration strategies. The code is available here: https://github.com/ML-KULeuven/deeplog

AIMachine LearningAIKnowledge Representation and ReasoningAIUncertainty in AI
#DM143

A Privacy-Preserving Intelligent Assistant for Clinical Psychology Practice

Aaron Pico, Joaquin Taverner, Emilio Vivancos, Ana Garcia-Fornes, Vicent Botti

This paper describes a fully local, privacy-preserving intelligent system designed to assist in clinical psychology practice. The system automatically transcribes therapy sessions performing speaker attribution. Beyond transcription, the tool enhances clinical reasoning by detecting cognitive distortions and emotional patterns utilizing specialized deep learning classifiers and Large Language Models (LLMs). By guiding an LLM locally through a multi-step analysis process, the assistant synthesizes the enriched data and generates a series of analysis reports and clinical documentation of the session. As a result, the assistant reduces the administrative burden on professionals while preserving privacy with an edge computing approach in which the data never leaves the therapist's device. Finally, the assistant uses human-in-the-loop validation so that the professional always remains in control, ensuring clinical accuracy and trust.

AIHumans and AIAINatural Language ProcessingAIMultidisciplinary Topics and ApplicationsAIAI Ethics, Trust, Fairnes
#DM149

Looking at Your Photo, What Comes to Mind? Personalized Memory Internalization for Dementia Reminiscence

Shunjie Wen, Kyung-Hwan Lee, Gimoon Lee, Seongsoo Heo, Jaeyeon Lee, Sangyeob Shin, Gyuwon Moon, Jiwoong Kim, Dong-Wan Choi

We present PhoMi, an interactive recall assistant that supports dementia reminiscence by engaging users with personal photographs captured via a live camera interface. Given a photograph, PhoMi delivers spoken questions and receives spoken responses, creating an accessible reminiscence setting with reduced reliance of human therapists. Over repeated sessions, user responses are incorporated into lightweight adapters of a vision–language model, enabling progressively personalized question generation without reprocessing prior interaction logs. PhoMi serves as a prototype toward scalable, lifelong AI companions for dementia reminiscence.

AIHumans and AIAINatural Language Processing
#DM159

TraceBrain: An Open-Source Framework for Agentic Trace Management

Quy Minh Le, Oscar Cao, Hoang Quoc Viet Pham, Hoang Thanh Lam, Hoang D. Nguyen

As Large Language Model (LLM) agents scale toward real-world deployment, they generate large volumes of fragmented, non-standardized execution traces. Many existing observability platforms treat these traces primarily as passive logging artifacts, lacking the unified infrastructure to operationalize them for active governance and agent adaptation across heterogeneous single-agent and multi-agent workflows. To address this gap, we introduce TraceBrain, an open-source infrastructure for autonomous agent trace management. TraceBrain adopts a framework-agnostic architecture built on a delta-based OpenTelemetry (OTLP) schema, which mitigates context explosion and supports on-demand reconstruction of long-horizon execution trajectories. For runtime governance, TraceBrain implements uncertainty-driven supervision, where an internal Trace Evaluator prioritizes ambiguous trajectories for human review, thereby reducing manual annotation workload. Moving beyond passive observation, the platform incorporates a hybrid semantic-lexical retrieval engine that combines dense vector similarity and exact keyword matching for operational memory retrieval. Furthermore, an automated curriculum mechanism continuously synthesizes failure patterns into structured training artifacts. Empirical evaluations demonstrate a ~100x reduction in storage overhead together with high precision in uncertainty-guided trace supervision. Ultimately, TraceBrain transforms the execution history into a reusable operational memory substrate, bridging runtime observability with retrieval-driven agent adaptation. The system is publicly available at https://github.com/ToolBrain/TraceBrain.

AIAgent-based and Multi-agent SystemsAIKnowledge Representation and ReasoningAISearchAIUncertainty in AI
#DM165

ORBIT: Optimal Recommendation Framework for Boarding with Interpretable Timelines

Joonseong Kang, Seung Ha Hwang, Jaehun Bang, Jiyoung Ko, Subeen Park, Jeffrey Gennari

Determining when to leave for the airport is a complex problem shaped by flight delay risk, traffic, weather, airport congestion, and passenger-specific constraints. Existing services rely on isolated delay estimates or simple travel time calculations, failing to capture real-time context. We propose ORBIT, a decision-support system that generates personalized leave-by recommendations by integrating predictive models with real-time operational signals. The system combines user input normalization, real-time data acquisition, Transformer-based delay prediction, and LLM-based reasoning to jointly account for statistical forecasts and dynamic factors like congestion, previous-leg propagation, weather, traffic, and airport processing times. Instead of a standalone delay estimate, ORBIT produces an actionable and interpretable departure plan. We implement ORBIT as an interactive system and demonstrate its applicability. A video demo is available at https://youtu.be/fZX42jceIM4.

AIMultidisciplinary Topics and ApplicationsAIPlanning and Scheduling
#DM170

Interactive Open-Set Semantic Mapping with a 3D Scene Graph Backend

Felix Igelbrink, Lennart Niecksch, Martin Günther, Marian Renz, Oscar Lima, Martin Atzmueller

While Open-Set Semantic Mapping and 3D Semantic Scene Graphs (3DSSGs) have become established paradigms in robotic perception in recent years, most existing works are limited to small environments or sacrifice geometric detail and instance granularity for scalability. Deploying these systems at scale for large multi-room environments remains a major challenge due to the computational overhead of high-dimensional feature integration and the maintenance of the 3DSSG structure. In this paper, we demonstrate a modular mapping architecture that establishes 3D Semantic Scene Graphs (3DSSGs) as its foundational backend. Unlike approaches that generate scene graphs as a post-processing step, our system maintains the graph as the primary, incrementally updated knowledge representation. Our architecture is optimized for GPU-accelerated operations, enabling the dense representation of extensive environments containing thousands of unique object instances, supporting open-vocabulary queries via CLIP features without requiring any additional post-processing steps. In this live demonstration, we showcase our pipeline processing large-scale data from the Habitat Matterport 3D (HM3D) dataset as well as live data collected from a handheld device. Attendees will interact with the generated maps by performing real-time, open-set queries (e.g., “find the vintage wooden chair”) across complex, multi-story environments, highlights the system's capability to represent dynamic, human-aligned environmental understanding suitable for downstream robotic tasks.

AIRoboticsAIComputer Vision
#DM171

SparseDR: Differentiable Rendering of Sparse Signed Distance Fields

Alexey Budak, Albert Garifullin, Vladimir Frolov

We present SparseDR, a novel differentiable rendering algorithm designed for sparse representations based on Signed Distance Fields (SDF). We leverage the Sparse Brick Set representation and propose an adaptation of redistancing for sparse SDFs. This enables SparseDR to surpass existing works in accuracy of surface reconstruction by increasing the effective resolution of the SDF representation. SparseDR is efficiently implemented in C++ and Vulkan, achieving several times shorter reconstruction time than other methods.

AIComputer VisionAIMachine Learning
#DM173

GRAIL: An Agentic AI Architecture for Interactive Grant Proposal Writing

Zhisheng Tang, Mayank Kejriwal

Securing research funding remains fragmented and time-consuming: researchers must navigate separate databases across dozens of agencies while simultaneously drafting competitive proposals. We present GRAIL, a web-based platform that unifies grant discovery and proposal writing through conversational AI. Users describe their research interests in natural language to explore opportunities from a unified index of 11.8K U.S. federal and nonprofit grant opportunities; within the document editor, integrated AI assistance supports real-time proposal revision and refinement. The system runs in any modern browser without installation. Conference attendees are invited to interact with the live system at the demo booth, explore the grant discovery and writing assistance workflows, and provide feedback on the user experience.

AIAgent-based and Multi-agent SystemsAIData MiningAINatural Language ProcessingAISearch
#DM183

Optimizing Spectrogram Resolution and Training Strategies for Real-Time Killer Whale Call Type Classification

Vladislav Naumov, Iaroslav Sheipak, Yuriy Ivanov, Ilya Makarov

Automated identification of killer whale call types from continuous acoustic recordings is essential for scalable population monitoring, yet existing general-purpose frameworks such as ANIMAL-SPOT suffer from suboptimal spectrogram resolution and lack strategies tailored to the spectral-temporal characteristics of killer whale vocalizations.
We identify two key limitations of the ANIMAL-SPOT framework: (1)~no possibility to choose different architecture of CNN backbone, and (2)~the absence of regularization and class-balancing techniques limits generalization across call types with varying abundance.
To address these issues, we propose a framework that combines optimized STFT parameters (FFT size 1024, hop length 172) with label smoothing and targeted oversampling, evaluated across three CNN backbones and five segment lengths.
On a dataset of 12 killer whale vocalization classes from Avacha Gulf, Russia, our best configuration (ResNet-18 with 1200\,ms segments) achieves \textbf{97.1\%} segment-level accuracy, compared to \textbf{96.2\%} for the ANIMAL-SPOT baseline --- a relative error reduction of 23.7\%.
We further present an interactive demonstration system for uploading recordings and obtaining time-resolved call-type predictions, enabling rapid analysis of passive acoustic monitoring data.

Demo video link: https://drive.google.com/file/d/1ITA_52WdnAcyg7zRjIIshutjl6Lp_um1/view?usp=drive_link
Presentation slides: https://drive.google.com/file/d/1sd-JH1R5YPa9iRtsbRO-OZxY8-fyglsO/view?usp=drive_link

AIHumans and AIAIMachine LearningAIMultidisciplinary Topics and Applications
#DM191

RareDASH: A Dynamic Multi-Agent System for Holistic Rare Disease Care

Jialun Zhong, Jiayang Yu, Yanzeng Li, Meng Qin, Lei Zou, Yuqian Wang, Ying Zhang, Hanna Li, Liying Yan, Jie Qiao

Rare diseases are characterized by low prevalence and intricate pathogenesis, leading to highly heterogeneous clinical trajectories. The care of rare disease presents formidable challenges due to the requirement for highly specialized expertise and experiences. Existing methods are typically tailored for isolated rare disease scenarios (e.g., diagnostic tasks, medication recommendations), which lacks a comprehensive perspective of the entire care process. Inspired by recent studies of agent skills, we propose RareDASH, a multi-agent system (MAS) featuring dynamic workflow orchestration designed to provide a comprehensive solution for the full life-cycle of rare disease care. Our framework is inherently patient-centric, enhancing rare disease discovery capabilities through proactive inquiry and information elicitation directly from the patients. Furthermore, we implement diverse agent memory to optimize both the accuracy and efficiency of the multi-agent collaboration. Finally, an online auditing module is integrated into the system to monitor and mitigate the hallucinations, ensuring the reliability of clinical outputs. The work sheds light on the feasibility of leveraging MAS in holistic rare disease care.

AIAgent-based and Multi-agent SystemsAIPlanning and SchedulingAIHumans and AI
#DM194

LLM-to-Map: Transparent Conversational Tool Orchestration for Real-Time Multi-Domain Simulation

Marouane Benbrahim, Kavya Gautam, Zonghan Zhang, Zhiqian Chen

Coupled power-traffic simulations are valuable for studying EV charging stress and outage propagation, but existing tools typically require scripts and opaque configurations. We present LLM-to-Map, a demo system that exposes a real-time multi-domain simulator through structured LLM tool orchestration. Natural-language requests are mapped to typed tool calls that coordinate SUMO traffic simulation, PyPSA power-flow analysis, V2G actions, and map operations. The agent operates over a fixed tool schema and emits auditable execution logs in the UI so users can inspect every action and parameter. Tool execution is deterministic: each request resolves to explicit API calls, and unsafe or destructive actions can require confirmation. The tool layer provides JSON schemas and parameter validation, preventing arbitrary code execution and enabling reproducible runs. The agent receives live system state (loads, EV statistics, V2G status, time, temperature) to support context-aware multi-step commands; when an LLM is unavailable, a deterministic parser preserves the same tool interface. The backend synchronizes cross-domain events (EV charging demand and substation failures) and streams state updates to a 3D map interface. We describe the architecture, coupling and synchronization, and a demo workflow that showcases multi-step scenario control and reporting for non-experts without writing scripts. Demo video: https://youtu.be/CpEgCZPl_2g; code: https://github.com/MarouaneBenbrahim/Map-LLM.

AINatural Language ProcessingAIAgent-based and Multi-agent SystemsAIHumans and AIAIMultidisciplinary Topics and Applications
#DM195

UrbanMix: LLM-Guided Simulation of Mixed Autonomy Traffic with Heterogeneous Behavioral Profiles

Roman Sultimov, Daniil Efimov, Ivan Novikov, Aleksandr Volkov, Yury Maximov

Cities deploying autonomous vehicles face an urgent policy question: would the adoption of autonomous vehicles (AVs) improve the congestion rate or worsen it? What would be the optimal adoption rate to minimize the congestion rate? How would cautious AVs (Waymo-style) and aggressive AVs (Tesla "Mad Max'"-style) interact with human drivers and delivery robots on shared roads? We present UrbanMix, an interactive simulation platform that embeds cognitively diverse agents (human drivers, cautious AVs, aggressive AVs, and delivery robots) with distinct behavioral profiles inside a Simulation of Urban MObility (SUMO) framework of real urban road networks. Our LLM planner operates as an urban policy coordinator, setting traffic rules through a bounded action interface, while a regulation shield enforces infrastructure constraints.

Our experiments reveal three key phenomena: (i) roads throughput may substantially drop with the increase in AVs adoption; (ii) an aggressive cascade, where runtime behavior switching modeling Tesla user-selectable "Mad Max" mode triggers up to 60 times increase in emergency braking events on a real Austin Downtown network; and (iii) a delivery bottleneck, showing 24-36% throughput reduction from slow robots. Results validated on synthetic and real data from Austin, TX, demonstrate that real network topology amplifies cascade effects by more than four times.

AIAgent-based and Multi-agent SystemsAIHumans and AIAIMultidisciplinary Topics and ApplicationsAIData Mining
#DM196

ElderMTL: Multi-Task Affect Monitoring for Elderly Care

Maria Razzhivina, Shahane Tigranyan, Aram Avetisyan, Ilya Makarov, Andrey Savchenko

We present ElderMTL, a multi-task affect monitoring system designed for elderly care settings. The system simultaneously estimates Facial Action Units (FAUs), Valence-Arousal (VA) signals, and categorical emotions (FER) from video, capturing multiple layers of affective information. To improve sensitivity to subtle affective cues common in older adults, our approach incorporates age-conditioned physiological modeling, including baseline muscle adjustments and a dynamic AU co-activation graph. This enables the system to adapt to age-related changes in facial expression patterns, providing more reliable and interpretable emotion assessments. In a live demonstration, we showcase ElderMTL processing video streams, visualizing AU activations, affective state predictions, and interpretable insights that highlight age-specific affective dynamics. This work demonstrates that physiologically grounded, multi-task affective monitoring can provide meaningful, real-world support for elderly care.

AIComputer VisionAIHumans and AIAIMachine Learning
#DM201

Secure Coding Unleashed: Boosting Productivity With On-Premise LLM-Powered IDE Plugins

Vasilii Krikunov, Nikolay Kotlyarov, Eugenii Nikolaev, Vasily Konovalov

The integration of Large Language Model (LLM)-based code assistants into IDEs has boosted developer productivity, but cloud‑based solutions pose severe privacy risks for enterprises handling proprietary code. Existing on‑premise alternatives lack semantic code retrieval, multi‑GPU optimization, and rigorous productivity validation. We present an enterprise‑oriented, customizable IDE plugin that uses internally hosted LLMs, combining client‑side tiered retrieval (Personal CodeRAG with SiamBERT/CodeSage) and vLLM‑optimized inference on dedicated hardware. No code leaves the organization. Our benchmark of four inference frameworks shows vLLM and LMDeploy achieve $\approx$20 ms time‑to‑first‑token on LLaMA‑3‑8B, suitable for real‑time interaction. In a three‑month study with 100 developers, the plugin reduced mean cycle time by 15.4\%, with perceived productivity gains of 50-70\% for tasks like code explanation and unit test generation. This work provides the first systematic evaluation of on‑premise LLM‑assisted coding that quantifies both objective and subjective productivity impacts.

AIAgent-based and Multi-agent SystemsAINatural Language Processing
#DM205

Interactive System for Reducing Error Propagation in Multi-Stage Ancient Egyptian Text Analysis

Maksim Golyadkin, Innokentiy Humonen, Ilya Makarov

We introduce a web tool that puts the full image-to-text pipeline for Ancient Egyptian hieroglyphs inside a single annotation workspace. Instead of only producing a final transcription, our system exposes editable intermediate results so users can validate and correct the pipeline step by step. User edits are stored as image-aligned annotations, which supports both text analysis and dataset creation. Quantitative results indicate improved efficiency and output quality relative to a manual baseline. A demonstration video is available at Google Drive.

AIComputer VisionAINatural Language ProcessingAIMultidisciplinary Topics and Applications
#DM209

Double Bounded Neural Ray Queries

Alexander Nikolaev, Nikolay Mozokhin, Roman Rodionov, Vladimir Frolov

We introduce a novel neural ray tracing method designed for compact scene representations and real-time rendering. To compress the scene with minimal fidelity losses, we address the issue of limiting the search space for intersections. For each scene we first construct two lightweight proxy shells that tightly bound the original surface from inside and outside. While executing ray queries, we intersect the rays with the shells and extract the regions that potentially contain the intersection with the original surface. Extracted regions are passed to the small neural network to retrieve the exact intersection location. We implement our method as part of a GPU-accelerated hybrid path tracing pipeline. We demonstrate it running real-time rendering on a variety of scenes, achieving up to 300x memory reduction and surpassing existing compressed ray tracing techniques in memory-quality trade-off.

AIMachine Learning
#DM210

CrossRefine: A Microservice for Cross-Domain Spatial Super-Resolution

Daniil Sukhorukov, Andrei Zakharov, Ilya Makarov

High-resolution spatial fields are critical for local decision-making, yet many operational and scientific workflows produce coarse outputs due to computational limits. We present CrossRefine, a deployable microservice for cross-domain spatial super-resolution that enhances multi-channel spatial tiles without modifying upstream models. It is built around a unified, topography-conditioned adversarial UNet trained across geographically diverse regions to ensure robustness to heterogeneous terrains and domain shifts. Unlike region-specific enhancement models, the system generalizes across domains within a single architecture, balancing numerical fidelity and structural realism through a hybrid regression–adversarial objective. The service provides REST API endpoints for batch and streaming inference, supports mixed-precision, and offers per-tile diagnostics and confidence maps to promote safe deployment.
In our demo, we show interactive refinement of coarse spatial inputs, side-by-side comparison with interpolation and non-adversarial baselines, and real-time profiling of latency and throughput on commodity hardware. CrossRefine illustrates how spatial super-resolution can be delivered as a practical AI microservice, enabling scalable refinement of existing computational workflows without requiring higher-resolution upstream simulations.

Demonstration video: https://shorturl.at/lz2un

AIComputer VisionAIMultidisciplinary Topics and Applications
#DM211

LELA: An End-to-end LLM-based Entity Linking Framework with Zero-shot Domain Adaptation

Samy Haffoudhi, Nikola Dobričić, Fabian Suchanek, Nils Holzenberger

Entity linking is a key component of many downstream NLP systems, yet existing approaches are often tied to the specific target knowledge bases and domains, limiting their real world application. In this paper, we extend LELA, a modular and domain-agnostic LLM-based entity disambiguation method, into a practical Python library that integrates zero-shot Named Entity Recognition (NER) -- thereby providing a complete end-to-end pipeline for entity-linking in real-world usage. We provide experimental results validating LELA's performance and robustness across diverse entity linking settings. In our demo, users can play with the system on their own input texts. All code is publicly available at https://github.com/dig-team/LELA, and a video is at https://www.youtube.com/watch?v=WdupiRjLbR4.

AINatural Language Processing
#DM216

Sparse ProtoPatient: Interactive Multi-Prototype Explanations for Clinical Diagnosis Prediction

Conor Fallon, Bogdan Kostić, Betty van Aken, Jens-Michalis Papaioannou, Alexei Figueroa, Keno Bressem, Alexander Löser

We present the Sparse ProtoPatient Demo, a publicly available interactive system for interpretable ICD-10 diagnosis prediction from clinical admission notes.
The system is designed for clinicians in training, researchers, and educators exploring prototype-based diagnostic reasoning.

The demo links predictions to learned prototypical patient representations and token-level evidence, allowing users to input custom text or select preset cases, inspect predicted ICD-10 codes, visualize label-wise saliency, retrieve supporting prototype notes, and compare alternative prototype cohorts.
The demo provides a reproducible platform for interactive inspection of prototype-based clinical reasoning, enabling complementary opinion exploration, model auditing, and teaching of interpretable diagnosis prediction.
The deployed model is trained on the publicly released CodiEsp corpus (1000 clinical notes, 955 ICD-10 labels) using a sparse multi-prototype architecture with five prototypes per label.
We use the official machine-translated English CodiEsp-MT release to support English-language interaction.
It achieves a macro-AUROC of 0.92 on a held-out test set and supports real-time interaction (300ms per query).
The system is fully containerized for public research and educational use.

AINatural Language ProcessingAIMultidisciplinary Topics and ApplicationsAIAI Ethics, Trust, FairnesAIHumans and AI
#DM218

DeepL Voice: Real-Time Speech-to-Speech Translation

Johannes Ernesti, Peter Kaiser, Jonas Heinze, Elnaz Shafaei-Bajestan, Kristina Geißler, Weiyue Wang, Johannes Beck, Sascha Brinker, Thorben Finke

DeepL Voice is a real-time speech-to-speech translation system for global business communication, following a pragmatic incremental approach: developing a production-grade cascaded speech-to-speech-translation (S2ST) system, while exploring end-to-end solutions in parallel.
The production system (launched November 2024) achieves competitive transcription quality through proprietary real-time ASR models and eliminates translation "flickering" via stable text streaming while maintaining low latency.
Supporting 18 input languages and 30+ target languages, it offers DeepL Voice for Meetings (Microsoft Teams/Zoom integration), DeepL Voice for Conversations (mobile apps), as well as the DeepL API for Voice.
Key features include customizable formality and glossary support for business-appropriate communication, with voice cloning TTS under development.

AIHumans and AIAINatural Language Processing
#DM220

SteelAgent: An LLM-Orchestrated System for Physics-Informed Steel Property Prediction and Generalization Auditing

Aleksandr Volkov, Roman Sultimov, Mikhail Kuzin, Yury Maximov

Machine learning models for steel property prediction routinely report high-quality metrics with R² > 0.85, yet these results rely on random splits that allow similar grades in both train and test sets. We present SteelAgent, an interactive system that exposes a critical generalization gap: the same models drop from R² > 0.85 to R² = 0.11 on unseen steel families, revealing more than 7 times higher quality degradation. Similarly, conformal prediction coverage degrades from 91% to 38% under distribution shift induced by holding out substantial data sources. SteelAgent combines physics-informed features grounded in classical metallurgy and interpretable models with conformal uncertainty quantification, and an LLM orchestrator that coordinates six domain-specific tools. The system supports property prediction with specification compliance checking, competitive steel comparison, and cost-aware inverse alloy design over 3,741 heat treatment records spanning 1,234 grades. All predictions are traceable through explicit tool calls, ensuring that all physical quantities are computed, not generated. We made the code and data freely accessible to the community.

AIAgent-based and Multi-agent SystemsAIKnowledge Representation and ReasoningAINatural Language ProcessingAIMultidisciplinary Topics and Applications
#DM223

Lucyde: A Demonstrator for Explainable Artificial Intelligence and Interactive Machine Learning

Eda Ismail-Tsaous, Ute Schmid

As AI‑based decision‑support systems become increasingly widespread, methods aimed at improving the performance of human-AI teams are gaining attention. In recent years, explainable artificial intelligence (XAI) has received growing interest, as it provides methods to make the behavior of machine learning models more transparent and can help to identify errors and flaws, which is particularly important in safety critical domains such as medicine and law. However, explanations themselves can be misleading, inconsistent, or incorrect, which makes it essential to raise awareness of the possibilities and limitations of these methods.
We introduce Lucyde, a web‑based demonstrator designed to help users explore, compare, and better understand XAI methods across different datasets, models, and configurations. Lucyde provides a curated collection of explanation techniques, enables side‑by‑side comparison of methods, and offers easy‑to‑understand supplementary information for different user groups. It also illustrates interactive machine learning workflows by allowing users to correct model outputs or explanations. Lucyde thereby fosters informed and reflective engagement with AI systems and their explanations.

AIAI Ethics, Trust, FairnesAIHumans and AIAIMachine Learning
#DM230

From Surface Learning to Deep Understanding: A Grounded AI Tutoring System for Moodle

Anna Ostrowska, Michał Kukla, Gabriela Majstrak, Jan Opala, Sebastian Pergała, Jan Skwarek, Anna Wróblewska

This demo paper describes the development of the AI Teaching & Learning Assistant, a modular Moodle plugin that leverages Retrieval-Augmented Generation (RAG) to deliver high-quality, highly
selective answers strictly grounded in teachers’ materials. The system employs a dual-centric design, providing students with interactive, Socratic-based tutoring options and educators with a “human-in-the-loop” workspace for supervised content generation, e.g., quizzes, tasks, or short summaries. By grounding Large Language Model (LLM) responses in teacher-provided materials, the assistant addresses the risks of misinformation while encouraging deep conceptual mastery. Evaluation via the LLM-as-a-Judge framework and a preliminary user study confirms its effectiveness, achieving faithfulness scores up to 0.97 and a 4.00/5.00 recommendation rate.

Demo Video: https://tinyurl.com/4zz4bcjn

AIMultidisciplinary Topics and ApplicationsAIHumans and AIAINatural Language ProcessingAISearch
#DM231

[COMP25] The Automated Negotiating Agents Competition (ANAC) 2025 Challenges and Results

Reyhan Aydoğan, Tim Baarslag, Tamara C.P. Florijn, Katsuhide Fujita, Catholijn M. Jonker, Yasser Mohammad

This paper presents the primary research challenges and key findings from the 15th International Automated Negotiating Agents Competition (ANAC 2025), one of the official competitions of IJCAI
2025. We focus on two critical domains: multideal negotiations and the development of agents capable of concurrent negotiation within complex supply chain management environments. Furthermore, this work analyzes the results of the competition and outlines strategic directions for future iterations.

AIAgent-based and Multi-agent SystemsAIGame Theory and Economic Paradigms
#DM234

LLARS: Enabling Domain Expert & Developer Collaboration for LLM Prompting, Generation and Evaluation

Philipp Steigerwald, Mara Stieler, Jennifer Burghardt, Eric Rudolph, Jens Albrecht

We demonstrate LLARS, an open-source platform that bridges the gap between domain experts and developers for building LLM-based systems.
LLARS integrates three tightly connected modules into an end-to-end pipeline:
Collaborative Prompt Engineering for real-time co-authoring with version control and instant LLM testing,
Batch Generation for configurable output production across user-selected prompts x models x data with cost control and
Hybrid Evaluation where human and LLM evaluators jointly assess outputs through diverse assessment methods, with live agreement metrics and provenance analysis to identify the best model-prompt combination for a given use case.
New prompts and models are automatically available for batch generation and completed batches can be turned into evaluation scenarios with a single click.
Interviews with six domain experts and three developers in online counselling confirmed that LLARS feels intuitive, saves considerable time by keeping everything in one place and makes interdisciplinary collaboration seamless.
Source code: github.com/th-nuernberg/llars

AIHumans and AIAINatural Language ProcessingAIMultidisciplinary Topics and Applications
#DM237

PERELMAN: Pipeline for Scientific Literature Meta-Analysis

Daniil Sherki, Daniil Merkulov, Aleksandra Savina, Dzhantemir Kikov, Dmitry Parpulov, Alexander Ivanov, Artem Abakumov, Ekaterina Muravleva

We present PERELMAN (PipEline foR sciEntific Literature Meta-ANalysis), an agentic framework designed to extract specific information from a large corpus of scientific articles to support large-scale literature reviews and meta-analyses. Our central goal is to reliably transform heterogeneous article content into a unified, machine-readable representation. PERELMAN first elicits domain knowledge-including target variables, inclusion criteria, units, and normalization rules-through a structured dialogue with a subject-matter expert. This domain knowledge is then reused across multiple stages of the pipeline and guides coordinated agents in extracting evidence from narrative text, tables, and figures, enabling consistent aggregation across studies. In order to assess reproducibility and validate our implementation, we evaluate the system on the task of reproducing the meta-analysis of layered Li-ion cathode properties LiNi0.8Mn0.1Co0.1O2 (NMC811). We describe our solution, which has the potential to reduce the time required to prepare meta-analyses from months to minutes.

AIAgent-based and Multi-agent Systems
#DM239

An Automated Maintenance Plant for Highways

N'zebo Richard Anvo, Alwyn Mathew, Lavindra de Silva, Damian Palin, Jie Xu, Samuel Schaefer, Abir Al-Tabbaa, Fumiya Iida, Ioannis Brilakis

The Digital Roads project at Cambridge University is leveraging digitalisation, automation, and low-carbon materials to build an Automated Maintenance Plant (AMP) for UK road networks, aimed at minimising repair times to reduce congestion, improving safety, and contributing to the UK’s net-zero goals through faster, more accurate, and efficient road maintenance.

AIRoboticsAIMultidisciplinary Topics and Applications
#DM249

GEV: Statically Correct and Programmable Knowledge Graph Updates

Eduard Kamburjan, Shqiponja Ahmetaj, Chinmayi Prabhu Prasad Baramashetru, Paolo Pareti

Knowledge Graphs (KGs) evolve over time and it is critical to ensure that their integrity constraints are maintained after each update.
We introduce GEV, the first tool to statically ensure that a KG update in Java preserves satisfaction of SHACL constraints. This allows verification of updates at design time, and eliminates the need for costly continuous revalidation.
GEV is a command-line system that loads and verifies updates, applies them to a loaded KG, and keeps track of the validation status. Internally, it relies on SHACL graph updates, a theoretical framework with a method for static verification.

AIKnowledge Representation and Reasoning

Wavelength.AI: Extending the Collaborative Game Wavelength as a Testbed for Studying Shared Understanding in Human–Agent Collaboration

SoilNet App: AI-Assisted Expert-level Annotations of Soil Horizons

A Resilient Solution for Sewer Overflow Monitoring Across Cloud and Edge

Neuro-Symbolic Logical Reasoning with Textual Entailment

RUVA: Personalized Transparent On-Device Graph Reasoning

MC-RAG System: A Structure-Driven RAG System for Multi-Constraint Queries

vSpeedUI: Turning Past GUI Experience into Fast Executable Plans

eNNcode: Optimization-Based Analysis of Neural Networks

AI-Powered Interactive Multimodal Digital Book & Online Shop For Blind and Visually Impaired Users

Anti-Slavery Intelligence (ASI): An AI-Powered Tool for Modern Slavery Compliance Analysis and Remediation

Visualizing Deep Agents in Long-Horizon Tasks: Towards Explainable and Trustworthy Agentic AI

DeepMed Search: An Open-Source Agentic Platform for Medical Deep Research with Introspective Verification

Intent Hub: A Self-Healing Semantic Agent Routing System for Resolving Overlap in Agentic Systems

A Scalable Cross-Domain Event Extraction System via a Unified Generative Training Framework

RoboVineSim: A Simulation Tool for Human-Robot Collaboration in Vineyard Harvesting

ADP-MA: An Interactive System for Autonomous Data Processing using Meta-Agents

Making Weak Supervision Interactive: Exploring Transfer from Sound Libraries to Passive Acoustic Monitoring Data

AwakeForest: An Interactive Geospatial Platform for Large-Scale Forest Imagery

Low-Latency Real-Time Audio Game Commentary System via LLM-Based Parallel Text Generation

DataCube: A Video Retrieval Platform via Natural Language Semantic Profiling

Visualizing and Interacting with Model Representation Space for Human-Centric Active Learning

PEACE 2.0: Grounded Explanations and Counter-Speech for Combating Hate Expressions

DeepLog: A Software Framework for Modular Neurosymbolic AI

A Privacy-Preserving Intelligent Assistant for Clinical Psychology Practice

Looking at Your Photo, What Comes to Mind? Personalized Memory Internalization for Dementia Reminiscence

TraceBrain: An Open-Source Framework for Agentic Trace Management

ORBIT: Optimal Recommendation Framework for Boarding with Interpretable Timelines

Interactive Open-Set Semantic Mapping with a 3D Scene Graph Backend

SparseDR: Differentiable Rendering of Sparse Signed Distance Fields

GRAIL: An Agentic AI Architecture for Interactive Grant Proposal Writing

Optimizing Spectrogram Resolution and Training Strategies for Real-Time Killer Whale Call Type Classification

RareDASH: A Dynamic Multi-Agent System for Holistic Rare Disease Care

LLM-to-Map: Transparent Conversational Tool Orchestration for Real-Time Multi-Domain Simulation

UrbanMix: LLM-Guided Simulation of Mixed Autonomy Traffic with Heterogeneous Behavioral Profiles

ElderMTL: Multi-Task Affect Monitoring for Elderly Care

Secure Coding Unleashed: Boosting Productivity With On-Premise LLM-Powered IDE Plugins

Interactive System for Reducing Error Propagation in Multi-Stage Ancient Egyptian Text Analysis

Double Bounded Neural Ray Queries

CrossRefine: A Microservice for Cross-Domain Spatial Super-Resolution

LELA: An End-to-end LLM-based Entity Linking Framework with Zero-shot Domain Adaptation

Sparse ProtoPatient: Interactive Multi-Prototype Explanations for Clinical Diagnosis Prediction

DeepL Voice: Real-Time Speech-to-Speech Translation

SteelAgent: An LLM-Orchestrated System for Physics-Informed Steel Property Prediction and Generalization Auditing

Lucyde: A Demonstrator for Explainable Artificial Intelligence and Interactive Machine Learning

From Surface Learning to Deep Understanding: A Grounded AI Tutoring System for Moodle

[COMP25] The Automated Negotiating Agents Competition (ANAC) 2025 Challenges and Results

LLARS: Enabling Domain Expert & Developer Collaboration for LLM Prompting, Generation and Evaluation

PERELMAN: Pipeline for Scientific Literature Meta-Analysis

An Automated Maintenance Plant for Highways

GEV: Statically Correct and Programmable Knowledge Graph Updates