Accepted Papers

IJCAI-ECAI 2026 Accepted Papers · Demonstrations Track

Presentation format

Every accepted paper is presented in two formats: an oral talk — which must be delivered in person in Bremen by one of the authors — and a poster during a dedicated poster session.

50 of 50 shown
  1. #DM33

    Wavelength.AI: Extending the Collaborative Game Wavelength as a Testbed for Studying Shared Understanding in Human–Agent Collaboration

    Katelyn Morrison, Gabriel Gonzalez, Zahra Ashktorab, Matt Riemer, Andrew Anderson, Djallel Bouneffouf, Justin Weisz
    AI's increasing role as a personal agent assisting knowledge workers in everyday tasks underscores the need to investigate how to help human–agent teams build a shared understanding. We extend the collaborative "mind-reading" game Wavelength to include an AI teammate, presenting the first demonstration of an LLM capable of playing this game. Based on our agent–agent play experiments, we developed Wavelength.AI, which implements two strategies to support shared understanding: an initial team grounding conversation and post-game reflective explanations. We interpret higher team scores as evidence for better shared understanding in a preliminary user study with 24 human–AI teams. Our findings reveal that Wavelength.AI can help researchers evaluate and design different strategies to shape human-agent teams' shared understanding. Human players can see if they are on the same wavelength with AI today at https://play-wavelength-ai.com.
  2. #DM67

    SoilNet App: AI-Assisted Expert-level Annotations of Soil Horizons

    Vipin Singh, Joey Pruessing, Teodor Chiaburu, Einar Eberhardt, Sina Hesse, Stefan Broda, Frank Haußer, Felix Biessmann
    Precise descriptions of soil horizons are required for policy makers, agriculture and many applications in civil engineering. Up to date correct soil horizon annotations require human experts as they follow complex hierarchical taxonomies. We present the SoilNet App, a web-based demonstrator that guides experts through relevant tasks for expert-level soil horizon annotations from soil profile images. To demonstrate the reliability of the SoilNet app we present results of a user study with soil horizon annotation experts, which highlights the difficulty of image-only-based annotation and suggests that collaborating with our model not only increases expert performance but also improves inter-annotator consistency. Our app is publicly accessible (https://soilnet.demo.calgo-lab.de).
    AIMultidisciplinary Topics and ApplicationsAIHumans and AIAIComputer Vision
  3. #DM68

    A Resilient Solution for Sewer Overflow Monitoring Across Cloud and Edge

    Vipin Singh, Tianheng Ling, Peter Ghaly, Felix Grimmeisen, Gregor Schiele, Felix Biessmann
    Aging combined sewer systems in many historical cities are increasingly stressed by extreme rainfall events, which can trigger combined sewer overflows (CSO) with significant environmental and public health impacts. Forecasting the filling dynamics of overflow basins is critical for anticipating capacity exceedance and enabling timely preventive actions for CSO. We present a web-based demonstrator that integrates Deep Learning forecasting methods in both cloud and edge settings into an interactive monitoring dashboard for overflow monitoring, resilient to network outages.
    AIMultidisciplinary Topics and ApplicationsAIPlanning and SchedulingAIHumans and AIAIAI Ethics, Trust, Fairnes
  4. #DM76

    Neuro-Symbolic Logical Reasoning with Textual Entailment

    Zacchary Sadeddine, Fabian M. Suchanek
    Large Language Models can use logical deduction to answer natural language questions, but they remain black-boxes with potentially erroneous chains-of-thought. In this paper, we adapt VANESSA, a neuro-symbolic method for chain-of-thought verification, to reasoning-based question answering. VANESSA combines a logical reasoner with a neural textual entailment model to handle phrasing variations. Building on VANESSA, we develop a transparent, logic-based approach to answer natural language questions even with phrase variations. Our experiments across various datasets show our method is competitive with the state of the art, while also delivering proof trees for its answers. A demo interface allows users to interact with the system.
    AINatural Language ProcessingAIKnowledge Representation and Reasoning
  5. #DM80

    RUVA: Personalized Transparent On-Device Graph Reasoning

    Gabriele Conte, Alessio Mattiace, Gianni Carmosino, Potito Aghilar, Giovanni Servedio, Francesco Musicco, Vito Walter Anelli, Tommaso Di Noia, Francesco Donini
    The Personal AI landscape is currently dominated by "Black Box" Retrieval-Augmented Generation. While standard vector databases offer statistical matching, they suffer from a fundamental lack of accountability: when an AI hallucinates or retrieves sensitive data, the user cannot inspect the cause nor correct the error. Worse, "deleting" a concept from a vector space is mathematically imprecise, leaving behind probabilistic "ghosts" that violate true privacy. We propose Ruva, the first "Glass Box" architecture designed for Human-in-the-Loop Memory Curation.
    Ruva grounds Personal AI in a Personal Knowledge Graph, enabling users to inspect what the AI knows and to perform precise redaction of specific facts. By shifting the paradigm from Vector Matching to Graph Reasoning, Ruva ensures the "Right to be Forgotten." Users are the editors of their own lives; Ruva hands them the pen. The project and the demo video are available at http://sisinf00.poliba.it/ruva/.
  6. #DM81

    MC-RAG System: A Structure-Driven RAG System for Multi-Constraint Queries

    Xiao Zhang, Yang Wan, Yi Li, Miao Xie, Chunli Lv
    Retrieval-Augmented Generation (RAG) systems are widely adopted in question answering, yet they often fail to satisfy complex multi-constraint queries, leading to constraint violations, factual inconsistencies, or hallucinations. We present Structure-Driven RAG System for Multi-Constraint Queries(MC-RAG), a structure-driven RAG system that reformulates retrieval as a subgraph matching problem over a knowledge graph. By integrating semantic and structural embeddings with path-level indexing, MC-RAG performs interpretable, structure-aware, and constraint-consistent retrieval and generation. During the demonstration, participants can input medical or encyclopedic multi-constraint queries, visualize how the system parses constraints, performs structural matching, and generates answers, thereby experiencing an end-to-end, interactive, and explainable RAG pipeline.
    A demo video is available at https://youtu.be/J8kahzmAnu0.
    AINatural Language ProcessingAIKnowledge Representation and Reasoning
  7. #DM84

    vSpeedUI: Turning Past GUI Experience into Fast Executable Plans

    Xiaohan Zheng, Yihong Chen, Haiquan Qiu, Quanming Yao
    LLM-based mobile GUI agents usually invoke large models for nearly every micro-action, making real-device automation slow even when similar workflows have been completed before. We present vSpeedUI, a public demo system that turns past GUI experience into fast executable plans. It organizes historical trajectories into an Executable Experience Graph (EXG), where UI states are connected by Semantic Step Summaries with explicit preconditions. At task initialization, vSpeedUI performs Global Look-ahead Planning to retrieve, validate, and rank candidate transitions into a pre-verified plan. During execution, the agent uses lightweight graph traversal with state localization, target adaptation, and fallback when needed. On HarmonyOS, vSpeedUI reduces LLM latency and total task time while maintaining strong success rates, showing a practical route toward data-efficient GUI automation. Code is available at: https://github.com/LARS-research/vSpeedUI.
    AIAgent-based and Multi-agent SystemsAIPlanning and SchedulingAIKnowledge Representation and ReasoningAIMachine Learning
  8. #DM87

    eNNcode: Optimization-Based Analysis of Neural Networks

    Muhammad Atallah, Lukas Dankwart, Daniel Neider, Mustafa Yalciner
    The increasing use of neural networks (NNs) in high-stakes decision-making requires rigorous analysis to ensure safety, fairness, and explainability.
    Formal verification tools for neural networks typically focus on determining the satisfiability of properties such as safety or fairness.
    However, many fairness and explainability tasks go beyond satisfiability and instead rely on arbitrary optimization objectives.
    To address such problems, neural networks are often encoded as mixed-integer linear optimization (MILO) problems with linear objectives.
    In practice, these encodings are usually implemented in an ad-hoc manner, limiting comparability across works, reducing transparency, and increasing implementation effort.
    We address this gap by introducing eNNcode, a user-friendly PyPI library that converts any piecewise-linear neural network in Open Neural Network Exchange (ONNX) format into a MILO instance.
    The library supports arbitrary constraints on input and output nodes, as well as user-defined optimization objectives.
    Our experiments show that eNNcode achieves performance comparable to existing libraries, despite its simplicity and ease of use.
    Overall, eNNcode facilitates reproducible and standardized optimization-based analysis of neural networks.
    AIConstraint Satisfaction and Optimization
  9. #DM99

    AI-Powered Interactive Multimodal Digital Book & Online Shop For Blind and Visually Impaired Users

    Mazen Salous, Daniel Westphal, Wilko Heuen, Ayoub Ben Dhiab, Charles Hudin, Sabrina Paneels, Susanne Boll, Larbi Abdenebaoui
    We present an AI-powered interactive multimodal system that enriches digital image accessibility for blind and visually impaired (BVI) users. Our demonstration showcases two application domains: (1) an educational digital book, and (2) an online shopping interface. In both use-cases, users can virtually feel material textures (leather, wood, etc.) and engage in voice-driven inquiry about images. The system integrates state-of-the-art AI components – including voice-to-voice conversational agents, vision models for object segmentation, and a custom 16-actuator vibrotactile display – to provide multimodal feedback (haptic vibrations, spoken descriptions, and audio cues).
    The result is an inclusive technology with significant societal benefit, empowering BVI users to learn and shop more independently through natural multimodal interactions.
    AIHumans and AIAIComputer VisionAIMachine Learning
  10. #DM100

    Anti-Slavery Intelligence (ASI): An AI-Powered Tool for Modern Slavery Compliance Analysis and Remediation

    Mahmoud Gad, Abdessalam Elhabbash, Steven Young
    We present Anti-Slavery Intelligence (ASI), a deployed AI system that analyses corporate modern slavery statements to identify compliance gaps and generate prioritised remediation advice. ASI orchestrates a multi-model pipeline: Gemini-2.5-Flash for vision-enhanced PDF parsing and structured compliance scoring against 48 expert-defined criteria, and Gemini-2.5-Pro for synthesising company-specific, timeline-based recommendations. A benchmarking engine compares each statement against industry and FTSE index averages across 11 sectors. Evaluation on 95 expert-annotated statements yields an F1-score of 0.86 and recall of 0.94, reducing time to generate an initial compliance assessment from several hours to approximately five minutes per document. ASI is publicly available at https://www.antislaveryintelligence.co.uk/ with free access for academics and NGOs. A demonstration video is at https://youtu.be/DNhdEGrRwzg.
    AIMultidisciplinary Topics and ApplicationsAINatural Language ProcessingAIHumans and AI
  11. #DM105

    Visualizing Deep Agents in Long-Horizon Tasks: Towards Explainable and Trustworthy Agentic AI

    Amirkia Rafiei Oskooei, Mehmet S. Aktas
    The transition from prompt-based Large Language Models (LLMs) to autonomous Deep Agents has enabled the automation of long-horizon tasks. However, as these agents adopt hierarchical architectures with nested tool usage, they suffer from significant opacity. Existing linear tracing tools fail to capture the multi-dimensional complexity of parallel sub-agent execution, hindering both debugging and user trust. We propose a general-purpose observability framework that decomposes agent execution into four distinct visualization dimensions: Temporal, Cognitive, Hierarchical, and Spatial. We validate this framework through RepoLearn, an open-source workbench for automated codebase comprehension. Our user study demonstrates that this multi-dimensional approach reduces the Time-to-Insight (TTI) for complex behavioral analysis by 56% and significantly lowers cognitive load (NASA-TLX) compared to state-of-the-art linear traces. The source code is available at https://github.com/amirkiarafiei/repo-learn and the demo at https://www.youtube.com/watch?v=s3U6E9o94gk.
    AIAgent-based and Multi-agent SystemsAIHumans and AIAIAI Ethics, Trust, Fairnes
  12. #DM110

    DeepMed Search: An Open-Source Agentic Platform for Medical Deep Research with Introspective Verification

    Maolin Liu, Fanyu Xu, xu ruoqing, JiaHang Zhang, Hao Wang, Rui Wang
    Navigating the deluge of heterogeneous medical data, from academic literature (PubMed) to clinical guidelines (Web) and private knowledge bases remains a critical bottleneck for evidence-based medicine. While commercial black-box tools lack transparency, standard open-source RAG implementations frequently suffer from ``reasoning drift'' when handling complex, long-tail queries. We present DeepMed Search, a fully open-source, agentic platform designed for transparent medical deep research. Built on a high-performance Next.js architecture, DeepMed Search features a source-adaptive router that autonomously dispatches sub-queries to PubMed, web search, or local graph-based knowledge bases based on information density. Crucially, the platform integrates an introspective verification module, powered by a causal-consistent multi-agent debate framework, to validate retrieved evidence against diagnostic logic before synthesis. To demonstrate its robustness, we showcase DeepMed Search's ability to autonomously decompose high-difficulty rare disease queries, filter out confounding noise, and generate structured, citation-backed research reports in minutes. By open-sourcing this software, we provide the community with a robust infrastructure to democratize access to trustworthy, glass-box medical reasoning at a commercial-grade performance level, which is publicly available at: https://www.deepmedsearch.cloud and the demonstration video is available at: https://youtu.be/4U4aok8yLpk.
    AIAgent-based and Multi-agent SystemsAIAI Ethics, Trust, Fairnes
  13. #DM113

    Intent Hub: A Self-Healing Semantic Agent Routing System for Resolving Overlap in Agentic Systems

    Chenrui Liang, Peng Xu, Xinyuan Liu
    Semantic overlap poses a fundamental challenge to accurate agent routing in large-scale agentic systems. We present Intent Hub, a self-healing semantic agent routing system that combines offline semantic diagnosis with an online Dual Filtering Mechanism. By leveraging LLM-generated augmentative positive and adversarial negative utterances, Intent Hub constructs explicit decision boundaries and enables interpretable, millisecond-level routing under high semantic overlap. Intent Hub further supports interactive semantic debugging, allowing developers to visually diagnose conflicts, repair routing rules, and immediately observe changes in online agent routing.
  14. #DM114

    A Scalable Cross-Domain Event Extraction System via a Unified Generative Training Framework

    Siting Liang, Omar Adjali, Bhatti Omair, Daniel Sonntag
    Event extraction is fundamental to information extraction. Prior approaches often separate event detection and argument extraction or depend on dataset-specific designs, limiting scalability and cross-domain generalization. We propose a unified generative, sequence-to-sequence framework that performs all event extraction subtasks jointly and supports both end-to-end and pipeline configurations. We fine-tune pre-trained language models on multiple event datasets across diverse domains, enabling a single model to retain domain-specific semantics while generalizing over large, evolving label spaces. Cross-domain experiments show strong, robust performance across datasets, demonstrating a scalable solution for real-world event extraction. We demonstrate these capabilities through a web-based application tailored for researchers and practitioners. The platform supports inspection of different configurations and facilitates cross-domain comparisons.
  15. #DM118

    RoboVineSim: A Simulation Tool for Human-Robot Collaboration in Vineyard Harvesting

    Dimitrios Troullinos, Maria Nuria Conejero, Filippo Bistaffa, Jose Maria Bengochea-Guevara, Ángela Ribeiro, Juan Rodriguez-Aguilar
    In agricultural tasks, manual grape harvesting remains a labor-intensive activity facing challenges of efficiency, labor shortages, and sustainability. To this end, tailor-made robotic systems have been designed with the capabilities to transfer heavy boxes, navigate vineyard terrains, communicate, accurately locate, and safely interact with humans. The introduction of collaborative robotic fleets alongside human workers in large-scale vineyard harvesting effectively presents a Multi-Robot Task Allocation (MRTA) problem, where the real-world domain possesses characteristics that, when combined, pose a challenging research endeavor and align with open issues in MRTA research. Here, we present RoboVineSim, a simulation tool that can capture any vineyard area using geographical data and model the behavior of humans and robots in the environment. In addition, we have established the necessary mechanisms to facilitate the development of novel MRTA methods in this domain.
  16. #DM121

    ADP-MA: An Interactive System for Autonomous Data Processing using Meta-Agents

    Udayan Khurana
    We demonstrate ADP-MA (Autonomous Data Processing using Meta-Agents), a system that autonomously solves a complex and diverse set of data processing tasks. Three domain-agnostic meta-agents coordinate task-specific ground agents through a multi-stage pipeline: data understanding, planning, critique, expansion, execution, and finalization. Errors are caught early via progressive sampling on small data subsets before running on full data. The system supports three execution strategies, twelve domain knowledge packs, and confidence-based early stopping. An interactive web interface lets users watch pipelines being built in real time, replay completed runs at any stage, and compare results across cases. On four benchmarks, ADP-MA reaches 90.6% on DSEval, 44.8% on KramaBench, 50.0% on DA-Code, and 70.0% on AgentBench, outperforming published single-agent baselines.
    AIAgent-based and Multi-agent SystemsAIHumans and AI
  17. #DM122

    Making Weak Supervision Interactive: Exploring Transfer from Sound Libraries to Passive Acoustic Monitoring Data

    Novruz Mammadli, Rida Saghir, Kanwar Ammar Ali, Prathmesh Doddanawar, Thiago S. Gouvêa, Daniel Sonntag
    Passive Acoustic Monitoring (PAM), an increasingly popular method for wildlife monitoring, generates large volumes of data whose analysis depends on instance-level annotations that are costly to obtain. Archival sound collections provide weak labels that lack temporal localisation. In prior work, we demonstrated that Multiple Instance Learning (MIL) can extract approximate event locations from weakly labelled PAM data, suggesting it may be applied to sound collection data.
    This demo operationalizes that approach within an interactive workflow that connects weakly annotated sound collections to downstream PAM deployment. The system supports configurable MIL-based localisation, lightweight interactive refinement, and transfer to an independent PAM dataset.
    We carried out a preliminary evaluation with an actual sound library from a museum collection and a benchmark PAM dataset. Results confirm that weakly annotated sound collections can serve as a viable training signal for downstream PAM detection and illustrate differences between alternative MIL instantiations under real transfer conditions. (Video available at https://cst.dfki.de/projects-weak-supervision-demo)
    AIMachine LearningAIMultidisciplinary Topics and ApplicationsAIHumans and AI
  18. #DM124

    AwakeForest: An Interactive Geospatial Platform for Large-Scale Forest Imagery

    Suraj Prasai, Kangning Cui, Rongkun Zhu, Sarra Alqahtani, Ying Zhang, Victor Paúl Pauca, Miles R. Silman, Fan Yang
    Forest imagery analysis often involves multiple tightly coupled vision tasks, which must be performed under substantial variation in geographic regions, sensors, and acquisition conditions. However, practitioners often lack a unified tool that is geospatial-native, cloud-optimized, and ML-integrated for end-to-end workflows spanning annotation, prediction, visualization, and downstream analysis at scale. We present AwakeForest, an interactive end-to-end platform designed for large-scale forest imagery that integrates model-assisted inference, automatic annotation, and human-in-the-loop refinement within a single workflow. Our platform supports plug-and-play integration of pretrained models and enables scalable interaction with forest imagery ranging from standard aerial scenes to large orthomosaics that can span several gigabytes to hundreds of gigabytes. AwakeForest produces analysis-ready outputs that can be directly used for downstream analysis and to support iterative model and annotation updates on new scenes. We demonstrate the system on the PALMS dataset and illustrate how AwakeForest supports an end-to-end workflow for practical forest management and analysis.
    AIComputer VisionAIMultidisciplinary Topics and ApplicationsAINatural Language ProcessingAIKnowledge Representation and Reasoning
  19. #DM125

    Low-Latency Real-Time Audio Game Commentary System via LLM-based Parallel Text Generation

    Ryota Kawamatsu, Anum Afzal, Yuki Saito, Shinnosuke Takamichi, Graham Neubig, Katsuhito Sudoh, Hiroya Takamura, Tatsuya Ishigaki
    We present a low-latency real-time audio game commentary system that generates spoken commentary directly from live gameplay video. In this end-to-end setting, a key bottleneck is accumulated waiting time; conventional pipelines capture frames, generate text, and synthesize speech sequentially for each utterance, and do not request the next generation until speech playback has completed. This strict sequentiality causes long and unnatural silence between utterances. To address this latency bottleneck, our system runs text generation in parallel with speech playback and buffers multiple candidate utterances ahead of time, enabling immediate synthesis at playback boundaries. Experiments on fast-paced game videos show that our parallel design reduces the mean inter-utterance silence from 9.6 seconds to 0.3 seconds compared to sequential baselines. It also improves similarity to professional speaking--silence timing patterns by over 40 %, and a user study with 120 experienced game players confirms significantly improved perceived speaking rhythm. Our demo video is available at: https://youtu.be/pmrRUlvav8M.
  20. #DM130

    DataCube: A Video Retrieval Platform via Natural Language Semantic Profiling

    Yiming Ju, Hanyu Zhao, Quanyue Ma, Donglin Hao, Chenwei Wu, Ming Li, Songjing Wang, Tengfei Pan
    Large-scale video repositories are increasingly available for modern video understanding and generation tasks. However, transforming raw videos into high-quality, task-specific datasets remains costly and inefficient. We present DataCube, an intelligent platform for automatic video processing, multi-dimensional profiling, and query-driven retrieval. DataCube constructs structured semantic representations of video clips and supports hybrid retrieval with neural re-ranking and deep semantic matching. Through an interactive web interface, users can efficiently construct customized video subsets from massive repositories for training, analysis, and evaluation, and build searchable systems over their own private video collections. The system is publicly accessible at https://datacube.baai.ac.cn/. Demo Video: https://youtu.be/L7bKfPBm2tU
    AIComputer VisionAISearchAIData MiningAINatural Language Processing
  21. #DM134

    Visualizing and Interacting with Model Representation Space for Human-Centric Active Learning

    Rida Saghir, Thiago S. Gouvêa, Daniel Sonntag
    Active learning reduces annotation effort by selecting informative samples, yet most approaches remain model-driven, offering users little control over training or support for understanding model behaviour. Human-centric active learning brings users further into the loop by introducing additional points of interaction, particularly in the sample selection process. However, such systems are typically demonstrated using fixed feature projections or visualizations of shallow classifier outputs. We present a representation-centric active learning tool in which interaction takes place directly within the model’s representation space. By operating in the same space the model uses for decision making, the interface supports the co-evolution of representations and user understanding. We additionally report initial qualitative (think-aloud) and quantitative findings from a pilot study, illustrating that such representation-centric frameworks can achieve comparable performance to standard baselines while fostering improved human–model collaboration. (Video and the code available at \url{https://cst.dfki.de/demo-interacting-model-space})
    AIHumans and AIAIMachine Learning
  22. #DM137

    PEACE 2.0: Grounded Explanations and Counter-Speech for Combating Hate Expressions

    Greta Damo, Stéphane Petiot, Elena Cabrio, Serena Villata
    The increasing volume of hate speech on online platforms poses significant societal challenges. While the Natural Language Processing community has developed effective methods to automatically detect the presence of hate speech, responses to it, called counter-speech, are still an open challenge. We present PEACE 2.0, a novel tool that, besides analysing and explaining why a message is considered hateful or not, also generates a response to it. More specifically, PEACE 2.0 has three main new functionalities: leveraging a Retrieval-Augmented Generation (RAG) pipeline i) to ground HS explanations into evidence and facts, ii) to automatically generate evidence-grounded counter-speech, and iii) exploring the characteristics of counter-speech replies.
    By integrating these capabilities, PEACE 2.0 enables in-depth analysis and response generation for both explicit and implicit hateful messages.
    AINatural Language ProcessingAIMultidisciplinary Topics and ApplicationsAIAI Ethics, Trust, Fairnes
  23. #DM138

    DeepLog: A Software Framework for Modular Neurosymbolic AI

    Robin Manhaeve, Stefano Colamonaco, Vincent Derkinderen, Rik Adriaensen, Lucas Van Praet, Luc De Raedt, Giuseppe Marra
    DeepLog is an operational neurosymbolic framework that unifies logic and deep learning within standard PyTorch workflows. While existing neurosymbolic systems focus on a particular paradigm and semantics, DeepLog serves as a universal backend that can emulate many systems in the neurosymbolic alphabet soup. By treating diverse neurosymbolic languages as high-level specifications, the DeepLog software automatically compiles them into optimized arithmetic circuits. This design lowers the barrier for machine learning practitioners by treating logic as composable modules, while providing neurosymbolic developers with a shared, high-performance basis for prototyping new integration strategies.
    The video is available here: https://youtu.be/CJAQJeaTWB0
  24. #DM143

    A Privacy-Preserving Intelligent Assistant for Clinical Psychology Practice

    Aaron Pico, Joaquin Taverner, Emilio Vivancos, Ana Garcia-Fornes, Vicent Botti
    This paper describes a fully local, privacy-preserving intelligent system designed to assist in clinical psychology practice. The system automatically transcribes therapy sessions performing speaker attribution. Beyond transcription, the tool enhances clinical reasoning by detecting cognitive distortions and emotional patterns utilizing specialized deep learning classifiers and Large Language Models (LLMs). By guiding an LLM locally through a multi-step analysis process, the assistant synthesizes the enriched data and generates a series of analysis reports and clinical documentation of the session. As a result, the assistant reduces the administrative burden on professionals while preserving privacy with an edge computing approach in which the data never leaves the therapist's device. Finally, the assistant uses human-in-the-loop validation so that the professional always remains in control, ensuring clinical accuracy and trust.
    AIHumans and AIAINatural Language ProcessingAIMultidisciplinary Topics and ApplicationsAIAI Ethics, Trust, Fairnes
  25. #DM149

    Looking at Your Photo, What Comes to Mind? Personalized Memory Internalization for Dementia Reminiscence

    Shunjie Wen, Kyung-Hwan Lee, GiMoon Lee, Seongsoo Heo, Jaeyeon Lee, Sangyeob Shin, Gyuwon Moon, Jiwoong Kim, Dong-Wan Choi
    We present PhoMi, an interactive recall assistant that supports dementia reminiscence by engaging users with personal photographs captured via a live camera interface. Given a photograph, PhoMi delivers spoken questions and receives spoken responses, creating an accessible reminiscence setting with reduced reliance of human therapists. Over repeated sessions, user responses are incorporated into lightweight adapters of a vision–language model, enabling progressively personalized question generation without reprocessing prior interaction logs. PhoMi serves as a prototype toward scalable, lifelong AI companions for dementia reminiscence.
  26. #DM159

    TraceBrain: An Open-Source Framework for Agentic Trace Management

    Quy Minh Le, Oscar Cao, Hoang Quoc Viet Pham, Hoang Thanh Lam, Hoang D. Nguyen
    As Large Language Model (LLM) agents scale toward real-world deployment, they generate large volumes of fragmented, non-standardized execution traces. Many existing observability platforms treat these traces primarily as passive logging artifacts, lacking the unified infrastructure to operationalize them for active governance and agent adaptation across heterogeneous single-agent and multi-agent workflows. To address this gap, we introduce TraceBrain, an open-source infrastructure for autonomous agent trace management. TraceBrain adopts a framework-agnostic architecture built on a delta-based OpenTelemetry (OTLP) schema, which mitigates context explosion and supports on-demand reconstruction of long-horizon execution trajectories. For runtime governance, TraceBrain implements uncertainty-driven supervision, where an internal Trace Evaluator prioritizes ambiguous trajectories for human review, thereby reducing manual annotation workload. Moving beyond passive observation, the platform incorporates a hybrid semantic-lexical retrieval engine that combines dense vector similarity and exact keyword matching for operational memory retrieval. Furthermore, an automated curriculum mechanism continuously synthesizes failure patterns into structured training artifacts. Empirical evaluations demonstrate a ~100x reduction in storage overhead together with high precision in uncertainty-guided trace supervision. Ultimately, TraceBrain transforms the execution history into a reusable operational memory substrate, bridging runtime observability with retrieval-driven agent adaptation. The system is publicly available at https://github.com/ToolBrain/TraceBrain.
    AIAgent-based and Multi-agent SystemsAIKnowledge Representation and ReasoningAISearchAIUncertainty in AI
  27. #DM165

    ORBIT: Optimal Recommendation Framework for Boarding with Interpretable Timelines

    Joonseong Kang, Jaehun Bang, Seung Ha Hwang, Jiyoung Ko, Subeen Park, Jeffrey Gennari
    Determining when to leave for the airport is a complex problem shaped by flight delay risk, traffic, weather, airport congestion, and passenger-specific constraints. Existing services rely on isolated delay estimates or simple travel time calculations, failing to capture real-time context. We propose ORBIT, a decision-support system that generates personalized leave-by recommendations by integrating predictive models with real-time operational signals. The system combines user input normalization, real-time data acquisition, Transformer-based delay prediction, and LLM-based reasoning to jointly account for statistical forecasts and dynamic factors like congestion, previous-leg propagation, weather, traffic, and airport processing times. Instead of a standalone delay estimate, ORBIT produces an actionable and interpretable departure plan. We implement ORBIT as an interactive system and demonstrate its applicability. A video demo is available at https://youtu.be/fZX42jceIM4.
  28. #DM170

    Interactive Open-Set Semantic Mapping with a 3D Scene Graph Backend

    Felix Igelbrink, Lennart Niecksch, Martin Günther, Marian Renz, Oscar Lima, Martin Atzmueller
    While Open-Set Semantic Mapping and 3D Semantic Scene Graphs (3DSSGs) have become established paradigms in robotic perception in recent years, most existing works are limited to small environments or sacrifice geometric detail and instance granularity for scalability. Deploying these systems at scale for large multi-room environments remains a major challenge due to the computational overhead of high-dimensional feature integration and the maintenance of the 3DSSG structure. In this paper, we demonstrate a modular mapping architecture that establishes 3D Semantic Scene Graphs (3DSSGs) as its foundational backend. Unlike approaches that generate scene graphs as a post-processing step, our system maintains the graph as the primary, incrementally updated knowledge representation. Our architecture is optimized for GPU-accelerated operations, enabling the dense representation of extensive environments containing thousands of unique object instances, supporting open-vocabulary queries via CLIP features without requiring any additional post-processing steps. In this live demonstration, we showcase our pipeline processing large-scale data from the Habitat Matterport 3D (HM3D) dataset as well as live data collected from a handheld device. Attendees will interact with the generated maps by performing real-time, open-set queries (e.g., “find the vintage wooden chair”) across complex, multi-story environments, highlights the system's capability to represent dynamic, human-aligned environmental understanding suitable for downstream robotic tasks.
  29. #DM171

    SparseDR: Differentiable Rendering of Sparse Signed Distance Fields

    Alexey Budak, Albert Garifullin, Vladimir Frolov
    We present SparseDR, a novel differentiable rendering algorithm designed for sparse representations based on Signed Distance Fields (SDF). We leverage the Sparse Brick Set representation and propose an adaptation of redistancing for sparse SDFs. This enables SparseDR to surpass existing works in accuracy of surface reconstruction by increasing the effective resolution of the SDF representation. SparseDR is efficiently implemented in C++ and Vulkan, achieving several times shorter reconstruction time than other methods.
    AIComputer VisionAIMachine Learning
  30. #DM173

    GRAIL: An Agentic AI Architecture for Interactive Grant Proposal Writing

    Zhisheng Tang, Mayank Kejriwal
    Securing research funding remains fragmented and time-consuming: researchers must navigate separate databases across dozens of agencies while simultaneously drafting competitive proposals. We present GRAIL, a web-based platform that unifies grant discovery and proposal writing through conversational AI. Users describe their research interests in natural language to explore opportunities from a unified index of 11.8K U.S. federal and nonprofit grant opportunities; within the document editor, integrated AI assistance supports real-time proposal revision and refinement. The system runs in any modern browser without installation. Conference attendees are invited to interact with the live system at the demo booth, explore the grant discovery and writing assistance workflows, and provide feedback on the user experience.
    AIAgent-based and Multi-agent SystemsAIData MiningAINatural Language ProcessingAISearch
  31. #DM183

    Optimizing Spectrogram Resolution and Training Strategies for Real-Time Killer Whale Call Type Classification

    Vladislav Naumov, Iaroslav Sheipak, Yuriy Ivanov, Ilya Makarov
    Automated identification of killer whale call types from continuous acoustic recordings is essential for scalable population monitoring, yet existing general-purpose frameworks such as ANIMAL-SPOT suffer from suboptimal spectrogram resolution and lack strategies tailored to the spectral-temporal characteristics of killer whale vocalizations.
    We identify two key limitations of the ANIMAL-SPOT framework: (1)~no possibility to choose different architecture of CNN backbone, and (2)~the absence of regularization and class-balancing techniques limits generalization across call types with varying abundance.
    To address these issues, we propose a framework that combines optimized STFT parameters (FFT size 1024, hop length 172) with label smoothing and targeted oversampling, evaluated across three CNN backbones and five segment lengths.
    On a dataset of 12 killer whale vocalization classes from Avacha Gulf, Russia, our best configuration (ResNet-18 with 1200\,ms segments) achieves \textbf{97.1\%} segment-level accuracy, compared to \textbf{96.2\%} for the ANIMAL-SPOT baseline --- a relative error reduction of 23.7\%.
    We further present an interactive demonstration system for uploading recordings and obtaining time-resolved call-type predictions, enabling rapid analysis of passive acoustic monitoring data.

    Demo video link: https://drive.google.com/file/d/1ITA_52WdnAcyg7zRjIIshutjl6Lp_um1/view?usp=drive_link
    Presentation slides: https://drive.google.com/file/d/1sd-JH1R5YPa9iRtsbRO-OZxY8-fyglsO/view?usp=drive_link
    AIHumans and AIAIMachine LearningAIMultidisciplinary Topics and Applications
  32. #DM191

    RareDASH: A Dynamic Multi-Agent System for Holistic Rare Disease Care

    Jialun Zhong, Jiayang Yu, Yanzeng Li, Meng Qin, Lei Zou, Yuqian Wang, Ying Zhang, Hanna Li, Liying Yan, Jie Qiao
    Rare diseases are characterized by low prevalence and intricate pathogenesis, leading to highly heterogeneous clinical trajectories. The care of rare disease presents formidable challenges due to the requirement for highly specialized expertise and experiences. Existing methods are typically tailored for isolated rare disease scenarios (e.g., diagnostic tasks, medication recommendations), which lacks a comprehensive perspective of the entire care process. Inspired by recent studies of agent skills, we propose RareDASH, a multi-agent system (MAS) featuring dynamic workflow orchestration designed to provide a comprehensive solution for the full life-cycle of rare disease care. Our framework is inherently patient-centric, enhancing rare disease discovery capabilities through proactive inquiry and information elicitation directly from the patients. Furthermore, we implement diverse agent memory to optimize both the accuracy and efficiency of the multi-agent collaboration. Finally, an online auditing module is integrated into the system to monitor and mitigate the hallucinations, ensuring the reliability of clinical outputs. The work sheds light on the feasibility of leveraging MAS in holistic rare disease care.
    AIAgent-based and Multi-agent SystemsAIPlanning and SchedulingAIHumans and AI
  33. #DM194

    LLM-to-Map: Transparent Conversational Tool Orchestration for Real-Time Multi-Domain Simulation

    Marouane Benbrahim, Kavya Gautam, Zonghan Zhang, Zhiqian Chen
    Coupled power-traffic simulations are valuable for studying EV charging stress and outage propagation, but existing tools typically require scripts and opaque configurations. We present LLM-to-Map, a demo system that exposes a real-time multi-domain simulator through structured LLM tool orchestration. Natural-language requests are mapped to typed tool calls that coordinate SUMO traffic simulation, PyPSA power-flow analysis, V2G actions, and map operations. The agent operates over a fixed tool schema and emits auditable execution logs in the UI so users can inspect every action and parameter. Tool execution is deterministic: each request resolves to explicit API calls, and unsafe or destructive actions can require confirmation. The tool layer provides JSON schemas and parameter validation, preventing arbitrary code execution and enabling reproducible runs. The agent receives live system state (loads, EV statistics, V2G status, time, temperature) to support context-aware multi-step commands; when an LLM is unavailable, a deterministic parser preserves the same tool interface. The backend synchronizes cross-domain events (EV charging demand and substation failures) and streams state updates to a 3D map interface. We describe the architecture, coupling and synchronization, and a demo workflow that showcases multi-step scenario control and reporting for non-experts without writing scripts.
  34. #DM195

    UrbanMix: LLM-Guided Simulation of Mixed Autonomy Traffic with Heterogeneous Behavioral Profiles

    Roman Sultimov, Daniil Efimov, Ivan Novikov, Aleksandr Volkov, Yury Maximov
    Cities deploying autonomous vehicles face an urgent policy question: would the adoption of autonomous vehicles (AVs) improve the congestion rate or worsen it? What would be the optimal adoption rate to minimize the congestion rate? How would cautious AVs (Waymo-style) and aggressive AVs (Tesla "Mad Max'"-style) interact with human drivers and delivery robots on shared roads? We present UrbanMix, an interactive simulation platform that embeds cognitively diverse agents (human drivers, cautious AVs, aggressive AVs, and delivery robots) with distinct behavioral profiles inside a Simulation of Urban MObility (SUMO) framework of real urban road networks. Our LLM planner operates as an urban policy coordinator, setting traffic rules through a bounded action interface, while a regulation shield enforces infrastructure constraints.

    Our experiments reveal three key phenomena: (i) roads throughput may substantially drop with the increase in AVs adoption; (ii) an aggressive cascade, where runtime behavior switching modeling Tesla user-selectable "Mad Max" mode triggers up to 60 times increase in emergency braking events on a real Austin Downtown network; and (iii) a delivery bottleneck, showing 24--36% throughput reduction from slow robots.
    Results validated on synthetic and real data from Austin, TX, demonstrate that real network topology amplifies cascade effects by more than four times.

    Demonstration video: https://youtu.be/ksgdX5iguFw. Live demo: https://evacuation-viz.vercel.app/urban.
  35. #DM196

    ElderMTL: Multi-Task Affect Monitoring for Elderly Care

    Maria Razzhivina, Shahane Tigranyan, Aram Avetisyan, Ilya Makarov, Andrey Savchenko
    We present ElderMTL, a multi-task affect monitoring system designed for elderly care settings. The system simultaneously estimates Facial Action Units (FAUs), Valence-Arousal (VA) signals, and categorical emotions (FER) from video, capturing multiple layers of affective information. To improve sensitivity to subtle affective cues common in older adults, our approach incorporates age-conditioned physiological modeling, including baseline muscle adjustments and a dynamic AU co-activation graph. This enables the system to adapt to age-related changes in facial expression patterns, providing more reliable and interpretable emotion assessments. In a live demonstration, we showcase ElderMTL processing video streams, visualizing AU activations, affective state predictions, and interpretable insights that highlight age-specific affective dynamics. This work demonstrates that physiologically grounded, multi-task affective monitoring can provide meaningful, real-world support for elderly care.
    AIComputer VisionAIHumans and AIAIMachine Learning
  36. #DM201

    Secure Coding Unleashed: Boosting Productivity With On-Premise LLM-Powered IDE Plugins

    Vasilii Krikunov, Nikolay Kotlyarov, Eugenii Nikolaev, Vasily Konovalov
    The integration of code assistance powered by Large Language Models (LLMs) into Integrated Development Environments (IDEs) has rapidly expanded, significantly influencing developer productivity. However, existing cloud-based solutions offered by third-party providers introduce critical privacy concerns due to storage and reuse of proprietary codebases for further model training. Addressing these concerns, we propose an enterprise-oriented approach to developing a customizable code-generation plugin for widely used IDEs, utilizing internally hosted LLMs. Through the proposed custom solution, this research explicitly quantifies the impact on developer productivity across various coding tasks.

    https://drive.google.com/file/d/10Y2_cOgUtZPViz5-oiXvG97fYxOUxKVH/view
  37. #DM205

    Interactive System for Reducing Error Propagation in Multi-Stage Ancient Egyptian Text Analysis

    Maksim Golyadkin, Innokentiy Humonen, Ilya Makarov
    We present a web-based system that brings an image-to-text pipeline for Ancient Egyptian hieroglyphs into a single interactive workspace. Instead of only producing a final transcription, our system exposes editable intermediate results so users can validate and correct the pipeline step by step. User edits are stored as image-aligned annotations, which supports both real-time text analysis and dataset creation. Quantitative results indicate improved efficiency and output quality relative to a manual baseline. A demonstration video is available at https://drive.google.com/file/d/1Wjy5vwbnX8kOqhb1ZHWWqi_qfXTUJHVr/view?usp=sharing.
  38. #DM209

    Double Bounded Neural Ray Queries

    Alexander Nikolaev, Nikolay Mozokhin, Roman Rodionov, Vladimir Frolov
    We introduce a novel neural ray tracing method designed for compact scene representations and real-time rendering. To compress the scene with minimal fidelity losses, we address the issue of limiting the search space for intersections. For each scene we first construct two lightweight proxy shells that tightly bound the original surface from inside and outside. While executing ray queries, we intersect the rays with the shells and extract the regions that potentially contain the intersection with the original surface. Extracted regions are passed to the small neural network to retrieve the exact intersection location. We implement our method as part of a GPU-accelerated hybrid path tracing pipeline. We demonstrate it running real-time rendering on a variety of scenes, achieving up to 300x memory reduction and surpassing existing compressed ray tracing techniques in memory-quality trade-off.
    AIMachine Learning
  39. #DM210

    CrossRefine: A Microservice for Cross-Domain Spatial Super-Resolution

    Daniil Sukhorukov, Andrei Zakharov, Ilya Makarov
    High-resolution spatial fields are critical for local decision-making, yet many operational and scientific workflows produce coarse outputs due to computational limits. We present CrossRefine, a deployable microservice for cross-domain spatial super-resolution that enhances multi-channel spatial tiles without modifying upstream models. It is built around a unified, topography-conditioned adversarial UNet trained across geographically diverse regions to ensure robustness to heterogeneous terrains and domain shifts. Unlike region-specific enhancement models, the system generalizes across domains within a single architecture, balancing numerical fidelity and structural realism through a hybrid regression–adversarial objective. The service provides REST API endpoints for batch and streaming inference, supports mixed-precision, and offers per-tile diagnostics and confidence maps to promote safe deployment.
    In our demo, we show interactive refinement of coarse spatial inputs, side-by-side comparison with interpolation and non-adversarial baselines, and real-time profiling of latency and throughput on commodity hardware. CrossRefine illustrates how spatial super-resolution can be delivered as a practical AI microservice, enabling scalable refinement of existing computational workflows without requiring higher-resolution upstream simulations.

    Demonstration video: https://shorturl.at/lz2un
    AIComputer VisionAIMultidisciplinary Topics and Applications
  40. #DM211

    LELA: An End-to-end LLM-based Entity Linking Framework with Zero-shot Domain Adaptation

    Samy Haffoudhi, Nikola Dobricic, Fabian M. Suchanek, Nils Holzenberger
    Entity linking is a key component of many downstream NLP systems, yet existing approaches are often tied to the specific target knowledge bases and domains, limiting their real world application. In this paper, we extend LELA, a modular and domain-agnostic LLM-based entity disambiguation method, into a practical Python library that integrates zero-shot Named Entity Recognition (NER) -- thereby providing a complete end-to-end pipeline for entity-linking in real-world usage. We provide experimental results validating LELA's performance and robustness across diverse entity linking settings. In our demo, users can play with the system on their own input texts. All code is publicly available at https://github.com/NDobricic/LELA, and a video is at https://www.youtube.com/watch?v=WdupiRjLbR4.
  41. #DM216

    Sparse ProtoPatient: Interactive Multi-Prototype Explanations for Clinical Diagnosis Prediction

    Conor Fallon, Bogdan Kostić, Betty Van Aken, Jens-Michalis Papaioannou, Alexei Figueroa, Keno Bressem, Alexander Löser
    We present the Sparse ProtoPatient Demo, a publicly available interactive system for interpretable ICD-10 diagnosis prediction from clinical admission notes.
    The system is designed for clinicians in training, researchers, and educators exploring prototype-based diagnostic reasoning.

    The demo links predictions to learned prototypical patient representations and token-level evidence, allowing users to input custom text or select preset cases, inspect predicted ICD-10 codes, visualize label-wise saliency, retrieve supporting prototype notes, and compare alternative prototype cohorts.
    The demo provides a reproducible platform for interactive inspection of prototype-based clinical reasoning, enabling complementary opinion exploration, model auditing, and teaching of interpretable diagnosis prediction.
    The deployed model is trained on the publicly released CodiEsp corpus (1000 clinical notes, 955 ICD-10 labels) using a sparse multi-prototype architecture with five prototypes per label.
    We use the official machine-translated English CodiEsp-MT release to support English-language interaction.
    It achieves a macro-AUROC of 0.92 on a held-out test set and supports real-time interaction (300ms per query).
    The system is fully containerized for public research and educational use.
  42. #DM218

    DeepL Voice: Real-Time Speech-to-Speech Translation

    Johannes Ernesti, Peter Kaiser, Jonas Heinze, Elnaz Shafaei-Bajestan, Kristina Geißler, Weiyue Wang, Johannes Beck, Sascha Brinker, Thorben Finke
    DeepL Voice is a real-time speech-to-speech translation system for global business communication, following a pragmatic incremental approach: developing a production-grade cascaded speech-to-speech-translation (S2ST) system, while exploring end-to-end solutions in parallel.
    The production system (launched November 2024) achieves competitive transcription quality through proprietary real-time ASR models and eliminates translation "flickering" via stable text streaming while maintaining low latency.
    Supporting 18 input languages and 30+ target languages, it offers DeepL Voice for Meetings (Microsoft Teams/Zoom integration) and DeepL Voice for Conversations (mobile apps).
    Key features include customizable formality and glossary support for business-appropriate communication, with voice cloning TTS under development.
    Demo Video: https://youtu.be/DMMcti2f4rc
  43. #DM220

    SteelAgent: An LLM-Orchestrated System for Physics-Informed Steel Property Prediction and Generalization Auditing

    Aleksandr Volkov, Roman Sultimov, Mikhail Kuzin, Yury Maximov
    Machine learning models for steel property prediction routinely report high quality metrics with $R^2\!>\!0.85$, yet these results rely on random splits that allow similar grades in both train and test sets. We present SteelAgent, an interactive system that exposes a critical generalization gap: the same models drop from $R^2\!>\!0.85$ all the way to $R^2\!=\!0.11$ on unseen steel families revealing more than 7~ times higher quality degradation. Similarly, conformal prediction coverage degrades from 91% to 38% under distribution shift induced by holding out substantial data sources. SteelAgent combines physics-informed features grounded in classical metallurgy and interpretable models with conformal uncertainty quantification, and an LLM orchestrator that coordinates six domain-specific tools. The system supports property prediction with specification compliance checking, competitive steel comparison, and cost-aware inverse alloy design over 3,741 heat treatment records spanning 1,234 grades. All predictions are traceable through explicit tool calls, ensuring that all physical quantities are computed, not generated. We made the code and data open-source and freely accessible to the community.

    Demo video: https://www.youtube.com/watch?v=BwVBJ-SwuQo. Live demo: https://steelagent.vercel.app
  44. #DM223

    Lucyde: A Demonstrator for Explainable Artificial Intelligence and Interactive Machine Learning

    Eda Ismail-Tsaous, Ute Schmid
    As AI‑based decision‑support systems become increasingly widespread, methods aimed at improving the performance of human-AI teams are gaining attention. In recent years, explainable artificial intelligence (XAI) has received growing interest, as it provides methods to make the behavior of machine learning models more transparent and can help to identify errors and flaws, which is particularly important in safety critical domains such as medicine and law. However, explanations themselves can be misleading, inconsistent, or incorrect, which makes it essential to raise awareness of the possibilities and limitations of these methods.
    We introduce Lucyde, a web‑based demonstrator designed to help users explore, compare, and better understand XAI methods across different datasets, models, and configurations. Lucyde provides a curated collection of explanation techniques, enables side‑by‑side comparison of methods, and offers easy‑to‑understand supplementary information for different user groups. It also illustrates interactive machine learning workflows by allowing users to correct model outputs or explanations. Lucyde thereby fosters informed and reflective engagement with AI systems and their explanations.

    The video can be found here: https://cloud.smartcitybamberg.de/s/pRHXdr6qxxng3Me
  45. #DM230

    From Surface Learning to Deep Understanding: A Grounded AI Tutoring System for Moodle

    Anna Ostrowska, Michał Kukla, Gabriela Majstrak, Jan Opala, Sebastian Pergała, Jan Skwarek, Anna Wróblewska
    This demo paper describes the development of the AI Teaching \& Learning Assistant, a modular Moodle plugin that leverages Retrieval-Augmented Generation (RAG) to deliver high-quality, hallucination-free education. The system employs a dual-centric design, providing students with interactive, Socratic-based tutoring and educators with a "Human-in-the-loop" workspace for supervised content generation. By grounding Large Language Model (LLM) responses in teacher-provided materials, the assistant addresses the risks of misinformation while encouraging deep conceptual mastery. Evaluation via the Ragas (LLM-as-a-Judge) framework and a preliminary user study confirms its effectiveness, achieving faithfulness scores up to 0.97 and a 4.00/5.00 recommendation rate.
    Demo Video: https://tinyurl.com/4zz4bcjn
  46. #DM231

    [COMP25] The Automated Negotiating Agents Competition (ANAC) 2026 Challenges and Results

    Yasser Mohammad, Reyhan Aydoğan, Catholijn Jonker, Katsuhide Fujita, Tim Baarslag, Tamara Florijn
    This paper presents the primary research challenges and key findings from the 15th International Automated Negotiating Agents Competition (ANAC 2025), one of the official competitions of IJCAI 2025. We focus on two critical domains: multi-deal negotiations and the development of agents capable of concurrent negotiation within complex supply chain management environments. Furthermore, this work analyzes the results of the competition and outlines strategic directions for future iterations.
  47. #DM234

    LLARS: Enabling Domain Expert & Developer Collaboration for LLM Prompting, Generation and Evaluation

    Philipp Steigerwald, Mara Stieler, Jennifer Burghardt, Eric Rudolph, Jens Albrecht
    We demonstrate LLARS, an open-source platform that bridges the gap between domain experts and developers for building LLM-based systems.
    LLARS integrates three tightly connected modules into an end-to-end pipeline:
    Collaborative Prompt Engineering for real-time co-authoring with version control and instant LLM testing,
    Batch Generation for configurable output production across user-selected prompts x models x data with cost control and
    Hybrid Evaluation where human and LLM evaluators jointly assess outputs through diverse assessment methods, with live agreement metrics and provenance analysis to identify the best model-prompt combination for a given use case.
    New prompts and models are automatically available for batch generation and completed batches can be turned into evaluation scenarios with a single click.
    Interviews with six domain experts and three developers in online counselling confirmed that LLARS feels intuitive, saves considerable time by keeping everything in one place and makes interdisciplinary collaboration seamless.
    Source code: github.com/th-nuernberg/llars
    AIHumans and AIAINatural Language ProcessingAIMultidisciplinary Topics and Applications
  48. #DM237

    PERELMAN: Pipeline for scientific literature meta-analysis

    Daniil Sherki, Daniil Merkulov, Aleksandra Savina, Dzhantemir Kikov, Dmitry Parpulov, Alexander Ivanov, Artem Abakumov, Ekaterina Muravleva
    We present PERELMAN (PipEline foR sciEntific Literature Meta-ANalysis), an agentic framework designed to extract specific information from a large corpus of scientific articles to support large-scale literature reviews and meta-analyses. Our central goal is to reliably transform heterogeneous article content into a unified, machine-readable representation. PERELMAN first elicits domain knowledge-including target variables, inclusion criteria, units, and normalization rulesсthrough a structured dialogue with a subject-matter expert. This domain knowledge is then reused across multiple stages of the pipeline and guides coordinated agents in extracting evidence from narrative text, tables, and figures, enabling consistent
    aggregation across studies. In order to assess reproducibility and validate our implementation, we evaluate the system on the task of reproducing the meta-analysis of layered Li-ion cathode properties NMC811 reported in [Savina and Abakumov, 2023]. We describe our solution, which has the potential to reduce the time required to prepare meta-analyses from months to minutes.
  49. #DM239

    An Automated Maintenance Plant for Highways

    N'zebo Richard Anvo, Alwyn Mathew, Lavindra de Silva, Damian Palin, Jie Xu, Samuel Schaefer, Abir Al-Tabbaa, Fumiya Iida, Ioannis Brilakis
    The Digital Roads project at Cambridge University is leveraging digitalisation, automation, and low-carbon materials to build an Automated Maintenance Plant (AMP) for UK road networks, aimed at minimising repair times to reduce congestion, improving safety, and contributing to the UK’s net-zero goals through faster, more accurate, and efficient road maintenance.
  50. #DM249

    GEV: Statically Correct and Programmable Knowledge Graph Updates

    Eduard Kamburjan, Shqiponja Ahmetaj, Chinmayi Prabhu Prasad Baramashetru, Paolo Pareti
    Knowledge Graphs (KGs) evolve over time and it is critical to ensure that their integrity constraints are maintained after each update.
    We introduce GEV, the first tool to statically ensure that a KG update in Java preserves satisfaction of SHACL constraints. This allows verification of updates at design time, and eliminates the need for costly continuous revalidation.
    GEV is a command-line system that loads and verifies updates, applies them to a loaded KG, and keeps track of the validation status. Internally, it relies on SHACL graph updates, a theoretical framework with a method for static verification.
    AIKnowledge Representation and Reasoning