Accepted Papers – IJCAI 2026

IJCAI-ECAI 2026 Accepted Papers · Special Track on AI and Social Good

Presentation format

Every accepted paper is presented in two formats: an oral talk (6 min talk + 2 min for Q&A) — which must be delivered in person in Bremen by one of the authors — and a poster (A0, free format) during a dedicated poster session.

#AI4G6

Session Aug 20 · 15:00–16:30 · Room 12

Poster Aug 20 · 16:30–18:00

PhyTTA: Physics-Informed Test-Time Adaptation of Foundation Models for Regional Drought Prediction

Wentao Gao, Jiuyong Li, Lin Liu, Thuc Duy Le, Jixue Liu, Yun Chen, Yanchang Zhao

Drought prediction is crucial for disaster mitigation, yet it remains challenging because regional droughts reflect nonlinear interactions among precipitation supply, evaporative demand, and slowly varying land atmosphere states. Although time series foundation models (TSFMs) have shown strong zero shot performance in general forecasting tasks, they can exhibit systematic sensitivity mismatch in the prediction of regional precipitation and temperature anomalies in the Standardized Precipitation Evapotranspiration Index (SPEI). We introduce PhyTTA, a physics informed test time adaptation framework that keeps the TSFM backbone frozen and corrects the predictable part of its one step residual through three components: a dynamic forcing correction, a static tracker of slowly varying background bias, and a conditional confidence gate that applies the correction only in supported regimes. Our analysis shows that, when frozen model residuals contain a component predictable from local forcing changes or slowly varying background bias, correcting this component reduces expected squared error in the population setting. PhyTTA implements this principle through a conservative online ridge EMA correction with regime support gating. Across four foundation backbones on three South Australian locations and TimesFM on three cross regional regions, PhyTTA reduces MSE by up to 19.78%. With the TimesFM backbone it gives consistent 12.3% to 15.1% MSE reductions across the three South Australian locations.

AI4GData MiningMultidisciplinary Topics and ApplicationsMultidisciplinary Topics and ApplicationsAI4GHumans and AI
#AI4G10

Session Aug 20 · 15:00–16:30 · Room 12

Poster Aug 20 · 16:30–18:00

When Can We Trust Fairness Audits? Identifying Reliability Boundaries of Third-party Audit Conclusions

Yuanhao Liu, Qi Cao, Huawei Shen

Fairness auditing aims to assess whether a model is fair, playing a critical role in identifying potential risks in deployed AI systems. In practice, due to limited access, third-party auditors often rely on self-collected datasets (e.g., via sock-puppets), which may differ from real-world deployment scenarios. Such discrepancy can lead to inconsistencies between audit conclusions on the collected data and those in actual deployment, raising concerns about the reliability of third-party audits. This motivates a critical question: When can we trust the fairness audit conclusion derived from third-party datasets? Answering this question is challenging, as the actual deployment distribution is typically inaccessible or unobservable. To tackle this, we introduce the Consistency Radius, a metric that quantifies the maximum distribution shift under which an audit conclusion based on third-party dataset remain consistent. We further propose a convex relaxation optimization-based method to estimate the radius relying solely on model responses over the audit dataset. Leveraging this framework, third-party auditors can provide their datasets to model providers and request the magnitude of distributional discrepancy relative to the deployment distribution, enabling reliable audit conclusions without requiring any direct data access.

AI Ethics, Trust, FairnessAI Ethics, Trust, FairnessAI4GAI Ethics, Trust, Fairness
#AI4G12

Session Aug 20 · 15:00–16:30 · Room 12

Poster Aug 20 · 16:30–18:00

STAMP: Multi-Pattern Attention-Aware Multiple Instance Learning for STAS Diagnosis in Multi-Center Histopathology Images

Liangrui Pan, Xiaoyu Li, Chenchen Nie, Yaning Yang, Shaoliang Peng

Spread through air spaces (STAS) constitutes a novel invasive pattern in lung adenocarcinoma (LUAD), associated with tumor recurrence and diminished survival rates. However, large-scale STAS diagnosis in LUAD remains a labor-intensive endeavor, compounded by the propensity for oversight and misdiagnosis due to its distinctive pathological characteristics and morphological features. Consequently, there is a pressing clinical imperative to leverage deep learning models for STAS diagnosis. This study initially assembled histopathological images from STAS patients at the Second Xiangya Hospital and the Third Xiangya Hospital of Central South University, alongside the TCGA-LUAD cohort. Three senior pathologists conducted cross-verification annotations to construct the STAS-SXY, STAS-TXY, and STAS-TCGA datasets. We then propose a multi‑pattern attention-aware multiple instance learning framework, named STAMP, to analyze and diagnose the presence of STAS across multi‑center histopathology images. Specifically, the dual‑branch architecture guides the model to learn STAS‑associated pathological features from distinct semantic spaces. Transformer-based instance encoding and a multi‑pattern attention aggregation modules dynamically selects regions closely associated with STAS pathology, suppressing irrelevant noise and enhancing the discriminative power of global representations. Moreover, a similarity regularization constraint prevents feature redundancy across branches, thereby improving overall diagnostic accuracy. Extensive experiments demonstrated that STAMP achieved competitive diagnostic results on STAS-SXY, STAS-TXY and STAS-TCGA, with AUCs of 0.8058, 0.8017, and 0.7928, respectively, surpassing the clinical level. The 10 open baseline results establish a benchmark for STAS diagnostic research and facilitate the future generalizability and clinical integration of computational pathology technologies. Dataset features and code are accessible at https://github.com/panliangrui/IJCAI2026.

Knowledge Representation and ReasoningKnowledge Representation and ReasoningMachine LearningMachine LearningAI4GComputer VisionAI4GData Mining
#AI4G33

Session Aug 20 · 15:00–16:30 · Room 12

Poster Aug 20 · 16:30–18:00

Suicide Risk Assessment from AI-powered Video Surveillance: An Interpretable Framework for Prevention in Metro Stations

Safwen Naimi, Wassim Bouachir, Guillaume-Alexandre Bilodeau, Brian Mishara

Understanding and monitoring human behavior in metro stations play an important role in supporting suicide prevention efforts, where early identification of high-risk situations can enable timely intervention. This requires assessing suicide risk from a surveillance video by jointly reasoning about the behavior of each passenger, his/her spatial context, and temporal dynamics. However, this assessment using videos captured by surveillance cameras is challenging, as it demands accurate perception of human motion, understanding of platform geometry, and aggregation of heterogeneous behavioral cues over time. In this work, we formalize the task of Suicide Risk Assessment (SRA) in metro stations and introduce the first interpretable framework that addresses this challenge. Unlike approaches that focus on isolated subtasks or attempt to infer intent directly, our formulation assesses suicide risk from accumulated evidence by incorporating person tracking, activity recognition, semantic segmentation of the platform, and trajectory-driven risk heatmap modeling. By formalizing SRA as a distinct task and benchmarking a complete operational pipeline achieving 83.2% ROC-AUC on real surveillance data, this work highlights the complexity of suicide risk assessment and opens new directions for research on interpretable AI systems for social good.

Computer VisionComputer VisionHumans and AIHumans and AIMultidisciplinary Topics and ApplicationsMultidisciplinary Topics and Applications
#AI4G34

Session Aug 20 · 15:00–16:30 · Room 12

Poster Aug 20 · 16:30–18:00

Policy-Embedded Graph Expansion: Networked HIV Testing with Diffusion-Driven Network Samples

Akseli Kangaslahti, Davin Choo, Lingkai Kong, Milind Tambe, Alastair van Heerden, Cheryl Johnson

HIV is a retrovirus that attacks the human immune system and can lead to death without proper treatment. In collaboration with the WHO and the University of Witwatersrand, we study how to improve the efficiency of HIV testing with the goal of eventual deployment, directly supporting progress toward UN Sustainable Development Goal 3.3. While prior work has demonstrated the promise of intelligent algorithms for sequential, network-based HIV testing, existing approaches rely on assumptions that are impractical in our real-world implementations. Here, we study sequential testing on incrementally revealed disease networks and introduce Policy-Embedded Graph Expansion (PEGE), a novel framework that directly embeds a generative distribution over graph expansions into the decision-making policy rather than attempting explicit topological reconstruction. We further propose Dynamics-Driven Branching (DDB), a diffusion-based graph expansion model that supports decision making in PEGE and is designed for data-limited settings where forest structures arise naturally, as in our real-world referral process. Experiments on real HIV transmission networks show that the combined approach (PEGE + DDB) consistently outperforms baselines (e.g., 17.3% improvement in discounted reward and 15.4% more HIV detections with 25% of the population tested) and explore key tradeoffs that drive solution quality.

Multidisciplinary Topics and ApplicationsMultidisciplinary Topics and Applications
#AI4G35

Session Aug 20 · 15:00–16:30 · Room 12

Poster Aug 20 · 16:30–18:00

DART: Navigating Last-Mile Heterogeneity in Instant Delivery via Distribution-Adaptive Splines

Hao Xiong, Yang Gao, Haiyong Luo, Fang Zhao, Dan Luo

On-demand delivery platforms rely on Travel Time Estimation (TTE) to balance courier earnings and overdue risks. In collaboration with one of China's largest platforms, we address a critical "Fairness Gap" in TTE: current systems fail to capture complex delivery patterns in GNSS-denied environments, subjecting couriers handling high concurrent order volumes to disproportionate pressure due to overdue deliveries. Analyzing 1.27 million real-world trajectories, we attribute this bias to unique challenges in GNSS-denied scenarios: distributional heterogeneity, structural heterogeneity, and contextual uncertainty. To bridge this gap, we propose DART (Distribution-Adaptive Robust Timing). DART incorporates a Learnable Adaptive Spline (LAS) encoder with a gradient-driven knot migration mechanism to enhance non-linear expressiveness for outliers, significantly improving long-tail accuracy. Furthermore, a Spatio-Temporal Transition Graph (STTG) reconstructs the latent topology by integrating sequence semantics, such as Wi-Fi-sensed arrival merchant timestamps. At the same time, a Distribution Gating Mechanism characterizes delivery time distributions under distinct contexts. Through extensive experiments and large-scale online A/B testing, DART not only reduces MAE by 14.0% in complex environments but also decreases the Order Overdue Rate by 1.7% (saving $24,000 daily), demonstrating how AI effectively reconciles operational efficiency with labor fairness.

Machine LearningMachine LearningData MiningData MiningHumans and AIHumans and AI
#AI4G38

Session Aug 20 · 15:00–16:30 · Room 12

Poster Aug 20 · 16:30–18:00

Beyond Vision: A Multimodal Dataset and Framework for Pest Recognition via Plant Electrophysiological Signals

Lu Wang, Jiaming Lin, Yuting Ye, Tao Wei, Chuchu Qin

Precise pest identification is essential for sustainable agriculture. Current visual recognition systems are brittle in the wild, where performance degrades due to occlusion and variable illumination. In contrast, plant electrophysiological signals serve as a robust, all-weather physiological modality, capable of detecting cryptic feeding behaviors that escape optical sensors. However, this field remains constrained by the scarcity of data and the absence of specialized algorithms. To bridge this gap, we introduce the Herbivory-Induced Plant Bio-signal Multimodal (HIPB-MM) dataset, the first fine-grained dataset comprising 4,023 synchronized plant electrophysiological signal-video pairs recording the feeding processes of three typical pest species. To address the weak and non-stationary nature of these signals, we propose the Herbivory-Induced Physiological Sensing (HIPS) framework. It integrates a Morphological Semantic Decoupling strategy to recover robust slow-wave semantics, and a Generation-State Encoder to model latent physiological states. Complementing this, an auxiliary dual-stream visual branch calibrates signal representations using explicit behavioral and morphological cues. Experiments demonstrate that HIPS establishes a solid benchmark (69.81% accuracy), comprehensively outperforming state-of-the-art baselines. Crucially, this work validates plant electrophysiology as a low-cost, all-weather modality for sustainable crop protection, effectively reducing pesticide dependency and safeguarding ecosystem health.

Data MiningData MiningMachine LearningMachine LearningMultidisciplinary Topics and ApplicationsMultidisciplinary Topics and Applications
#AI4G39

Session Aug 20 · 15:00–16:30 · Room 12

Poster Aug 20 · 16:30–18:00

Towards an Early Warning System for Ocean Heat Extremes Through AI-Ocean Dynamics Synergy

Zheng Jiang, Wei Wang, Gaowei Zhang, Yifei Bao, Zengzhou Hao, Lingyu Xu, Suixiang Shi, Lei Wang, Yi Wang

Ocean heat extremes, including marine heatwaves and the El Ni\~no–Southern Oscillation (ENSO), exert profound impacts on marine ecosystems and socio-economic stability. Establishing robust early warning systems is critical for proactive risk management; however, conventional predictive models often fail to generalize to the intensifying, non-stationary extremes driven by rapid global warming. This project introduces a novel AI-Ocean Dynamics synergy designed to provide an integrated early warning system. By synthesizing multi-source observations with physics-informed neural networks, it ensures predictions remain constrained by fundamental physical laws. The system forecasts event onset, intensity, duration, and spatial extent while simultaneously attributing the underlying mechanisms, such as ocean advection and air–sea heat exchange. To validate performance, we establish a specialized ocean heat extremes benchmark to assess predictive skill and attribution reliability. Furthermore, the system incorporates an incremental learning mechanism, enabling continuous adaptation to long-term climatic and environmental evolutions. This project advances the development of reliable, interpretable, and adaptive early warning systems, providing a vital tool for informed policy and maritime decision-making.

Multidisciplinary Topics and ApplicationsMultidisciplinary Topics and Applications
#AI4G43

Session Aug 20 · 15:00–16:30 · Room 12

Poster Aug 20 · 16:30–18:00

Improving Survey Participation in Low-Literacy Populations Through Value-Sensitive Conversational AI

Raj Gaurav Maurya

Collecting reliable social data from low-literacy populations remains a persistent challenge, particularly when surveys involve sensitive topics and marginalized communities. Traditional paper-based and web-based survey modalities often suffer from high attrition and incomplete responses due to literacy barriers, social pressure, and interactional discomfort. In this paper, we present findings from an initial field evaluation comparing multiple survey modalities paper-based interviews, digital web-based surveys, conversational AI (convAI) surveys, and convAI enhanced with layered value-sensitive design conducted with low-literacy women across India. Using data from 315 participants, we show that convAI significantly improves survey completion rates relative to traditional modalities, with the highest completion and lowest drop-off observed when value-sensitive and culturally aligned conversational design elements are fully integrated. These results demonstrate the importance of human-centered and value-sensitive interaction design in enabling inclusive, ethical, and scalable data collection; motivating more `AI for social good' applications.

AI4GHumans and AIAI4GMultidisciplinary Topics and ApplicationsAI4GAI Ethics, Trust, Fairness
#AI4G44

Session Aug 20 · 15:00–16:30 · Room 12

Poster Aug 20 · 16:30–18:00

CHAM-net: A Contrastive Hierarchical Adaptive Meta-network for Robust Global Methane Flux Prediction

Rongchao Dong, Yiming Sun, Shuo Chen, Youmi Oh, Licheng Liu, Yiqun Xie, Xiaowei Jia

Methane is a potent greenhouse gas that significantly contributes to global warming. However, accurately estimating global methane emissions and consumption remains challenging due to the complex interactions among environmental drivers that may vary across spatial and temporal scales. Prior data-driven methods often overlook the inherent spatiotemporal heterogeneity of ecosystems, failing to explicitly capture site-specific characteristics and cross-year evolutionary dynamics. To address these issues, we propose the Contrastive Hierarchical Adaptive Meta-network (CHAM-net), a novel framework that explicitly learns from historical context to capture site-specific dynamics. CHAM-net employs a hierarchical encoder–decoder architecture, in which the encoder captures site-specific characteristics from historical data and then dynamically conditions the decoder to generate the final prediction. Experimental results demonstrate that CHAM-net consistently outperforms all baseline methods on both simulation and observational datasets for methane emission and consumption, achieving nRMSE values as low as 0.43 and 0.88 with corresponding R^2 scores up to 0.97 and 0.68 for emission prediction.

Data MiningData MiningMachine LearningMachine LearningMultidisciplinary Topics and ApplicationsMultidisciplinary Topics and Applications
#AI4G50

Session Aug 19 · 15:00–16:30 · Room 12

Poster Aug 19 · 16:30–18:00

Optimizing Sensor Placement with Greedy Algorithms: A Case Study in Wildlife Camera Trapping for Spatial Capture-Recapture Population Estimation

Hannah Murray, Amrita Gupta, Arielle W. Parsons, Justin P. Suraci, Bistra Dilkina

Estimating wildlife populations is central to conservation planning, yet designing sensor deployments that produce reliable data for such estimates remains challenging. Spatial capture-recapture (SCR) models, widely used to estimate animal population sizes, are highly sensitive to sensor layout, where poor placement can substantially increase uncertainty in population estimates. We present a novel approach that formulates sensor placement for SCR as a scenario-based optimization problem under real-world resource constraints. Using collections of simulated animal capture histories spanning ecologically plausible parameter ranges, candidate sensor placements are evaluated via closed-form, SCR-derived design criteria linked to the precision of population estimates and optimized using both genetic algorithms and a greedy search strategy. We demonstrate our approach through an example American marten camera trapping study in British Columbia's South Chilcotin Mountains, achieving lower relative standard error and bias in population estimates than single-scenario and grid-based baselines. Our method was co-developed with conservation practitioners and is currently being used in real-world monitoring programs. This framework offers a general approach for designing wildlife surveys that support reliable population estimation across a range of realistic ecological scenarios.

Constraint Satisfaction and OptimizationConstraint Satisfaction and OptimizationMultidisciplinary Topics and ApplicationsMultidisciplinary Topics and Applications
#AI4G52

Session Aug 19 · 15:00–16:30 · Room 12

Poster Aug 19 · 16:30–18:00

PACE: A Personalized Adaptive Curriculum Engine for 9-1-1 Call-taker Training

Zirong Chen, Hongchao Zhang, Meiyi Ma

9-1-1 call-taking training requires mastery of over a thousand interdependent skills, covering diverse incident types and protocol-specific nuances. A nationwide labor shortage is already straining training capacity, but effective instruction still demands that trainers tailor objectives to each trainee's evolving competencies. This personalization burden is one that current practice cannot scale.
Partnering with Metro Nashville Department of Emergency Communications (MNDEC), we propose PACE (Personalized Adaptive Curriculum Engine), a co-pilot system that augments trainer decision-making by (1) maintaining probabilistic beliefs over trainee skill states, (2) modeling individual learning and forgetting dynamics, and (3) recommending training scenarios that balance acquisition of new competencies with retention of existing ones.
PACE propagates evidence over a structured skill graph to accelerate diagnostic coverage and applies contextual bandits to select scenarios that target gaps the trainee is prepared to address.
Empirical results show that PACE achieves 19.50% faster time-to-competence and 10.95% higher terminal mastery compared to state-of-the-art frameworks. Co-pilot studies with practicing training officers further demonstrate a 95.45% alignment rate between PACE's and experts' pedagogical judgments on real-world cases. Under estimation, PACE cuts turnaround time to merely 34 seconds from 11.58 minutes, up to 95.08% reduction.

Multidisciplinary Topics and ApplicationsMultidisciplinary Topics and Applications
#AI4G53

Session Aug 19 · 15:00–16:30 · Room 12

Poster Aug 19 · 16:30–18:00

An LLM-based Chain-of-Response Counter-Scam System

Heedou Kim, Mogan Gim, Donghee Choi, Soonil Bae, Hoonick Lee, Mi-Young Kim, Jaewoo Kang

The rapid evolution of online scams, driven by transnational networks and mass-produced social engineering scenarios, has exposed the speed limitations of conventional detection, necessitating tighter inter-agency coordination. While LLMs show promise in scam identification, their role in accelerating integrated response frameworks remains underexplored. We propose Counter-Scam, a unified LLM-based multi-agent framework that orchestrates end-to-end response from initial detection to crime investigation. The framework first proposes safe data guidelines, emphasizing non-public scam data and secure dataset construction via scam-specific NER. Developed with insights from 37 stakeholders to reduce delays and improve analytical efficiency, the system integrates CSRA (multi-agent mitigation), CSRT (nine role-aligned NLP tasks), and CSRD (a corpus of 185,300 scam cases and 38,587 knowledge entries). Experiments show that fine-tuned sLLMs surpass commercial models with over 10% in all CSRT tasks and a 0.24 F1 improvement in scam-specific NER. This proves the framework's capability for enabling rapid, collaborative mitigation of online scam.

Agent-based and Multi-agent SystemsAgent-based and Multi-agent SystemsAI Ethics, Trust, FairnessAI Ethics, Trust, FairnessMultidisciplinary Topics and ApplicationsMultidisciplinary Topics and ApplicationsNatural Language ProcessingNatural Language Processing
#AI4G62

Session Aug 19 · 15:00–16:30 · Room 12

Poster Aug 19 · 16:30–18:00

Domain-Informed Graph Neural Networks for Climate Factor Forecasting to Support Sustainable Crop Management

Ziyue Sun, Zixin Jiang, Chenkai Xu, Xinggao Liu

Forecasting climate factors is critical for anticipating agro-climatic risks and enabling sustainable crop management. However, accurate prediction remains challenging due to complex spatiotemporal variability, heterogeneous seasonal patterns, and intricate interdependencies among climate variables. Inspired by agronomic knowledge, We propose DoIGNN, a Domain-Informed Graph Neural Network that injects a domain-structured graph constraint built from Agro-Climatic Homogeneous Zones (ACHZs). Specifically, we partition stations into agro-climatic zones using long-term climatic statistics and location attributes, and construct a hierarchical ACHZ-guided adjacency. To better capture shared climate dynamics, we introduce a spatiotemporal decomposition module with temporal regularization that factorizes the climate tensor into low-rank global temporal bases and station loadings, yielding a compact station-level global component as auxiliary information for target forecasting. Finally, DoIGNN performs forecasting on both the ACHZ-guided and static-dynamic graphs to learn cross-region dependencies. Experiments on real-world climate datasets demonstrate that DoIGNN consistently improves forecasting accuracy over strong baselines while yielding more interpretable spatial dependency patterns that support climate-informed crop management decisions. Cooperating with Ningbo Natural Resources and Planning Big Data Center, the proposed model has been trained and deployed for local data analysis.

Data MiningData MiningMachine LearningMachine LearningMultidisciplinary Topics and ApplicationsMultidisciplinary Topics and Applications
#AI4G69

Session Aug 19 · 15:00–16:30 · Room 12

Poster Aug 19 · 16:30–18:00

Scalable Mapping of Tree Traits to Study the Dynamics of Protected Forest Areas in India

Dhruvi Goyal, Aaditeshwar Seth

Monitoring forest functional diversity is essential to understand ecosystem resilience in the face of rapid environmental change. While existing remote sensing approaches primarily track structural attributes such as canopy density and tree height, functional traits like leaf phenology (evergreen vs. deciduous) and leaf type (broadleaf vs. needleleaf) reveal more direct information about adaptive strategies of tree species. This study presents a scalable machine learning framework for mapping these traits across India at 10 m resolution using Google AlphaEarth Foundations (AEF) embeddings, which capture the complete annual spectral reflectance and radar signatures of the land surface. A key contribution we make is to curate an ML-ready training dataset by combining tree traits information with tree species occurrence data, and to obtain a diverse sample from this data based on spectral time-series to ensure the dataset captures a wide range of phenological dynamics. We then build cross-validation folds to specifically test for spatial generalizability across different eco-regions in India and temporal generalizability across different years, for classifiers learned from the data. Multiple classifiers are evaluated: Random Forest models trained on AEF embedding features achieved the best performance for both classification tasks, outperforming models trained on conventional Sentinel-1 and Sentinel-2 time series while offering seamless deployment in Google Earth Engine. Compared to publicly available land-cover products that encode leaf phenology and leaf type, our model yields significantly higher accuracy while providing outputs at substantially finer spatial resolution. We then observe the outputs of our model over several protected forest areas in India to understand their dynamics over the last 8 years. Our contribution is an analysis-ready open dataset to learn tree traits from remote sensed spectral data, a trained model that is spatially and temporally generalizable, and a demonstration of the insights the model can provide to understand the dynamics of protected forest areas, all of which can be replicated in other areas.

Humans and AIHumans and AIMachine LearningMachine LearningMultidisciplinary Topics and ApplicationsMultidisciplinary Topics and Applications
#AI4G78

Session Aug 19 · 15:00–16:30 · Room 12

Poster Aug 19 · 16:30–18:00

Human-in-the-Loop Meta Bayesian Optimization for Fusion Energy and Scientific Applications

Ricardo Luna Gutierrez, Sahand Ghorbanpour, Rahman Ejaz, Varchas Gopalaswamy, Riccardo Betti, Vineet Gundecha, Aarne Lees, Soumyendu Sarkar

Inertial Confinement Fusion (ICF) holds transformative promise for sustainable, near-limitless clean energy, yet remains constrained by prohibitively high costs and limited experimental opportunities. This paper presents Human-in-the-Loop Meta Bayesian Optimization (HL-MBO), a framework that integrates expert knowledge with few-shot, uncertainty-aware machine learning to accelerate discovery in data-scarce, high-stakes scientific domains. HL-MBO introduces a meta-learned surrogate model with an expert-informed acquisition function to recommend candidate experiments. To foster trust and enable informed decisions, HL-MBO also provides interpretable explanations of its suggestions. We show that HL-MBO outperforms current BO methods for ICF energy yield optimization, as well as benchmarks in molecular optimization and critical-temperature maximization for superconducting materials.

Machine LearningMachine LearningMultidisciplinary Topics and ApplicationsMultidisciplinary Topics and Applications
#AI4G88

Session Aug 19 · 15:00–16:30 · Room 12

Poster Aug 19 · 16:30–18:00

iFire AI: AI-powered Wildfire Simulation and 3D Immersive Visualisation

Renhao Huang, Lara Clemente, Mario Flores Gonzalez, Yang Song, Greg Drummond, Gonzalo Herrera, Jason Sharples, Michael Ostwald, Maurice Pagnucco, Ali Asadipour, Dennis Del Favero

Wildfires, especially extreme wildfires, can cause prolonged damage to natural environments, human lives and global economies. To mitigate these impacts, enhancing wildfire preparedness is critical. This research proposal introduces iFire AI, a collaborative project to develop a world-leading 3D immersive visualisation system that enables users to experience and understand extreme wildfire scenarios. iFire AI utilises our advanced 360-degree immersive system AVIE, which visualises interactive landscapes and wildfire events enhanced by AI techniques. Firstly, we propose a deep learning model for wildfire behaviour modelling that generates minute-resolution wildfire progression information for immersive visualisation, supported by a Sim2Real pipeline that integrates simulated and real-world data to address data insufficiency and enhance model development and evaluation. We also explore 3D tree reconstruction using 3D Gaussian splatting, creating visually realistic and computationally efficient tree models. iFire AI can enhance users' risk perception, situational awareness and collaborative decision-making, and thereby reduce risks due to extreme wildfires and promote sustainable development.

Humans and AIHumans and AIMachine LearningMachine LearningMultidisciplinary Topics and ApplicationsMultidisciplinary Topics and Applications
#AI4G90

Session Aug 19 · 15:00–16:30 · Room 12

Poster Aug 19 · 16:30–18:00

Coarse-to-Fine Domain Incremental Learning with Attentive Distillation for Mining Footprint Segmentation in Multispectral Imagery

Alif Tri Handoyo, Vincent C.S. Lee, Rizka Widyarini Purwanto, Alex M. Lechner, Deanna Kemp, Muhamad Risqi U. Saputra

Automatically mapping and segmenting global mining footprints using remote sensing and deep learning is critical for monitoring the socio-environmental risks and impacts of mining, yet its progress is hindered by the scarcity of fine-grained annotated data. Although large-scale datasets with coarse boundaries are widely available, leveraging them to improve fine-grained segmentation is challenging due to significant domain shift. To address this, we propose MineC2FNet, a coarse-to-fine domain incremental learning framework that exploits abundant coarse data to enhance fine-grained mining footprint segmentation. MineC2FNet adopts a teacher–student architecture with attentive distillation at both the feature and prediction levels, selectively transferring generalized knowledge from the coarse domain while enabling boundary refinement using limited fine-grained data (fine domain). We further introduce an expertly validated dataset of 219 images with precise boundary annotations across diverse geographies and commodities. Extensive experiments against state-of-the-art approaches, including domain adaptation and domain incremental learning methods, demonstrate that MineC2FNet achieves superior performance while effectively handling domain shift. The dataset and code are publicly available at https://github.com/risqiutama/MineC2FNet.

AI4GComputer VisionAI4GMachine LearningAI4GMultidisciplinary Topics and Applications
#AI4G93

Session Aug 19 · 15:00–16:30 · Room 12

Poster Aug 19 · 16:30–18:00

Bridging the SEA Gap: An Initial Benchmark for Neural Audio Codec-Synthesized Speech Deepfakes in South-East Asian Languages

Orchid Chetia Phukan, Girish, Mohd Mujtaba Akhtar, Arun Balaji Buduru

Codecfakes (CFs) are a type of speech deepfakes generated through Audio Language Models (ALMs), with Neural Audio Codecs (NACs) forming the core mechanism for speech encoding and generation. CFs exhibit distributional characteristics that differ from vocoder-based deepfakes, causing detectors trained on vocoder data to generalize poorly to CFs detection. Although this has led to the development of CF detection benchmarks, existing resources are largely confined to English—and to a limited extent Chinese—leaving South-East Asian (SEA) languages unexplored. To bridge this gap, we introduce SEA-CF, the first large-scale benchmark for CF detection spanning multiple SEA languages, diverse speaker profiles, and a wide range of NAC architectures. SEA-CF is constructed by synthesizing publicly available real speech corpora. Our experiments show that state-of-the-art (SOTA) CF detectors trained on English-centric datasets fail to generalize to SEA speech due to language-specific phonetic structures, tonal variations, and rich prosodic diversity. We further conduct a comprehensive zero-shot and fine-tuned evaluation of recent SOTA ALMs on SEA-CF. Fine-tuning the ALMs improves performance, however, these are very large being impractical for real-world application due to their scale, particularly in low-resource and latency-constrained settings. To address this limitation, we propose a novel small-ALM, GARUDA tailored for CF detection, which delivers strong performance while remaining lightweight. Extensive evaluations demonstrate that the proposed Small-ALM outperforms strong end-to-end and ALM-based baselines, establishing a new, practical direction for robust CF detection in SEA languages and beyond.

AI Ethics, Trust, FairnessAI Ethics, Trust, FairnessNatural Language ProcessingNatural Language Processing
#AI4G96

Session Aug 19 · 15:00–16:30 · Room 12

Poster Aug 19 · 16:30–18:00

Context-Aware Concept Distillation for Trustworthy Flood Prediction

Eli Levinkopf, Efrat Morin, Claudia V. Goldman

Effective flood risk management relies on accurate forecasting, yet the ”black box” nature of state-of-the-art Deep Learning models creates a barrier
to trust and accountability in high-stakes public safety decisions. While existing Explainable AI (XAI) methods offer local attributions, they fail to provide the verifiable, operationally meaningful causal narratives required by disaster response authorities. To address this societal challenge, we propose Context-Aware Concept Distillation (CACD), a framework developed in collaboration with domain experts to distill opaque LSTMs into interpretable, hydrology-aware surrogate models. We introduce an unsupervised pipeline to discover a ”Hydrological Language” and a Residual Hypernetwork that dynamically modulates these concepts based on static basin characteristics. Evaluated on 5,203 basins globally, our model achieves high fidelity (Median NSE 0.70), significantly outperforming black-box baselines (e.g., Multi Layer Perceptrons) on unseen future data. By demonstrating that human-interpretable concepts are sufficient to reconstruct flood dynamics, this work balances AI accuracy with the transparency required for responsible environmental decision-making.

AI4GAI Ethics, Trust, FairnessAI4GHumans and AIAI4GMachine LearningAI4GMultidisciplinary Topics and Applications
#AI4G101

Session Aug 19 · 11:30–12:30 · Room 12

Poster Aug 19 · 16:30–18:00

HighFM: Towards a Foundation Model for Learning Representations from High-Frequency Earth Observation Data

Stella Girtsou, Konstantinos Alexis, Giorgos Giannopoulos, Charalambos Kontoes

The increasing frequency and severity of climate-related disasters have intensified the need for real-time monitoring, early warning, and informed decision-making. Earth Observation (EO), powered by satellite data and Machine Learning (ML), offers powerful tools to meet these challenges. Foundation Models (FMs) have revolutionized EO ML by enabling general-purpose pretraining on large-scale remote sensing datasets. However most existing models rely on high-resolution satellite imagery with low revisit rates, limiting their suitability for fast-evolving phenomena and time-critical emergency response. In this work, we present HighFM, a first cut approach towards a FM for high-temporal-resolution, multispectral EO data. Leveraging over 2 TB of SEVIRI imagery from the Meteosat Second Generation (MSG) platform, we adapt the SatMAE masked autoencoding framework to learn robust spatiotemporal representations. To support real-time monitoring, we enhance the original architecture with fine-grained temporal encodings to capture short-term variability. The pretrained models are then fine-tuned on cloud masking and active fire detection tasks. We benchmark our SEVIRI-pretrained Vision Transformers against traditional baselines and recent geospatial FMs, demonstrating consistent gains across both balanced accuracy and IoU metrics. Our results highlight the potential of temporally dense geostationary data for real-time EO, offering a scalable path toward foundation models for disaster detection and tracking.

Computer VisionComputer VisionMachine LearningMachine Learning
#AI4G105

Session Aug 19 · 11:30–12:30 · Room 12

Poster Aug 19 · 16:30–18:00

Brazilian Indigenous Languages Revitalization at Scale: Reducing Cost and Development Time for Low-Resource Language Courses

Gustavo Polleti, Fabricio Gerardi, Carolina Aragon, Fernando Bororo, Mariel Kujiboekureu, Wayali Salomã, Fabio Cozman

Brazil is home to about 180 Indigenous languages, which span a wide range of sociolinguistic circumstances, from critically endangered to relatively stable. Across this continuum, Indigenous communities often lack pedagogical resources and are underserved by existing language-learning technologies, which are typically designed for high-resource languages and assume solid connectivity and large datasets. This research project proposes AI-assisted tools and workflows for the collection and annotation of textual and speech data that substantially reduce the time and cost required to produce engaging language-learning game apps. Our goal is to implement a language-learning game app that Indigenous students can use to practice their reading, writing and speaking skills at home. We propose novel speech processing models for low-resource Indigenous languages and offlline support in low-connectivity environments. Our project adopts a co-creation model that actively foster collaboration between Indigenous educators, linguists, and youth, is adapted to the their context, and complies with ethical guidelines. We outline an implementation plan with Bororo and Enawene Nawe communities to test our methods and, potentially, produce an AI-driven platform for Indigenous language education that is applicable across diverse sociolinguistic contexts in Brazil and beyond.

AI Ethics, Trust, FairnessAI Ethics, Trust, FairnessHumans and AIHumans and AIMultidisciplinary Topics and ApplicationsMultidisciplinary Topics and ApplicationsNatural Language ProcessingNatural Language Processing
#AI4G108

Session Aug 19 · 11:30–12:30 · Room 12

Poster Aug 19 · 16:30–18:00

Deep Reinforcement Learning Enhanced Semi-supervised Graph Neural Network for Credit Card Fraud Detection

Huilin He, Kun Zhu, Zewen Hu, Jie Wang, Dawei Cheng

Credit card fraud threatens global payment ecosystems, causing billions in losses and undermining public trust. Efficient fraud detection remains challenging due to surging transaction volumes and evolving tactics. While Graph Neural Networks (GNNs) excel at modeling structural relationships, they struggle in real-world scenarios characterized by label scarcity and often overlook discriminative feature-level signals, leaving rich risk signals underutilized without costly manual engineering. To address this, we propose DRESS, a Deep Reinforcement Learning (DRL) Enhanced Semi-supervised GNN framework. It employs a DRL agent to automatically capture and enhance feature-level risks, fusing them with graph-based structural risks and propagating via a gated temporal attention network for final prediction. To mitigate inefficient exploration of the DRL module, we incorporate a feature self-attention layer to weigh feature contributions to fraud detection and employ self-supervised intrinsic rewards to help optimize the DRL module efficiently. Extensive experiments on real-world datasets demonstrate that DRESS outperforms state-of-the-art methods, especially in low-label scenarios with only 2%–10% labeled samples. By empowering resource-limited institutions to combat fraud and prevent financial loss, DRESS secures the digital trust essential for inclusive growth, contributing to AI for poverty alleviation and economic development.

Data MiningData MiningKnowledge Representation and ReasoningKnowledge Representation and Reasoning
#AI4G121

Session Aug 19 · 11:30–12:30 · Room 12

Poster Aug 19 · 16:30–18:00

Improving Scientific Formula Verbalization in Large Speech Language Models for Accessible Learning

Xueyi Li, Tianqiao Liu, Zitao Liu, Teng Guo, Yongdong Wu

Online learning systems provide accessible learning opportunities for blind or low-vision students. To support access to complex scientific materials, the speech models used in these systems need to deliver accurate scientific formula verbalization. While recent large speech language models (LSLMs) provide remarkable low-latency streaming capabilities, their potential for scientific formula verbalization remains underexplored. In this paper, we propose Formula-Speech, the first end-to-end LSLM designed for scientific formula verbalization. Specifically, we construct two high-quality scientific formula datasets with educational experts to align speech models with scientific formula verbalization patterns. We then adopt a lightweight and effective two-stage training framework, combining supervised fine-tuning for basic formula-to-speech alignment with reinforcement learning guided by a custom reward function to optimize for human-preferred verbalization. Experimental results show that our model significantly improves the verbalization performance of LSLMs and achieves state-of-the-art results across multiple scientific domains. Our code and datasets are available at https://github.com/ai4ed/FormulaSpeech.

AI4GMultidisciplinary Topics and ApplicationsAI4GHumans and AIAI4GData Mining
#AI4G122

Session Aug 19 · 11:30–12:30 · Room 12

Poster Aug 19 · 16:30–18:00

AG-STELLA: Spatio-Temporal Learning for Water-related Agricultural Land Use Activity Mapping with AlphaEarth

Nibir Chandra Mandal, Oishee Bintey Hoque, Kyle Luong, Samarth Swarup, Kirti Rajagopalan, Mandy L Wilson, Abhijin Adiga, Madhav Marathe

Accurate mapping of agricultural land use activity, particularly long-term transition from cropland to pasture and short-term transition between cropland to fallow land, is essential for sustainable water management, drought response, and food-system resilience which directly supports United Nations Sustainable Development Goals (SDG-2 and SDG-8). However, reliable agricultural land use activity mapping is challenging due to spectral ambiguity, temporal irregularities, severe class imbalance, and limited generalization across agricultural regions. In this work, we propose AG-STELLA, a knowledge guided spatio-temporal model that (i) captures temporal changes of agricultural lands using pretrained spatio-temporal transformers; (ii) integrates geospatial context using AlphaEarth embedding; (iii) introduces a temporal transition latent space with temporal consistency constraints; (iv) employs guidance through hydroclimatic consistency; and (v) uses a land use-aware gated decoder to improve robustness across regions. Through experimentation across three water-stressed U.S. states, we show consistent gains over baseline vision and foundation models, achieving up to 27% F1-score improvement for pasture (minority class) and 16% overall. We further show the robustness across heterogeneous regions through cross-state transfer learning, where AG-STELLA consistently outperforms foundation model baselines and achieve up to 82.3% F1-score for fallow land with a 9.6% improvement over the best foundation model. Code: https://github.com/Nibir088/AG-STELLA

Computer VisionComputer VisionMachine LearningMachine LearningMultidisciplinary Topics and ApplicationsMultidisciplinary Topics and Applications
#AI4G124

Session Aug 19 · 11:30–12:30 · Room 12

Poster Aug 19 · 16:30–18:00

Rule-Bottleneck RL: Learning to Decide and Explain for Sequential Resource Allocation via LLM Agents in Public Health

Guojun Xiong, Mauricio Tec, Haichuan Wang, Francesca Dominici, Joseph Ngonzi, Adeline Boatin, Milind Tambe

Reducing preventable maternal mortality remains a global health priority. Under Sustainable Development Goal (SDG) target 3.1, the WHO emphasizes timely and equitable allocation of limited maternal health resources. Motivated by Department of Obstetrics and Gynecology at several important hospitals in Uganda and Ghana, we study the problem of sequential allocation of wearable vital sign monitoring devices among maternal mothers. While deep reinforcement learning (RL) has shown promise for sequential resource allocation, its limited interpretability hinders adoption in such high-stakes settings. In contrast, large language model (LLM) agents provide human-readable reasoning but often struggle with effective long-term decision making. To bridge this gap, we introduce Rule-Bottleneck RL (RBRL), the first LLM agent framework for resource allocation problems that jointly optimizes language-based decision policy and explainability. At each step within RBRL, an LLM first generates candidate rules---language statements capturing decision priorities tailored to the current state. RL then optimizes rule selection to maximize environmental rewards and explainability, with the LLM acting as a judge. Finally, an LLM chooses the action (optimal allocation) based on the rule. We provide conditions for RBRL performance guarantees as well as the finite-horizon evaluation gap of the learned RBRL policy. Experiments in maternal health show that RBRL outperforms baseline LLM agents and approaches the performance of deep RL, while producing clearer, policy-relevant explanations. Human evaluations further confirm improved trust and usability, demonstrating RBRL as a practical AI approach aligned with SDG target 3.1.

Agent-based and Multi-agent SystemsAgent-based and Multi-agent SystemsMultidisciplinary Topics and ApplicationsMultidisciplinary Topics and ApplicationsPlanning and SchedulingPlanning and Scheduling
#AI4G134

Session Aug 19 · 11:30–12:30 · Room 12

Poster Aug 19 · 16:30–18:00

PoemDirector: A Multi-Agent Context-Adaptive Instructional Mode Selection Framework for Chinese Classical Poetry Video Generation

Tengteng Cheng, Xiaoli Zeng, Jialu Huang, Mingliang Hou, Zitao Liu, Xiangyu Zhao, Weiqi Luo

Classical poetry is central to aesthetic education and cultural inheritance in China's K-12 education, and web-based instructional videos have become a primary learning resource. However, existing production pipelines either require costly teacher-guided editing or generate videos automatically without systematic instructional design, leading to unclear logic, shallow explanations, and limited contextual adaptation. We propose PoemDirector, a multi-agent framework developed with a national K-12 educational platform partner that unifies context-adaptive instructional mode selection, hierarchical explanation, and end-to-end video generation. A Director Agent analyzes each poem, selects a pedagogically grounded instructional mode from a teacher-co-designed library, and coordinates specialized agents to produce a complete instructional video from a poem title alone. We further establish a multidimensional evaluation framework, refined by literary and education experts, to assess instructional quality and poetic presentation. Experiments show that PoemDirector substantially outperforms commercial baselines and approaches human-crafted video quality across multiple dimensions. The resources are publicly available at https://github.com/ai4ed/PoemDirector.

Agent-based and Multi-agent SystemsAgent-based and Multi-agent SystemsHumans and AIHumans and AINatural Language ProcessingNatural Language Processing
#AI4G136

Session Aug 20 · 10:00–11:00 · Room 12

Poster Aug 20 · 16:30–18:00

NeuroDALEC: A Differentiable and Interpretable Mass-Conserving Framework for Terrestrial Ecosystem Carbon Cycle Dynamics

Meng Wan, Tiantian Liu, Zhixin Xia, Ningming Nie, Jue Wang, Rongqiang Cao, Honglin He, Xiaoli Ren, Peng Shi, Yangang Wang

Accurate simulation of terrestrial ecological carbon cycles is crucial for global climate change and ecosystem management. Process-based carbon models have high interpretability, but suffer from insufficient accuracy and slow computation due to fixed parameters. In contrast, deep-learning carbon models achieve high accuracy, but disregard physical principles, which prevents ecologists from explaining ecosystem dynamics. We propose NeuroDALEC, an interpretable framework that embeds the DALEC carbon-cycle model within a neural network, enabling differentiable computation of ecological processes. Key parameters and ensemble learning strategies are designed, and mass-conserving carbon pool state transition equations are introduced to ensure physical consistency. Experiments show NeuroDALEC outperforms existing models in both accuracy and efficiency. Moreover, it provides sufficient interpretability by predicting all components of the carbon cycle. Deployed in a real-time carbon assimilation system, NeuroDALEC supports daily carbon forecasting and decision-making. This work contributes to the United Nations' Sustainable Development Goals 13 (Climate Action) and 15 (Life on Land). The source code is available at: https://github.com/codesiena/NeuroDALEC.

Machine LearningMachine LearningMultidisciplinary Topics and ApplicationsMultidisciplinary Topics and Applications
#AI4G141

Session Aug 20 · 10:00–11:00 · Room 12

Poster Aug 20 · 16:30–18:00

UniST-Pred: A Robust Unified Framework for Spatio-Temporal Traffic Forecasting in Transportation Networks Under Disruptions

Yue Wang, Djellel Difallah, Areg Karapetyan, Samer Madanat

Spatio-temporal traffic forecasting is a core component of intelligent transportation systems, supporting various downstream tasks such as signal control and network-level traffic management. In real-world deployments, forecasting models must operate under structural and observational uncertainties, conditions that are rarely considered in model design. Recent approaches achieve strong short-term predictive performance by tightly coupling spatial and temporal modeling, often at the cost of increased complexity and limited modularity. In contrast, efficient time-series models capture long-range temporal dependencies without relying on explicit network structure. We propose UniST-Pred, a unified spatio-temporal forecasting framework that first decouples temporal modeling from spatial representation learning, then integrates both through adaptive representation-level fusion. To assess robustness of the proposed approach, we construct a dataset based on an agent-based, microscopic traffic simulator (MATSim) and evaluate UniST-Pred under severe network disconnection scenarios. Additionally, we benchmark UniST-Pred on standard traffic prediction datasets, demonstrating its competitive performance against existing well-established models despite a lightweight design. The results illustrate that UniST-Pred maintains strong predictive performance across both real-world and simulated datasets, while also yielding interpretable spatio-temporal representations under infrastructure disruptions. The source code and the generated dataset are available at https://anonymous.4open.science/r/UniST-Pred-EF27.

Machine LearningMachine LearningMultidisciplinary Topics and ApplicationsMultidisciplinary Topics and Applications
#AI4G142

Session Aug 20 · 10:00–11:00 · Room 12

Poster Aug 20 · 16:30–18:00

Integrating Atmospheric Dispersion Modeling Priors into Cuboid Splatting for Spatiotemporal Reconstruction of Airborne Radioiodine After Nuclear Accidents

Mareike Böckel, Stephan Doerfel, Kathrin Meisenberg, Oliver Meisenberg, Max Friedrich, Mattis Hartwig

In nuclear power plant accidents, airborne radioiodine poses major health risks, making reliable reconstructions of its spatiotemporal distribution crucial for emergency management. Current state-of-the-art prognosis systems use atmospheric dispersion modeling but ignore posterior evidence from emergency care centers, comprising movement profiles and thyroid measurements of affected individuals. A first study showed that the AI method Cuboid Splatting can reconstruct iodine air concentrations from such data but it ignores simulations from established prognosis systems.
Our multidisciplinary team extends Cuboid Splatting by incorporating these simulations as priors and subsequently correcting them using movement and thyroid data. Several ways to translate and correct priors are developed. The best-performing approaches are combined into a novel Cuboid Splatting-with-prior mechanism, which we evaluate using constructed prior scenarios representing different error types and intensities.
Using Cuboid Splatting-with-prior yields more accurate reconstructions than (i) the used dispersion simulations alone and (ii) plain Cuboid Splatting without prior. Across reconstructions, the mean scenario error is 19.61%, improving on (i) by 28.02pp and on (ii) by 89.86pp, the latter with particularly large gains at high spatial resolution. These results demonstrate that combining simulation-based priors with measurement-based posterior inference can substantially improve the reconstruction of iodine air activity concentrations in nuclear emergencies.

Data MiningData MiningMachine LearningMachine LearningMultidisciplinary Topics and ApplicationsMultidisciplinary Topics and Applications
#AI4G144

Session Aug 20 · 10:00–11:00 · Room 12

Poster Aug 20 · 16:30–18:00

ICFD-31k: A Large-Scale Dataset and Benchmark for Real-Time Conversational Fraud Detection

Rishi Ahuja, Kumar Prateek, Simranjit Singh

The proliferation of sophisticated telephone scams poses a significant societal and economic threat, impacting diverse linguistic contexts in a country like India. Furthermore, the lack of large-scale, publicly available datasets remains a critical barrier impacting research on robust, real-time countermeasures. In view of this, the proposed work introduces ICFD-31k, the first Indian Conversational Fraud Dataset, representing a new benchmark containing over 31,000 realistic conversational transcripts. ICFD-31k comprises systematically generated content, covering 10 distinct fraud umbrellas spanning from financial impersonation to job scams. ICFD-31k transcripts feature rich annotations comprising a final verdict, chunk-level streaming labels, and detailed ``slow-thinking'' rationales. In addition, the human-in-the-loop evaluation validates the ICFD-31k's quality, achieving a Cohen's Kappa of 0.534 that confirms annotation reliability. Furthermore, the proposed work introduces two fine-tuned models based on RoBERTa: M1 for non-streaming data and M2 for streaming data. The comprehensive experiments with strong baselines (M1, M2) further demonstrate the ICFD-31k's utility. The code and reproducibility materials are available at https://github.com/SPELLAILab/ICFD-31k.

AI Ethics, Trust, FairnessAI Ethics, Trust, FairnessMachine LearningMachine LearningMultidisciplinary Topics and ApplicationsMultidisciplinary Topics and ApplicationsNatural Language ProcessingNatural Language Processing
#AI4G148

Session Aug 20 · 10:00–11:00 · Room 12

Poster Aug 20 · 16:30–18:00

Clinically-Oriented Screening Model for Diabetic Retinopathy Severity Grading and Diabetic Macular Edema Detection

Sanchika Menezes, Rohan Chawla, Nawazish Shaikh, Pradeep Venkatesh, Radhika Tandon, Srinivas Rana

Diabetic retinopathy (DR) and diabetic macular edema (DME) are leading causes of preventable blindness worldwide. Automated screening tools are critical for timely detection at scale, particularly in low-resource settings where access to ophthalmologists is limited. We propose DRDME-Net, a deployment-driven joint learning framework that formulates DR grading as an ordinal regression task and DME detection via a continuous surrogate, rather than conventional classification. This design yields stable risk scores tightly aligned with operational clinical decision-making thresholds. Evaluation on facility and community cohorts demonstrates that DRDME-Net achieves strong performance across severity boundaries. Insights from an initial feasibility pilot further demonstrate its scalability in real-world workflows. These results highlight the potential of DRDME-Net to expand equitable access to timely detection, reduce preventable vision loss, and provide a practical template for integrating AI into population screening initiatives.

Computer VisionComputer VisionMachine LearningMachine LearningMultidisciplinary Topics and ApplicationsMultidisciplinary Topics and Applications
#AI4G162

Session Aug 20 · 10:00–11:00 · Room 12

Poster Aug 20 · 16:30–18:00

FlowID: Enhancing Forensic Identification with Latent Flow-Matching Models

Jules Ripoll, David Bertoin, Alasdair Newson, Charles Dossal, Jose Pablo Baraybar

Every day, many people die under violent circumstances, whether from crimes, war, migration, or climate disasters.
Medico-legal and law enforcement institutions document many portraits of the deceased for evidence, but cannot immediately carry out identification on them.
While traditional image editing tools can process these photos for public release, the workflow is lengthy and produces suboptimal results.
In this work, we leverage advances in image generation models, which can now produce photorealistic human portraits, to introduce FlowID, an identity-preserving facial reconstruction method. Our approach combines single-image fine-tuning, which adapts the generative model to out-of-distribution injured faces, with attention-based masking that localizes edits to damaged regions while preserving identity-critical features.
Together, these components enable the removal of artifacts from violent death while retaining sufficient identity information to support identification.
To evaluate our method, we introduce InjuredFaces, a novel benchmark for identity-preserving facial reconstruction under severe facial damage.
Beyond serving as an evaluation tool for this work, InjuredFaces provides a standardized resource for the community to study and compare methods addressing facial reconstruction in extreme conditions.
Experimental results show that FlowID outperforms state-of-the-art open-source methods while maintaining low memory requirements, making it suitable for local deployment without compromising data privacy.

AI4GComputer VisionAI4GAI Ethics, Trust, FairnessHumans and AIHumans and AIMachine LearningMachine LearningMultidisciplinary Topics and ApplicationsMultidisciplinary Topics and Applications
#AI4G169

Session Aug 20 · 10:00–11:00 · Room 12

Poster Aug 20 · 16:30–18:00

Agents Trusting Agents? Restoring Lost Capabilities with Inclusive Healthcare

Alba Aguilera, Georgina Curto, Nardine Osman, Ahmed Al-Awah

Agent-based simulations have an untapped potential to inform social policies on urgent human development challenges in a non-invasive way, before these are implemented in real-world populations. This paper responds to the request from non-profit and governmental organizations to evaluate policies under discussion to improve equity in health care services for people experiencing homelessness (PEH) in the city of Barcelona. With this goal, we integrate the conceptual framework of the capability approach (CA), which is explicitly designed to promote and assess human well-being, to model and evaluate the behavior of agents who represent PEH and social workers. We define a reinforcement learning environment where agents aim to restore their central human capabilities, under existing environmental and legal constraints. We use Bayesian calibration of profile-dependent behavioral parameters of PEH agents, modeling the degree of trust and resulting engagement with social workers, which is reportedly a key element for the success of the policies in scope. Our results open a path to mitigate health inequity by building relationships of trust between social service workers and PEH.

Agent-based and Multi-agent SystemsAgent-based and Multi-agent SystemsMultidisciplinary Topics and ApplicationsMultidisciplinary Topics and Applications
#AI4G181

Session Aug 20 · 11:30–12:30 · Room 12

Poster Aug 20 · 16:30–18:00

Directional Hallucinations: Ideological Drift in News-Grounded LLM Question Answering

Chendi Wang, Liam Cunningham, Tom Yishay, Jieying Chen

Large language models (LLMs) are increasingly used to answer questions about political information, including in election-adjacent information settings where factual errors and ideological distortions are high-stakes. We present a reproducible measurement framework that treats hallucinations, unsupported statements in document-grounded QA, as diagnostic signals of ideological drift. Using 21,727 expert-labeled U.S. political news articles from QBias spanning left, center, and right sources, we (i) generate an article-specific question, (ii) elicit document-grounded answers from three open-weight LLMs and one proprietary model, (iii) detect sentence-level hallucinations via reference-based comparison, (iv) classify the ideological valence of hallucinated sentences with a fine-tuned stance classifier, and (v) probe output logits to relate token-level uncertainty to hallucination and drift. Hallucination rates vary substantially across models and concentrate in contentious topics, while source-ideology differences in hallucination frequency are modest. In contrast, hallucination content exhibits robust leftward drift: a majority of hallucinated sentences are classified as left-leaning, including among hallucinations generated from right-leaning sources. Logit-level analysis shows hallucinations arise in high-entropy generation contexts, and in some models uncertainty also predicts leftward drift, consistent with an "uncertainty → guessing" mechanism. We discuss implications for auditing AI-mediated political information and for designing safeguards in election-relevant deployments.

AI Ethics, Trust, FairnessAI Ethics, Trust, FairnessMultidisciplinary Topics and ApplicationsMultidisciplinary Topics and ApplicationsNatural Language ProcessingNatural Language ProcessingUncertainty in AIUncertainty in AI
#AI4G183

Session Aug 20 · 11:30–12:30 · Room 12

Poster Aug 20 · 16:30–18:00

FACET: Multi-Agent AI Supporting Teachers in Scaling Differentiated Learning for Diverse Students

Jana Gonnermann-Müller, Jennifer Haase, Nicolas Leins, Moritz Igel, Konstantin Fackeldey, Sebastian Pokutta

Classrooms are increasingly heterogeneous, with learners varying in performance, motivation, language proficiencies, and learning differences such as dyslexia and ADHD. While teachers recognize the need for differentiated instruction, growing workloads make it an ideal that is often unrealized in practice. Current AI educational tools are predominantly student-facing and performance-centric, ignoring other aspects that shape learning outcomes.
We introduce Facet, a teacher-facing multi-agent framework that supports differentiation across motivation, performance, and learning differences. Developed with educational stakeholders from the outset, the framework coordinates four specialized agents, including learner simulation, diagnostic assessment, material generation, and evaluation within a teacher-in-the-loop design.
School principals (N = 30) shaped system requirements through participatory workshops, while in-service K–12 teachers (N = 70) evaluated material quality. Mixed-methods evaluation demonstrates strong perceived value for inclusive differentiation. Practitioners emphasized both the urgent need arising from classroom heterogeneity and the importance of maintaining pedagogical autonomy as a prerequisite for adoption. We discuss implications for future school deployment and outline partnerships for longitudinal classroom implementation.

Humans and AIHumans and AIAgent-based and Multi-agent SystemsAgent-based and Multi-agent SystemsMultidisciplinary Topics and ApplicationsMultidisciplinary Topics and Applications
#AI4G191

Session Aug 20 · 11:30–12:30 · Room 12

Poster Aug 20 · 16:30–18:00

Democratizing Ski Safety: Real-Time Turn Segmentation with Smartphone IMU and Causal LSTM Networks

Michał Szymocha, Piotr Kacprzak, Jakub Robak, Wojciech Turek

Anterior cruciate ligament (ACL) injury is one of the most common and serious injuries in sports, particularly among recreational skiers. Research shows that structured technique awareness and continuous feedback can significantly reduce the risk of such injuries, yet access to professional instructors is limited to wealthy athletes who can afford continuous private coaching, creating a harmful inequity in injury prevention. This gap can be mitigated by automating the real-time analysis of skiing techniques available to the wider recreational skiing community. The approach relies exclusively on inertial sensors embedded in standard smartphones, eliminating the need for specialized equipment and enabling broad social scalability. To support immediate feedback, the system operates causally, producing predictions based solely on past observations. The work is conducted in cooperation with professional ski instructors, ensuring that problem formulation, data annotation, and result evaluation reflect real-world coaching practices and injury prevention needs. The model is evaluated using Leave-One-Subject-Out validation on a public, in-the-wild dataset, demonstrating robust generalization across skiers, achieving an average directional accuracy of 89.8%, while maintaining extremely low inference latency suitable for on-device mobile deployment. This work outlines a practical pathway to democratizing injury prevention in recreational sports.

Data MiningData MiningHumans and AIHumans and AIMachine LearningMachine Learning
#AI4G198

Session Aug 20 · 11:30–12:30 · Room 12

Poster Aug 20 · 16:30–18:00

Column Generation for the Micro-Transit Zoning Problem

Hins Hu, Rishav Sen, Jose Paolo Talusan, Abhishek Dubey, Aron Laszka, Samitha Samaranayake

Along with the rapid development of new urban mobility options like ride-sharing over the past decade, on-demand micro-transit services stand out as a middle ground, bridging the gap between fixed-line mass transit and single-request ride-hailing, balancing ridership maximization and travel time minimization. Micro-transit adoption can have significant social impact. It improves urban sustainability, through lower energy consumption and reduced emissions, while enhancing equitable mobility access for disadvantaged communities, thanks to its lower vehicle miles per passenger, flexible schedules, and affordable pricing. However, effective operation of micro-transit services requires planning geo-fenced zones in advance, which involves solving a challenging combinatorial optimization problem. Existing approaches enumerate candidate zones first and selects a fixed number of optimal zones in the second step. In this paper, we generalize the Micro-Transit Zoning Problem (MZP) to allow a global budget rather than imposing a size limit for candidate zones. We also design a Column Generation (CG) framework to solve the problem exactly and several pricing heuristics to accelerate computation. Extensive numerical experiments across major U.S. cities demonstrate that our approach produces higher-quality solutions more efficiently and scales better in the generalized setting.

Constraint Satisfaction and OptimizationConstraint Satisfaction and OptimizationMultidisciplinary Topics and ApplicationsMultidisciplinary Topics and ApplicationsPlanning and SchedulingPlanning and Scheduling
#AI4G199

Session Aug 20 · 11:30–12:30 · Room 12

Poster Aug 20 · 16:30–18:00

A Hybrid Modeling Framework for Crop Prediction Tasks via Dynamic Parameter Calibration and Multi-Task Learning

William Solow, Paola Pesantez-Cabrera, Markus Keller, Lav Khot, Sandhya Saisubramanian, Alan Fern

Accurate prediction of crop states (e.g., phenology stages and cold hardiness) is essential for timely farm management decisions such as irrigation, fertilization, and canopy management to optimize crop yield and quality. While traditional biophysical models can be used for season-long predictions, they lack the precision required for site-specific management. Deep learning methods are a compelling alternative, but can produce biologically unrealistic predictions and require large-scale data. We propose a hybrid modeling approach that uses a neural network to parameterize a differentiable biophysical model and leverages multi-task learning for efficient data sharing across crop cultivars in data limited settings. By predicting the parameters of the biophysical model, our approach improves the prediction accuracy while preserving biological realism. Empirical evaluation using real-world and synthetic datasets demonstrates that our method improves prediction accuracy by 60% for phenology and 40% for cold hardiness compared to deployed biophysical models. Project site: https://tinyurl.com/DMC-MTL-Site.

Machine LearningMachine LearningMultidisciplinary Topics and ApplicationsMultidisciplinary Topics and Applications
#AI4G202

Session Aug 20 · 11:30–12:30 · Room 12

Poster Aug 20 · 16:30–18:00

A Gloss-driven Indian Sign Language Production System Using Learned Pose Representations

Suvajit Patra, Arkadip Maitra, Swami Punyeshwarananda, Soumitra Samanta

Sign Language Production (SLP) system translates spoken or written language into sign language, enabling accessible communication between the deaf/hard-of-hearing and the hearing population. Being one of the most widely used sign languages globally, Indian Sign Language (ISL) is a very low-resource language and lacks such SLP systems. This paper presents a scalable and modular SLP framework based on Sign-Pose-VQ-VAE model, designed for low-resource settings. The model learns discrete pose representations (codes) by disentangling body, left-hand, and right-hand keypoints, enabling efficient pose modeling and co-articulated sign generation. The proposed system is evaluated using a Hindi movie subtitle corpus coupled with an off-the-shelf back-translation model and achieves a gloss BLEU-4 score of 47.20. The system-generated signs are evaluated by certified ISL interpreters with an average rating of 4.33/5, and a BERT precision of 0.7683 on glosses. In addition, the proposed system achieves state-of-the-art performance among keypoint-based methods on the PHOENIX14T benchmark, attaining a BLEU-4 score of 10.03 and surpassing the previous best method by 0.67 points. The system with code is available at https://cs.rkmvu.ac.in/~isl/sl_gen_vqvae.

Computer VisionComputer VisionHumans and AIHumans and AIMultidisciplinary Topics and ApplicationsMultidisciplinary Topics and ApplicationsNatural Language ProcessingNatural Language Processing
#AI4G206

Session Aug 20 · 11:30–12:30 · Room 12

Poster Aug 20 · 16:30–18:00

Climate Surrogates for Scalable Multi-Agent Reinforcement Learning: A Case Study with CICERO-SCM

Oskar Bohn Lassen, Serio Agriesti, Filipe Rodrigues, Blaz Kurnik, Francisco Pereira

Climate policy analysis requires models that capture multi-gas climate effects, but such models are too slow to embed in reinforcement learning loops at scale. In collaboration with a pan-European public-sector environmental agency, we develop a multi-agent reinforcement learning (MARL) framework that integrates a higher-fidelity climate surrogate as the environment transition, enabling regional agents to learn policies under multi-gas dynamics. We train a recurrent surrogate on 20,000 multi-gas emission pathways to emulate CICERO-SCM. The surrogate achieves near-simulator accuracy (global-mean temperature RMSE ≈ 4 × 10⁻⁴ K) with ∼1000× faster one-step inference and yields >100× end-to-end MARL training speed-up. We show policy agreement with the simulator in tractable settings and propose a replay- and rank-consistency test (Kendall’s τ) for assessing policy fidelity when simulator-in-the-loop training is infeasible. This enables large-scale multi-agent policy experiments while retaining high-fidelity multi-gas climate response.

Agent-based and Multi-agent SystemsAgent-based and Multi-agent SystemsHumans and AIHumans and AIMachine LearningMachine LearningAI Ethics, Trust, FairnessAI Ethics, Trust, Fairness
#AI4G207

Session Aug 19 · 10:00–11:00 · Room 12

Poster Aug 19 · 16:30–18:00

Addressing Overcommitment in the Reasoning of Gendered Economic Memes Under Multimodal Ambiguity

Kushal Kanwar, Dushyant Singh Chauhan, Kapil Rana, Gopendra Vikram Singh, Nils Lukas

Multimodal meme understanding is increasingly used to analyze socially sensitive content, yet existing models often exhibit biased behavior when interpreting economic dependence and social roles under ambiguity. Many memes express economic relationships through sparse text or symbolic visual cues, providing insufficient evidence for gendered attribution. In such underspecified settings, models tend to rely on pretraining correlations, leading to hallucinated and stereotypical economic role assignments. In this work, we study gendered economic dependence in image-text memes through the lens of contextual sufficiency and identify epistemic overcommitment—inferring roles without adequate evidence—as a primary source of bias. We propose CGER-Net, a context-grounded multimodal framework that estimates whether the input provides sufficient evidence for gendered economic reasoning and applies evidence-gated inference to enable confident attribution when cues are explicit while favoring principled abstention otherwise. We evaluate CGER-Net on EconMeme-GE, a curated dataset of image-text memes annotated as Men, Women, Neutral, or Ambiguous. Across strong contemporary multimodal baselines, CGER-Net reduces Gender Overcommitment Rate by up to 44% on ambiguous instances while maintaining comparable accuracy on unambiguous cases. Human evaluation further shows that 79% of generated rationales are judged as epistemically aligned with the available evidence. These results highlight the importance of modeling when not to infer for reliable and responsible multimodal analysis.

AI Ethics, Trust, FairnessAI Ethics, Trust, FairnessHumans and AIHumans and AIMultidisciplinary Topics and ApplicationsMultidisciplinary Topics and Applications
#AI4G218

Session Aug 19 · 10:00–11:00 · Room 12

Poster Aug 19 · 16:30–18:00

SMaRT: Online Reusable Resource Assignment and an Application to Mediation in the Kenyan Judiciary

Shafkat Farabi, Didac Marti Pinto, Wei Lu, Manuel Ramos-Maqueda, Sanmay Das, Antoine Deeb, Anja Sautmann

Motivated by the problem of assigning mediators to cases in the Kenyan judicial system, we study an online resource allocation problem where incoming tasks (cases) must be immediately assigned to available, capacity-constrained resources (mediators). The resources differ in their quality, which may need to be learned. In addition, resources can only be assigned to a subset of tasks that overlaps to varying degrees with the subset of tasks other resources can be assigned to. The objective is to maximize task completion while satisfying soft capacity constraints across all the resources. The scale of the real-world problem poses substantial challenges, since there are over 2000 mediators, and a multitude of combinations of geographic locations (87) and case types (12) that each mediator is qualified to work on. Together, these features—unknown quality of new resources (newly onboarded mediators), soft capacity constraints (due to the mandate to assign cases without delay), and high-dimensional state space—make existing scheduling and resource allocation algorithms either inapplicable or inefficient. We formalize the problem in a tractable manner, using a quadratic program formulation for assignment and a multi-agent bandit style framework for learning. We demonstrate the key properties and advantages of our new algorithm, SMaRT (Selecting Mediators that are Right for the Task), compared with baselines on some stylized instances of the mediator allocation problem. We then turn to considering its application to real-world data on cases and mediators from the Kenyan Judiciary. SMaRT outperforms baselines and allows for controlling the tradeoff between the strictness of the capacity constraints and overall case resolution rates, both in situations where mediator quality is known beforehand and when the problem is bandit-like in that learning is part of the problem definition. On the strength of these results, we plan to conduct a randomized controlled trial in which we deploy SMaRT in the Judiciary's mediation management system.

Agent-based and Multi-agent SystemsAgent-based and Multi-agent SystemsAI Ethics, Trust, FairnessAI Ethics, Trust, Fairness
#AI4G222

Session Aug 19 · 10:00–11:00 · Room 12

Poster Aug 19 · 16:30–18:00

White-Hat Testing for the Ballot Box: A Framework for Election AI Auditing

Chendi Wang, Jieying Chen

Recent research shows that conversational AI can shift voter preferences, with effects persisting for weeks. Yet frontier models exhibit a documented "persuasion-reliability tradeoff", producing hallucinated or systematically distorted election information. Despite these risks, election officials lack standardized tools to systematically evaluate AI systems before deployment. We propose CivicAudit-Bench, a stakeholder-guided auditing framework to stress-test large language models for civic hallucinations, false confidence, jurisdiction-dependent failure, and asymmetric refusals/accuracy. This framework introduces a modular, counterfactual, and severity-aware auditing methodology that integrates roll-call–based alignment modeling, entity-swap probing, and jurisdiction-conditional correctness criteria. Informed by engagement with the U.S. Election Assistance Commission, the toolkit consists of three modules: (1) PoliBias-US, a multi-indicator alignment screen combining Congressional roll-call ideology scaling with party-cue counterfactual sensitivity, persona robustness, and narrative-framing alignment; (2) HalluBias-Election, an evidence-linked benchmark that measures hallucinations, severity-weighted critical errors, and asymmetries via Entity-Swap Counterfactual Probing and a jurisdiction-safe completion criterion; and (3) Disclosure-Test, pre-registered experiments assessing whether transparency and calibrated-uncertainty disclosures reduce overreliance and attenuate persuasion without blocking legitimate civic information. CivicAudit-Bench outputs versioned audit scorecards and a coordinated white-hat disclosure workflow, advancing UN SDG~16 by strengthening democratic information integrity.

AI Ethics, Trust, FairnessAI Ethics, Trust, FairnessHumans and AIHumans and AIMultidisciplinary Topics and ApplicationsMultidisciplinary Topics and ApplicationsUncertainty in AIUncertainty in AI
#AI4G225

Session Aug 19 · 10:00–11:00 · Room 12

Poster Aug 19 · 16:30–18:00

STELLAR: Spatio-Temporal Environmental Learning with Latent Alignment and Refinement for Long-Tailed Species Distribution Modeling

Shufeng Kong, Tao Yu, Yuanyuan Wei, Caihua Liu, Junwen Bai, Yingheng Wang, Marc Grimson, Daniel Fink, Carla P. Gomes

Joint Species Distribution Modeling (JSDM) is a key enabler for biodiversity monitoring and conservation planning. However, accurate JSDM faces two coupled challenges: environmental drivers and species distributions are inherently spatio-temporal, while species co-occurrence patterns exhibit complex non-linear community structure and severe long-tail imbalance driven by rare species. Existing approaches often address these factors in isolation, learning from static covariates or neglecting the historical trajectories of dynamic community structure. To overcome these limitations, we propose STELLAR (Spatio-Temporal Environmental Learning with Latent Alignment and Refinement), a novel framework that learns a shared latent space where dynamic habitat context and community structure are optimized jointly. Our approach integrates three complementary components: (1) a Graph-Temporal Encoder that employs graph attention and recurrent units to aggregate spatial neighborhood effects and capture the co-evolving historical dynamics of environmental context and community structure; (2) a Context-Anchored Latent Alignment mechanism that structures the latent space using a label-activated mixture prior and supervised contrastive learning, actively clustering species based on shared environmental preferences; and (3) an Imbalance-Aware Decoupled Decoding module that utilizes Asymmetric Loss to focus learning on hard, rare species samples, preventing mode collapse in the long tail. Experiments on the large-scale eBird dataset, curated with domain experts, demonstrate that our framework significantly outperforms state-of-the-art baselines, particularly in predicting rare species and revealing interpretable species interactions.

Data MiningData MiningMachine LearningMachine LearningMultidisciplinary Topics and ApplicationsMultidisciplinary Topics and Applications
#AI4G226

Session Aug 19 · 10:00–11:00 · Room 12

Poster Aug 19 · 16:30–18:00

Quantifying Error Disparities in Population Health Models

Aaron Marker, Salvatore Giorgi, Adithya V Ganesan, Vasudha Varadarajan, Ojas Deshpande, Laura Brandt, Gabriel Odom, H. Andrew Schwartz

Many high-stakes social applications of AI, such as public health surveillance and policy planning, operate at the community- rather than individual-level. However, most AI fairness research operates at the individual- or data-level (i.e. document or image) and rely on metrics defined over discrete demographic categories rather than population-level demographic proportions.
In this work, we first introduce the Bilateral Concentration Index (BCI) to quantify nonmonotonic error disparities missed by the category-based metrics used at individual or data-levels.
Then we conduct a large-scale audit of sociodemographic error disparities in both lexical- and transformer-based models of county-level health outcomes, over a dataset cover billions of community-mapped messages. While all tasks had significant disparity, the size varied widely depending on the outcome and model, from BCI of 2.1% for predicting life satisfaction to 17.0% for predicting fair or poor health. We further evaluate four approaches for incorporating sociodemographic information, as potential bias mitigation strategies, finding that while demographic inclusion consistently improved predictive accuracy, it frequently amplified error disparities. The largest disparities were associated with education and income (BCI = 2.7–16.4%), often reducing accuracy for low-income, and in some cases, high-income communities. These findings highlight a critical accuracy–fairness trade-off in community-level models for public health tasks, demonstrating how seemingly beneficial modeling choices can lead to increased disparities which could disadvantage communities if model predictions are used to make policy decisions in geographic areas where this public health data is not available.

AI Ethics, Trust, FairnessAI Ethics, Trust, FairnessAI4GNatural Language ProcessingNatural Language ProcessingNatural Language Processing
#AI4G230

Session Aug 19 · 10:00–11:00 · Room 12

Poster Aug 19 · 16:30–18:00

LLM-Enhanced Knowledge and Learning Path Understanding for Graph-based Educational Recommendation

Qingqing Liang, Shuyan Zheng, Peiwei Xia, Chunyang Wang, Xuesong Lu, Aoying Zhou

Educational recommendations empower personalized learning by suggesting suitable learning resources to learners, and the graph-based recommenders are widely adopted. Existing methods are mainly ID-based, which initialize learners and resources with trainable identifiers and optimize their representations solely from the interaction graph. As a result, the lack of semantic understanding of learning resources and learning paths hinders further improvements in recommendation accuracy. To alleviate the problem, we propose KLU4EduRec, which leverages large language models (LLMs) to understand resource knowledge and learning paths, thereby enhancing traditional graph-based educational recommenders. Specifically, for learning path understanding, we segment a learning path by detecting learning pattern drift in resource knowledge sequence, and prompt LLMs to infer learners' learning patterns within each segment. The segment-level patterns are then chronologically aggregated to represent the overall learning path. Besides, we prompt LLMs to summarize the core knowledge of learning resources from their content as complementary semantic signals. Finally, the resulting semantic representations are aligned and fused with structural representations learned by a graph-based recommender to enable more accurate recommendations. Extensive experiments show that KLU4EduRec greatly outperforms existing methods, including traditional ID-based methods and recent LLM-powered methods. A case study shows how the understanding of pattern drift in a learning path leads to more suitable recommendations. Our codes are available at https://github.com/DaSESmartEdu/KLU4EduRec.

Data MiningData MiningKnowledge Representation and ReasoningKnowledge Representation and ReasoningMultidisciplinary Topics and ApplicationsMultidisciplinary Topics and Applications
#AI4G237

Session Aug 19 · 10:00–11:00 · Room 12

Poster Aug 19 · 16:30–18:00

Sustainable Intelligence for the Wild: Democratizing Ecological Monitoring via Knowledge-Adaptive Edge Expert Agents

Jiaxing Li, Hao Fang, Chi Xu, Miao Zhang, Jiangchuan Liu, William I. Atlas, Katrina M. Connors, Mark A. Spoljaric

Rapid biodiversity loss underscore the urgency of effective monitoring, yet manual surveys remain resource-intensive. While on-device AI offers a scalable alternative, its performance in the wild is often challenged by environmental variability. Current methods rely heavily on cloud resource, which requires continuous uploading of field data for model retraining. This approach is unsuitable for remote deployments because it consumes limited power and network connectivity. To address these constraints, this research proposes a shift from model adaptation to knowledge adaptation. We introduce an architecture that separates visual perception from reasoning, combining a visual encoder with a dynamic knowledge base. We uses an explicit knowledge base to replace implicitly encoding expert knowledge into model parameters. This method also supports knowledge sustainability by preserving expert insights in a structured form. Through cross-disciplinary collaboration with biologists and Indigenous communities, this work advances ethical AI co-development, fostering responsible and culturally informed ecosystem management.

Humans and AIHumans and AIKnowledge Representation and ReasoningKnowledge Representation and ReasoningMultidisciplinary Topics and ApplicationsMultidisciplinary Topics and ApplicationsAgent-based and Multi-agent SystemsAgent-based and Multi-agent Systems
#AI4G253

Session Aug 21 · 11:30–12:30 · Room 12

Poster Aug 21 · 15:00–16:00

Temporal Motif-aware Graph Test-time Adaptation for OOD Blockchain Anomaly Detection

Runang He, Tongya Zheng, Huiling Peng, Yuanyu Wan, Bingde Hu, Jiawei Chen, Canghong Jin, Mingli Song, Can Wang

Ever-evolving transaction patterns have significantly hindered anomaly detection on emerging cryptocurrency blockchains due to the vast number of addresses and diverse anomalous behaviors. Recently, advanced Graph Anomaly Detection (GAD) approaches applied to blockchains have faced two critical challenges: adversarial pattern evolution by malicious actors and the out-of-distribution (OOD) problem caused by varied transaction semantics on blockchains. To address these challenges, we propose a novel framework termed TEmporal Motif-aware Graph Test-Time Adaptation (TEMG-TTA). First, we comprehensively capture the 3-node temporal motif distribution of each active address using an efficient computational mechanism, enabling downstream temporal motif-aware graph learning. Second, we design a simple yet effective test-time adaptation strategy to facilitate the sharing of common patterns between training and testing graphs. Extensive experiments on 5 real-world datasets demonstrate that our proposed TEMG-TTA outperforms state-of-the-art GAD approaches by an average of 54.88%. A further case study on interpretable motif patterns reveals that TEMG-TTA explicitly characterizes the complex transaction patterns of anomalous addresses, thereby verifying the effectiveness of our technical designs. Our code is publicly available at https://github.com/LuoXishuang0712/TEMG-TTA/.

Data MiningData MiningMachine LearningMachine Learning
#AI4G267

Session Aug 21 · 11:30–12:30 · Room 12

Poster Aug 21 · 15:00–16:00

ArogyaSutra: A Multi-Agent Framework for Multimodal Medical Reasoning in Indic Languages

Tanmoy Kanti Halder, Akash Ghosh, Subhadip Baidya, Arijit Roy, Sriparna Saha

Multimodal Large Language Models (MLLMs) have shown promising reasoning capabilities in general domains, yet their performance remains limited in specialized settings such as healthcare, especially in multilingual and low-resource scenarios. This gap is critical in regions like rural India, where patients often express complex medical queries in native Indic languages and rely on multimodal inputs such as medical images. Existing English-centric MLLMs struggle to support such use cases, limiting equitable access to AI-driven healthcare assistance. To address this challenge, we introduce ArogyaBodha, a large-scale multilingual multimodal medical question--answer dataset constructed from eight heterogeneous sources, covering 31 body systems, six imaging modalities, and 21 clinical domains across English and seven major Indian languages. We further propose ArogyaSutra, an actor--critic-based multi-agent framework that integrates tool grounding with dual-memory mechanisms for step-wise, reasoning-aware decision making, and uses stored actor--critic simulation trajectories for distillation. Experiments show that our dataset and framework improve multilingual medical reasoning accuracy across all Indic languages, with ablations validating the contribution of each component. The source code and dataset are available at: https://iitp-cse.github.io/ArogyaSutra/

Agent-based and Multi-agent SystemsAgent-based and Multi-agent SystemsNatural Language ProcessingNatural Language Processing
#AI4G271

Session Aug 21 · 11:30–12:30 · Room 12

Poster Aug 21 · 15:00–16:00

PhysTrans: A Physics-Aware Transferable Framework for Global Cold-Start Photovoltaic Forecasting

Meng Wan, Kaipeng Gao, Jue Wang, Siyan Fang, Xue Miao, Pufen Zhang, Sijie Chang, Peng Shi, Yangang Wang, Zhenbing Zhao

With the rapid expansion of photovoltaic (PV) power generation worldwide, PV systems have become key to global energy construction. Accurate PV forecasting is essential for safe grid operation and renewable energy integration. However, most existing models rely heavily on site-specific historical data and perform poorly when deployed in cold-start scenarios of newly built power plants. We propose PhysTrans, a physics-aware transferable framework for cold-start PV forecasting. Firstly, we design a physics-constrained residual network that utilizes a clear-sky module for better physical consistency. Furthermore, we propose a dynamic cloud cropping method to obtain the cloud information of shaded PV stations by fitting the angle of the sun offsets. To fuse the asymmetric data, a query-based asymmetric fusion mechanism is introduced to achieve high-precision alignment of multi-modal data. We conduct experiments on global datasets, and the results show that the PhysTrans outperforms state-of-the-art models with a 13.2% decrease in MAE in the single-site task, and also outperforms existing migration models with an average decrease in MAE of 12.7% in the cross-sites task. Our work advances reliable and transferable PV forecasting for early-stage grid integration and contributes to SDG 7 (Affordable and Clean Energy) and SDG 13 (Climate Action), in line with the Leave No One Behind principle.

Humans and AIHumans and AIMachine LearningMachine LearningMultidisciplinary Topics and ApplicationsMultidisciplinary Topics and Applications
#AI4G272

Session Aug 21 · 11:30–12:30 · Room 12

Poster Aug 21 · 15:00–16:00

PROB-EMOE: A Probabilistic Ensemble Mixture-of-Experts Framework for Metro Network Expansion Forecasting

Fangyi Ding, Zhan Zhao, Zhi Li, Xudong Guo, Ning Zhang, Yamin Wang, Yihong Tang

Forecasting Origin-Destination (OD) demand for new metro lines is critical for sustainable infrastructure planning but faces spatiotemporal out-of-distribution challenges. Existing models often struggle to capture heterogeneous interaction patterns in changing topologies and overlook inherent uncertainty and over-dispersion issues. To bridge these gaps, we propose PROB-EMOE, a planning-oriented probabilistic framework tailored for network expansion. To ensure robust generalization, we design a Mixture-of-Experts (MoE) predictor that integrates diverse expert views to capture heterogeneous demand patterns across changing topologies. To quantify extrapolation uncertainty, our framework functions as a unified probabilistic system by synergizing Deep Ensembles with a probabilistic output, effectively quantifying both data and model uncertainty. Through a systematic investigation of likelihood families, we empirically demonstrate that the Negative Binomial distribution offers the optimal fit in this context. Extensive experiments on a multi-year Shenzhen metro dataset demonstrate that our approach achieves state-of-the-art predictive performance and provides the sharpest calibrated uncertainty intervals. The framework has been deployed in a metropolitan smart-data platform to support risk-aware investment decisions.

Data MiningData MiningMultidisciplinary Topics and ApplicationsMultidisciplinary Topics and ApplicationsUncertainty in AIUncertainty in AI

PhyTTA: Physics-Informed Test-Time Adaptation of Foundation Models for Regional Drought Prediction

When Can We Trust Fairness Audits? Identifying Reliability Boundaries of Third-party Audit Conclusions

STAMP: Multi-Pattern Attention-Aware Multiple Instance Learning for STAS Diagnosis in Multi-Center Histopathology Images

Suicide Risk Assessment from AI-powered Video Surveillance: An Interpretable Framework for Prevention in Metro Stations

Policy-Embedded Graph Expansion: Networked HIV Testing with Diffusion-Driven Network Samples

DART: Navigating Last-Mile Heterogeneity in Instant Delivery via Distribution-Adaptive Splines

Beyond Vision: A Multimodal Dataset and Framework for Pest Recognition via Plant Electrophysiological Signals

Towards an Early Warning System for Ocean Heat Extremes Through AI-Ocean Dynamics Synergy

Improving Survey Participation in Low-Literacy Populations Through Value-Sensitive Conversational AI

CHAM-net: A Contrastive Hierarchical Adaptive Meta-network for Robust Global Methane Flux Prediction

Optimizing Sensor Placement with Greedy Algorithms: A Case Study in Wildlife Camera Trapping for Spatial Capture-Recapture Population Estimation

PACE: A Personalized Adaptive Curriculum Engine for 9-1-1 Call-taker Training

An LLM-based Chain-of-Response Counter-Scam System

Domain-Informed Graph Neural Networks for Climate Factor Forecasting to Support Sustainable Crop Management

Scalable Mapping of Tree Traits to Study the Dynamics of Protected Forest Areas in India

Human-in-the-Loop Meta Bayesian Optimization for Fusion Energy and Scientific Applications

iFire AI: AI-powered Wildfire Simulation and 3D Immersive Visualisation

Coarse-to-Fine Domain Incremental Learning with Attentive Distillation for Mining Footprint Segmentation in Multispectral Imagery

Bridging the SEA Gap: An Initial Benchmark for Neural Audio Codec-Synthesized Speech Deepfakes in South-East Asian Languages

Context-Aware Concept Distillation for Trustworthy Flood Prediction

HighFM: Towards a Foundation Model for Learning Representations from High-Frequency Earth Observation Data

Brazilian Indigenous Languages Revitalization at Scale: Reducing Cost and Development Time for Low-Resource Language Courses

Deep Reinforcement Learning Enhanced Semi-supervised Graph Neural Network for Credit Card Fraud Detection

Improving Scientific Formula Verbalization in Large Speech Language Models for Accessible Learning

AG-STELLA: Spatio-Temporal Learning for Water-related Agricultural Land Use Activity Mapping with AlphaEarth

Rule-Bottleneck RL: Learning to Decide and Explain for Sequential Resource Allocation via LLM Agents in Public Health

PoemDirector: A Multi-Agent Context-Adaptive Instructional Mode Selection Framework for Chinese Classical Poetry Video Generation

NeuroDALEC: A Differentiable and Interpretable Mass-Conserving Framework for Terrestrial Ecosystem Carbon Cycle Dynamics

UniST-Pred: A Robust Unified Framework for Spatio-Temporal Traffic Forecasting in Transportation Networks Under Disruptions

Integrating Atmospheric Dispersion Modeling Priors into Cuboid Splatting for Spatiotemporal Reconstruction of Airborne Radioiodine After Nuclear Accidents

ICFD-31k: A Large-Scale Dataset and Benchmark for Real-Time Conversational Fraud Detection

Clinically-Oriented Screening Model for Diabetic Retinopathy Severity Grading and Diabetic Macular Edema Detection

FlowID: Enhancing Forensic Identification with Latent Flow-Matching Models

Agents Trusting Agents? Restoring Lost Capabilities with Inclusive Healthcare

Directional Hallucinations: Ideological Drift in News-Grounded LLM Question Answering

FACET: Multi-Agent AI Supporting Teachers in Scaling Differentiated Learning for Diverse Students

Democratizing Ski Safety: Real-Time Turn Segmentation with Smartphone IMU and Causal LSTM Networks

Column Generation for the Micro-Transit Zoning Problem

A Hybrid Modeling Framework for Crop Prediction Tasks via Dynamic Parameter Calibration and Multi-Task Learning

A Gloss-driven Indian Sign Language Production System Using Learned Pose Representations

Climate Surrogates for Scalable Multi-Agent Reinforcement Learning: A Case Study with CICERO-SCM

Addressing Overcommitment in the Reasoning of Gendered Economic Memes Under Multimodal Ambiguity

SMaRT: Online Reusable Resource Assignment and an Application to Mediation in the Kenyan Judiciary

White-Hat Testing for the Ballot Box: A Framework for Election AI Auditing

STELLAR: Spatio-Temporal Environmental Learning with Latent Alignment and Refinement for Long-Tailed Species Distribution Modeling

Quantifying Error Disparities in Population Health Models

LLM-Enhanced Knowledge and Learning Path Understanding for Graph-based Educational Recommendation

Sustainable Intelligence for the Wild: Democratizing Ecological Monitoring via Knowledge-Adaptive Edge Expert Agents

Temporal Motif-aware Graph Test-time Adaptation for OOD Blockchain Anomaly Detection

ArogyaSutra: A Multi-Agent Framework for Multimodal Medical Reasoning in Indic Languages

PhysTrans: A Physics-Aware Transferable Framework for Global Cold-Start Photovoltaic Forecasting

PROB-EMOE: A Probabilistic Ensemble Mixture-of-Experts Framework for Metro Network Expansion Forecasting