IJCAI-ECAI 2026 Accepted Papers · Special Track on AI and Social Good
Presentation format
Every accepted paper is presented in two formats: an oral talk — which must be delivered in person in Bremen by one of the authors — and a poster during a dedicated poster session.
-
#AI4G6
PhyTTA: Physics-Informed Test-Time Adaptation of Foundation Models for Regional Drought Prediction
Drought prediction is crucial for disaster mitigation, yet it remains challenging due to the complexity and variability of drought events. Although time series foundation models (TSFMs) have shown great potential in general time series forecasting problems, they struggle to adapt to regional hydrological information. They often underestimate the impact of regional precipitation or temperature anomalies on drought indices like SPEI. This problem arises because general pretraining captures averaged time series patterns, which do not account for the unique climatic and hydrological characteristics of specific regions. To bridge this gap, we introduce Phy-TTA, a physics-informed adaptation framework designed to restore the physical consistency of drought forecasts. Rather than updating model parameters, which might overfit random weather noise, Phy-TTA corrects prediction errors by explicitly modeling the causal link between physical forcing (e.g., rainfall deficits) and drought. Our theoretical analysis highlights the critical role of incorporating physics-driven information to enhance the accuracy and reliability of drought predictions. Experiments across multiple regions demonstrate that Phy-TTA consistently improves performance.Multidisciplinary Topics and ApplicationsMultidisciplinary Topics and ApplicationsHumans and AIHumans and AI -
#AI4G10
When Can We Trust Fairness Audits? Identifying Reliability Boundaries of Third-party Audit Conclusions
Fairness auditing aims to assess whether a model is fair, playing a critical role in identifying potential risks in deployed AI systems. In practice, due to limited access, third-party auditors often rely on self-collected datasets (e.g., via sock-puppets), which may differ from real-world deployment scenarios. Such discrepancy can lead to inconsistencies between audit conclusions on the collected data and those in actual deployment, raising concerns about the reliability of third-party audits. This motivates a critical question: When can we trust the fairness audit conclusion derived from third-party datasets? Answering this question is challenging, as the actual deployment distribution is typically inaccessible or unobservable. To tackle this, we introduce the Consistency Radius, a metric that quantifies the maximum distribution shift under which an audit conclusion based on third-party dataset remain consistent. We further propose a convex relaxation optimization-based method to estimate the radius relying solely on model responses over the audit dataset. Leveraging this framework, third-party auditors can provide their datasets to model providers and request the magnitude of distributional discrepancy relative to the deployment distribution, enabling reliable audit conclusions without requiring any direct data access.AI Ethics, Trust, FairnessAI Ethics, Trust, FairnessAI4GAI Ethics, Trust, Fairness -
#AI4G12
STAMP: Multi-Pattern Attention-Aware Multiple Instance Learning for STAS Diagnosis in Multi-Center Histopathology Images
Spread through air spaces (STAS) constitutes a novel invasive pattern in lung adenocarcinoma (LUAD), associated with tumor recurrence and diminished survival rates. However, large-scale STAS diagnosis in LUAD remains a labor-intensive endeavor, compounded by the propensity for oversight and misdiagnosis due to its distinctive pathological characteristics and morphological features. Consequently, there is a pressing clinical imperative to leverage deep learning models for STAS diagnosis. This study initially assembled histopathological images from STAS patients at the Second Xiangya Hospital and the Third Xiangya Hospital of Central South University, alongside the TCGA-LUAD cohort. Three senior pathologists conducted cross-verification annotations to construct the STAS-SXY, STAS-TXY, and STAS-TCGA datasets. We then propose a multi‑pattern attention-aware multiple instance learning framework, named STAMP, to analyze and diagnose the presence of STAS across multi‑center histopathology images. Specifically, the dual‑branch architecture guides the model to learn STAS‑associated pathological features from distinct semantic spaces. Transformer-based instance encoding and a multi‑pattern attention aggregation modules dynamically selects regions closely associated with STAS pathology, suppressing irrelevant noise and enhancing the discriminative power of global representations. Moreover, a similarity regularization constraint prevents feature redundancy across branches, thereby improving overall diagnostic accuracy. Extensive experiments demonstrated that STAMP achieved competitive diagnostic results on STAS-SXY, STAS-TXY and STAS-TCGA, with AUCs of 0.8058, 0.8017, and 0.7928, respectively, surpassing the clinical level. The 10 open baseline results establish a benchmark for STAS diagnostic research and facilitate the future generalizability and clinical integration of computational pathology technologies. Dataset features and code are accessible at https://github.com/panliangrui/IJCAI2026.Knowledge Representation and ReasoningKnowledge Representation and ReasoningMachine LearningMachine LearningAI4GComputer VisionAI4GData Mining -
#AI4G33
Suicide Risk Assessment from AI-powered Video Surveillance: An Interpretable Framework for Prevention in Metro Stations
Understanding and monitoring human behavior in metro stations play an important role in supporting suicide prevention efforts, where early identification of high-risk situations can enable timely intervention. This requires assessing suicide risk from a surveillance video by jointly reasoning about the behavior of each passenger, his/her spatial context, and temporal dynamics. However, this assessment using videos captured by surveillance cameras is challenging, as it demands accurate perception of human motion, understanding of platform geometry, and aggregation of heterogeneous behavioral cues over time. In this work, we formalize the task of Suicide Risk Assessment (SRA) in metro stations and introduce the first interpretable framework that addresses this challenge. Unlike approaches that focus on isolated subtasks or attempt to infer intent directly, our formulation assesses suicide risk from accumulated evidence by incorporating person tracking, activity recognition, semantic segmentation of the platform, and trajectory-driven risk heatmap modeling. By formalizing SRA as a distinct task and benchmarking a complete operational pipeline achieving 83.2% ROC-AUC on real surveillance data, this work highlights the complexity of suicide risk assessment and opens new directions for research on interpretable AI systems for social good.Computer VisionComputer VisionHumans and AIHumans and AIMultidisciplinary Topics and ApplicationsMultidisciplinary Topics and Applications -
#AI4G34
Policy-Embedded Graph Expansion: Networked HIV Testing with Diffusion-Driven Network Samples
HIV is a retrovirus that attacks the human immune system and can lead to death without proper treatment. In collaboration with the WHO and a large South African university, we study how to improve the efficiency of HIV testing with the goal of eventual deployment, directly supporting progress toward UN Sustainable Development Goal 3.3. While prior work has demonstrated the promise of intelligent algorithms for sequential, network-based HIV testing, existing approaches rely on assumptions that are impractical in our real-world implementations. Here, we study sequential testing on incrementally revealed disease networks and introduce Policy-Embedded Graph Expansion (PEGE), a novel framework that directly embeds a generative distribution over graph expansions into the decision-making policy rather than attempting explicit topological reconstruction. We further propose Dynamics-Driven Branching (DDB), a diffusion-based graph expansion model that supports decision making in PEGE and is designed for data-limited settings where forest structures arise naturally, as in our real-world referral process. Experiments on real HIV transmission networks show that the combined approach (PEGE + DDB) consistently outperforms existing baselines (e.g., 13% improvement in discounted reward and 9% more HIV detections with 25% of the population tested) and explore key tradeoffs that drive decision quality.Multidisciplinary Topics and ApplicationsMultidisciplinary Topics and Applications -
#AI4G35
DART: Navigating Last-Mile Heterogeneity in Instant Delivery via Distribution-Adaptive Splines
On-demand delivery platforms rely on Travel Time Estimation (TTE) to balance courier earnings and overdue risks. In collaboration with one of China's largest platforms, we address a critical "Fairness Gap" in TTE: current systems fail to capture complex delivery patterns in GNSS-denied environments, subjecting couriers handling high concurrent order volumes to disproportionate pressure due to overdue deliveries. Analyzing 1.27 million real-world trajectories, we attribute this bias to unique challenges in GNSS-denied scenarios: distributional heterogeneity, structural heterogeneity, and contextual uncertainty. To bridge this gap, we propose DART (Distribution-Adaptive Robust Timing). DART incorporates a Learnable Adaptive Spline (LAS) encoder with a gradient-driven knot migration mechanism to enhance non-linear expressiveness for outliers, significantly improving long-tail accuracy. Furthermore, a Spatio-Temporal Transition Graph (STTG) reconstructs the latent topology by integrating sequence semantics, such as Wi-Fi-sensed arrival merchant timestamps. At the same time, a Distribution Gating Mechanism characterizes delivery time distributions under distinct contexts. Through extensive experiments and large-scale online A/B testing, DART not only reduces MAE by 14.0% in complex environments but also decreases the Order Overdue Rate by 1.7% (saving $24,000 daily), demonstrating how AI effectively reconciles operational efficiency with labor fairness.Machine LearningMachine LearningData MiningData MiningHumans and AIHumans and AI -
#AI4G38
Beyond Vision: A Multimodal Dataset and Framework for Pest Recognition via Plant Electrophysiological Signals
Precise pest identification is essential for sustainable agriculture. Current visual recognition systems are brittle in the wild, where performance degrades due to occlusion and variable illumination. In contrast, plant electrophysiological signals serve as a robust, all-weather physiological modality, capable of detecting cryptic feeding behaviors that escape optical sensors. However, this field remains constrained by the scarcity of data and the absence of specialized algorithms. To bridge this gap, we introduce the Herbivory-Induced Plant Bio-signal Multimodal (HIPB-MM) dataset, the first fine-grained dataset comprising 4,023 synchronized plant electrophysiological signal-video pairs recording the feeding processes of three typical pest species. To address the weak and non-stationary nature of these signals, we propose the Herbivory-Induced Physiological Sensing (HIPS) framework. It integrates a Morphological Semantic Decoupling strategy to recover robust slow-wave semantics, and a Generation-State Encoder to model latent physiological states. Complementing this, an auxiliary dual-stream visual branch calibrates signal representations using explicit behavioral and morphological cues. Experiments demonstrate that HIPS establishes a solid benchmark (69.81% accuracy), comprehensively outperforming state-of-the-art baselines. Crucially, this work validates plant electrophysiology as a low-cost, all-weather modality for sustainable crop protection, effectively reducing pesticide dependency and safeguarding ecosystem health.Data MiningData MiningMachine LearningMachine LearningMultidisciplinary Topics and ApplicationsMultidisciplinary Topics and Applications -
#AI4G39
Towards an Early Warning System for Ocean Heat Extremes Through AI-Ocean Dynamics Synergy
Ocean heat extremes, including marine heatwaves and the El Ni\~no–Southern Oscillation (ENSO), exert profound impacts on marine ecosystems and socio-economic stability. Establishing robust early warning systems is critical for proactive risk management; however, conventional predictive models often fail to generalize to the intensifying, non-stationary extremes driven by rapid global warming. This project introduces a novel AI-Ocean Dynamics synergy designed to provide an integrated early warning system. By synthesizing multi-source observations with physics-informed neural networks, it ensures predictions remain constrained by fundamental physical laws. The system forecasts event onset, intensity, duration, and spatial extent while simultaneously attributing the underlying mechanisms, such as ocean advection and air–sea heat exchange. To validate performance, we establish a specialized ocean heat extremes benchmark to assess predictive skill and attribution reliability. Furthermore, the system incorporates an incremental learning mechanism, enabling continuous adaptation to long-term climatic and environmental evolutions. This project advances the development of reliable, interpretable, and adaptive early warning systems, providing a vital tool for informed policy and maritime decision-making.Multidisciplinary Topics and ApplicationsMultidisciplinary Topics and Applications -
#AI4G43
Improving Survey Participation in Low-Literacy Populations Through Value-Sensitive Conversational AI
Collecting reliable social data from low-literacy populations remains a persistent challenge, particularly when surveys involve sensitive topics and marginalized communities. Traditional paper-based and web-based survey modalities often suffer from high attrition and incomplete responses due to literacy barriers, social pressure, and interactional discomfort. In this paper, we present findings from an initial field evaluation comparing multiple survey modalities—paper-based interviews, digital web-based surveys, conversational AI surveys, and conversational AI enhanced with layered value-sensitive design—conducted with low-literacy women across India. Using data from 315 participants, we show that conversational AI significantly improves survey completion rates relative to traditional modalities, with the highest completion and lowest drop-off observed when value-sensitive and culturally aligned conversational design elements are fully integrated. These results demonstrate the importance of human-centered and value-sensitive interaction design in enabling inclusive, ethical, and scalable data collection for AI-for-social-good applications.Humans and AIHumans and AIUncertainty in AIUncertainty in AIKnowledge Representation and ReasoningKnowledge Representation and Reasoning -
#AI4G44
CHAM-net: A Contrastive Hierarchical Adaptive Meta-network for Robust Global Methane Flux Prediction
Methane is a potent greenhouse gas that significantly contributes to global warming. However, accurately estimating global methane emissions and consumption remains challenging due to the complex interactions among environmental drivers that may vary across spatial and temporal scales. Prior data-driven methods often overlook the inherent spatiotemporal heterogeneity of ecosystems, failing to explicitly capture site-specific characteristics and cross-year evolutionary dynamics. To address these issues, we propose the Contrastive Hierarchical Adaptive Meta-network (CHAM-net), a novel framework that explicitly learns from historical context to capture site-specific dynamics. CHAM-net employs a hierarchical encoder–decoder architecture, in which the encoder captures site-specific characteristics from historical data and then dynamically conditions the decoder to generate the final prediction. Experimental results demonstrate that CHAM-net consistently outperforms all baseline methods on both simulation and observational datasets for methane emission and consumption, achieving nRMSE values as low as 0.43 and 0.88 with corresponding R^2 scores up to 0.97 and 0.68 for emission prediction.Multidisciplinary Topics and ApplicationsMultidisciplinary Topics and ApplicationsMachine LearningMachine LearningData MiningData Mining -
#AI4G50
Optimizing Sensor Placement with Greedy Algorithms: A Case Study in Wildlife Camera Trapping for Spatial Capture-Recapture Population Estimation
Estimating wildlife populations is central to conservation planning, yet designing sensor deployments that produce reliable data for such estimates remains challenging. Spatial capture-recapture (SCR) models, widely used to estimate animal population sizes, are highly sensitive to sensor layout, where poor placement can substantially increase uncertainty in population estimates. We present a novel framework that formulates camera trap placement as a scenario-based optimization problem under real-world resource constraints. Using collections of simulated animal capture histories spanning ecologically plausible parameter ranges, candidate sensor placements are evaluated via closed-form, SCR-derived design criteria linked to the precision of population estimates and optimized using both genetic algorithms and a greedy search strategy. We demonstrate our approach using data from a hypothetical American pine marten camera trapping study in British Columbia's South Chilcotin Mountains, achieving lower relative standard error and bias in population estimates than baselines. Our method was developed in close collaboration with conservation practitioners and is currently being used in real-world wildlife monitoring programs. This framework offers a general approach for designing wildlife surveys that support reliable population estimation across a range of realistic ecological scenarios.Multidisciplinary Topics and ApplicationsMultidisciplinary Topics and ApplicationsConstraint Satisfaction and OptimizationConstraint Satisfaction and Optimization -
#AI4G52
PACE: A Personalized Adaptive Curriculum Engine for 9-1-1 Call-taker Training
9-1-1 call-taking training requires mastery of over a thousand interdependent skills, covering diverse incident types and protocol-specific nuances. A nationwide labor shortage is already straining training capacity, but effective instruction still demands that trainers tailor objectives to each trainee's evolving competencies. This personalization burden is one that current practice cannot scale.
Partnering with local 9-1-1 call center, we propose \textit{PACE} (\textbf{P}ersonalized \textbf{A}daptive \textbf{C}urriculum \textbf{E}ngine), a co-pilot system that augments trainer decision-making by (1) maintaining probabilistic beliefs over trainee skill states, (2) modeling individual learning and forgetting dynamics, and (3) recommending training scenarios that balance acquisition of new competencies with retention of existing ones.
PACE propagates evidence over a structured skill graph to accelerate diagnostic coverage and applies contextual bandits to select scenarios that target gaps the trainee is prepared to address.
Empirical results show that PACE achieves 19.50\% faster time-to-competence and 10.95\% higher terminal mastery compared to state-of-the-art frameworks. Co-pilot studies with practicing training officers further demonstrate a 95.45\% alignment rate between PACE's and experts' pedagogical judgments on real-world cases. Under estimation, PACE cuts turnaround time to merely 34 seconds from 11.58 minutes, up to 95.08\% reduction.Multidisciplinary Topics and ApplicationsMultidisciplinary Topics and Applications -
#AI4G53
An LLM-based Chain-of-Response Counter-Scam System
The rapid evolution of online scams, driven by transnational networks and mass-produced social engineering scenarios, has exposed the speed limitations of conventional detection, necessitating tighter inter-agency coordination. While LLMs show promise in scam identification, their role in accelerating integrated response frameworks remains underexplored. We propose Counter-Scam, a unified LLM-based multi-agent framework that orchestrates end-to-end response from initial detection to crime investigation. The framework first proposes safe data guidelines, emphasizing non-public scam data and secure dataset construction via scam-specific NER. Developed with insights from 37 stakeholders to reduce delays and improve analytical efficiency, the system integrates CSRA (multi-agent mitigation), CSRT (nine role-aligned NLP tasks), and CSRD (a corpus of 185,300 scam cases and 38,587 knowledge entries). Experiments show that fine-tuned sLLMs surpass commercial models with over 10% in all CSRT tasks and a 0.24 F1 improvement in scam-specific NER. This proves the framework's capability for enabling rapid, collaborative mitigation of online scam.Agent-based and Multi-agent SystemsAgent-based and Multi-agent SystemsMultidisciplinary Topics and ApplicationsMultidisciplinary Topics and ApplicationsNatural Language ProcessingNatural Language ProcessingAI Ethics, Trust, FairnessAI Ethics, Trust, Fairness -
#AI4G62
Domain-Informed Graph Neural Networks for Climate Factor Forecasting to Support Sustainable Crop Management
Forecasting climate factors is critical for anticipating agro-climatic risks and enabling sustainable crop management. However, accurate prediction remains challenging due to complex spatiotemporal variability, heterogeneous seasonal patterns, and intricate interdependencies among climate variables. Inspired by agronomic knowledge, We propose DoIGNN, a Domain-Informed Graph Neural Network that injects a domain-structured graph constraint built from Agro-Climatic Homogeneous Zones (ACHZs). Specifically, we partition stations into agro-climatic zones using long-term climatic statistics and location attributes, and construct a hierarchical ACHZ-guided adjacency. To better capture shared climate dynamics, we introduce a spatiotemporal decomposition module with temporal regularization that factorizes the climate tensor into low-rank global temporal bases and station loadings, yielding a compact station-level global component as auxiliary information for target forecasting. Finally, DoIGNN performs forecasting on both the ACHZ-guided and static-dynamic graphs to learn cross-region dependencies. Experiments on real-world climate datasets demonstrate that DoIGNN consistently improves forecasting accuracy over strong baselines while yielding more interpretable spatial dependency patterns that support climate-informed crop management decisions. Cooperating with Ningbo Natural Resources and Planning Big Data Center, the proposed model has been trained and deployed for local data analysis.Data MiningData MiningMachine LearningMachine LearningMultidisciplinary Topics and ApplicationsMultidisciplinary Topics and Applications -
#AI4G69
Scalable Mapping of Tree Traits to Study the Dynamics of Protected Forest Areas in India
Monitoring forest functional diversity is essential to understand ecosystem resilience in the face of rapid environmental change. While existing remote sensing approaches primarily track structural attributes such as canopy density and tree height, functional traits like leaf phenology (evergreen vs. deciduous) and leaf type (broadleaf vs. needleleaf) reveal more direct information about adaptive strategies of tree species. This study presents a scalable machine learning framework for mapping these traits across India at 10 m resolution using Google AlphaEarth Foundations (AEF) embeddings, which capture the complete annual spectral reflectance and radar signatures of the land surface. A key contribution we make is to curate an ML-ready training dataset by combining tree traits information with tree species occurrence data, and to obtain a diverse sample from this data based on spectral time-series to ensure the dataset captures a wide range of phenological dynamics. We then build cross-validation folds to specifically test for spatial generalizability across different eco-regions in India and temporal generalizability across different years, for classifiers learned from the data. Multiple classifiers are evaluated: Random Forest models trained on AEF embedding features achieved the best performance for both classification tasks, outperforming models trained on conventional Sentinel-1 and Sentinel-2 time series while offering seamless deployment in Google Earth Engine. Compared to publicly available land-cover products that encode leaf phenology and leaf type, our model yields significantly higher accuracy while providing outputs at substantially finer spatial resolution. We then observe the outputs of our model over several protected forest areas in India to understand their dynamics over the last 8 years. Our contribution is an analysis-ready open dataset to learn tree traits from remote sensed spectral data, a trained model that is spatially and temporally generalizable, and a demonstration of the insights the model can provide to understand the dynamics of protected forest areas, all of which can be replicated in other areas.Humans and AIHumans and AIMultidisciplinary Topics and ApplicationsMultidisciplinary Topics and ApplicationsMachine LearningMachine Learning -
#AI4G78
Human-in-the-Loop Meta Bayesian Optimization for Fusion Energy and Scientific Applications
Inertial Confinement Fusion (ICF) holds transformative promise for sustainable, near-limitless clean energy, yet remains constrained by prohibitively high costs and limited experimental opportunities. This paper presents Human-in-the-Loop Meta Bayesian Optimization (HL-MBO), a framework that integrates expert knowledge with few-shot, uncertainty-aware machine learning to accelerate discovery in data-scarce, high-stakes scientific domains. HL-MBO combines a meta-learned surrogate model with an expert-informed acquisition function to recommend candidate experiments. To foster trust and enable informed decisions, HL-MBO also provides interpretable explanations of its suggestions. We show HL-MBO outperforms current BO methods on ICF energy yield optimization, as well as benchmarks in molecular optimization and critical temperature maximization for superconducting materials. By embedding human expertise into the optimization loop, HL-MBO opens a practical and scalable path to advance socially impactful scientific research.Machine LearningMachine LearningMultidisciplinary Topics and ApplicationsMultidisciplinary Topics and Applications -
#AI4G88
iFire AI: AI-powered Wildfire Simulation and 3D Immersive Visualisation
Wildfires, especially extreme wildfires, cause irreversible damage to ecosystems, human lives and economies globally. To reduce such losses, understanding wildfires is crucial for effective preparedness. This research proposal introduces iFire AI, a collaborative project aimed at developing world's leading 3D immersive visualisation system for extreme wildfire for experiencing, understanding extreme wildfire scenarios. iFire AI utilises our advanced 360-degree immersive system AVIE, which visualises interactive landscapes and wildfire events rendered by Unreal Engine. To provide realistic extreme fire scenarios at an hourly temperate resolution, we propose a deep learning model supported by a Sim2Real pipeline that integrates simulated and real-world data to address data insufficiency and enhance model development and evaluation. Finally, we explore 3D tree reconstruction using 3D Gaussian splatting, creating visually realistic, computationally efficient, and dynamically interactive tree models. By placing users inside hyper-realistic wildfire environments, iFire AI can enhance users' risk perception, situational awareness and collaborative decision-making, and thereby reduce risks due to extreme wildfires and promote sustainable development.Humans and AIHumans and AIMachine LearningMachine LearningMultidisciplinary Topics and ApplicationsMultidisciplinary Topics and Applications -
#AI4G90
Coarse-to-Fine Domain Incremental Learning with Attentive Distillation for Mining Footprint Segmentation in Multispectral Imagery
Automatically mapping and segmenting global mining footprints using remote sensing and deep learning is critical for monitoring the socio-environmental risks and impacts of mining, yet its progress is hindered by the scarcity of fine-grained annotated data. Although large-scale datasets with coarse boundaries are widely available, leveraging them to improve fine-grained segmentation is challenging due to significant domain shift. To address this, we propose MineC2FNet, a coarse-to-fine domain incremental learning framework that exploits abundant coarse data to enhance fine-grained mining footprint segmentation. MineC2FNet adopts a teacher–student architecture with attentive distillation at both the feature and prediction levels, selectively transferring generalized knowledge from the coarse domain while enabling boundary refinement using limited fine-grained data (fine domain). We further introduce an expertly validated dataset of 219 images with precise boundary annotations across diverse geographies and commodities. Extensive experiments against state-of-the-art approaches, including domain adaptation and domain incremental learning methods, demonstrate that MineC2FNet achieves superior performance while effectively handling domain shift. The dataset and code are publicly available at https://github.com/risqiutama/MineC2FNet.AI4GComputer VisionAI4GMachine LearningAI4GMultidisciplinary Topics and Applications -
#AI4G93
Bridging the SEA Gap: An Initial Benchmark for Neural Audio Codec-Synthesized Speech Deepfakes in South-East Asian Languages
Codecfakes (CFs) are a type of speech deepfakes generated through Audio Language Models (ALMs), with Neural Audio Codecs (NACs) forming the core mechanism for speech encoding and generation. CFs exhibit distributional characteristics that differ from vocoder-based deepfakes, causing detectors trained on vocoder data to generalize poorly to CFs detection. Although this has led to the development of CF detection benchmarks, existing resources are largely confined to English—and to a limited extent Chinese—leaving South-East Asian (SEA) languages unexplored. To bridge this gap, we introduce SEA-CF, the first large-scale benchmark for CF detection spanning multiple SEA languages, diverse speaker profiles, and a wide range of NAC architectures. SEA-CF is constructed by synthesizing publicly available real speech corpora. Our experiments show that state-of-the-art (SOTA) CF detectors trained on English-centric datasets fail to generalize to SEA speech due to language-specific phonetic structures, tonal variations, and rich prosodic diversity. We further conduct a comprehensive zero-shot and fine-tuned evaluation of recent SOTA ALMs on SEA-CF. Fine-tuning the ALMs improves performance, however, these are very large being impractical for real-world application due to their scale, particularly in low-resource and latency-constrained settings. To address this limitation, we propose a novel small-ALM, GARUDA tailored for CF detection, which delivers strong performance while remaining lightweight. Extensive evaluations demonstrate that the proposed Small-ALM outperforms strong end-to-end and ALM-based baselines, establishing a new, practical direction for robust CF detection in SEA languages and beyond.AI Ethics, Trust, FairnessAI Ethics, Trust, FairnessNatural Language ProcessingNatural Language Processing -
#AI4G96
Context-Aware Concept Distillation for Trustworthy Flood Prediction
Effective flood risk management relies on accurate forecasting, yet the ”black box” nature of state-of-the-art Deep Learning models creates a barrier
to trust and accountability in high-stakes public safety decisions. While existing Explainable AI (XAI) methods offer local attributions, they fail to provide the verifiable, operationally meaningful causal narratives required by disaster response authorities. To address this societal challenge, we propose Context-Aware Concept Distillation (CACD), a framework developed in collaboration with domain experts to distill opaque LSTMs into interpretable, hydrology-aware surrogate models. We introduce an unsupervised pipeline to discover a ”Hydrological Language” and a Residual Hypernetwork that dynamically modulates these concepts based on static basin characteristics. Evaluated on 5,203 basins globally, our model achieves high fidelity (Median NSE 0.70), significantly outperforming black-box baselines (e.g., Multi Layer Perceptrons) on unseen future data. By demonstrating that human-interpretable concepts are sufficient to reconstruct flood dynamics, this work balances AI accuracy with the transparency required for responsible environmental decision-making.AI4GAI Ethics, Trust, FairnessAI4GHumans and AIAI4GMachine LearningAI4GMultidisciplinary Topics and Applications -
#AI4G101
HighFM: Towards a Foundation Model for Learning Representations from High-Frequency Earth Observation Data
The increasing frequency and severity of climate-related disasters have intensified the need for real-time monitoring, early warning, and informed decision-making. Earth Observation (EO), powered by satellite data and Machine Learning (ML), offers powerful tools to meet these challenges. Foundation Models (FMs) have revolutionized EO ML by enabling general-purpose pretraining on large-scale remote sensing datasets. However most existing models rely on high-resolution satellite imagery with low revisit rates—limiting their suitability for fast-evolving phenomena and time-critical emergency response. In this work, we present HighFM, a first cut approach towards a FM for high-temporal-resolution, multispectral EO data. Leveraging over 2 TB of SEVIRI imagery from the Meteosat Second Generation (MSG) platform, we adapt the SatMAE masked autoencoding framework to learn robust spatiotemporal representations. To support real-time monitoring, we enhance the original architecture with fine-grained temporal encodings to capture short-term variability. The pretrained models are then fine-tuned on cloud masking and active fire detection tasks. We benchmark our SEVIRI-pretrained Vision Transformers against traditional baselines and recent geospatial FMs, demonstrating consistent gains across both balanced accuracy and IoU metrics. Our results highlight the potential of temporally dense geostationary data for real-time EO, offering a scalable path toward foundation models for disaster detection and tracking.Computer VisionComputer VisionMachine LearningMachine Learning -
#AI4G105
Brazilian Indigenous Languages Revitalization at Scale: Reducing Cost and Development Time for Low-Resource Language Courses
Brazil is home to about 180 Indigenous languages, which span a wide range of sociolinguistic circumstances, from critically endangered to relatively stable. Across this continuum, Indigenous communities often lack pedagogical resources and are underserved by existing language-learning technologies, which are typically designed for high-resource languages and assume solid connectivity and large datasets. This research project proposes AI-assisted tools and workflows for the collection and annotation of textual and speech data that substantially reduce the time and cost required to produce engaging language-learning game apps. Our goal is to implement a language-learning game app that Indigenous students can use to practice their reading, writing and speaking skills at home. We propose novel speech processing models for low-resource Indigenous languages and offlline support in low-connectivity environments. Our project adopts a co-creation model that actively foster collaboration between Indigenous educators, linguists, and youth, is adapted to the their context, and complies with ethical guidelines. We outline an implementation plan with Bororo and Enawene Nawe communities to test our methods and, potentially, produce an AI-driven platform for Indigenous language education that is applicable across diverse sociolinguistic contexts in Brazil and beyond.AI Ethics, Trust, FairnessAI Ethics, Trust, FairnessHumans and AIHumans and AIMultidisciplinary Topics and ApplicationsMultidisciplinary Topics and ApplicationsNatural Language ProcessingNatural Language Processing -
#AI4G108
Deep Reinforcement Learning Enhanced Semi-supervised Graph Neural Network for Credit Card Fraud Detection
Credit card fraud threatens global payment ecosystems, causing billions in losses and undermining public trust. Efficient fraud detection remains challenging due to surging transaction volumes and evolving tactics. While Graph Neural Networks (GNNs) excel at modeling structural relationships, they struggle in real-world scenarios characterized by label scarcity and often overlook discriminative feature-level signals, leaving rich risk signals underutilized without costly manual engineering. To address this, we propose DRESS, a Deep Reinforcement Learning (DRL) Enhanced Semi-supervised GNN framework. It employs a DRL agent to automatically capture and enhance feature-level risks, fusing them with graph-based structural risks and propagating via a gated temporal attention network for final prediction. To mitigate inefficient exploration of the DRL module, we incorporate a feature self-attention layer to weigh feature contributions to fraud detection and employ self-supervised intrinsic rewards to help optimize the DRL module efficiently. Extensive experiments on real-world datasets demonstrate that DRESS outperforms state-of-the-art methods, especially in low-label scenarios with only 2%–10% labeled samples. By empowering resource-limited institutions to combat fraud and prevent financial loss, DRESS secures the digital trust essential for inclusive growth, contributing to AI for poverty alleviation and economic development.Data MiningData MiningKnowledge Representation and ReasoningKnowledge Representation and Reasoning -
#AI4G121
Improving Scientific Formula Verbalization in Large Speech Language Models for Accessible Learning
Online learning systems provide accessible learning opportunities for blind or low-vision students. To support access to complex scientific materials, the speech models used in these systems need to deliver accurate scientific formula verbalization. While recent large speech language models (LSLMs) provide remarkable low-latency streaming capabilities, their potential for scientific formula verbalization remains underexplored. In this paper, we propose Formula-Speech, the first end-to-end LSLM designed for scientific formula verbalization. Specifically, we construct two high-quality scientific formula datasets with educational experts to align speech models with scientific formula verbalization patterns. We then adopt a lightweight and effective two-stage training framework, combining supervised fine-tuning for basic formula-to-speech alignment with reinforcement learning guided by a custom reward function to optimize for human-preferred verbalization. Experimental results show that our model significantly improves the verbalization performance of LSLMs and achieves state-of-the-art results across multiple scientific domains.AI4GMultidisciplinary Topics and ApplicationsAI4GHumans and AIAI4GData Mining -
#AI4G122
AG-STELLA: Spatio-Temporal Learning for Water-related Agricultural Land Use Activity Mapping with AlphaEarth
Accurate mapping of agricultural land use activity, particularly long-term transition from cropland to pasture and short-term transition between cropland to fallow land, is essential for sustainable water management, drought response, and food-system resilience which directly supports United Nations Sustainable Development Goals (SDG-2 and SDG-8). However, reliable land use activity mapping is challenging due to spectral ambiguity, temporal irregularities, severe class imbalance, and limited generalization across agricultural regions. In this work, we propose AG-STELLA, a knowledge guided spatiotemporal model that (i) captures temporal changes of agricultural lands using pretrained spatiotemporal transformers; (ii) integrates geospatial context using AlphaEarth embedding; (iii) introduces a temporal transition latent space with temporal consistency constraints; (iv) employs guidance through hydroclimatic consistency; and (v) uses a land use-aware gated decoder to improve robustness across regions. Through experimentation across three water-stressed U.S. states, we show consistent gains over baseline vision and foundation models, achieving up to 27% F1-score improvement for pasture (minority class) and 16% overall. We further show the robustness across heterogeneous regions through cross-state transfer learning, where AG-STELLA consistently outperforms foundation model baselines and achieve up to 82.3% F1 for fallow land with a 9.6\% improvement over the best foundation model.Machine LearningMachine LearningComputer VisionComputer VisionMultidisciplinary Topics and ApplicationsMultidisciplinary Topics and Applications -
#AI4G124
Rule-Bottleneck RL: Learning to Decide and Explain for Sequential Resource Allocation via LLM Agents in Public Health
Reducing preventable maternal mortality remains a global health priority. Under Sustainable Development Goal (SDG) target 3.1, the WHO emphasizes timely and equitable allocation of limited maternal health resources. Motivated by Department of Obstetrics and Gynecology at several important hospitals in Uganda and Ghana, we study the problem of sequential allocation of wearable vital sign monitoring devices among maternal mothers. While deep reinforcement learning (RL) has shown promise for sequential resource allocation, its limited interpretability hinders adoption in such high-stakes settings. In contrast, large language model (LLM) agents provide human-readable reasoning but often struggle with effective long-term decision making. To bridge this gap, we introduce Rule-Bottleneck RL (RBRL), the first LLM agent framework for resource allocation problems that jointly optimizes language-based decision policy and explainability. At each step within RBRL, an LLM first generates candidate rules---language statements capturing decision priorities tailored to the current state. RL then optimizes rule selection to maximize environmental rewards and explainability, with the LLM acting as a judge. Finally, an LLM chooses the action (optimal allocation) based on the rule. We provide conditions for RBRL performance guarantees as well as the finite-horizon evaluation gap of the learned RBRL policy. Experiments in maternal health show that RBRL outperforms baseline LLM agents and approaches the performance of deep RL, while producing clearer, policy-relevant explanations. Human evaluations further confirm improved trust and usability, demonstrating RBRL as a practical AI approach aligned with SDG target 3.1.Multidisciplinary Topics and ApplicationsMultidisciplinary Topics and ApplicationsAgent-based and Multi-agent SystemsAgent-based and Multi-agent SystemsPlanning and SchedulingPlanning and Scheduling -
#AI4G134
PoemDirector: A Multi-Agent Context-Adaptive Instructional Mode Selection Framework for Chinese Classical Poetry Video Generation
Classical poetry is a significant component of aesthetics and cultural inheritance in China's K–12 language education, and web-based instructional videos have become the primary way students can learn about classical poetry. However, current approaches have failed to produce both high-quality explanations and large-scale automatic videos that teach classical poetry. Teachers are responsible for ensuring instructional design in semi-automated processes, but the time and expense of production are too great. Although end-to-end fully automated generation is efficient, the logic is unclear and the explanations are shallow because there is no systematic instructional design. The majority of current approaches lack an end-to-end generative framework with a hierarchical explanation structure for deep understanding, instructional intelligence with context-adaptive capabilities, and clear teaching objectives. In this paper, a multi-agent framework called PoemDirector is proposed that combines context-adaptive strategies, multi-layered explanations, and pedagogically grounded end-to-end generation into one framework. Based on the poem and the situation, the director agent in PoemDirector creates a structured creative blueprint and organizes for other agents to collaborate in order to build a link for creating instructional videos on classical poetry. In the meantime, we further establish a multi-dimensional evaluation framework for instructional effectiveness and poetic presentation, and conduct comparative studies based on both this framework and extra objective video quality metrics.
The results demonstrated that PoemDirector significantly lowered labor costs and outperformed the baseline in a number of metrics, thereby resolving the conflict between high-quality instruction and mass production.Agent-based and Multi-agent SystemsAgent-based and Multi-agent SystemsHumans and AIHumans and AINatural Language ProcessingNatural Language Processing -
#AI4G136
NeuroDALEC: A Differentiable and Interpretable Mass-Conserving Framework for Terrestrial Ecosystem Carbon Cycle Dynamics
Accurate simulation of terrestrial ecological carbon cycles is crucial for global climate change and ecosystem management. Process-based carbon models have high interpretability, but suffer from insufficient accuracy and slow computation due to fixed parameters. In contrast, deep-learning carbon models achieve high accuracy, but disregard physical principles, which prevents ecologists from explaining ecosystem dynamics. We propose NeuroDALEC, an interpretable framework that embeds the DALEC carbon-cycle model within a neural network, enabling differentiable computation of ecological processes. Key parameters and ensemble learning strategies are designed, and mass-conserving carbon pool state transition equations are introduced to ensure physical consistency. Experiments show NeuroDALEC outperforms existing models in both accuracy and efficiency. Moreover, it provides sufficient interpretability by predicting all components of the carbon cycle. Deployed in a real-time carbon assimilation system, NeuroDALEC supports daily carbon forecasting and decision-making. This work contributes to the United Nations' Sustainable Development Goals 13 (Climate Action) and 15 (Life on Land). The source code is available at: https://osf.io/ubcv4/overview?view_only=ac8753c98677438180e82926ae898aba.Multidisciplinary Topics and ApplicationsMultidisciplinary Topics and ApplicationsMachine LearningMachine Learning -
#AI4G141
UniST-Pred: A Robust Unified Framework for Spatio-Temporal Traffic Forecasting in Transportation Networks Under Disruptions
Spatio-temporal traffic forecasting is a core component of intelligent transportation systems, supporting various downstream tasks such as signal control and network-level traffic management. In real-world deployments, forecasting models must operate under structural and observational uncertainties, conditions that are rarely considered in model design. Recent approaches achieve strong short-term predictive performance by tightly coupling spatial and temporal modeling, often at the cost of increased complexity and limited modularity. In contrast, efficient time-series models capture long-range temporal dependencies without relying on explicit network structure. We propose UniST-Pred, a unified spatio-temporal forecasting framework that first decouples temporal modeling from spatial representation learning, then integrates both through adaptive representation-level fusion. To assess robustness of the proposed approach, we construct a dataset based on an agent-based, microscopic traffic simulator (MATSim) and evaluate UniST-Pred under severe network disconnection scenarios. Additionally, we benchmark UniST-Pred on standard traffic prediction datasets, demonstrating its competitive performance against existing well-established models despite a lightweight design. The results illustrate that UniST-Pred maintains strong predictive performance across both real-world and simulated datasets, while also yielding interpretable spatio-temporal representations under infrastructure disruptions. The source code and the generated dataset are available at https://anonymous.4open.science/r/UniST-Pred-EF27.Multidisciplinary Topics and ApplicationsMultidisciplinary Topics and ApplicationsMachine LearningMachine Learning -
#AI4G142
Integrating Atmospheric Dispersion Modeling Priors into Cuboid Splatting for Spatiotemporal Reconstruction of Airborne Radioiodine After Nuclear Accidents
In nuclear power plant accidents, airborne radioiodine poses major health risks, making reliable reconstructions of its spatiotemporal distribution crucial for emergency management. Current state-of-the-art prognosis systems use atmospheric dispersion modeling but ignore posterior evidence from emergency care centers, comprising movement profiles and thyroid measurements of affected individuals. A first study showed that the AI method Cuboid Splatting can reconstruct iodine air concentrations from such data but it ignores simulations from established prognosis systems.
Our multidisciplinary team extends Cuboid Splatting by incorporating these simulations as priors and subsequently correcting them using movement and thyroid data. Several ways to translate and correct priors are developed. The best-performing approaches are combined into a novel Cuboid Splatting-with-prior mechanism, which we evaluate using constructed prior scenarios representing different error types and intensities.
Using Cuboid Splatting-with-prior yields more accurate reconstructions than (i) the used dispersion simulations alone and (ii) plain Cuboid Splatting without prior. Across reconstructions, the mean scenario error is 19.6%, improving on (i) by 28.0pp and on (ii) by 89.9pp, the latter with particularly large gains at high spatial resolution. These results demonstrate that combining simulation-based priors with measurement-based posterior inference can substantially improve the reconstruction of iodine air activity concentrations in nuclear emergencies.Data MiningData MiningMultidisciplinary Topics and ApplicationsMultidisciplinary Topics and ApplicationsMachine LearningMachine Learning -
#AI4G144
ICFD-31k: A Large-Scale Dataset and Benchmark for Real-Time Conversational Fraud Detection
The proliferation of sophisticated telephone scams poses a significant societal and economic threat, impacting diverse linguistic contexts in a country like India. Furthermore, the lack of large-scale, publicly available datasets remains a critical barrier impacting research on robust, real-time countermeasures. In view of this, the proposed work introduces ICFD-31k, the first Indian Conversational Fraud Dataset, representing a new benchmark containing over 31,000 realistic conversational transcripts. ICFD-31k comprises systematically generated content, covering 10 distinct fraud umbrellas spanning from financial impersonation to job scams. ICFD-31k transcripts feature rich annotations comprising a final verdict, chunk-level streaming labels, and detailed ``slow-thinking'' rationales. In addition, the human-in-the-loop evaluation validates the ICFD-31k's quality, achieving a Cohen's Kappa of 0.534 that confirms annotation reliability. Furthermore, the proposed work introduces two fine-tuned models based on RoBERTa: M1 for non-streaming data and M2 for streaming data. The comprehensive experiments with strong baselines (M1, M2) further demonstrate the ICFD-31k's utility.Multidisciplinary Topics and ApplicationsMultidisciplinary Topics and ApplicationsAI Ethics, Trust, FairnessAI Ethics, Trust, FairnessMachine LearningMachine LearningNatural Language ProcessingNatural Language Processing -
#AI4G148
Clinically-Oriented Screening Model for Diabetic Retinopathy Severity Grading and Diabetic Macular Edema Detection
Diabetic retinopathy (DR) and diabetic macular edema (DME) are leading causes of preventable blindness worldwide. Automated screening tools are critical for timely detection at scale, particularly in low-resource settings where access to ophthalmologists is limited. We propose DRDME-Net, a deployment-driven joint learning framework that formulates DR grading as an ordinal regression task and DME detection via a continuous surrogate, rather than conventional classification. This design yields stable risk scores tightly aligned with operational clinical decision-making thresholds. Evaluation on facility and community cohorts demonstrates that DRDME-Net achieves strong performance across severity boundaries. Insights from an initial feasibility pilot further demonstrate its scalability in real-world workflows. These results highlight the potential of DRDME-Net to expand equitable access to timely detection, reduce preventable vision loss, and provide a practical template for integrating AI into population screening initiatives.Computer VisionComputer VisionMachine LearningMachine LearningMultidisciplinary Topics and ApplicationsMultidisciplinary Topics and Applications -
#AI4G162
FlowID: Enhancing Forensic Identification with Latent Flow-Matching Models
Every day, many people die under violent circumstances, whether from crimes, war, migration, or climate disasters.
Medico-legal and law enforcement institutions document many portraits of the deceased for evidence, but cannot immediately carry out identification on them.
While traditional image editing tools can process these photos for public release, the workflow is lengthy and produces suboptimal results.
In this work, we leverage advances in image generation models, which can now produce photorealistic human portraits, to introduce FlowID, an identity-preserving facial reconstruction method. Our approach combines single-image fine-tuning, which adapts the generative model to out-of-distribution injured faces, with attention-based masking that localizes edits to damaged regions while preserving identity-critical features.
Together, these components enable the removal of artifacts from violent death while retaining sufficient identity information to support identification.
To evaluate our method, we introduce InjuredFaces, a novel benchmark for identity-preserving facial reconstruction under severe facial damage.
Beyond serving as an evaluation tool for this work, InjuredFaces provides a standardized resource for the community to study and compare methods addressing facial reconstruction in extreme conditions.
Experimental results show that FlowID outperforms state-of-the-art open-source methods while maintaining low memory requirements, making it suitable for local deployment without compromising data privacy.AI Ethics, Trust, FairnessAI Ethics, Trust, FairnessMultidisciplinary Topics and ApplicationsMultidisciplinary Topics and ApplicationsMachine LearningMachine LearningHumans and AIHumans and AIComputer VisionComputer Vision -
#AI4G169
Agents Trusting Agents? Restoring Lost Capabilities with Inclusive Healthcare
Agent-based simulations have an untapped potential to inform social policies on urgent human development challenges in a non-invasive way, before these are implemented in real-world populations. This paper responds to the request from non-profit and governmental organizations to evaluate policies under discussion to improve equity in health care services for people experiencing homelessness (PEH) in the city of Barcelona. With this goal, we integrate the conceptual framework of the capability approach (CA), which is explicitly designed to promote and assess human well-being, to model and evaluate the behaviour of agents who represent PEH and social workers. We define a reinforcement learning environment where agents aim to restore their central human capabilities, under existing environmental and legal constraints. We use Bayesian inverse reinforcement learning (IRL) to calibrate profile-dependent behavioural parameters in PEH agents, modeling the degree of trust and engagement with social workers, which is reportedly a key element for the success of the policies in scope. Our results open a path to mitigate health inequity by building relationships of trust between social service workers and PEH.Multidisciplinary Topics and ApplicationsMultidisciplinary Topics and ApplicationsAgent-based and Multi-agent SystemsAgent-based and Multi-agent Systems -
#AI4G181
Directional Hallucinations: Ideological Drift in News-Grounded LLM Question Answering
Large language models (LLMs) are increasingly used to answer questions about political information, including in election-adjacent information settings where factual errors and ideological distortions are high-stakes. We present a reproducible measurement framework that treats hallucinations, unsupported statements in document-grounded QA, as diagnostic signals of ideological drift. Using 21,727 expert-labeled U.S. political news articles from QBias spanning left, center, and right sources, we (i) generate an article-specific question, (ii) elicit document-grounded answers from three open-source LLMs, (iii) detect sentence-level hallucinations via reference-based comparison, (iv) classify the ideological valence of hallucinated sentences with a fine-tuned stance classifier, and (v) probe output logits to relate token-level uncertainty to hallucination and drift. Hallucination rates vary substantially across models and concentrate in contentious topics, while source-ideology differences in hallucination frequency are modest. In contrast, hallucination content exhibits robust leftward drift: a majority of hallucinated sentences are classified as left-leaning, including among hallucinations generated from right-leaning sources. Logit-level analysis shows hallucinations arise in high-entropy generation contexts, and in some models uncertainty also predicts leftward drift, consistent with an "uncertainty → guessing" mechanism. In advisory consultation with an election administration stakeholder, we discuss implications for auditing AI-mediated political information and for designing safeguards in election-relevant deployments.Uncertainty in AIUncertainty in AINatural Language ProcessingNatural Language ProcessingMultidisciplinary Topics and ApplicationsMultidisciplinary Topics and ApplicationsAI Ethics, Trust, FairnessAI Ethics, Trust, Fairness -
#AI4G183
FACET: Multi-Agent AI Supporting Teachers in Scaling Differentiated Learning for Diverse Students
Classrooms are becoming increasingly heterogeneous, comprising learners with diverse performance and motivation levels, language proficiencies, and learning differences such as dyslexia and ADHD. While teachers recognize the need for differentiated instruction, growing workloads create substantial barriers, making differentiated instruction an ideal that is often unrealized in practice. Current AI educational tools, which promise differentiated materials, are predominantly student-facing and performance-centric, ignoring other aspects that shape learning outcomes.
We introduce FACET, a teacher-facing multi-agent framework designed to address these gaps by supporting differentiation that accounts for motivation, performance, and learning differences. Developed with educational stakeholders from the outset, the framework coordinates four specialized agents, including learner simulation, diagnostic assessment, material generation, and evaluation within a teacher-in-the-loop design.
School principals (N = 30) shaped system requirements through participatory workshops, while in-service K–12 teachers (N = 70) evaluated material quality. Mixed-methods evaluation demonstrates strong perceived value for inclusive differentiation. Practitioners emphasized both the urgent need arising from classroom heterogeneity and the importance of maintaining pedagogical autonomy as a prerequisite for adoption. We discuss implications for future school deployment and outline partnerships for longitudinal classroom implementation.Agent-based and Multi-agent SystemsAgent-based and Multi-agent SystemsHumans and AIHumans and AIMultidisciplinary Topics and ApplicationsMultidisciplinary Topics and Applications -
#AI4G191
Democratizing Ski Safety: Real-Time Turn Segmentation with Smartphone IMU and Causal LSTM Networks
Anterior cruciate ligament (ACL) injury is one of the most common and serious injuries in sports, particularly among recreational skiers. Research shows that structured technique awareness and continuous feedback can significantly reduce the risk of such injuries, yet access to professional instructors is limited to wealthy athletes who can afford continuous private coaching, creating a harmful inequity in injury prevention. This gap can be mitigated by automating the real-time analysis of skiing techniques available to the wider recreational skiing community. The approach relies exclusively on inertial sensors embedded in standard smartphones, eliminating the need for specialized equipment and enabling broad social scalability. To support immediate feedback, the system operates causally, producing predictions based solely on past observations. The work is conducted in cooperation with professional ski instructors, ensuring that problem formulation, data annotation, and result evaluation reflect real-world coaching practices and injury prevention needs. The model is evaluated using Leave-One-Subject-Out validation on a public, in-the-wild dataset, demonstrating robust generalization across skiers, achieving an average directional accuracy of 89.8\%, while maintaining extremely low inference latency suitable for on-device mobile deployment. This work outlines a practical pathway to democratizing injury prevention in recreational sports.Machine LearningMachine LearningData MiningData MiningHumans and AIHumans and AI -
#AI4G198
Column Generation for the Micro-Transit Zoning Problem
Along with the rapid development of new urban mobility options like ride-sharing over the past decade, on-demand micro-transit services stand out as a middle ground, bridging the gap between fixed-line mass transit and single-request ride-hailing, balancing ridership maximization and travel time minimization. Micro-transit adoption can have significant social impact. It improves urban sustainability, through lower energy consumption and reduced emissions, while enhancing equitable mobility access for disadvantaged communities, thanks to its lower vehicle miles per passenger, flexible schedules, and affordable pricing. However, effective operation of micro-transit services requires planning geo-fenced zones in advance, which involves solving a challenging combinatorial optimization problem. Existing approaches enumerate candidate zones first and selects a fixed number of optimal zones in the second step. In this paper, we generalize the Micro-Transit Zoning Problem (MZP) to allow a global budget rather than imposing a size limit for candidate zones. We also design a Column Generation (CG) framework to solve the problem exactly and several pricing heuristics to accelerate computation. Extensive numerical experiments across major U.S. cities demonstrate that our approach produces higher-quality solutions more efficiently and scales better in the generalized setting.Constraint Satisfaction and OptimizationConstraint Satisfaction and OptimizationMultidisciplinary Topics and ApplicationsMultidisciplinary Topics and ApplicationsPlanning and SchedulingPlanning and Scheduling -
#AI4G199
A Hybrid Modeling Framework for Crop Prediction Tasks via Dynamic Parameter Calibration and Multi-Task Learning
Accurate prediction of crop states (e.g., phenology stages and cold hardiness) is essential for timely farm management decisions such as irrigation, fertilization, and canopy management to optimize crop yield and quality. While traditional biophysical models can be used for season-long predictions, they lack the precision required for site-specific management. Deep learning methods are a compelling alternative, but can produce biologically unrealistic predictions and require large-scale data. We propose a hybrid modeling approach that uses a neural network to parameterize a differentiable biophysical model and leverages multi-task learning for efficient data sharing across crop cultivars in data limited settings. By predicting the parameters of the biophysical model, our approach improves the prediction accuracy while preserving biological realism. Empirical evaluation using real-world and synthetic datasets demonstrates that our method improves prediction accuracy by 60% for phenology and 40% for cold hardiness compared to deployed biophysical models. Project site: https://tinyurl.com/DMC-MTL-Site.Machine LearningMachine LearningMultidisciplinary Topics and ApplicationsMultidisciplinary Topics and Applications -
#AI4G202
A Gloss-Driven Indian Sign Language Production System Using Learned Pose Representations
Sign Language Production (SLP) system translates spoken or written language into sign language, enabling accessible communication between the deaf/hard-of-hearing and the hearing population. Being one of the most widely used sign languages globally, Indian Sign Language (ISL) is a very low-resource language and lacks such SLP systems. This paper presents a scalable and modular SLP framework based on Sign-Pose-VQ-VAE model, designed for low-resource settings. The model learns discrete pose representations (codes) by disentangling body, left-hand, and right-hand keypoints, enabling efficient pose modeling and co-articulated sign generation. The proposed system is evaluated using a Hindi movie subtitle corpus coupled with an off-the-shelf back-translation model and achieves a gloss BLEU-4 score of 47.20. The system generated signs are evaluated by certified ISL interpreters with an average rating of 4.33/5, and a BERT precision of 0.7683 on glosses. In addition, the proposed system achieves state-of-the-art performance among keypoint-based methods on the PHOENIX14T benchmark, with a BLEU-4 score of 10.03.Natural Language ProcessingNatural Language ProcessingComputer VisionComputer VisionHumans and AIHumans and AIMultidisciplinary Topics and ApplicationsMultidisciplinary Topics and Applications -
#AI4G206
Climate Surrogates for Scalable Multi-Agent Reinforcement Learning: A Case Study with CICERO-SCM
Climate policy analysis requires models that capture multi-gas climate effects, but such models are too slow to embed in reinforcement learning loops at scale.
In collaboration with a pan-European public-sector environmental agency, we develop a multi-agent reinforcement learning (MARL) framework that integrates a high-fidelity climate surrogate as the environment transition, enabling regional agents to learn policies under multi-gas dynamics.
We train a recurrent surrogate on 20,000 multi-gas emission pathways to emulate CICERO-SCM.
The surrogate achieves near-simulator accuracy (global-mean temperature RMSE 0.0004 with 1000x faster one-step inference and yields 100x end-to-end MARL training speed-up.
We show policy agreement with the simulator in tractable settings and propose a replay- and rank-consistency test (Kendall's τ) for assessing policy fidelity when simulator-in-the-loop training is infeasible.
This enables large-scale multi-agent policy experiments while retaining high-fidelity multi-gas climate response.Agent-based and Multi-agent SystemsAgent-based and Multi-agent SystemsHumans and AIHumans and AIMachine LearningMachine Learning -
#AI4G207
Addressing Overcommitment in the Reasoning of Gendered Economic Memes Under Multimodal Ambiguity
Multimodal meme understanding is increasingly used to analyze socially sensitive content, yet existing models often exhibit biased behavior when interpreting economic dependence and social roles under ambiguity. Many memes express economic relationships through sparse text or symbolic visual cues, providing insufficient evidence for gendered attribution. In such underspecified settings, models tend to rely on pretraining correlations, leading to hallucinated and stereotypical economic role assignments. In this work, we study gendered economic dependence in image-text memes through the lens of contextual sufficiency and identify epistemic overcommitment—inferring roles without adequate evidence—as a primary source of bias. We propose CGER-Net, a context-grounded multimodal framework that estimates whether the input provides sufficient evidence for gendered economic reasoning and applies evidence-gated inference to enable confident attribution when cues are explicit while favoring principled abstention otherwise. We evaluate CGER-Net on EconMeme-GE, a curated dataset of image-text memes annotated as Men, Women, Neutral, or Ambiguous. Across strong contemporary multimodal baselines, CGER-Net reduces Gender Overcommitment Rate by up to 44% on ambiguous instances while maintaining comparable accuracy on unambiguous cases. Human evaluation further shows that 79% of generated rationales are judged as epistemically aligned with the available evidence. These results highlight the importance of modeling when not to infer for reliable and responsible multimodal analysis.AI Ethics, Trust, FairnessAI Ethics, Trust, FairnessHumans and AIHumans and AIMultidisciplinary Topics and ApplicationsMultidisciplinary Topics and Applications -
#AI4G218
SMaRT: Online Reusable Resource Assignment and an Application to Mediation in the Kenyan Judiciary
Motivated by the problem of assigning mediators to cases in the Kenyan judicial system, we study an online resource allocation problem where incoming tasks (cases) must be immediately assigned to available, capacity-constrained resources (mediators). The resources differ in their quality, which may need to be learned. In addition, resources can only be assigned to a subset of tasks that overlaps to varying degrees with the subset of tasks other resources can be assigned to. The objective is to maximize task completion while satisfying soft capacity constraints across all the resources. The scale of the real-world problem poses substantial challenges, since there are over 2000 mediators, and a multitude of combinations of geographic locations (87) and case types (12) that each mediator is qualified to work on. Together, these features—unknown quality of new resources (newly onboarded mediators), soft capacity constraints (due to the mandate to assign cases without delay), and high-dimensional state space—make existing scheduling and resource allocation algorithms either inapplicable or inefficient. We formalize the problem in a tractable manner, using a quadratic program formulation for assignment and a multi-agent bandit style framework for learning. We demonstrate the key properties and advantages of our new algorithm, SMaRT (Selecting Mediators that are Right for the Task), compared with baselines on some stylized instances of the mediator allocation problem. We then turn to considering its application on real-world data on cases and mediators from the Kenyan judiciary. SMaRT outperforms baselines and allows for controlling the tradeoff between the strictness of the capacity constraints and overall case resolution rates, both in situations where mediator quality is known beforehand and when the problem is bandit-like in that learning is part of the problem definition. On the strength of these results, we plan to run a randomized controlled trial with SMaRT in the judiciary in the near futureAI Ethics, Trust, FairnessAI Ethics, Trust, FairnessAgent-based and Multi-agent SystemsAgent-based and Multi-agent Systems -
#AI4G222
White-Hat Testing for the Ballot Box: A Framework for Election AI Auditing
Recent research shows that conversational AI can shift voter preferences, with effects persisting for weeks. Yet frontier models exhibit a documented ``persuasion--reliability tradeoff,'' producing hallucinated or systematically distorted election information. Despite these risks, election officials lack standardized tools to systematically evaluate AI systems before deployment. We propose CivicAudit-Bench, a stakeholder-guided auditing framework to stress-test large language models for civic hallucinations, false confidence, jurisdiction-dependent failure, and asymmetric refusals/accuracy. This framework introduces a modular, counterfactual, and severity-aware auditing methodology that integrates roll-call–based alignment modeling, entity-swap probing, and jurisdiction-conditional correctness criteria. Informed by engagement with the U.S. Election Assistance Commission, the toolkit consists of three modules:
(1) PoliBias-US, a multi-indicator alignment screen combining Congressional roll-call ideology scaling with party-cue counterfactual sensitivity, persona robustness, and narrative-framing alignment; (2) HalluBias-Election, an evidence-linked benchmark that measures hallucinations, severity-weighted critical errors, and asymmetries via Entity-Swap Counterfactual Probing and a jurisdiction-safe completion criterion; and (3) Disclosure-Test, pre-registered experiments assessing whether transparency and calibrated-uncertainty disclosures reduce overreliance and attenuate persuasion without blocking legitimate civic information. CivicAudit-Bench outputs versioned audit scorecards and a coordinated white-hat disclosure workflow, advancing UN SDG 16 by strengthening democratic information integrity.AI Ethics, Trust, FairnessAI Ethics, Trust, FairnessMultidisciplinary Topics and ApplicationsMultidisciplinary Topics and ApplicationsHumans and AIHumans and AIUncertainty in AIUncertainty in AI -
#AI4G225
STELLAR: Spatio-Temporal Environmental Learning with Latent Alignment and Refinement for Long-Tailed Species Distribution Modeling
Joint Species Distribution Modeling (JSDM) is a key enabler for biodiversity monitoring and conservation planning. However, accurate JSDM faces two coupled challenges: environmental drivers and species distributions are inherently spatio-temporal, while species co-occurrence patterns exhibit complex non-linear community structure and severe long-tail imbalance driven by rare species. Existing approaches often address these factors in isolation, learning from static covariates or neglecting the historical trajectories of dynamic community structure. To overcome these limitations, we propose STELLAR (Spatio-Temporal Environmental Learning with Latent Alignment and Refinement), a novel framework that learns a shared latent space where dynamic habitat context and community structure are optimized jointly. Our approach integrates three complementary components: (1) a Graph-Temporal Encoder that employs graph attention and recurrent units to aggregate spatial neighborhood effects and capture the co-evolving historical dynamics of environmental context and community structure; (2) a Context-Anchored Latent Alignment mechanism that structures the latent space using a label-activated mixture prior and supervised contrastive learning, actively clustering species based on shared environmental preferences; and (3) an Imbalance-Aware Decoupled Decoding module that utilizes Asymmetric Loss to focus learning on hard, rare species samples, preventing mode collapse in the long tail. Experiments on the large-scale eBird dataset, curated with domain experts, demonstrate that our framework significantly outperforms state-of-the-art baselines, particularly in predicting rare species and revealing interpretable species interactions.Multidisciplinary Topics and ApplicationsMultidisciplinary Topics and ApplicationsData MiningData MiningMachine LearningMachine Learning -
#AI4G226
Quantifying Error Disparities in Population Health Models
Many high-stakes social applications of AI, such as public health surveillance and policy planning, operate at the community- rather than individual-level. However, most model fairness research evaluates disparities at the individual- or data-level (i.e. document or image) and rely on metrics defined over discrete demographic categories rather than population-level demographic proportions. In this work, we first introduce the Bilateral Concentration Index (BCI) to quantify nonmonotonic error disparities missed by the category-based metrics use at individual or data-levels.
Then we conduct a large-scale audit of sociodemographic error disparities in both lexical- and transformer-based models of county-level health outcomes, over a dataset cover billions of community-mapped messages. While all tasks had significant disparity, the size varied widely depending on the outcome and model, from BCI of 2.1% for predicting life satisfaction to 17.0% for predicting fair or poor health. We further evaluate four approaches for incorporating sociodemographic information, as potential bias mitigation strategies, finding that while demographic inclusion consistently improved predictive accuracy, it frequently amplified error disparities. The largest disparities were associated with education and income (BCI = 2.7–16.4%), often reducing accuracy for low-income—and in some cases high-income—communities. These findings highlight a critical accuracy–fairness trade-off in community-level models for public health tasks, demonstrating how seemingly beneficial modeling choices can lead to increased disparities which could disadvantage communities if used for policy decisions.AI Ethics, Trust, FairnessAI Ethics, Trust, FairnessNatural Language ProcessingNatural Language Processing -
#AI4G230
LLM-Enhanced Knowledge and Learning Path Understanding for Graph-based Educational Recommendation
Educational recommendations empower personalized learning by suggesting suitable learning resources to learners, and the graph-based recommenders are widely adopted. Existing methods are mainly ID-based, which initialize learners and resources with trainable identifiers and optimize their representations solely from the interaction graph. As a result, the lack of semantic understanding of learning resources and learning paths hinders further improvements in recommendation accuracy. To alleviate the problem, we propose KLU4EduRec, which leverages large language models (LLMs) to understand resource knowledge and learning paths, thereby enhancing traditional graph-based educational recommender systems. Specifically, for learning path understanding, we segment a learning path by detecting learning pattern drift in resource knowledge sequence, and prompt LLMs to infer learners' learning patterns within each segment. The segment-level patterns are then chronologically aggregated to represent the overall learning path. Besides, we prompt LLMs to summarize the core knowledge of learning resources from their content as complementary semantic signals. Finally, the resulting semantic representations are aligned and fused with structural representations learned by a graph-based recommender to enable more accurate recommendations. We conduct extensive experiments to show that KLU4EduRec greatly outperforms existing methods, including traditional ID-based methods and recent LLM-powered methods. A case study shows how the understanding of pattern drift in a learning path leads to more suitable recommendations. A reproducibility package is available at https://anonymous.4open.science/r/KLU4EduRec-C08C.Multidisciplinary Topics and ApplicationsMultidisciplinary Topics and ApplicationsKnowledge Representation and ReasoningKnowledge Representation and ReasoningData MiningData Mining -
#AI4G237
Sustainable Intelligence for the Wild: Democratizing Ecological Monitoring via Knowledge-Adaptive Edge Expert Agents
Rapid biodiversity loss underscore the urgency of effective monitoring, yet manual surveys remain resource-intensive. While on-device AI offers a scalable alternative, its performance in the wild is often challenged by environmental variability. Current methods rely heavily on cloud resource, which requires continuous uploading of field data for model retraining. This approach is unsuitable for remote deployments because it consumes limited power and network connectivity. To address these constraints, this research proposes a shift from model adaptation to knowledge adaptation. We introduce an architecture that separates visual perception from reasoning, combining a visual encoder with a dynamic knowledge base. We uses an explicit knowledge base to replace implicitly encoding expert knowledge into model parameters. This method also supports knowledge sustainability by preserving expert insights in a structured form. Through cross-disciplinary collaboration with biologists and Indigenous communities, this work advances ethical AI co-development, fostering responsible and culturally informed ecosystem management.Humans and AIHumans and AIKnowledge Representation and ReasoningKnowledge Representation and ReasoningMultidisciplinary Topics and ApplicationsMultidisciplinary Topics and ApplicationsAgent-based and Multi-agent SystemsAgent-based and Multi-agent Systems -
#AI4G253
Temporal Motif-aware Graph Test-time Adaptation for OOD Blockchain Anomaly Detection
The ever-evolving transaction patterns have significantly hindered anomaly detection on emerging cryptocurrency blockchains due to the vast number of addresses and diverse anomalous behaviors. Recently, advanced Graph Anomaly Detection (GAD) approaches applied to blockchains have faced two critical challenges: \textit{adversarial pattern evolution by malicious actors} and \textit{the Out of Distribution (OOD) problem caused by varied transaction semantics on blockchains}. To address these challenges, we propose a novel framework termed \textbf{TE}mporal \textbf{M}otif-aware \textbf{G}raph \textbf{T}est-\textbf{T}ime \textbf{A}daptaion (\textbf{TEMG-TTA}). First, we comprehensively capture the 3-node temporal motif distribution of each active address using an efficient computational mechanism, enabling downstream temporal motif-aware graph learning. Second, we design a simple yet effective test-time adaptation strategy to facilitate the sharing of common patterns between training and testing graphs. Extensive experiments on 5 real-world datasets demonstrate that our proposed \textbf{TEMG-TTA} outperforms \textit{state-of-the-art} GAD approaches by an average of 37.65\%. A further case study on interpretable motif patterns reveals that \textbf{TEMG-TTA} explicitly characterizes the complex transaction patterns of anomalous addresses, thereby verifying the effectiveness of our technical designs. Our code will be made publicly available\footnote{\url{https://anonymous.4open.science/r/TEMG-TTA/}}.Data MiningData MiningMachine LearningMachine Learning -
#AI4G267
ArogyaSutra: A Multi-Agent Framework for Multimodal Medical Reasoning in Indic Languages
Multimodal Large Language Models(MLLMs) have shown promising reasoning capabilities in general domains, yet their performance remains limited in specialized settings such as healthcare, particularly for multilingual and low-resource scenarios. This gap is critical in regions like rural India, where patients often express complex medical queries in native Indic languages and rely on multimodal inputs such as medical images. Existing MLLMs, predominantly trained on English-centric data, struggle to support such use cases, limiting equitable access to AI-driven healthcare assistance. To address this challenge, we construct a large-scale multilingual multimodal medical question–answer dataset named \textbf{ArogyaBodha} from eight heterogeneous sources, covering 31 body systems, six imaging modalities, and 21 clinical domains across English and seven major Indian languages. We further propose \textbf{\textit{ArogyaSutra}}, an actor–critic–based multi-agent framework that combines tool grounding with dual-memory mechanisms to support step-wise, reasoning-aware decision making while explicitly retaining past mistakes to prevent their repeated occurrence. The Actor predicts the answer to the multimodal query from visual and memory states, whereas the Critic evaluates actor outcomes and delivers corrective feedback, enabling iterative refinement of the reasoning process. Experiments show that our dataset and framework improve the multilingual medical reasoning accuracy of an MLLM across all Indic languages, with ablation studies validating the effectiveness of each component. Our work supports UN SDGs~3,~4, and~10 by enabling reliable multilingual medical decision support, reducing healthcare inequities, and strengthening inclusive clinical education for underserved communities.Natural Language ProcessingNatural Language ProcessingAgent-based and Multi-agent SystemsAgent-based and Multi-agent Systems -
#AI4G271
PhysTrans: A Physics-Aware Transferable Framework for Global Cold-Start Photovoltaic Forecasting
With the rapid expansion of photovoltaic (PV) power generation worldwide, PV systems have become key to global energy construction. Accurate PV forecasting is essential for safe grid operation and renewable energy integration. However, most existing models rely heavily on site-specific historical data and perform poorly when deployed in cold-start scenarios of newly built power plants. We propose PhysTrans, a physics-aware transferable framework for cold-start PV forecasting. Firstly we design a physics-constrained residual network that utilizes a clear-sky module for better physical consistency. In further, we propose a dynamic cloud cropping method to obtain the cloud information of shaded PV stations by fitting the angle of the sun offsets. To fuse the asymmetric data, a query-based asymmetric fusion mechanism is introduced to achieve high-precision alignment of multi-modal data. We conduct experiments on global datasets, and the results show that the PhysTrans outperforms state-of-the-art models with a 13.2\% decrease in MAE in the single-site task, and also outperforms existing migration models with an average decrease in MAE of 12.7\% in the cross-sites task. Our work advances reliable and transferable PV forecasting for early-stage grid integration and contributes to SDG 7 (Affordable and Clean Energy) and SDG 13 (Climate Action), in line with the Leave No One Behind principle.Humans and AIHumans and AIMultidisciplinary Topics and ApplicationsMultidisciplinary Topics and ApplicationsMachine LearningMachine Learning -
#AI4G272
PROB-EMOE: A Probabilistic Ensemble Mixture-of-Experts Framework for Metro Network Expansion Forecasting
Forecasting Origin-Destination (OD) demand for new metro lines is critical for sustainable infrastructure planning but faces spatiotemporal out-of-distribution challenges. Existing models often struggle to capture heterogeneous interaction patterns in changing topologies and overlook inherent uncertainty and over-dispersion issues. To bridge these gaps, we propose PROB-EMOE, a planning-oriented probabilistic framework tailored for network expansion. To ensure robust generalization, we design a Mixture-of-Experts (MoE) predictor that integrates diverse expert views to capture heterogeneous demand patterns across changing topologies. To quantify extrapolation uncertainty, our framework functions as a unified probabilistic system by synergizing Deep Ensembles with a probabilistic output, effectively quantifying both data and model uncertainty. Through a systematic investigation of likelihood families, we empirically demonstrate that the Negative Binomial distribution offers the optimal fit in this context. Extensive experiments on a multi-year Shenzhen metro dataset demonstrate that our approach achieves state-of-the-art predictive performance and provides the sharpest calibrated uncertainty intervals. The framework has been deployed in a metropolitan smart-data platform to support risk-aware investment decisions.Data MiningData MiningMultidisciplinary Topics and ApplicationsMultidisciplinary Topics and ApplicationsUncertainty in AIUncertainty in AI
