Accepted Papers

IJCAI-ECAI 2026 Accepted Papers · All Tracks

Presentation format

Every accepted paper is presented in two formats: an oral talk — which must be delivered in person in Bremen by one of the authors — and a poster during a dedicated poster session.

953 of 953 shown
  1. #29

    Frequency-Aware Augmentation and Alignment for Time Series Contrastive Learning

    Yusen Liu, Zhichen Lai, Hua Lu, Xu Cheng, Xiufeng Liu, Huan Huo
    Contrastive learning has become a dominant paradigm for learning time series representations from large-scale unlabeled data. However, current methods are often adapted from computer vision and rely on random time-domain augmentations (e.g., jittering and cropping). Such augmentations can unpredictably disrupt the natural frequency structure of signals, leading to representations failing to capture crucial patterns in the data. To address this, we propose a framework of Frequency-Aware Augmentation and Alignment for Time Series Contrastive Learning (FACL), which comprises two key innovations. First, FACL employs a novel frequency-structured augmentation mechanism based on wavelet transforms. The mechanism constructs controlled and interpretable contrastive views by the structured attenuation and recombination of specific wavelet components. Second, FACL introduces a multi-level contrastive objective that incorporates a subspace alignment strategy. This objective explicitly aligns representations within their corresponding frequency subspaces. Experiments across six forecasting and four classification benchmarks show that FACL achieves superior performance compared to recent baselines. Ablation studies and model analysis highlight the contribution of each component in FACL. Furthermore, low-sample semi-supervised learning experiments confirm the robustness and generalization of FACL.
    Data MiningMining spatial and/or temporal dataMachine LearningSelf-supervised LearningMachine LearningUnsupervised learning
  2. #35

    S³-TIR: Coupled Spatial-Spectral Splatting for Thermal Infrared Novel View Synthesis

    Yaoxing Wang, Yuheng Li, Yan Di, Shan Gao
    We present S³-TIR (Coupled Spatial-Spectral Splatting), a physics-inspired framework for high-fidelity thermal infrared (TIR) novel view synthesis. Existing 3D Gaussian splatting methods struggle to capture the unique duality of TIR imagery: the coexistence of diffusive smoothness from heat conduction and structural sharpness from material discontinuities. To address this issue, we leverage the intrinsic frequency properties of thermal scenes, where conduction creates smooth fields while discontinuities cause sharp signal jumps. Accordingly, S³-TIR couples adaptive spatial support with internal spectral modulation via a unified generalized-exponential Gabor primitive. This design utilizes spatial support as a physical truncation controller, confining high-frequency textures within sharp boundaries while maintaining smooth diffusion elsewhere. Furthermore, we propose a frequency-aware deferred splatting scheme that enforces multi-view frequency consistency, anchoring frequency distributions to the 3D structure rather than view-dependent appearance. Experiments on diverse benchmarks demonstrate that S³-TIR achieves state-of-the-art performance, notably reducing LPIPS by 42.5% and improving PSNR by 3.2% on the RGBT-Scenes dataset, while yielding structurally stable thermal sequences free from jittering artifacts.
    Computer Vision3D computer visionComputer VisionScene analysis and understanding
  3. #52

    Understanding and Exploiting Phase Sensitivity for Attacking Large Vision–Language Models

    Daizong Liu, Junhao Dong, Xiang Fang, Hongyang He, Keyan Jin, Zhongliang Guo, Xiaoye Qu, Keke Tang
    Although Large Vision-Language Models (LVLMs) have demonstrated remarkable reasoning capabilities across various downstream multimodal tasks, they are proven to be vulnerable to carefully designed adversarial examples. Existing LVLM attackers show that exploring external components of adversarial guidance (e.g., forcing adversarial alignment, resembling harmful features) can help improve adversarial effects. However, leveraging the intrinsic patterns of LVLMs to induce adversarial perturbation generation by exploring how LVLMs perceive images has not been deeply studied. Inspired by the cognitive science, in this paper, we make the first attempt to investigate the interference of adversarial perturbation from the perspectives of image phase, and find that LVLMs are sensitive to the phase-aware image structure. Motivated by this, we propose a novel LVLM attack method called BadPhase with further backdoor designs, to implant adversarial phase as triggers into any image inputs via data poisoning so as to control the LVLMs’ predictions. A textual trigger and a backdoor perturbation switcher are also introduced to activate the malicious behavior only when both triggers are present. The whole backdoor optimization is implemented at the test-time to reduce the resource reliance. Experiments on four popular LVLMs and three benchmarks demonstrate the effectiveness of our proposed method.
    Computer VisionAdversarial learning, adversarial attack and defense methodsComputer VisionVision, language and reasoning
  4. #56

    Can Edge Addition Be Safe and Effective? Adjacency-Centered Augmentation via Langevin and SDE Diffusion for Self-Supervised Graph Anomaly Detection

    Haokai Gao, Menghua Jiang, Xuantao Yang, Jiale Liu, Yubin Li, Yuncheng Jiang
    Edge addition is commonly considered risky in Graph Anomaly Detection (GAD), as random edge addition may induce anomaly–normal connectivity. Consequently, most existing augmentation strategies focus on feature perturbation, edge removal, or subgraph sampling, leaving edge addition largely unexplored. In contrast, we empirically find that moderate and targeted structural completion among normal nodes consistently improves GAD performance, revealing guided edge addition as an overlooked yet effective augmentation dimension. Motivated by this observation, we introduce two adjacency-centered edge generation strategies with complementary mechanisms. One performs a training-free structural completion scheme via spectrum-aware Langevin dynamics, enriching graph connectivity while preserving node features. The other models the joint evolution of node features and graph structure through a stochastic differential equation–based diffusion process, producing structurally coherent and anomaly-aware complementary graphs. Extensive experiments on 12 benchmark datasets with 7 state-of-the-art GAD models demonstrate consistent and substantial improvements in both AUROC and AUPRC. Code and appendices are available at https://github.com/GaoHaokai222/LangGen-JointSDE
    Data MiningAnomaly/outlier detectionData MiningMining graphsMachine LearningSelf-supervised Learning
  5. #82

    JointScaler: A Hierarchical Multi-Indicator Distribution Forecasting Approach for Uncertainty-Aware Joint Scaling in Cloud Services

    Yang Luo, Zhemeng Yu, Yikang Fu, Wei Lu, Lintao Ma, Xiaofeng Gao, Guihai Chen
    Proactive scaling improves cloud resource efficiency by forecasting system-relevant indicators and dynamically provisioning resources to maximize utilization while satisfying quality requirements. Existing approaches forecast service indicators in isolation, ignore forecasting uncertainty, and scale resource types independently, violating bundled resource constraints and degrading service quality. Therefore, we propose JointScaler, a learning-based framework for multi-indicator distribution forecasting and uncertainty-aware scaling. It captures inter-indicator dependencies via hierarchical attention, models dynamic uncertainties with normalizing flows, and leverages full predictive distributions to optimize bundled resource allocations under quality requirements. Evaluated on 4 real-world datasets, JointScaler improves point and distribution forecasting accuracy by 5.87% and 14.74%, outperforming 12 advanced baselines. In a week-long A/B test on a payment application’s cloud platform, it reduced GPU and CPU usage by 2,400+ and 37,000+ hours with stable service quality, delivering significant economic benefits.
    Data MiningApplicationsData MiningMining spatial and/or temporal dataData MiningParallel, distributed and cloud-based high performance mining
  6. #108

    Music Atelier: Exploring the Knowing–Doing Gap in LLM Creativity via Symbolic Music Composition

    Zhejing Hu, Yan Liu, Zhi Zhang, Sean Fontaine, Gong Chen
    The knowing--doing gap, the mismatch between ideas articulated during model reasoning and the realized creative artifact, remains a fundamental challenge in creative AI and persists in LLM-based artistic creation. This paper responds to this gap by introducing a systematic and interpretable framework for examining how user intent is articulated during model reasoning and selectively realized, or lost, during action in LLMs, using symbolic music composition as an analytical lens. We present a realizational process theory that formalizes creative generation and localizes the knowing--doing gap at the realization stage. We instantiate this theory with Music Atelier (Mutelier), an LLM-as-a-Judge framework that operationalizes idea-level realization analysis and makes the gap observable and analyzable in practice. Across diverse evaluation settings, we show that Mutelier reveals reliable, previously invisible failure modes in which intent-aligned ideas are articulated during reasoning but fail to materialize in the final artifact. By reframing artistic creation as a realizational process, this work provides a principled process-level foundation for understanding how knowing--doing gaps emerge in machine creativity.
    Multidisciplinary Topics and ApplicationsArts and creativity
  7. #119

    Variable-Oriented Adaptive Singleton Consistency

    Yaling Wu, Hongbo Li, Minghao Yin
    Singleton consistency significantly reduces the search space of backtracking search for solving constraint satisfaction problems (CSP). However, it is too expensive to be used throughout the search, and it has not been successfully applied in solving general CSPs. In this paper, we propose a variable-oriented adaptive singleton consistency, namely VOASC, that could be applied in general-purpose constraint solvers. It calculates the filtering efficiency of local consistency enforced by the default propagation engine and the singleton-based consistency for each variable, and then selects an appropriate consistency to propagate the decisions of the variable in subsequent searches. We performed extensive experimentation with the MiniZinc benchmark suite. VOASC demonstrates superior solving performance on both CSP and COP, compared to the two candidate local consistencies and several existing adaptive propagation methods.
    Constraint Satisfaction and OptimizationConstraint programmingConstraint Satisfaction and OptimizationConstraint satisfaction
  8. #128

    SPACE: Structure-Preserving Cross-Modal Image Enhancement for Extreme Low-Light Conditions

    Yue Zhang, Zhiliang Wu, Yuxuan Hou, Hehe Fan
    Low-Light Image Enhancement (LLIE) remains challenging due to severe noise, color artifacts, and structural degradation in extreme low-light environments. Existing methods primarily rely on photometric cues and often fail to preserve geometric consistency, resulting in blurred edges and distorted textures. To address this limitation, we introduce LAMP, a large-scale dataset of Low-light Aligned Multimodal Pairs for LLIE. LAMP contains over 18k high-quality aligned pairs and consists of two complementary subsets. LAMP-Real is captured using LiDAR-based depth sensors under physically controlled illumination, while LAMP-Synthetic is generated through a physics-calibrated degradation pipeline. Both subsets provide precisely aligned RGB images, depth maps, and pixel-wise semantic annotations. Building upon LAMP, we propose SPACE, a Structure Preservation Aware Cross-modal Enhancement framework that explicitly leverages geometric priors. SPACE introduces a Depth-Adaptive HVI Transformation to decouple luminance and chrominance under depth guidance, effectively suppressing color-space noise. Furthermore, a Depth-Manifold Modulated Attention mechanism constrains feature interactions within a learned depth manifold, ensuring structural coherence during enhancement. Extensive experiments demonstrate that SPACE consistently outperforms state-of-the-art methods in visual quality and structural fidelity. The code and data are available at https://github.com/YueCheong/SPACE.
    Computer VisionLow-level VisionComputer VisionMultimodal learningAIComputer Vision
  9. #138

    The Price of Proportional Representation in Temporal Voting

    Nicholas Teh
    We study proportional representation in the temporal voting model, where collective decisions are made repeatedly over time over a fixed horizon. Prior work has extensively investigated how proportional representation axioms from multiwinner voting (e.g., justified representation (JR) and its variants) can be adapted, satisfied, and verified in this setting. However, much less is understood about their interaction with social welfare. In this work, we quantify the efficiency cost of enforcing proportionality. We formalize the welfare-proportionality tension via the worst-case ratio between the maximum achievable utilitarian welfare and the maximum welfare attainable subject to a proportionality axiom. We show that imposing proportional representation in the temporal setting can incur a growing, yet sublinear, welfare loss as the number of voters or rounds increases. We further identify a clean separation among axioms: for JR, the welfare loss diminishes as the time horizon grows and vanishes asymptotically, whereas for stronger axioms this conflict persists even with many rounds. Moreover, we prove that welfare maximization under each axiom is NP-complete and APX-hard, even under static preferences and bounded-degree approvals, and provide fixed-parameter algorithms under several natural structural parameters.
    Game Theory and Economic ParadigmsComputational social choiceMultidisciplinary Topics and ApplicationsEconomics
  10. #157

    Debate with Myself: Zero-Shot Event Causality Identification with Adversarial Evidence Integration via Large Language Models

    Zefan Zeng, Yuehang Si, Xingchen Hu, Qing Cheng, Jiakun Liu, Zhong Liu
    Event Causality Identification (ECI) is a crucial task in knowledge discovery that extracts structured causal relationships between annotated event mentions from unstructured text. However, existing approaches typically rely on extensive labeled data, which is scarce for specialized domains and topics. Although Large Language Models (LLMs) show strong promise for few-shot and zero-shot information extraction, they are prone to “causal hallucination,” generating unreliable and spurious causal links. To address these limitations, we propose LLM-SD (Large Language Model Self-Debate), a novel framework that formulates ECI as a structured debate among multiple identical instances of a single LLM. Causality is determined through the integration of adversarial evidence. The framework employs LLMs in distinct roles: an affirmative team argues for the existence of causality, a negative team argues against it, and an adjudication committee evaluates the evidence for determination. An evidence strength grading rule guides the quantification and integration of adversarial evidence. The automatic and LLM-driven verification finally produce a reasoned verdict for event causality. LLM-SD reduces spurious causal links resulting from causal hallucination in LLMs and identifies more long-distance causalities by promoting a balanced evaluation of arguments. Extensive experiments on three benchmark datasets demonstrate that LLM-SD achieves state-of-the-art performance in a zero-shot setting.
    Natural Language ProcessingInformation extraction
  11. #165

    TriHAI: A Combination Enhanced Tri-Mode Deferral for Human-AI Team

    Minhui Zhang, Xuehan Zhao, Xin Zhang, Jiaqi Liu, Zhiwen Yu, Bin Guo
    Learning to Defer operates by deferring AI-uncertain samples to humans, enabling the system to outperform either alone. Some works extend it by introducing human-AI combination to deferral. However, they remain limited to binary choices. Given that AI, humans, and human-AI combination are each indispensable, we extend the binary deferral paradigm to a ternary one. There are two challenges: i) how to effectively select among the three modes, especially distinguishing humans from combination? ii) how to design an enhanced combination without being degraded by unreliable model predictions? To address the challenges, we propose TriHAI, a tri-mode deferral method that routes samples to suitable modes based on normalized confidence scores obtained through a discriminative gating network, and enhances combination mode by fusing model predictions refined by human-guided Conformal Prediction and human predictions via Bayesian theory. Experiments on three datasets show that TriHAI surpasses other human-AI baselines by up to 9.57% in accuracy.
    Humans and AIHuman-AI collaboration
  12. #168

    PMCE: Probabilistic Multi-Granularity Semantics with Caption-Guided Enhancement for Few-Shot Learning

    Jiaying Wu, Can Gao, Jinglu Hu, Hui Li, Xiaofeng Cao, Jingcai Guo
    Few-shot learning aims to recognize novel categories from limited labeled samples, where prototypes estimated from 1--5 supports per class are often unreliable. Semantic-based approaches alleviate this by introducing class-level priors, but they often ignore instance-level cues and rarely optimize queries under the inductive protocol. We propose PMCE, a Probabilistic framework that leverages Multi-granularity semantics with Caption-guided Enhancement for few-shot classification. On base classes, we build a knowledge bank with class-wise visual statistics and class-name embeddings. At test time, the semantic embedding of a novel class retrieves a few similar base classes whose visual priors are aggregated into a class-specific prior and combined with the support-based prototype via MAP estimation. Simultaneously, we train a lightweight enhancer on base classes that fuses frozen BLIP captions with visual features, and apply it to both supports and queries without using query-set statistics or novel labels. A simple caption-consistency regularizer further improves robustness to noisy captions. Experiments on four standard benchmarks with ResNet-12 and Swin-T backbones show that PMCE outperforms state-of-the-art few-shot baselines, achieving up to 7.71% gains over the strongest competitor on MiniImageNet in the challenging 1-shot setting. Our code is available at https://github.com/channa419/PMCE.
    Computer VisionTransfer, low-shot, semi- and un- supervised learningMachine LearningCost-sensitive learningMachine LearningFoundation modelsMachine LearningLearning sparse modelsMachine LearningOpen-World/Open-Set/OOD Learning
  13. #197

    Guard4D: Robust Watermarking for 4D Gaussian Splatting via Decoupled Decoding

    Yingying Shi, Zhong Zhou, Bin Zhou
    4D Gaussian Splatting (4DGS) enables high-quality, real-time rendering of dynamic scenes and is becoming a popular format for sharing 4D assets. This trend calls for robust copyright watermarking that can be verified from rendered videos without access to the original scene or model parameters. However, watermarking 4DGS is challenging: verification relies on short multi-view clips whose frames vary with motion, viewpoint changes, and post-processing. More importantly, decoding from rendered frames is easily dominated by scene content (objects, textures, illumination) rather than subtle watermark traces, leading to unstable evidence across frames and poor generalization across scenes. We propose Guard4D, a decoupled watermarking framework for 4DGS. Guard4D pre-trains a general-purpose message decoder under text supervision and embeds a binary message by optimizing compact spherical-harmonic offsets while keeping geometry and opacity fixed. To improve extraction from short clips and suppress semantic interference, we introduce a temporal modeling and semantic decoupling (TMSD) module that temporally aggregates watermark information and uses clean and noise-perturbed control clips generated from the non-watermarked model to reduce the influence of semantics on watermark decoding. Extensive experiments on dynamic 4DGS scenes demonstrate the effectiveness of Guard4D. Code is available at https://github.com/shisyy/Guard4D.
    Computer Vision3D computer visionComputer VisionImage and video synthesis and generationComputer VisionMachine learning for vision
  14. #210

    Complementary Branching for #SAT: Bounded-Degree Bounds and Faster #3-SAT

    Konstantin Kutzkov
    We present complementary branching, a new branching approach for model counting based on the complement counting paradigm. Instead of branching on individual variables, complementary branching decomposes the counting problem for arbitrary CNF formulas into several subformulas, which naturally augments the classic DPLL branching. We show two novel results in which the main tool is complementary branching. First, we design new #SAT algorithms for sparse CNF formulas running in time O*(2^(alpha n)), for some constant alpha < 1. As a second application, we improve the best known deterministic upper bound for #3-SAT to O*(1.637^n) by using a version of clause learning based on complementary branching. These results demonstrate that complementary branching is a powerful tool for designing faster exact algorithms for propositional model counting.
    Constraint Satisfaction and OptimizationSatisfiabiltyConstraint Satisfaction and OptimizationSolvers and tools
  15. #221

    Distilling and Scaling Hierarchical Vision Transformer to 30B Parameters

    Shuguang Dou, Dongqi Li, Hao Jiang, Dongsheng Jiang
    Recent efforts have successfully scaled plain Vision Transformers (ViTs) to unprecedented sizes, ranging from 6B to 22B parameters. In contrast, their hierarchical counterparts have remained largely constrained to less than 1B parameters. To bridge this gap, we propose EHV, an efficient hierarchical ViT architecture. Our dense model scales from 200M to 5B parameters, and we further extend it into a Sparse Mixture-of-Experts (SMoE) variant, achieving an industry-leading scale of 30B parameters. The unsupervised pretraining process consists of two stages: first, we pretrain on ImageNet-21K using a Masked Autoencoder (MAE); second, the resulting model is distilled from multiple state-of-the-art foundation models on a nearly 27M-image dataset. With only 6.7B active parameters, EHV-5B-MoE demonstrates exceptional transfer learning performance across image classification, fine-grained classification, as well as video and dense prediction tasks, specially achieving a linear evaluation accuracy of 89.0\% on ImageNet-1K. This result surpasses those of comparable models such as EVA-CLIP-18B and DINOv3-7B, indicating that its SMoE architecture learns high-quality, generalizable, and linearly separable feature representations.
    Computer VisionRepresentation learningComputer VisionTransfer, low-shot, semi- and un- supervised learning
  16. #228

    Deterministic Implementation in Single-Item Auctions

    Yan Liu, Zeyu Ren, Pingzhong Tang, Zihe Wang, Yulong Zeng, Jie Zhang
    Deterministic auctions are attractive in practice due to their transparency, simplicity, and ease of implementation, motivating a sharper understanding of when they can attain the same outcomes as randomized mechanisms.
    We study deterministic implementation in single-item auctions under two notions of outcomes: (revenue, welfare) pairs and interim allocations.
    For (revenue, welfare) pairs, we show a separation in discrete settings: there exists a pair implementable by a deterministic Bayesian incentive-compatible (BIC) auction but not by any deterministic dominant-strategy incentive-compatible (DSIC) auction.
    For continuous atomless priors, we identify conditions under which deterministic DSIC auctions are equivalent to randomized BIC auctions in terms of achievable outcomes.
    For interim allocations, under a strict monotonicity condition, we establish a deterministic analogue of Border's theorem for two bidders, providing a necessary and sufficient condition for deterministic DSIC implementability.
    Using this characterization, we exhibit an interim allocation implementable by a randomized BIC auction but not by any deterministic DSIC auction.
    Game Theory and Economic ParadigmsAuctions and market-based systemsGame Theory and Economic ParadigmsMechanism design
  17. #231

    ITBoost: Information-Theoretic Trust for Robust Boosting

    Ye Su, Longlong Zhao, Diego García-Gil, Jipeng Guo, Gangchun Zhang, Jinxin Chen, Jinsong Chen
    Gradient boosting remains a strong and widely used method for tabular data learning, but its performance often degrades when training labels are noisy. This behavior is largely related to the way boosting algorithms emphasize samples with large gradients, without explicitly accounting for whether such errors originate from informative hard cases or from unreliable labels. We address this issue by reconsidering how sample reliability is evaluated during boosting. Instead of relying on instantaneous error, we examine the evolution of each sample’s residuals across iterations. Based on this insight, we propose Information-Theoretic Trust Boosting (ITBoost), which uses the Minimum Description Length principle to measure the complexity of residual trajectories. Samples whose residual patterns fluctuate in an irregular manner are treated as less trustworthy and are down-weighted during learning. Theoretically, we derive a tighter generalization bound for ITBoost under label noise. Empirical results on various tabular benchmarks indicate that ITBoost provides improved robustness in noisy environments over leading boosting and deep tabular models, while retaining best average performance on clean data.
    Machine LearningClassificationMachine LearningEnsemble methodsMachine LearningRobustnessMachine LearningSupervised Learning
  18. #236

    Interaction Effects in Hybrid Compression of Small Language Models

    Iheb Bouriel, Qassim Nasir, Manar Abu Talib
    We investigate the interaction between quantization and pruning in the compression of open-weight small language models (SLMs), as these techniques are frequently combined in practice without a clear understanding of whether their effects are antagonistic, synergistic or additive. Prior work has reported ordering effects and studied pruning or quantization in isolation, but rarely quantifies their interaction across models and evaluation metrics. We introduce an interaction coefficient that isolates non-additive effects and apply it to Falcon3-1B-Base and LLaMA-3.2-1B, with a limited study on Qwen2.5-1.5B. Our experiments cover four hybrid application orders, multiple pruning sparsities and 4/8-bit quantization. We find non-monotonic interactions: memory savings are dominated by 4-bit quantization, while pruning alone provides limited benefits. Hybrids exhibit the greatest effects at global unstructured pruning sparsity s ∈ {0.2, 0.3} while getting close to additivity at s=0.5. At low sparsity, truthfulness improves, while reasoning losses diminish at higher sparsity. Future work will extend interaction analysis to hardware-aware sparse kernels and larger models.
    AINatural Language ProcessingNatural Language ProcessingLanguage modelsMachine LearningOptimizationMachine LearningBenchmarksNatural Language ProcessingInterpretability and analysis of models for NLP
  19. #259

    TrajAR: Long-Term Trajectory Prediction at Urban Intersections via Multi-scale Interaction Perception

    Letian Gong, Yan Lin, Xinyue Zhang, Yiwei Shuang, Guanyu Yao, Cheng Long, Shengnan Guo, Youfang Lin, Huaiyu Wan
    Accurate trajectory prediction of multiple road users at urban intersections--including motorized and nonmotorized vehicles and pedestrians--is critical for cooperative vehicle-infrastructure systems and intelligent transportation systems. This study focuses on multiple road users' trajectory prediction in urban intersections. Such predictions face two key challenges: modeling interactions among multiple road users in complex scenarios and achieving accurate long-term prediction. Existing solutions inadequately incorporate these interactions and suffer from error accumulation or overall bias when performing long-term prediction. To overcome these challenges, we propose a long-term Trajectory prediction multi-scale AutoRegressive framework (TrajAR), which follows an encoder-decoder structure. The encoder dynamically perceives road users' motion trends, types, signals, and implicit interactions for effective handling of complex scenarios. The decoder employs a multi-scale prediction strategy that progressively refines the predicted trajectory from coarse to fine scales, where coarse-scale predictions establish long-term path backbones and finer scales enhance precision. These designs allow TrajAR to achieve accurate long-term prediction in real-world scenarios, demonstrated through experiments on four real-world urban datasets from different intersections. Our results show an average improvement of 23.8%. We provide the code of TrajAR at https://github.com/LetianGong/TrajAR.
    Data MiningMining spatial and/or temporal data
  20. #273

    Bridging LLMs and SAT Solving: Automated Evolution of High-Performance Heuristics

    Mao Luo, Hang Ding, Chu-Min Li, Junjie Zhang, Zhiwei Ye, Zhipeng Lü, Caiquan Xiong, Xinyun Wu
    Despite decades of intensive research and optimization, modern Boolean Satisfiability (SAT) solvers have reached a plateau where significant performance gains are increasingly difficult to achieve. While Large Language Models (LLMs) have demonstrated remarkable capabilities in pattern recognition and code generation for combinatorial optimization, their direct application to highly optimized SAT solvers remains a formidable challenge due to the extreme complexity and sensitivity of solver heuristics. In this paper, we introduce AESAT (Auto-Evolving SAT solving), a novel neuro-symbolic framework designed to automatically evolve and optimize the heuristic functions of SAT solvers. AESAT employs a memetic-inspired approach, synergizing LLM-guided individual optimization with evolutionary exploration. By leveraging self-optimized prompting techniques, our framework enables LLMs to iteratively discover and refine sophisticated heuristics that bypass the limitations of human-engineered designs. The efficacy of AESAT is demonstrated by its flagship derivative, AE-Kissat-MAB, which won the main track of the 2025 International SAT Competition by a wide margin. This result represents the first time an LLM-enhanced solver has dominated the world's premier SAT competition, marking a paradigm shift in the automated design of reasoning algorithms.
    Constraint Satisfaction and OptimizationConstraint satisfactionConstraint Satisfaction and OptimizationSatisfiabiltyConstraint Satisfaction and OptimizationSolvers and tools
  21. #275

    Mitigating Dynamic Graph Distribution Shifts via Spectral Augmentation

    Qianyu Song, Chao Li, Zhongying Zhao, Hua Duan, Qingtian Zeng
    Dynamic Graph Neural Networks (DyGNNs) are facing challenges with distribution shifts between training and test data that are similar but not identical. Existing DyGNNs for out-of-distribution scenarios primarily focus on discovering invariant patterns in the spatial domain, overlooking its impact on the structural properties in the spectral domain. In this paper, we propose a spectral-based graph augmentation framework designed to investigate and improve generalization behavior in dynamic graphs under distribution shifts. Specifically, we augment the input graph spectra into a mixture of shift components by maximizing the variance of spectral distance and propose an efficient approximation to reduce the computational cost brought by eigen-decomposition. Building on this, we develop a multi-encoder architecture in which each encoder targets a specific spectral shift component to generate referential representations. Our method adopts a new learning objective to encourage the model to rely on robust spectral properties for better generalization under distribution shifts. Extensive experiments on both real-world and synthetic datasets demonstrate that our proposed method significantly outperforms state-of-the-art approaches in node classification and link prediction tasks. Source codes are available at https://github.com/SSQiana/DSPA.
    Data MiningMining graphsData MiningMining spatial and/or temporal data
  22. #280

    D2ACE: Multi-Label Batch Selection Guided by Dual Dynamics and Adaptive Correlation Enhancement

    Bin Liu, Haoyu Peng, Zhijia Wei, Jiajing Zhang, Grigorios Tsoumakas
    Batch selection is crucial for improving both training efficiency and predictive performance in deep multi-label classification (MLC). Existing batch selection methods typically rely on a single metric to assess instance importance and use static label weights to distinguish label significance, neglecting the dynamic evolution of metric utility and label significance during training. In addition, the method that explicitly exploits label correlations is largely affected by abundant irrelevant labels and insensitive to local label distributions. To address these issues, we propose D2ACE, a novel multi-label batch selection method guided by Dual Dynamics and Adaptive Correlation Enhancement. D2ACE explicitly captures metric and label-level training dynamics by combining stage-wise Bernoulli mixture sampling, which balances uncertainty and noise-resistant hardness, with dynamic label weighting to recalibrate label priorities at each epoch based on current metric statistics. Furthermore, D2ACE introduces a local context-aware correlation enhancement to focus on relevant labels with instance-adaptive dependencies. Extensive experiments on tabular and image benchmarks demonstrate that D2ACE outperforms existing batch selection approaches across various deep MLC models, achieving stronger predictive performance and more efficient correlation modeling.
    Machine LearningDeep learning architecturesMachine LearningMulti-label learning
  23. #307

    Robust Federated Hyperspectral Image Clustering

    Xiang Yang, Zhengzhong Zhu, Dayu Hu, Xiaowen Ma, Zihao Li, Wenxuan Tu, Taichun Zhou, Wenxin Zhang, Renxiang Guan
    Hyperspectral image (HSI) clustering facilitates the unsupervised discrimination of complex surface materials but traditionally relies on the idealized assumption of centralized data availability. In real-world scenarios, however, this assumption clashes with data privacy regulations and the physical distribution of data across isolated silos. While Federated Learning offers a decentralized solution, existing frameworks in remote sensing are predominantly confined to supervised paradigms and struggle to address the heavy reliance on annotations and Non-IID distributions inherent among clients. To overcome these limitations, we propose a novel framework named Robust Federated HSI Clustering(RFHC) that enables collaborative unsupervised learning without requiring raw data exchange. Specifically, we design a dual-encoder architecture that incorporates a Federated Model Weight Augmentation strategy(FMWA), which generates consistent views through local-global network interactions to mitigate the spectral distortion introduced by traditional data augmentation. Furthermore, we develop a hybrid optimization objective that synergizes prototype relationship optimization with contrastive learning. This mechanism utilizes global prototypes to guide local training, effectively stabilizing feature learning against data heterogeneity while ensuring intra-cluster compactness. Extensive experiments on three benchmark HSI datasets demonstrate the effectiveness and superiority of the proposed method against state-of-the-art model(SOTA) federated approaches.
    Machine LearningClustering
  24. #312

    Constant-Memory Strategies in Stochastic Games: A Theoretical and Empirical Study

    Fengming Zhu, Fangzhen Lin
    Stochastic games have become a prevalent framework for studying long-term multi-agent interactions, especially in the context of multi-agent reinforcement learning.
    In this work, we comprehensively investigate the concept of constant-memory strategies in stochastic games.
    We first establish some results on best responses and Nash equilibria for behavioral constant-memory strategies, followed by a discussion on the computational hardness of best responding to mixed constant-memory strategies.
    Those theoretic insights are later verified on several sequential decision-making testbeds, including the \textit{Iterated Prisoner's Dilemma}, the \textit{Iterated Traveler's Dilemma}, and the \textit{Pursuit} domain.
    This work aims to enhance the understanding of theoretical issues in single-agent planning under multi-agent systems, and uncover the connection between decision models in single-agent and multi-agent contexts.
    The codebase and the full version of this paper is available at github.com/Fernadoo/Const-Mem.
    Agent-based and Multi-agent SystemsAgent theories and modelsGame Theory and Economic ParadigmsNoncooperative gamesMachine LearningPartially observable reinforcement learning and POMDPsPlanning and SchedulingPlanning with Incomplete Information
  25. #337

    Visual Implicit Geometry Transformer for Autonomous Driving

    Arsenii Shirokov, Mikhail Kuznetsov, Danila Stepochkin, Egor Evdokimov, Daniil Glazkov, Nikolay Patakin, Anton Konushin, Dmitry Senushkin
    We introduce the Visual Implicit Geometry Transformer (ViGT), an autonomous driving geometric model that estimates continuous 3D occupancy fields from surround-view camera rigs. ViGT represents a step towards foundational geometric models for autonomous driving, prioritizing scalability, architectural simplicity, and generalization across diverse sensor configurations. Our approach achieves this through a calibration-free architecture, enabling a single model to adapt to different sensor setups.
    Unlike general-purpose geometric foundational models that focus on pixel-aligned predictions, ViGT estimates a continuous 3D occupancy field in a bird’s-eye-view (BEV) addressing domain-specific requirements.
    ViGT naturally infers geometry from multiple camera views into a single metric coordinate frame, providing a common representation for multiple geometric tasks.
    Unlike most existing occupancy models, we adopt a self-supervised training procedure that leverages synchronized image-LiDAR pairs, eliminating the need for costly manual annotations.
    We validate the scalability and generalizability of our approach by training our model on a mixture of five large-scale autonomous driving datasets (NuScenes, Waymo, NuPlan, ONCE, and Argoverse) and achieving state-of-the-art performance on the pointmap estimation task, with the best average rank across all evaluated baselines.
    We further evaluate ViGT on the Occ3D-nuScenes benchmark, where ViGT achieves comparable performance with supervised methods.
    Computer Vision3D computer visionComputer VisionScene analysis and understanding
  26. #338

    Unified Sensor Simulation for Autonomous Driving

    Nikolay Patakin, Arsenii Shirokov, Anton Konushin, Dmitry Senushkin
    In this work, we introduce XSIM, a sensor simulation framework for autonomous driving. XSIM extends 3DGUT splatting with a generalized rolling-shutter modeling tailored for autonomous driving applications. Our framework provides a unified and flexible formulation for appearance and geometric sensor modeling, enabling rendering of complex sensor distortions in dynamic environments.
    We identify spherical cameras, such as LiDARs, as a critical edge case for existing 3DGUT splatting due to cyclic projection and time discontinuities at azimuth boundaries leading to incorrect particle projection.
    To address this issue, we propose a phase modeling mechanism that explicitly accounts temporal and shape discontinuities of Gaussians projected by the Unscented Transform at azimuth borders.
    In addition, we introduce an extended 3D Gaussian representation that incorporates two distinct opacity parameters to resolve mismatches between geometry and color distributions.
    As a result, our framework provides enhanced scene representations with improved geometric consistency and photorealistic appearance. We evaluate our framework extensively on multiple autonomous driving datasets, including Waymo Open Dataset, Argoverse 2, and PandaSet. Our framework consistently outperforms strong recent baselines and achieves state-of-the-art performance across all datasets.
    Computer Vision3D computer visionComputer VisionEmbodied vision: Active agents, simulationComputer VisionImage and video synthesis and generation
  27. #339

    G-VTM: A Multimodal Vision-Trajectory Model for Generalized Vehicle Trajectory Prediction

    Xinyue Zhang, Letian Gong, Yan Lin, Jinjun Cheng, Junlin Zhang, Guanyu Yao, Shengnan Guo, Youfang Lin, Shaojiang Wang, Huaiyu Wan
    Generalized vehicle trajectory prediction across diverse junctions, including urban intersections and roundabouts, remains a fundamental task in Cooperative Vehicle–Infrastructure Systems (CVIS). This study faces two key challenges: (1) Generalize across junctions with heterogeneous map semantics and traffic behavioral patterns, where the former arises from differences in road topologies and traffic regulations, and the latter reflects diverse behavioral intentions of road users; (2) Scenario-adaptive interaction modeling, where single-modality trajectory learning captures local spatio-temporal correlation, but lacks map constraint and direction-aware interaction contexts. To overcome these challenges, we propose G-VTM, a generalized vision-trajectory model. G-VTM models fine-grained behavioral patterns and relative spatial interaction from trajectory modality. At the vision modality, G-VTM captures global map semantics while modeling scenario- and direction-aware interaction based on intuitive visual perception. Experiments on multiple real-world datasets collected by unmanned aerial vehicles (UAVs) demonstrate that our method achieves strong generalized performance under heterogeneous traffic conditions. The code is provided at https://github.
    com/zxyhaclyon/G-VTM.
    Data MiningMining spatial and/or temporal data
  28. #353

    Representation-Aware Modularity: Efficient Cross-Task Generalization for LLMs

    Zheng Gong, Ying Sun, Chao Wang, Xiaohui Huo, Ping Li, Yi Zheng, Zhefeng Wang
    Cross-task generalization (CTG) enables large language models (LLMs) to handle unseen tasks proficiently, enhancing their adaptability in real-world scenarios. However, existing methods relying on per-token dynamic routing to multiple trained LoRA adapters face high computational and GPU memory costs. Recent Representation Fine-Tuning (ReFT) enhances efficiency for single-task adaptation by editing only prefix and suffix token representations. However, the semantic ambiguity of tokens and absence of a self-guided mechanism for parameter selection in unseen tasks limits their application to CTG. To this end, we propose RaMod, a Representation-Aware Modularity framework to extend the ReFT paradigm to CTG through two novel components: (i) Dual-Modular Representation & Parameter Fine-tuning, which manipulates only a strategically chosen subset of hidden representations with modular interventions to guide the model toward solving unseen tasks; and (ii) Asynchronous Orchestrator, which proactively allocates and releases GPU memory for selected interventions, thereby minimizing storage overhead. Extensive experiments demonstrate that RaMod not only achieves superior CTG performance but also substantially reduces the overhead of the latest CTG baseline, achieving 83%, 100%, and 79% reduction in its additional prefill time, generation delays, and memory consumption relative to original LLMs.
    Natural Language ProcessingLanguage generationNatural Language ProcessingLanguage models
  29. #356

    UniPocket: Physics-Aware Geometric Graph Learning with Manifold Completeness for Ligand-Specific Binding Site Prediction

    Kangxin Chen, Jieyu Zhao, Jinli Hu, Min Xie
    Predicting ligand binding sites on protein surfaces requires capturing complex local geometries and satisfying physical constraints. Existing voxel-based methods suffer from high computational costs and rotation sensitivity, while standard point-cloud GNNs often lack geometric completeness—failing to distinguish chiral structures or subtle topological variations. In this work, we propose UniPocket, a novel E(3)-equivariant surface graph neural network. Inspired by recent advances in efficient geometric completeness, UniPocket constructs a Manifold-Aware Surface Encoder that utilizes local surface normals as virtual reference frames to capture complete geometric invariants without expensive high-order tensor products. Furthermore, we introduce a Ligand-Gated Message Passing mechanism to condition the surface features on the chemical semantics of the target ligand, and a Physics-Aware Vector Rejection Module that enforces steric constraints via orthogonal vector decomposition. Experimental results on standard benchmarks (PDBbind, COACH420, Holo4k) demonstrate that UniPocket achieves state-of-the-art performance, validating that geometric completeness and physical inductive biases are key to precise binding site detection.
    Multidisciplinary Topics and ApplicationsBioinformatics
  30. #371

    Rethinking Multimodal Few-Shot 3D Point Cloud Segmentation: From Fused Refinement to Decoupled Arbitration

    Wentao Bian, Fenglei Xu
    In this paper, we revisit multimodal few-shot 3D point cloud semantic segmentation (FS-PCS), identifying a conflict in "Fuse-then-Refine" paradigms: the "Plasticity-Stability Dilemma." In addition, Contrastive Language-Image Pre-training (CLIP)'s inter-class confusion can result in semantic blindness. To address these issues, we present the Decoupled-experts Arbitration Few-Shot SegNet (DA-FSS), a model that effectively distinguishes between semantic and geometric paths and mutually regularizes their gradients to achieve better generalization. DA-FSS employs the same backbone and pre-trained text encoder as the baseline MultiModal Few-Shot SegNet (MM-FSS) to generate text embeddings, which can increase cost-free modalities' utilization rate and better leverage each modality's information space. To achieve this, we propose a Parallel Expert Refinement module to generate each modal correlation. We also propose a Stacked Arbitration Module (SAM) to perform convolutional fusion and arbitrate correlations for each modality pathway. The Parallel Experts decouple two paths: a Geometric Expert maintains plasticity, and a Semantic Expert ensures stability. They are coordinated via a Decoupled Alignment Module (DAM) that transfers knowledge without propagating confusion. Experiments on popular datasets (S3DIS, ScanNet) demonstrate the superiority of DA-FSS over MM-FSS. Meanwhile, geometric boundaries, completeness, and texture differentiation are all superior to the baseline. The code is available at: https://github.com/MoWenQAQ/DA-FSS/.
    Computer Vision3D computer visionComputer VisionMultimodal learningComputer VisionSegmentation, grouping and shape analysisComputer VisionTransfer, low-shot, semi- and un- supervised learningComputer VisionVision, language and reasoning
  31. #392

    D2MDM2: A Brain-Inspired Deep Network Based on DDM Decision-Making Mechanism for Remote Sensing Change Detection

    Kang Zhao, Ye Zhang, Bin Wang, Jianchao Zeng
    Remote sensing change detection (RSCD) aims to identify changed regions in bitemporal images. However, conventional one-step modeling suffers from performance degradation caused by imaging temporal differences (e.g., illumination disturbances, seasonal variations). To address this issue, we formulate the problem as an “adversarial attack and defense” paradigm. Then, instead of traditional adversarial training, we propose a brain-inspired deep network based on the drift diffusion model (DDM) decision-making mechanism for RSCD, which is rooted in cognitive neuroscience. By simulating brain decision processes, the network enhances model robustness significantly. Specifically, we design the multi-stage reasoning unit to simulate DDM’s iterative evidence generation, stochastic representation unit with reparameterization sampling to model stochastic diffusion in decision-making, decision aggregation unit to replicate DDM’s evidence accumulation, and loss-guided unit with multi-loss joint constraints to simulate directional drift. Experimental results verify the model’s superior performance on multiple datasets and their perturbed versions.
    Computer VisionAdversarial learning, adversarial attack and defense methodsComputer VisionRecognition (object detection, categorization)Humans and AIBrain sciences
  32. #414

    MiniST: Unlocking Input Window Length in Traffic Flow Forecasting with Compact Parameters

    Sheng Huang, Guanjun Wang, Jiaming Ma, Binwu Wang, Yang Wang
    Spatiotemporal traffic forecasting currently faces dual challenges: capturing long-range periodic dependencies and managing the computational burden of increasingly complex deep neural network architectures. Mainstream models typically contain millions of parameters and struggle to handle long sequence historical data due to memory bottlenecks. To address these challenges, we present MiniST, a minimalist framework that decouples information content from sequence length by exploiting the intrinsic redundancy of traffic data. Instead of modeling dense point-wise dependencies, MiniST projects high-dimensional traffic history into a compact set of Orthogonal Basis Vectors, effectively distilling dominant traffic modes into a disentangled latent space. Furthermore, we introduce a Template-based Update Module (TUM), which models temporal dynamics by anchoring these latent bases to a set of learnable, time-invariant global templates. Extensive experiments on multiple large-scale datasets, including a network with over 50,000 nodes, demonstrate that MiniST achieves state-of-the-art performance in both short-term and long-term traffic forecasting, outperforming 22 benchmark models with improvements of up to 11.58%, while maintaining minimal parameter size and competitive efficiency.The source code is available at MiniST Repository.
    Data MiningMining spatial and/or temporal data
  33. #419

    Controlling Decision Drift in Multimodal Sentiment Analysis with Missing Modalities

    Chenglizhao Chen, Yuchen Cao, Xinyu Liu, Mengke Song, Guisheng Zhang, Xiaomin Yu
    Multimodal sentiment analysis relies on textual, acoustic, and visual signals, yet real-world data often suffer from modality missing and quality imbalance. Existing methods generate features for modality missing from available ones, but differences in expression mechanisms and sentiment dynamics across modalities may cause the generated features to deviate from true distributions and mislead prediction. In addition, unreliable modalities may dominate fusion, resulting in representation shift across modality combinations and unstable sentiment representations. To address these challenges, we propose a two-level reference alignment framework. The framework introduces stable references at the feature representation and sentiment decision levels to improve robustness under modality missing. First-level reference alignment leverages complete-modality samples to constrain representations and align different modality combinations into a shared sentiment space. Second-level reference alignment enforces cross-modal consistency at the decision level by suppressing unreliable modalities through prototype retrieval and voting. As a result, the framework maintains stable and reliable sentiment predictions under diverse missing-modality patterns. Experiments on CMU-MOSI and CMU-MOSEI show consistent improvements across various missing-modality settings. Under full-modality input, the proposed method achieves state-of-the-art performance, with ACC of 86.28% and 85.88%, and F1 of 86.24% and 85.86%.
    Computer VisionMultimodal learningComputer VisionRecognition (object detection, categorization)Computer VisionVision, language and reasoning
  34. #446

    GUI-ReWalk: Massive Data Generation for GUI Agent via Stochastic Exploration and Intent-Aware Reasoning

    Taoran Lu, Minghao Liu, Musen Lin, Lichen Yuan, Yuhao Chao, Yiwei Liu, Haonan Xu, Rui Pan, Lei Yang, Yu Miao, Zhaojian Li
    Graphical User Interface (GUI) Agents, powered by large language and vision-language models, hold promise for enabling end-to-end automation in digital environments. However, their progress is fundamentally constrained by the scarcity of scalable, high-quality trajectory data. Existing data collection strategies either rely on costly and inconsistent manual annotations or on synthetic generation methods that trade off between diversity and meaningful task coverage. To bridge this gap, we present GUI-ReWalk: a reasoning-enhanced, multi-stage framework for synthesizing realistic and diverse GUI trajectories. GUI-ReWalk begins with a stochastic exploration phase that emulates human trial-and-error behaviors, and progressively transitions into a reasoning-guided phase where inferred goals drive coherent and purposeful interactions. Moreover, it supports multi-stride task generation, enabling the construction of long-horizon workflows across multiple applications. By combining randomness for diversity with goal-aware reasoning for structure, GUI-ReWalk produces data that better reflects the intent-aware, adaptive nature of human-computer interaction. We further train Qwen2.5-VL-7B on the GUI-ReWalk dataset and evaluate it across multiple benchmarks, including Screenspot-Pro, OSWorld/OSWorld-G, UI-Vision, AndroidControl, and GUI-Odyssey. Results demonstrate that GUI-ReWalk enables superior coverage of diverse interaction flows, higher trajectory entropy, and more realistic user intent. These findings establish GUI-ReWalk as a scalable and data-efficient framework for advancing GUI agent research and enabling robust real-world automation.
    Agent-based and Multi-agent SystemsAgent-based simulation and emergenceAIKnowledge Representation and ReasoningAIMachine LearningAgent-based and Multi-agent SystemsMulti-agent learning
  35. #459

    Cross Domain Test Time Scaling: Scale Knowledge and Reasoning on Cross Domains

    Minxi Yan, Yihua Shao, Yanling Pan, Siyu Chen, Hongjuan Pei, Hao Tang, Fei Ma, Jingcai Guo, Nicu Sebe
    Test-time scaling (TTS) has demonstrated remarkable potential in enhancing the reasoning capabilities of Large Language Models (LLMs) and Large Vision-Language Models (LVLMs). However, its application has primarily been limited to domains such as mathematics and programming, owing to their reasoning-intensive nature and the ease of result verification. Its utility in other knowledge-intensive fields, such as medicine and general scientific research, remains underexplored.
    To bridge this gap and unlock the potential of TTS in broader domains, we propose Cross-Domain TTS, a novel framework that enables task-tailored scaling. This framework consists of two key components: a conformal prediction-based cold-start strategy and an information-gain-based dynamic reasoning adjustment. The CP-based cold-start strategy guides the model's initialization during test-time scaling based on conformal prediction theory, while the information-gain-based dynamic reasoning adjustment guides the model's reasoning progress through a progress vector according to the information gain of reasoning steps. We conducted experiments using LLMs and LVLMs on cross-domain benchmarks. Our results demonstrate that the proposed framework consistently improves performance across various domain-specific datasets. For instance, in the medical domain, it achieves an improvement of up to 17\% in pass@1 accuracy while reducing inference latency and saving up to 30\% in token consumption. Code is available at https://github.com/Yan0613/Cross-Domain-TTS.
    Computer VisionTransfer, low-shot, semi- and un- supervised learningMachine LearningCost-sensitive learningMachine LearningFoundation modelsMachine LearningLearning sparse modelsMachine LearningOpen-World/Open-Set/OOD Learning
  36. #464

    Improving Multimodal Reasoning via Worst Dimension Optimization

    Haocheng Lv, Huaping Zhang, Qiuchi Li, Lei Li, Chunxiao Gao
    Multimodal reasoning requires a path that retains integrity over a wide range of constraints, from visual grounding to logic consistency. However, the current Process Reward Models (PRMs) focus on heuristically defined rewards that equally weigh these factors, which may lead to the concealment of individual dimension failures (such as visual hallucinations) by the dominating factors, without guaranteeing the validity of the reasoning process in general. Therefore, to overcome the limitation, the paper proposes the concept of Multimodal Multi-Dimensional Scalarization Process Reward Modeling (MMS-PRM), a paradigm specifically developed to enforce the worst dimension’s robustness in multimodal reasoning. Specifically, a hierarchical fine-grained reward space is developed to represent the multimodal risks in the reasoning tasks, and a Chebyshev-based Monte Carlo Tree Search (MCTS) algorithm is introduced, in which the primary focus during the path searching is given to the worst-performing dimension. Moreover, a curriculum-based Direct Preference Optimization (DPO) approach is developed to gradually learn the balanced reasoning skills in the policy. The experimental results show that, without the dimension collapse issue, the MMS-PRM approach significantly improves the reliability of the multimodal reasoning performance and reaches competitive results in various challenging tasks. The code is available at https://github.com/leibniz-Man/MMS-PRM
    Computer VisionVision, language and reasoningKnowledge Representation and ReasoningLearning and reasoningNatural Language ProcessingQuestion answering
  37. #467

    Mitigating Collaboration Degeneration in Multi-Agent Code Generation via a Controllable Competitive Collaboration Approach

    Shanzhi Gu, Mingyang Geng, Yihong Dong, Yunxin Mao, Zhaoyang Qu, Hao Zou, Ruochun Jin, Zhipeng Liu, Chuanfu Xu, Haotian Wang
    Empowered by large language models (LLMs), multi-agent systems (MAS) have shown significant potential in code generation by simulating collaborative workflows. However, we identify a collaboration degeneration phenomenon, where one agent dominates while others remain disengaged, occurring in 38.4% of non-routine HumanEval tasks. Surprisingly, introducing competition among LLM agents eliminates this degeneration and sparks innovation in generated code. Inspired by this catfish effect, we posit that competition helps prevent collaboration degeneration. We thus propose C3, a Controllable Competitive Collaboration framework featuring two novel mechanisms: Centralized Auction-Based Competition (CAB) for role allocation via bidding and
    Decentralized Communication-Aware Competition (DCC) for solution refinement through opponent-aware adaptation.
    Evaluated on 70 complex projects from extended SoftwareDev, C3 reduces collaboration degeneration from 32.4% to 12.8%, while enhancing code innovation and diversity. Human evaluations further confirm C3’s superiority in functionality, maintainability, and robustness over purely collaborative or competitive baselines.
    Agent-based and Multi-agent SystemsCoordination and cooperationAgent-based and Multi-agent SystemsMulti-agent learning
  38. #487

    SAFformer: Improving Spiking Transformer via Active Predictive Filtering

    Zequan Xie, Weiming Zeng, Yunhua Chen, Sichang Lin, Tongyang Chen, Jinsheng Xiao
    Spiking Neural Networks (SNNs) offer notable advantages in biological plausibility and energy efficiency, making them promising candidates for building low-power Transformers. However, existing Spiking Transformers largely adhere to a passive reactive paradigm, which struggles to focus on task-relevant information and incurs substantial computational overhead when processing redundant visual data. To overcome this fundamental yet underexplored limitation, we propose SAFformer, a novel Spiking Transformer architecture based on an active predictive filtering paradigm. Inspired by the brain’s predictive coding mechanism, SAFformer actively suppresses predictable signals and focuses on salient visual features. Extensive experiments show that SAFformer establishes new state-of-the-art performance on CIFAR-10/100 and CIFAR10-DVS. Remarkably, on ImageNet-1K, it achieves 80.44% Top-1 accuracy with only 26.58M parameters and an energy consumption of 5.88 mJ, demonstrating an exceptional balance between accuracy and efficiency.
    Computer VisionEfficiency and OptimizationHumans and AICognitive modelingHumans and AICognitive systemsMachine LearningAttention models
  39. #509

    System 1&2 Synergy via Dynamic Model Interpolation

    Chenxu Yang, Qingyi Si, Chong Tian, Xiyu Liu, Dingyu Yao, Chuanyu Qin, Zheng Lin, Weiping Wang, Jiaqi Wang
    Training a unified language model that adapts between intuitive System 1 and deliberative System 2 remains challenging due to interference between their cognitive modes. Recent studies have thus pursued making System 2 models more efficient. However, these approaches focused on output control, limiting what models produce. We argue that this paradigm is misaligned: output length is merely a symptom of the model's cognitive configuration, not the root cause. In this work, we shift the focus to capability control, which modulates how models think rather than what they produce. To realize this, we leverage existing Instruct and Thinking checkpoints through dynamic parameter interpolation, without additional training. Our pilot study establishes that linear interpolation yields a convex, monotonic Pareto frontier, underpinned by representation continuity and structural connectivity. Building on this, we propose DAMI (DynAmic Model Interpolation), a framework that estimates a query-specific Reasoning Intensity lambda(q) to configure cognitive depth. For training-based estimation, we develop a preference learning method encoding accuracy and efficiency criteria. For zero-shot deployment, we introduce a confidence-based method leveraging inter-model cognitive discrepancy. Experiments on five mathematical reasoning benchmarks demonstrate that DAMI achieves higher accuracy than the Thinking model while remaining efficient, effectively combining the efficiency of System 1 with the reasoning depth of System 2.
    Natural Language ProcessingApplicationsNatural Language ProcessingLanguage generationNatural Language ProcessingLanguage models
  40. #535

    EVENTTSF: Event-Aware Non-Stationary Time Series Forecasting

    Yunfeng Ge, Ming Jin, Yiji Zhao, Hongyan Li, Bo Du, Chang Xu, Shirui Pan
    Time series forecasting is vital in diverse sectors such as energy and transportation, where non-stationary dynamics are deeply intertwined with external events in other modalities such as texts. However, incorporating natural language-based external events to improve non-stationary forecasting remains largely unexplored, as most approaches still rely on a single modality, resulting in limited contextual knowledge and model underperformance. Enabling fine-grained multimodal interactions between temporal and textual data is challenged by two fundamental issues: (1) the gap in modeling interactions among discrete external events and continuous time series in a unified framework; (2) classical uniform diffusion timestep ignores event-induced non-stationary variability, leading to imbalanced denoising difficulty across diffusion stages. In this work, we propose event-aware non-stationary time series forecasting EventTSF, an autoregressive diffusion framework that integrates historical time series and textual events via step-wise diffusion. To mitigate the imbalanced denoising difficulty of uniform timestep sampling, EventTSF uses an event-aware flow-matching timestep conditioned on event semantics. Extensive experiments on 7 synthetic and real-world datasets show that EventTSF outperforms 12 non-stationary time series forecasting baselines, achieving average gains of 41.3% in probabilistic forecasting and 27.5% in deterministic forecasting across all evaluation metrics.
    Machine LearningTime series and data streams
  41. #552

    Bootstrapping Video Interaction Generation with Synthetic State Transitions

    Jiho Jang, Jin-Young Kim, Nojun Kwak, Kyungjune Baek
    While recent video generative models can synthesize high-fidelity videos, they struggle to portray plausible physical interactions and the resulting state transitions, a critical bottleneck for applications in robotics and VR/AR. To address this, we introduce a framework to generate a scalable synthetic dataset of controllable interactions.
    Our pipeline leverages a structured taxonomy and state-of-the-art image editing models to create explicit `start' and `end' state images, which serve as visual anchors for the interaction. To generate a seamless video utilizing these anchors, we propose State-Guided Sampling (SGS), a novel sampling technique that mitigates artifacts common in naive conditional generation. Furthermore, we develop and validate a new automated evaluation system that aligns with human judgments to ensure data quality.
    Experiments show that fine-tuning a base model on our dataset significantly enhances its ability to generate plausible interactions. The dataset, code, and evaluation tools will be released.
    Computer VisionImage and video synthesis and generationComputer VisionVideo analysis and understanding
  42. #560

    Learning Heterogeneous Global Local Frequency Dependencies in Diffusion-Based Image Compression

    YuBing Luo, Zekai Ji, Jia Qin, Zhihang Chen, Tengyue Guo, Pinle Qin, Rui Chai, Jianchao Zeng
    Diffusion-based image compression has exhibited robust performance. However, most existing methods primarily emphasize spatial domain, causing frequency dependencies to be learned only implicitly. We revisit this problem from a frequency perspective, and observe pronounced heterogeneity between global and local frequency dependencies. Global frequency patterns describe the energy distribution over the entire image, whereas patch level frequency relations govern the coupling of local details. These two forms of dependency differ substantially in both scale and semantic level, yet most methods overlook this distinction. To address this issue, we propose a learning heterogeneous global local frequency dependencies in diffusion-based image compression which uses Fourier and Mamba jointly models both global and local frequency correlations(FMDiff). The core of FMDiff is the dual branch FMBlock. In the frequency branch, features are decomposed into magnitude and phase, then fed into Mamba separately. Its scanning mechanism captures cross frequency coupling within each patch while aggregating long range frequency context along the sequence. Magnitude phase interactive modulation is then used to explicitly restore spectral information corrupted by compression. The spatial branch provides high level semantic constraints and is fused with the frequency branch during diffusion reconstruction. Extensive experiments show that FMDiff consistently improves performance across multiple datasets.
    Computer VisionImage and video synthesis and generationComputer VisionLow-level Vision
  43. #566

    BAMFair: Barycenter Aligned Mediation for Fairness Across Multiple Sensitive Attributes

    Hengyu Yue, Fan Wang, Weiming Liu, Yuwen Liu, Lianyong Qi, Haolong Xiang, Xiaolong Xu, Xuyun Zhang, Shichao Pei, Qiang Ni
    Achieving fairness in machine learning models while maintaining high accuracy is an important but complex task, especially when handling multiple sensitive attributes. Traditional fairness methods often struggle to eliminate bias within subgroups divided by sensitive attributes. Several key challenges have been identified in this context: (1) Multiple sensitive attribute scalability challenge, where methods fail to ensure fairness as the number of sensitive attributes increases, despite scenarios with multiple sensitive attributes being prevalent in real-world applications; (2) Multiple objective optimization conflict challenge, where simultaneously optimizing for accuracy, fairness, and other relevant objectives leads to conflicting gradient updates, causing suboptimal performance. To address these challenges, we propose BAMFair, a Barycenter Aligned Mediation framework for fairness across multiple sensitive attributes. It comprises two core modules: a Global Barycentric Alignment (GBA) module and a Nash Fairness Mediator (NFM) module. Specifically, GBA innovatively introduces a global fair barycenter and minimizes the distances from subgroups divided by sensitive attributes to it, providing a scalable and efficient solution for fairness optimization across multiple sensitive attributes. Subsequently, NFM negotiates an agreement among inconsistent gradient updates between different objectives. Extensive experiments on four real-world datasets validate that BAMFair outperforms state-of-the-art methods in scenarios with multiple sensitive attributes.
    AI Ethics, Trust, FairnesFairness and diversityData MiningCollaborative filteringData MiningRecommender systems
  44. #577

    Temporal-Synergistic Policy Optimization for Unsupervised Low-Light Image Enhancement

    Yuanfei Bao, Dong Li, Jie Huang, Xingbo Wang, Xueyang Fu
    Diffusion models show significant potential for low-light image enhancement. However, this task requires satisfying human perceptual preferences and content fidelity transcending simple brightness and color improvement. Existing methods rely on heuristic physical priors or incorporate perceptual metrics directly into timestep-wise training objectives. Such proxy constraints fail to provide reliable trajectory-level guidance for perceptual alignment. Furthermore, they often lead to artifacts or unnatural visual effects in this ill-posed inverse problem. To address these issues, we propose an unsupervised low-light enhancement framework based on Group Relative Policy Optimization (GRPO), which utilizes perceptual preferences to directly optimize the diffusion policy. We introduce a Sliding Window Hybrid ODE-SDE Sampling strategy that confines stochasticity to dynamic sub-intervals, thereby achieving efficient coarse-to-fine exploration and precise advantage attribution. To meet strict fidelity constraints, we construct a Synergistic Perception-Fidelity Reward and introduce an Independent Advantage Estimation strategy to mitigate signal collapse and gradient suppression in multi-objective optimization. Furthermore, we design a Temporal-Aware Dynamic Weighting Mechanism that adaptively adjusts perceptual weights across denoising stages to balance visual enhancement with structural preservation. Extensive experiments on multiple real-world benchmarks demonstrate that our method effectively improves the perceptual performance of existing generative models, yielding results that better align with human aesthetics.
    Computer VisionLow-level Vision
  45. #617

    CDO-GIA: A Robust Textual Gradient Inversion Attack Against Federated Language Models via Continuous-Discrete Optimization

    Jiajie Wang, Weibo Xu, Ruichen Xia, Jiahao Nie
    Gradient inversion attacks (GIAs) have shown that shared gradients in federated learning leak private training data. However, current textual GIAs fail in large batch size, as simply increasing batch size can serve as a stable defense against such attacks. In this paper, we propose CDO-GIA, a novel textual gradient inversion attack via continuous-discrete optimization that extends the maximum effective batch size from 4 to 8 for reconstructing text from gradient. In continuous optimization phase, CDO-GIA incorporates a dynamic weight decay mechanism to alleviate embedding homogenization caused by embedding regularization at large batch sizes, thereby enhancing unigrams reconstruction accuracy. Moreover, our method introduces a tabu beam search mechanism guided by the language model prior during discrete optimization. This design facilitates fine-grained exploration of high-dimensional token spaces via precise token order adjustment, thus reconstructing semantically coherent sequences. By alternating optimization between continuous and discrete phases, CDO-GIA effectively doubling practical attack limit of prior textual GIAs. Extensive experiments are conducted on three binary text classification datasets. The experimental results indicate that CDO-GIA surpasses all baseline methods. With a batch size of 8, it achieves performance gains of 25.01%, 155.64%, and 22.72% over the baselines in terms of Rouge-1, Rouge-2 and Rouge-L scores.
    AI Ethics, Trust, FairnesSafety and robustnessNatural Language ProcessingInformation extractionSearchMixed discrete/continuous search
  46. #618

    LogicFusion: Differentiable Logical Rule Learning for Cancer Driver Gene Identification

    Bang Chen, Lijun Guo, Wentao He, Guang Cao, Rong Zhang
    Identifying cancer driver genes (CDGs) requires integrating heterogeneous biological networks, each encoding distinct mechanistic insights into tumorigenesis. Existing multi-network methods typically fuse views at the feature level or enforce uniform representations, which obscures network-specific signals and limits interpretability. To address this, we propose LogicFusion, a novel differentiable framework that treats each biological network as an independent probabilistic evidence source and explicitly learns how to combine them using logical reasoning. Specifically, LogicFusion maps the output of each view into probabilistic space as independent evidence, and learns a set of logical rules in a differentiable manner. These rules adaptively select and fuse these view evidences through logical conjunction (AND) and disjunction (OR). The final prediction is then obtained via gene-specific weighting. LogicFusion enables both high-accuracy CDG identification and interpretable attribution of predictions to specific networks. Experiments on pan-cancer and cancer-specific datasets show that LogicFusion consistently outperforms state-of-the-art methods, while providing transparency in multi-view genomic reasoning through learned logical rules.
    Multidisciplinary Topics and ApplicationsBioinformaticsMachine LearningNeuro-symbolic methods/Abductive LearningMachine LearningMulti-view learning
  47. #622

    One-Step Self-Aligned Anchor Learning for Multi-View Clustering

    Miao Jia, Zijian Chen, Xingchen Hu, Jiyuan Liu, Huan Chen, Siwei Wang, Jincai Huang
    Multi-view anchor graph clustering has emerged as a high-efficiency paradigm of multi-view learning. However, how to design an effective anchor alignment mechanism within this framework remains an open challenge, which is formally termed the Anchor-Unaligned Problem (AUP). Current research fails to adequately address two pivotal aspects of this challenge: first, the construction of anchor graphs and the alignment of anchors are two independent stages, overlooking their potential synergistic reinforcement; second, selecting anchors from different views as an alignment baseline often renders the clustering performance highly sensitive to the baseline choice. To address these issues, we propose a unified framework termed One-Step Self-Aligned Anchor Learning for Multi-View Clustering (OSAA-MVC). Departing from conventional two-stage strategies, we integrate anchor alignment and anchor graph construction into a joint optimization process, thereby enabling their mutual reinforcement to improve clustering performance. To avoid the baseline selection issue, we introduce a novel mixed anchor strategy, which effectively bypasses the necessity of manual baseline selection while simultaneously capturing both view-specific and cross-view information. This strategy can stabilize clustering performance at a consistently high level. Extensive experiments demonstrate the superior efficiency and effectiveness of our proposed method compared to state-of-the-art competitors. The code is available at https://github.com/Jiamiao2024/OSAA-MVC.
    Machine LearningClusteringMachine LearningMulti-view learningMachine LearningUnsupervised learning
  48. #623

    Multi-Scale Residual Graph Learning with Contextual Memory for Sleep Staging

    Xiaodong Yang, Jieying Hu, Dongxin Liang
    Sleep is a vital physiological process, and its quality directly impacts human health. Although deep learning-based automated sleep staging methods have achieved significant progress, the studies still face limitations in fully mining spatio-temporal graph dependencies. Existing methods face challenges in capturing implicit spatial topology among multi-channel signals and efficiently fusing multi-level features in long-sequence modeling. To address these challenges, this paper proposes a novel hybrid deep learning framework for sleep staging that progressively and explicitly integrates spatial topology modeling, cross-level feature fusion, and global temporal aggregation. By decoupling shallow and deep spatial features and hierarchically organizing feature interactions, the proposed framework effectively captures implicit spatial dependencies while maintaining computational efficiency in long-sequence modeling. Extensive experimental results on multiple sleep staging datasets (83.1% accuracy, 80.1% F1-score on ISRUC-S1, 84.0% accuracy, 83.8% F1-score on ISRUC-S3, and 85.9% accuracy, 83.7% F1-score on Sleep-EDF-153) demonstrate that the proposed method achieves competitive accuracy with significantly reduced computational latency, highlighting its strong potential for mobile and real-time sleep monitoring applications.
    Humans and AIBrain sciencesHumans and AICognitive modelingHumans and AIHuman-computer interaction
  49. #625

    Robust Contrastive Graph Clustering with Adaptive Local-Global Integration

    Lei Zhang, Fubo Sun, Haipeng Yang, Zhong Guan, Likang Wu
    Graph clustering is essential in graph analysis for revealing structural patterns and node communities. Despite recent advances in self-supervised contrastive learning that have improved clustering via structural and attribute signals, existing methods still struggle to flexibly capture high-order local structures and often overlook global semantics in complex graphs. These limitations lead to suboptimal node representations, especially in real-world graphs with fragmented structures and ambiguous cluster boundaries. To address these limitations, a contrastive graph clustering framework is proposed to jointly integrate multi-scale local structures with global semantics via attention mechanisms. At the local level, GNN-based topological signals extracted from multiple propagation depths are adaptively fused through attention-based weighting to capture multi-scale neighborhood features. At the global level, semantic prototypes derived from dynamically evolving cluster centers are adaptively aggregated through attention to guide node representations and enhance inter-cluster separability. The model is trained under a dual-view contrastive learning paradigm with a hybrid objective that combines instance-level and structure-aware losses to improve representation robustness and discrimination. Experiments on eight real-world graph datasets demonstrate that our method achieves competitive clustering performance. Code is available at https://github.com/vege12138/w2.
    Data MiningMining graphsData MiningMining text, web, social mediaMachine LearningMulti-view learning
  50. #633

    Open-Set Test-Time Adaptation in VLMs via Noise-Immune Self-Purification

    Mingxu Feng, Fengqiang Wan, Yang Yang
    Unlike test-time adaptation (TTA), open-set TTA (OSTTA) aims to robustly adapt to in-distribution (ID) domain shifts while suppressing the detrimental impact of out-of-distribution (OOD) samples encountered at test time. To this end, we propose a novel OSTTA approach named Noise-Immune Self-Purification (NISP). By exploiting the zero-shot priors of pre-trained vision-language models (VLMs), NISP advances from coarse pseudo-labeling to fine-grained, noise-resilient adaptation. Technically, we first introduce a dual-cluster Bayesian Gaussian mixture model to fit VLMs-derived scores, achieving coarse pseudo-ID/OOD separation via posterior-risk thresholding. Subsequently, NISP constructs a noise-immune fine-grained adaptation where the adapter enforces consensus-discrepancy constraints to refine coarse pseudo-labels and suppress noise propagation. We then devise the Jaccard Consistency Score for OOD discrimination. Overall the coarse-to-fine pipeline enables rigorous self-purification and robust online adaptation. Theoretically we show that consensus-discrepancy losses mitigate the deleterious effects of noise. Empirically, NISP achieves state-of-the-art results across multiple OSTTA benchmarks, validating its efficacy. The code is available at https://github.com/njustkmg/IJCAI26-NISP.
    Machine LearningOpen-World/Open-Set/OOD LearningMachine LearningUnsupervised learning
  51. #642

    Coarse-to-Fine Latent Guidance: A Multi-Scale Diffusion Transformer for Traffic Flow Forecasting

    Zetao Li, Silin Zhou, Zheng Hu, Shimin Cai, Tao Zhou
    Accurate traffic flow prediction is fundamental to Intelligent Transportation Systems (ITS). However, traffic dynamics exhibit inherent multi-scale heterogeneity, where stable global trends are often masked by stochastic local fluctuations. Existing methods struggle to reconcile these conflicting resolutions, leading to sub-optimal forecasting. To address this, we propose the Multi-Scale Spatial-Temporal Diffusion Transformer (MS-STDT). Deviating from previous diffusion approaches that treat generation as a monolithic task, we reframe the forward diffusion process as an intrinsic temporal coarse-graining operation. By leveraging the data's multi-scale hierarchy as a structural anchor, we introduce a coarse-to-fine latent guidance strategy that enables the model to reconstruct stable global trends before refining fine-grained details. This ensures physical consistency and generative stability without requiring external labels. Extensive experiments across six real-world datasets confirm that MS-STDT performs the best, demonstrating significant improvements in predictive accuracy and zero-shot robustness against sensor failures. We provide code and data at https://github.com/ZetaoLiPhD/MS-STDT.
    Data MiningMining spatial and/or temporal data
  52. #676

    LLM-Summarized Interest Distillation for Multimedia Recommendation

    Meng Jian, Zhuoyang Xia, Haolun Fan, Lifang Wu
    Recommendation system technology faces the fundamental challenge of data sparsity. Although incorporating auxiliary textual data works to enrich interactions, lengthy and unstructured item descriptions often fail to depict user interests, as they primarily describe item attributes. Furthermore, the sparse interactions provide incomplete interest patterns, leading to biased recommendations. To address these issues, this work proposes an LLM-summarized interest distillation (SID) model to enrich interactions by interest summarization and retrieval augmentation. Interest summarization leverages large language model to improve user interest descriptions and filters out interest-irrelevant details, generating compact, user-centric interest texts. Retrieval augmentation utilizes semantic neighbors as multiple teachers to adaptively transfer their multimodal collaborative knowledge to augment behaviorally sparse student users, thereby enriching interest learning for recommendation. It enhances interactions both qualitatively, through improved interest descriptions, and quantitatively, through expanded semantic neighbors. Extensive experiments show the superior performance of SID over the state-of-the-art models, demonstrating its effectiveness of interest summarization and retrieval augmentation to improve recommendations.
    Data MiningCollaborative filteringData MiningInformation retrievalData MiningRecommender systems
  53. #680

    A3fford-HOI: Anatomy-Aligned Affordance Disentanglement for Fine-grained and Generalizable Hand Object Interaction

    Xurui Hu, Xingqun Qi, Bingkun Yang, Chen Su, Yiwei Ru, Yin Junhui, Muyi Sun, Man Zhang
    Hand Object Interaction (HOI) generation provides an efficient solution for virtual reality simulation and embodied AI deployment.Recent studies have explored instruction-driven HOI synthesis, yet they overlooked the fine-grained interactive contact and struggled with robust generalization in data-scarce scenarios. In addition, existing datasets are frequently constrained with limited object categories and monotonous motion patterns. To address these limitations, we first integrate a large-scale HOI dataset, namely HOI-X, which contains diverse objects, rich motion patterns, and part-level contact annotations. Along with HOI-X, we propose A3fford-HOI, a fine-grained and generalizable Hand Object Interaction framework with Anatomy-Aligned Affordance Disentanglement. Specifically, we define a Disentangle-2-Interact paradigm in A3fford-HOI. Considering the various interactions of distinct hand anatomical structures,
    in the Disentangle stage, we devise an Anatomy-Aligned Affordance Predictor, which leverages the external plausible physical priors in 3D multimodal large language models for affordance disentanglement prediction. Then, the disentangled affordance is employed into the Interact stage, i.e. Affordance Driven Interactor. The interactor integrates textual instructions, object geometric features, and the external knowledge enhanced (MLLMs) affordance via a Diffusion Transformer (DiT) for generalizable HOI generation. Extensive experiments demonstrate that A3fford-HOI maintains superior generation quality and contact plausibility for fine-grained and generalizable hand object interaction.
    Computer Vision3D computer visionComputer VisionMotion and tracking
  54. #685

    Geodesic Expert Routing for Unbiased Knowledge Distillation in Recommendation

    Xuan Zhang, Rongchuan Wei, Chunyu Wei, Hongxing Yuan, Yushun Fan
    Knowledge distillation has become a prevalent technique for deploying efficient recommender systems, enabling lightweight student models to approximate the performance of larger teachers. However, we identify a critical issue: distillation systematically amplifies popularity bias, as student models inherit and intensify the popularity-driven shortcuts encoded in teachers trained on interaction data dominated by popular items. To address this limitation, we propose GUIDE (Geodesic aware Unbiased Instructive Distillation with Experts), a collaborative distillation framework that incorporates domain-specific debiasing experts alongside the global teacher. GUIDE tackles two key challenges in this paradigm. First, for expert routing, we introduce Spherical Expert Alignment, which conducts expert-student matching on the spherical manifold with geodesic distance optimization, eliminating magnitude-induced bias and ensuring stable gradient flow. Second, for context fusion, we design a Meta-Debiasing Gate that dynamically arbitrates teacher-expert influence based on real-time user-item context through end-to-end meta-learning. Extensive experiments on multiple real-world datasets demonstrate that GUIDE significantly mitigates popularity bias while preserving recommendation accuracy, with state-of-the-art trade-offs among efficiency, accuracy, and fairness.
    Data MiningRecommender systemsMachine LearningDeep learning architectures
  55. #686

    Falsdo: Benchmarking Artifact-Controlled Multimodal Fake News Verification via Failure-Aligned Auditing

    Bowen Chen, Jun Yin, Lele Cao, Zheyuan Zhan, Can Wang
    Recent generative AI renders multimodal misinformation structurally harder to detect, making reliable detection dependent on semantic verification grounded in verifiable evidence. However, current benchmarks often fail to isolate true semantic checking from superficial shortcut exploitation. We introduce Falsdo, a diagnostic benchmark designed to make robustness and auditability identifiable. Constructed via a heterogeneous web-mining pipeline, Falsdo provides instruction-conditioned grounding and formalizes a counterfactual artifact-control protocol. This stress test reveals a critical failure: when low-level generation traces are equalized, detector performance degrades, indicating reliance on artifacts rather than robust verification. To enable reproducible diagnosis, we propose DREA, a failure-aligned evidence-auditing baseline. DREA specifies evidence acquisition with explicit noise control and implements constrained channel-wise auditing with reliability-aware late fusion. Experiments demonstrate that Falsdo reliably exposes shortcut dependence, while DREA improves instruction-level grounding (Joint-F1) and minimizes degradation under artifact-controlled settings.
    Data MiningMining text, web, social mediaNatural Language ProcessingLanguage groundingNatural Language ProcessingResources and evaluation
  56. #702

    Joint Multi-Modal Multi-Interest Profiling and Preference-Grounded Reasoning for Explainable Recommendation

    Anqi Wang, Jianye Xie, Weiming Liu, Rong Jiang, Lianyong Qi, Haolong Xiang, Xiaolong Xu, Wenmin Lin, Yang Zhang, Xiaokang Zhou
    Explainable Recommendation (ER) aims to enhance recommendation transparency and prediction accuracy by providing faithful and persuasive explanations. However, Multi-Modal Multi-Interest Explainable Recommendation (MMER) is particularly challenging in two aspects: effectively utilizing diverse multi-modal information to construct reliable semantic evidence and precisely identifying the most relevant user interest to generate faithful explanations. Most previous methods fail to effectively exploit visual information to provide reliable evidence and primarily rely on a single unified representation of user preferences, making it difficult to distinguish diverse user interests and generate faithful explanations. To fill this gap, we propose Joint Multi-Modal Multi-Interest Profiling and Preference-Grounded Reasoning (PRIME) for solving the MMER problem. Specifically, PRIME first organizes items into interest-aware clusters to facilitate interest disentanglement. Based on these clusters, it constructs explicit multi-modal multi-interest user profiles and a comprehensive global item profile to capture fine-grained preferences and distinguish diverse user interests. Furthermore, PRIME performs preference-grounded reasoning by retrieving the most relevant user interest for each item, enabling faithful explanations and accurate predictions. Our experiments on three real-world datasets demonstrate that PRIME outperforms the state-of-the-art methods.
    Data MiningRecommender systemsMachine LearningExplainable/Interpretable machine learning
  57. #707

    Learning to Harvest: VR-Guided Expert Behaviour Capture for Decision Modelling in Agricultural Robots

    Yining Lang, Zhaoxin Li, Xiujuan Chai
    While deep learning has significantly enhanced robotic perception in agriculture, autonomous decision-making in dense and occluded environments remains a persistent challenge. This paper proposes a VR-based expert motion capture framework to bridge this gap by integrating high-fidelity virtual environments with human expertise. By reconstructing a realistic tomato greenhouse in Unity and utilizing 27-point motion capture suits, we captured multi-dimensional expert demonstration data encompassing trajectories, postures, and velocities. In the VR environment, ``hard cases" featuring frequent collisions can be selectively generated at a higher frequency, allowing experts to provide concentrated demonstrations of specialized postures that markedly accelerate algorithmic learning efficiency. Crucially, the virtual framework enables multiple experts to repetitively harvest identical targets to match optimal trajectories—a process precluded in reality by the destructive nature of harvesting. Moreover, it bypasses biological growth and seasonal constraints, facilitating continuous, high-frequency experimentation independent of crop maturation. These expert data were subsequently integrated into a multi-task reinforcement learning model as experience replay and reward templates. Experimental results demonstrate that our expert-driven algorithm significantly outperforms baseline methods in harvesting efficiency, collision avoidance, and stability, establishing VR-guided motion capture as a robust and scalable paradigm for advancing agricultural robotics.
    Humans and AIApplicationsHumans and AIHuman-AI collaborationRoboticsApplications
  58. #737

    Harmonizing Federated Heterogeneous Optimization via Adaptive Objective Rectification

    Jianrong Lu, Bangwei Li, Zhuoya Gu, Peng Fang, Ziming Zhao, Jianhai Chen
    Federated optimization under data heterogeneity presents a significant challenge, often leading to suboptimal model performance. While numerous methods aim to replicate the ideal performance of centralized training, they frequently fall short in highly heterogeneous settings. In this paper, we introduce HaFedHo, an adaptive objective rectification method that harmonizes local training with the ideal data-centralized objective, requiring minimal modifications to the standard federated learning framework. HaFedHo operates by first decoupling the centralized objective and then employing a dynamic Taylor series expansion to accurately estimate the global objective for each client. Our theoretical analysis shows that the estimation error provably converges to zero as training progresses. Furthermore, extensive experiments on real-world datasets demonstrate that HaFedHo surpasses state-of-the-art methods, including SCAFFOLD, MimeLite, and FedDyn, in both test accuracy and communication efficiency. Notably, HaFedHo maintains its superior performance even with a client participation rate as low as $0.2\%$ in severely heterogeneous environments.
    Knowledge Representation and ReasoningApplicationsMachine LearningFederated learningMachine LearningOptimization
  59. #746

    VGAS: Value-Guided Action-Chunk Selection for Few-Shot Vision-Language-Action Adaptation

    Changhua Xu, En Yu, Junyu Xuan, Jie Lu
    Vision-Language-Action (VLA) models bridge multimodal reasoning with physical control, but adapting them to new tasks with scarce demonstrations remains unreliable. While fine-tuned VLA policies often produce semantically plausible trajectories, failures often arise from unresolved geometric ambiguities, where near-miss actions lead to divergent execution outcomes under limited supervision. We study few-shot VLA adaptation from a generation-selection perspective and propose a novel framework, VGAS (Value-Guided Action-chunk Selection). It performs inference-time best-of-N selection to identify action chunks that are both semantically faithful and geometrically precise. Specifically, VGAS employs a finetuned VLA as a high-recall proposal generator and introduces Q-Chunk-Former, a geometrically grounded Transformer critic to resolve fine-grained geometric ambiguities. In addition, we propose Explicit Geometric Regularization (EGR), which shapes a discriminative value landscape to preserve action ranking resolution among near-miss candidates while mitigating value instability under scarce supervision. Experiments and theoretical analysis demonstrate that VGAS consistently improves success rates and robustness under limited demonstrations and distribution shifts. Our code is available at https://github.com/Jyugo-15/VGAS
    Machine LearningFew-shot learningMachine LearningOffline reinforcement learningRoboticsLearning in robotics
  60. #749

    TCDA: Thread-Constrained Discourse-Aware Modeling for Conversational Sentiment Quadruple Analysis

    Xinran Li, Xinze Che, Yifan Lyu, Zhiqi Huang, Xiujuan Xu
    Conversational Aspect-based Sentiment Quadruple Analysis (DiaASQ) needs to capture the complex interrelationships in multiple rounds of dialogues. Existing methods usually employ simple Graph Convolutional Networks (GCN), which introduce structural noise and fail to consider the temporal sequence of the dialogues, or use standard RoPE, which implicitly captures relative distances in a flat sequence but cannot clearly separate the token-level syntactic order from the utterance-level progression, and may suffer from the Distance Dilution problem. To address these issues, we propose a new framework that combines Thread-Constrained Directed Acyclic Graph (TC-DAG) and Discourse-Aware Rotary Position Embedding (D-RoPE). Specifically, TC-DAG filters out cross-thread noise based on thread constraints, maintains global connectivity through root anchoring, and incorporates the temporal sequence of the dialogues. D-RoPE aligns multi-layer semantics using dual-stream projection and multi-scale frequency signals, captures thread dependencies using tree-like distances, and alleviates the token-level Distance Dilution problem by incorporating utterance-level progressions. Experimental results on two benchmark datasets demonstrate that our framework achieves state-of-the-art performance.
    Natural Language ProcessingDialogue and interactive systemsNatural Language ProcessingNamed entitiesNatural Language ProcessingSentiment analysis, stylistic analysis, and argument mining
  61. #760

    SpecBridge: Spectral Structure Alignment and Transitive Bridging for 3D–2D–Text Pre-Training

    Dong Wang, Jie Jiang, Weidong Min, Lixin Zhan, Xinpeng Zhao, Ze Zhang
    Open-vocabulary 3D understanding aims to align 3D representations with a unified vision-language semantic space. However, existing methods suffer from the challenge of structural asymmetry caused by sparse observations and holistic geometries. Additionally, the inherent semantic chasm between discrete coordinates and abstract natural language hinders cross-modal mapping. This work introduces SpecBridge, a 3D-2D-Text pre-training framework that leverages CLIP priors as a foundational bridge to connect three modalities by synergizing spectral graph theory with transitive semantic learning. The method consists of two main sub-modules. First, Spectral Eigen-Modal Alignment is presented to construct cross-modal correlations within the intrinsic eigen-spectral space via Laplacian eigendecomposition. By aligning low-frequency geometric harmonics with CLIP-guided observation inference, it maintains structural consistency between 3D geometries and 2D projections. Second, a Transitive Spectral-Semantic Alignment method is developed to establish a 3D→2D→Text propagation chain across the CLIP priors bridge. It distills dense CLIP priors into 3D representations through transitive distillation, effectively mitigating the semantic chasm between geometry and language. Extensive experiments confirm that the proposed SpecBridge demonstrates state-of-the-art performance by overcoming challenges triggered by modality discrepancies.
    Computer Vision3D computer visionMachine LearningMulti-modal learningMachine LearningRepresentation learning
  62. #772

    HFFN-ID: A Hierarchical Feature Fusion Network with Bi-Phase Subject ID Modulation for EEG Mel-Spectrogram Reconstruction

    Chi Huang, Zhaohu Liu, Yong Peng, Wanzeng Kong
    High-fidelity reconstruction of mel-spectrograms from EEG signals remains a formidable challenge, primarily due to the inherent inter-subject variability of neural patterns and the semantic gap between heterogeneous feature representations of these two modalities. To alleviate both issues, this paper proposes a Hierarchical Feature Fusion Network with bi-phase subject IDentifier modulation (HFFN-ID) for reconstructing mel-spectrograms from EEG signals. The primary improvements of the HFFN-ID framework are from two aspects, a Binary-Phase Subject-ID Modulation (BiPSM) mechanism for explicit subject conditioning and a Condition-guided Hierarchical Fusion (CHF) component for dynamic multi-layer feature synthesis, which respectively aim to address the problems of inter-subject variability in neural patterns and heterogeneous cross-modality feature fusion. Moreover, a speech envelope feature-based pre-training strategy is incorporated to initialize the parameter space, inspired by the shared low-level representations across different speech features. On the SparrKULee dataset, HFFN-ID establishes a new benchmark with a pearson correlation coefficient of 0.0723, representing a substantial 44% relative improvement over existing baseline. Importantly, HFFN-ID is highly efficient, achieving a 38.7% parameter reduction and a 2.5× training speedup. These results highlight the effectiveness of subject-conditioned and hierarchically fused architectures for advancing high-fidelity neural speech decoding.
    Humans and AIHuman-computer interactionHumans and AICognitive modelingHumans and AIIntelligent user interfaces
  63. #803

    UniSAGE: Unifying Static and Dynamic Attributes with Hyper-Structure

    Taoran Fang, Yan Deng, Chunping Wang, Yang Wang, Lei Chen, Yang Yang
    With the rapid growth of digital data, real-world applications increasingly involve hierarchical information that combines static attributes with dynamic records. Modeling such heterogeneous data in a unified and generalizable manner remains challenging. Existing approaches often rely on extensive manual design, are tightly coupled to specific data schemas, and typically process static and dynamic attributes in isolation, thereby overlooking their implicit interactions. We propose UniSAGE, a unified framework for modeling data with both static and dynamic attributes. UniSAGE constructs a global attribute graph that represents hierarchical and temporal relationships in a unified structure. To ensure representational consistency, it introduces two orthogonal parameter subspaces that jointly support static aggregation and dynamic reasoning within a shared semantic space. Building on these unified representations, UniSAGE further enables task-specific interaction between static and dynamic attributes via a lightweight hyper-structure mechanism. UniSAGE is fully automated, robust to evolving data schemas, and capable of capturing complex cross-attribute dependencies. Extensive experiments on multiple public benchmarks and a real-world financial behavior dataset demonstrate that UniSAGE consistently outperforms existing methods, achieving performance improvements of over 10% on several tasks. Our code is available at https://github.com/zjunet/UniSAGE.
    Data MiningMining graphsData MiningMining heterogenous dataData MiningMining semi-structured data
  64. #804

    A₃B₂: Adaptive Asymmetric Adapter for Alleviating Branch Bias in Vision-Language Image Classification with Few-Shot Learning

    Yiyun Zhou, Zhonghua Jiang, Wenkang Han, Kunxi Li, Mingjing Xu, Chang Yao, Jingyuan Chen
    Efficient transfer learning methods for large-scale vision–language models (e.g., CLIP) enable strong few-shot transfer, yet existing adaptation methods follow a fixed fine-tuning paradigm that implicitly assumes a uniform importance of the image and text branches, which has not been systematically studied in image classification. Through extensive analysis, we reveal a Branch Bias issue in vision–language image classification: adapting the image encoder does not always improve performance under out-of-distribution settings. Motivated by this observation, we propose A₃B₂, an Adaptive Asymmetric Adapter that alleviates Branch Bias in few-shot learning. A₃B₂ introduces Uncertainty-Aware Adapter Dampening (UAAD), which automatically suppresses image-branch adaptation when prediction uncertainty is high, enabling soft and data-driven control without manual intervention. Architecturally, A₃B₂ adopts a lightweight asymmetric design inspired by mixture-of-experts with Load Balancing Regularization. Extensive experiments on three few-shot image classification tasks across 11 datasets demonstrate that A₃B₂ consistently outperforms 11 competitive prompt- and adapter-based baselines.
    Computer VisionMultimodal learningComputer VisionTransfer, low-shot, semi- and un- supervised learning
  65. #831

    Guiding Team Objectives with Individual Policies: A Two-Stage Model Aggregation Framework for Partially Cooperative MARL

    Wanting Liu, Baoxi Wang, Pengyi Li, Lu Jiang, Chengwei Zhang
    In Partially Cooperative Markov Games (PCMG), agents need to learn effective cooperation under individual rewards, with the key challenge lies in leveraging these rewards to enhance team benefits optimally. Model aggregation (MA) is a promising solution due to its simplicity and efficiency, yet it suffers from instability caused by the volatility of individual policies trained with individual rewards and the non-asymptotic nature of aggregation updates. To address these challenges, we propose Two-Stage Model Aggregation (TSMA), a novel multi-policy MA framework for multi-agent reinforcement learning (MARL) that leverages individual policies to enhance team cooperation. Specifically, to enhance the stability and expressiveness of individual policies, each TSMA agent trains two individual policies (trained using the same individual reward) with parameter-level variation and combines them via adaptive weighted aggregation. The aggregated individual policy is then integrated with the team policy (trained using team reward), with integration weights dynamically adjusted over training.
    Additionally, L2 regularization is applied during the data-driven training to align their parameter distributions across policies, mitigating instability from non-asymptotic aggregation. As a general framework, TSMA is compatible with standard actor-critic MARL algorithms. We integrate TSMA with MAPPO and MADDPG, and evaluate them on challenging PettingZoo benchmarks, demonstrating significant improvements in PCMG. The code is available at: https://github.com/RL-DLMU/TSMA.
    Agent-based and Multi-agent SystemsCoordination and cooperationAgent-based and Multi-agent SystemsMulti-agent learning
  66. #834

    Bridging the Objective Gap: A Unified Pre-Training Framework for Few-Shot Medical Image Segmentation

    Shoupeng Chen, Yiming Miao, Limei Peng, Pin-Han Ho
    Few-shot medical image segmentation relies on dense, boundary-sensitive prototype matching, yet common pre-training objectives mainly optimize global alignment or reconstruction, creating an objective gap that hurts boundary delineation and increases adaptation cost. This raises the question: how to pre-train representations intrinsically matchable for episodic FSS while requiring minimal adaptation-induced re-organization? We introduce a Drift-Gap diagnostic to quantify intrinsic dense-matching misalignment and adaptation-induced feature drift. Guided by this lens, we propose BOG-PRETRAIN, combining Reliability-Gated Alignment to mitigate noisy report supervision, Semantic-Guided MIM to emphasize boundary-informative regions, and Dual Consistency Regularization to stabilize episodic metric geometry. Across five benchmarks (1/4/16-shot), BOG-PRETRAIN improves mean Dice by +15.5/+15.9/+12.5 points over best priors and reduces mean HD95 by 0.9/4.6/10.1; it achieves the lowest Drift (0.052 vs. 0.155 baseline) and Gap_PT (0.256), with ablations confirming the components' complementarity.
    Computer VisionBiomedical image analysisComputer VisionSegmentation, grouping and shape analysisMachine LearningFew-shot learning
  67. #837

    Transferable Attacks on Open-Vocabulary Video Instance Segmentation via Dual-Objective Triggers

    Minghao Shou, Kesen Wang, Tong Zhang, Han Bao, Zonghui Wang
    Open‑vocabulary video instance segmentation (OV‑VIS) couples spatial‑temporal reasoning with language grounding, yet its adversarial robustness has remained unexplored.
    We present the Dual-Objective Triggers (DOT), the first transferable attack on OV-VIS that simultaneously exploits the vision–language coupling and temporal coherence. DOT deploys a Dual Semantic Perturbation Module that overlays two complementary triggers: a Semantic Suppression Trigger erases the alignment between the true object and the query, while a Plausible Replacement Trigger steers the tracker toward a phantom trajectory that is visually plausible and text‑consistent. To amplify cross‑model transferability without sacrificing perceptual fidelity, we introduce Phase‑Guided Adversarial Training, which injects perturbations primarily in the phase spectrum while blending amplitudes with clean references. Extensive experiments on four state‑of‑the‑art OV‑VIS implementations demonstrate that DOT reduces mAP by up to 69.3\% and raises attack success rate by up to 98\%, outperforming the strongest baselines by a factor of 1.6$\times$ on average, while maintaining a PSNR of 52.49 dB, thus exposing critical security vulnerabilities and laying a foundation for future research on robust and trustworthy vision–language systems.
    Computer VisionAdversarial learning, adversarial attack and defense methodsComputer VisionVideo analysis and understandingComputer VisionVision, language and reasoningMachine LearningAdversarial machine learningMachine LearningMulti-modal learning
  68. #847

    MLDA: Test-Time Multi-Level Adaptation with Dynamic Alignment for Compositional Zero-Shot Learning

    Miaoge Li, Yu Liu, Jingcai Guo
    Compositional Zero-Shot Learning (CZSL) aims to recognize novel attribute–object compositions by leveraging shared knowledge from seen primitives. While recent works have begun to incorporate test-time adaptation to mitigate performance degradation caused by label-space distribution shifts, they suffer from primitive-level knowledge underutilization and cross-modal semantic asymmetry, hindering robust understanding of novel compositions. To address these issues, we introduce MLDA, a test-time adaptation framework for CZSL that jointly derives multi-level prototypes and performs dynamic vision–language alignment. Specifically, we construct and progressively update composition- and primitive-level prototype sets to accumulate structured semantic knowledge. By augmenting test inputs with diverse visual perspectives and enriched compositional descriptions, MLDA leverages optimal transport to achieve consistent matching of semantically salient information across modalities. Furthermore, composition activation scores are dynamically updated over the test stream to emphasize informative compositions and suppress irrelevant ones. Extensive experiments are conducted to verify the effectiveness of the proposed method. The code is available at https://github.com/keepgoingjkg/MLDA.
    Computer VisionTransfer, low-shot, semi- and un- supervised learningMachine LearningCost-sensitive learningMachine LearningFoundation modelsMachine LearningLearning sparse modelsMachine LearningOpen-World/Open-Set/OOD Learning
  69. #856

    StructureBench: A Unified Benchmark Suite for Multi-Scenario Structured Generation Tasks with On-Device Models

    Xiaokun Xiong, Zhengjie Xu, Junyi Chen, Shihao Bai, Ruihao Gong, Xianglong Liu
    Structured output generation is increasingly critical for real-world AI systems, particularly in on-device settings where small language models (0.5B–8B parameters) must produce machine-executable outputs under strict latency and privacy constraints. Although constrained decoding provides formal guarantees of structural validity without retraining, its effectiveness across different tasks, models, and constraint formalisms remains insufficiently understood. We introduce StructureBench, a comprehensive benchmark for structured generation on edge devices, covering JSON and tool-call generation, code synthesis, mathematics, science, and domain-specific languages, and evaluating over 11 on-device language and vision–language models spanning 0.5B–8B parameters. StructureBench systematically compares prompt-based generation with three widely used constrained decoding frameworks—Outlines, XGrammar, and Guidance—and adopts decoupled metrics to separately assess structural validity and semantic correctness. Our experiments show that constrained decoding consistently enforces syntactic validity, but does not reliably improve semantic accuracy and may even degrade performance for smaller models or complex grammars. These findings reveal clear task- and model-dependent boundaries for effective constrained decoding and highlight the need for scenario-aware adaptation in on-device structured generation. We release StructureBench at https://github.com/Str-Ben/StructureBench.
    Constraint Satisfaction and OptimizationConstraint satisfactionNatural Language ProcessingApplicationsNatural Language ProcessingInformation extractionNatural Language ProcessingLanguage generationNatural Language ProcessingTools
  70. #862

    EyeCue: Driver Cognitive Distraction Detection via Gaze-Empowered Egocentric Video Understanding

    Lang Zhang, JinYi Yoon, Matthew Corbett, Abhijit Sarkar, Bo Ji
    Driver cognitive distraction is a major cause of road collisions and remains difficult to detect. Unlike manual or visual distraction, cognitive distraction is diverted by thoughts unrelated to driving, even when the driver appears visually attentive and exhibits no explicit physical movements. In this work, we propose EyeCue, a gaze-empowered egocentric video understanding framework, to detect driver cognitive distraction. A key insight is that cognitive distraction manifests in the interaction between eye gaze and visual context. To capture this interaction, EyeCue integrates eye gaze with egocentric video to enable context-aware modeling of the driver's attention over time. Furthermore, to tackle the limited scale and diversity of existing datasets, we introduce CogDrive, a comprehensive multi-scenario dataset that augments four existing driving datasets with cognitive distraction annotations. Through extensive evaluations on CogDrive, we show that EyeCue achieves the highest accuracy of 74.38%, outperforming 11 baselines from 6 model families by over 7%. Notably, EyeCue can achieve an accuracy of over 70% across various driving scenarios (different road types, times of day, and weather conditions) with strong generalizability. These results highlight the importance of modeling gaze-context interactions and the effectiveness of cross-modal interaction modeling for multimodal cognitive distraction detection. Our codes and CogDrive dataset resources are available at https://github.com/langzhang2000/EyeCue.
    Humans and AICognitive modelingMachine LearningMulti-modal learningMultidisciplinary Topics and ApplicationsTransportation
  71. #871

    Bridging Inter-View and Client Heterogeneity: Federated Multi-View Clustering Under Non-IID Data

    Jiazhen Wang, Xinyue Chen, Shuaiyu Liu, Zican He, Yazhou Ren, Yi Wang, Shuyin Xia
    Federated multi-view clustering (FedMVC) has been widely used to discover latent structures in distributed multi-view data, but most methods assume independent and identically distributed (IID) data. In practice, non-IID distributions with partial and imbalanced categories cause clients to learn biased, local representations, leading to model bias and unstable federated training performance. To address these issues, we propose a FedMVC framework called Bridging Inter-View and Client Heterogeneity: Federated Multi-View Clustering under Non-IID Data (H$^{2}$-FedMVC), which tackles both inter-view and client heterogeneity in mixed-view settings. We propose a leader-guided learning mechanism to address inter-view heterogeneity by enhancing complementary and discriminative feature learning, and an expert adaptive aggregation strategy to handle client heterogeneity by prioritizing well-matched experts in global updates. Theoretical analysis and experiments demonstrate superior performance in heterogeneous mixed-view FedMVC settings, with code provided in the supplementary materials.
    Machine LearningClusteringMachine LearningFederated learningMachine LearningMulti-view learning
  72. #878

    LLMs as Parametric Knowledge Sources for Knowledge Graph Completion

    Deyu Chen, Qiyuan Li, Jinguang Gu, Meiyi Xie, Hong Zhu
    Knowledge graphs (KGs) serve as crucial symbolic knowledge sources for many downstream applications but often suffer from incompleteness. Recent efforts have attempted to integrate pre-trained language models and KGs for the knowledge graph completion (KGC) task. However, existing methods introduce substantial model complexity or rely on prompt-based solutions that inject priors at a shallow level, which do not exploit the factual knowledge inherently encoded in model parameters. To address these issues, we propose a novel framework that treats LLMs as parametric knowledge sources for KGC. Our framework performs LLM knowledge elicitation to extract factual knowledge from the model’s internal representations. Then, a cross-granularity representation alignment method transforms sentence-level representations into entity-level representations and aligns them within a unified space. Finally, a dynamic learning schedule balances alignment and expressiveness throughout training. Extensive experiments on multiple benchmark datasets show that our proposed method can be integrated with diverse KGC baselines and consistently improves link prediction in both standard and lifelong settings.
    Knowledge Representation and ReasoningLearning and reasoningKnowledge Representation and ReasoningSemantic WebNatural Language ProcessingApplications
  73. #888

    Meta-Cognitive Resonance Label Correction for Instance-Dependent Noise

    Gaoxia Jiang, Jie Su, Senyu Hou, Jia Zhang, Wenjian Wang
    Training with instance-dependent label noise poses a critical challenge in deep learning, severely impeding model generalization. Existing methods often struggle to break the vicious cycle where corrupted labels degrade feature representations, which in turn impairs noise identification. To overcome this, we propose the Meta-Cognitive Resonance Label Correction (MecReC) framework. For feature representation, MecReC structurally decouples representation learning from label correction, leveraging a pre-trained model as a frozen anchor to ensure feature robustness. For label correction, we introduce a novel meta-cognitive resonance mechanism that aggregates multiple complementary signals to comprehensively assess label reliability. These collaborative signals are fed to a lightweight meta-weight network, which automatically learns whether and how to correct each label. Furthermore, our personalized confidence thresholding strategy guarantees a tighter error bound for label purification. Extensive experiments demonstrate that MecReC consistently outperforms SOTA methods, exhibiting superior robustness particularly under extreme noise conditions. Code and additional details are available at https://github.com/JieSu272/MecReC.
    Machine LearningClassificationMachine LearningMeta-learningMachine LearningWeakly supervised learning
  74. #907

    Progressive Subexpression Reuse in Symbolic Regression: Insights from RL-based Search and a Genetic Programming Realization

    Xiangdong Wu, Wenjun Wu, Bingrun Chen, Junle Wang, Zhaoxin Fan, Haoyi Zhou, Pengju Zhang, Rongye Shi
    Symbolic regression (SR) aims to recover compact and interpretable mathematical expressions from data.
    Genetic programming (GP) directly searches over symbolic structures, but its population dynamics can make it difficult to reliably preserve and accumulate useful subexpressions.
    In contrast, reinforcement learning (RL)-based SR has shown strong empirical performance, suggesting that learned sampling dynamics may capture useful regularities in symbolic search.
    Motivated by this contrast, we analyze expressions sampled during RL training and identify a recurring pattern, termed progressive subexpression reuse, where useful simple subexpressions emerge early, become increasingly frequent, and support the formation of more complex structures.
    Based on this observation, we propose Reinforcement Genetic Programming (RGP), a purely GP-based and non-RL framework that explicitly realizes a stage-wise retention--reintroduction loop through a dynamically maintained subexpression pool and pool-guided population initialization.
    Experiments on standard SR benchmarks show that RGP matches or outperforms strong baselines without RL policy training, suggesting that progressive subexpression reuse is an effective mechanism for SR search. We release our code at https://github.com/wuxiangdong586-max/RGP.
    Machine LearningRegressionMachine LearningReinforcement learningSearchEvolutionary computation
  75. #931

    EKFEdit: Extended Kalman Filter for Training-Free Flow-Based Image Editing

    Chengfu Zou, Yaochen Li, Yi Han, Wenlong Zhou, Jia Shu
    Despite recent advances in training-free text-guided image editing using pretrained rectified flow models, achieving a balance between text adherence and background preservation remains a challenge. Existing methods typically treat image editing as an open-loop trajectory generation task, adopting either an inversion-based or inversion-free paradigm. However, neither can perfectly achieve this goal. Specifically, inversion-based approaches accumulate inversion errors without feedback, whereas inversion-free methods exhibit unstable trajectories due to the lack of constraints. To address this, we propose EKFEdit, a novel framework that formulates image editing as a sequential state estimation problem in a nonlinear dynamic system, which models inversion-free editing trajectories as the system’s prior dynamics and leverages an Extended Kalman Filter to adaptively fuse observations from external editing algorithms. EKFEdit enables adaptive trajectory correction, effectively combining the complementary strengths of the different paradigms. We instantiate this framework into two concrete approaches (EKFEdit-RF and EKFEdit-DNA) via different observation methods, respectively. Extensive experiments demonstrate that our approaches achieve new state-of-the-art performance compared to previous methods.
    Code is available at https://github.com/baiyeweiguang/EKFEdit.
    Computer VisionImage and video synthesis and generationMachine LearningGenerative models
  76. #943

    TransAlpha: Lightweight Design Empowers Stock Return Forecasting

    Xiao Yang
    In intraday stock return forecasting, existing Transformer methods face high computational complexity and do not explicitly handle noisy predictors. We propose TransAlpha, a light-weight Transformer variant tailored for cross section data to advance end-to-end forecasting methods with three novel components (Cross-Sectional Denoising for signal purification, Temporal Attention Gating for trend weighting, and Smart Pooling to avoid information loss) along with a multi-component hybrid loss function. We validate on full A-share market stock data and four key segments (CSI 300/500/1000/2000) with 400 proprietary alpha factors of 15-minute frequencies. TransAlpha outperforms state-of-the-art baselines in predictive power and portfolio profitability, indicating tailored Transformer models have significant practical value in quantitative trade.
    Machine LearningApplicationsMachine LearningDeep learning architecturesMachine LearningSupervised LearningMultidisciplinary Topics and ApplicationsFinance
  77. #950

    Multi-view Regression Clustering via Low-Rank Manifold Decomposition

    Xiaowei Zhao, Xinyu Kou, Yan Chen, Linrui Xie, Qiang Zhang, Liang Du
    Similarity-graph-based multi-view clustering is effective in capturing non-Gaussian cluster structures, yet it suffers from an inherent conflict between scalability and geometry-consistent cluster assignment. In particular, cluster labels are typically inferred via secondary clustering rather than being learned directly from the underlying data geometry. Although anchor-based extensions alleviate the computational burden, they introduce the challenge of anchor-number selection.
    To address these limitations, we propose Multi-view Clustering via Manifold Decomposition (MvMD), which directly infers cluster labels from multi-view data without explicit similarity graph construction or anchor selection. MvMD reformulates multi-view clustering as a unified multi-view regression problem, where cluster labels are optimized as model variables. To improve robustness against missing small clusters, a global balance regularization is incorporated. Meanwhile, local geometric consistency and the low-rank structure of cluster assignments are jointly enforced through a symmetric matrix factorization scheme, which is efficiently realized via a truncated SVD-based low-rank approximation.
    Extensive experiments on benchmark datasets validate the effectiveness and efficiency of the proposed MvMD. The code is available at https://github.com/Vince-Doit/MvMD.
    Machine LearningClusteringMachine LearningMulti-view learningMachine LearningUnsupervised learning
  78. #956

    FedENC: Federated Adaptive CLIP Model via Energy Alignment and Neural Collapse

    Yihang Wu, Ahmad Chaddad, Sarah A. Alkhodair
    Federated Learning (FL) faces two major challenges when deploying large vision-language models such as contrastive language-image pretraining (CLIP) model, namely high computation/communication overhead and performance degradation due to data heterogeneity. To address these, we propose FedENC, a novel federated adaptive CLIP framework. FedENC keeps the CLIP encoder frozen and introduces a lightweight communicable global expert adaptation module for learning general knowledge across clients and two local experts for client-specific adaptation to reduce computation and communication overhead. To mitigate data heterogeneity, we introduce neural collapse (NC) alignment to encourage the model to learn unified and balanced image feature representations. To enrich the performance, we propose a cosine and Kullback–Leibler (KL) divergence based alignment function to improve the prediction consistency between the global and local experts. Finally, we ensemble their outputs for robust inference. Extensive experiments on six datasets (4 natural + 2 medical) in challenging settings (e.g., domain shift) show that FedENC improves the test accuracy (e.g., by +5.37% on DomainNet compared to LoRA) with reasonable communication load (e.g., 15.2x lower than LoRA, with 3.3GB GPU memory usage). The code is available at https://github.com/AIPMLab/FedENC.
    Computer VisionMultimodal learningComputer VisionRecognition (object detection, categorization)Machine LearningClassificationMachine LearningFederated learningMachine LearningFoundation models
  79. #959

    Stability and Generalization for Decentralized Markov SGD

    Jiahuan Wang, Ziqing Wen, Ping Luo, Dongsheng Li, Tao Sun
    Stochastic gradient methods are central to large-scale learning, yet their generalization theory typically relies on independent sampling assumptions. In many practical applications, data are generated by Markov chains and learning is performed in a decentralized manner, which introduces significant analytical challenges. In this work, we investigate the stability and generalization of decentralized stochastic gradient descent (SGD) and stochastic gradient descent ascent (SGDA) under Markov chain sampling. Leveraging a stability-based framework, we characterize how Markovian dependence and decentralized communication jointly influence generalization behavior. Our analysis captures the effects of network topology, Markov chain mixing properties, and primal–dual dynamics. We establish non-asymptotic generalization bounds for both algorithms, extending existing results on Markov stochastic gradient methods to decentralized and minimax settings.
    Machine LearningLearning theoryMachine LearningOptimization
  80. #964

    Revealing Modular Gradient Noise Imbalance in LLMs: Calibrating Adam via Signal-to-Noise Ratio

    Ziqing Wen, Zhouyang Liu, Jiahuan Wang, Ping Luo, Li Shen, Dongsheng Li, Tao Sun
    The impressive performance of large language models (LLMs) arises from their massive scale and heterogeneous module composition. However, this structural heterogeneity poses significant optimization challenges. While adaptive optimizers such as Adam(W) provide per-parameter adaptivity, they do not explicitly account for module-level gradient heterogeneity, often resulting in slower convergence, suboptimal performance, or occasional training instability. Existing approaches typically rely on manually tuned module-specific learning rates or specific optimization strategies, which are computationally costly and difficult to generalize across tasks or models. To establish a more principled approach, we first analyze the noise-damping behavior of Adam in high-noise modules and introduce \textbf{Module-wise Learning Rate Scaling via SNR (MoLS)}. MoLS estimates module-level SNRs to scale Adam updates, allowing automated module-wise learning rate allocation without manual tuning. Empirical results across multiple LLM training benchmarks demonstrate that MoLS improves convergence speed and generalization, achieving performance comparable to carefully tuned module-specific learning rates, while remaining compatible with memory-efficient training algorithms.
    Machine LearningAttention modelsMachine LearningDeep learning architecturesMachine LearningOptimizationMachine LearningRepresentation learning
  81. #975

    Hybrid-Adaptive Thread Tuning to Mitigate Simulation Execution Bottlenecks in High-Performance Reinforcement Learning Inference

    Jiming Su, Hantao Hua, Lujia Yin, Yiping Yao, Feng Zhu
    In simulation-in-the-loop decision-making systems, reinforcement learning (RL) inference is often constrained by simulator-side execution overhead, where workloads are highly dynamic and sensitive to runtime thread configurations. Existing multithreaded strategies struggle to match thread resources before or during execution, causing resource contention, scheduling overhead, and reduced throughput. Through empirical analysis, we identify the ratio of task execution time to scheduling time as the key factor determining the optimal thread count. Building on this insight, we propose AutoThread, a hybrid adaptive thread-tuning method for mitigating simulation bottlenecks in RL inference. AutoThread employs a Physics-Informed Neural Operator (PINO) as a thread-count predictor and incorporates a finite-source M/M/1 queueing model to constrain and guide prediction, enabling fast and accurate estimation under dynamic workloads. It further performs load-aware online fine-tuning to compensate for prediction errors and refine resource allocation. Experiments show that AutoThread improves average speedup by 18.4\% over static strategies, achieves average throughput of 1.7× and 1.8× that of XGBoost and Reinforcer, respectively, and reduces execution time by up to 83.8\% compared with state-of-the-art methods. Our code and dataset are publicly available at \url{https://github.com/suchenjm/AutoThread}.
    Agent-based and Multi-agent SystemsEngineering methods, platforms, languages and toolsMachine LearningKnowledge-aided learningMachine LearningMultiagent Reinforcement LearningMachine LearningReinforcement learningPlanning and SchedulingScheduling
  82. #978

    Active Arbitration: Decoupling Spatio-Temporal Duality for Efficient Traffic Forecasting

    JiaJun Yu, Fang Yuan, Guang-Yong Chen, Min Gan
    Spatio-temporal forecasting is inherently bottlenecked by a phenomenon we term Spatio-Temporal Duality: the paradoxical coexistence of time-varying physical lags and instantaneous semantic synchrony. Constrained by passive coupling, existing architectures often employ indiscriminate spatial aggregation that fails to dynamically arbitrate interaction intensity based on traffic states, forcing models to rely on redundant deep stacking to approximate complex dynamics. We address this issue by introducing Time Arbitrated Spatial GNN (TAS-GNN), an efficient framework that arbitrates spatial interactions through time. Our model leverages deep temporal semantics to dynamically and proactively manage the aggregation of spatial domains, effectively decoupling physical connectivity from semantic relevance. In addition, to ensure model simplicity, we also introduce spectral decoupling via discrete wavelet Transform (DWT) to capture multiscale dependencies with minimal overhead. Experiments on three real-world datasets (PEMS04, 07, 08) show that TAS-GNN not only outperforms the current baseline model in accuracy, but also significantly improves the inference speed while reducing the number of parameters.
    Machine LearningAttention modelsMachine LearningGeometric learningMachine LearningSequence and graph learningMachine LearningTime series and data streams
  83. #981

    PTF-Net: Pseudo-Temporal Feature Fusion Network for Bi-Temporal Semantic Change Detection

    Xin Li, Xin Dong
    Bi-temporal Semantic Change Detection (SCD) is a fundamental analytical framework for capturing state transitions across various domains, with remote sensing being a key application area in this work. However, the prevailing paradigm of comparing two discrete temporal snapshots suffers from inherent limitations due to temporal discontinuity. These limitations lead to pronounced radiometric inconsistencies and pseudo-change noise in Bi-temporal SCD. To address these, we propose the Pseudo-Temporal Feature Fusion Network (PTF-Net), which reconceptualizes bi-temporal SCD as continuous pseudo-temporal evolution modeling. This novel paradigm facilitates the recovery of the missing temporal context between observation points. At its core lies an innovative Non-Linear Style Interpolation Mechanism, which projects discrete bi-temporal observations onto a smooth latent manifold to simulate plausible surface evolution dynamics. The mechanism synthesizes a sequence of semantically coherent intermediate representations, which inherently bridge the temporal gap and disentangles style variations from semantic changes. To fully exploit this synthesized sequence, we design a dedicated Temporal Dynamics Perception Branch. This branch employs efficient temporal interactions to robustly discriminate structural mutations from transient noise, effectively acting as a semantic filter. Extensive experiments on three public datasets reveal that PTF-Net consistently outperforms state-of-the-art methods, achieving comprehensive superiority across mIoU, Fscd, and SeK metrics.
    Computer VisionSegmentation, grouping and shape analysis
  84. #984

    LFS: Learnable Frame Selector for Event-Aware and Temporally Diverse Video Captioning

    Lianying Chao, Linfeng Yin, Peiyu Ren, Yifan Jiang, Qiaoyu Ren, Dingcheng Shan, Jingcheng Pang, Sijie Wu, Xubin Li, Kai Zhang
    Video captioning models convert frames into visual tokens and generate descriptions with large language models (LLMs). Since encoding all frames is prohibitively expensive, uniform sampling is the default choice, but it enforces equal temporal coverage while ignoring the uneven events distribution. This motivates a Learnable Frame Selector (LFS) that selects temporally diverse and event-relevant frames. LFS explicitly models temporal importance to balance temporal diversity and event relevance, and employs a stratified strategy to ensure temporal coverage while avoiding clustering. Crucially, LFS leverages caption feedback from frozen video-LLMs to learn frame selection that directly optimizes downstream caption quality. Additionally, we identify the gap between existing benchmark and human's cognition. Thus, we introduce ICH-CC built from carefully designed questions by annotators that reflect human-consistent understanding of video. Experiments indicate that LFS consistently improves detailed video captioning across two representative community benchmarks and ICH-CC, achieving up to 2.0% gains on VDC and over 4% gains on ICH-CC. Moreover, we observe that enhanced captions with LFS leads to improved performance on video question answering. Overall, LFS provides an effective and easy-to-integrate solution for detailed video captioning.
    Computer VisionVideo analysis and understanding
  85. #987

    MonoPure: Multi-Component Purification via Disentangled, Projective Representations for Monocular 3D Object Detection

    Yeon Woo Cho, Jung Woo Cheon, Seung-hyeok Back, Seok Bong Yoo
    Monocular 3D object detection is a cost-efficient alternative to multisensor systems, yet it remains fragile to multi-component adversarial attacks that perturb the image and tamper with camera calibration. Compounded distortions degrade 3D reasoning by disrupting the correspondence between the 3D geometry and 2D image plane. To address this problem, this work proposes MonoPure, a monocular 3D object detection framework that performs multi-component purification via disentangled and projective representations. MonoPure incorporates a disentangled purification and segmentation module that purifies the image data, with a target-region probability map steering diffusion-based purification to focus on task-relevant regions. In addition, MonoPure presents a 3D detection decoder that integrates 2D skeleton keypoints as object-level spatial cues, enabling occlusion-robust 3D detection. Finally, a projective calib-purification module restores compromised intrinsics by iteratively minimizing the reprojection error between projected 3D boxes and calibration-invariant 2D detection boxes. The experiments confirm that MonoPure outperforms prior detectors under multi-component attacks and occlusion.
    Computer Vision3D computer visionComputer VisionAdversarial learning, adversarial attack and defense methodsComputer VisionRecognition (object detection, categorization)Multidisciplinary Topics and ApplicationsReal-time systems
  86. #988

    Robust, Generalizable Proactive Face-swapping Defense via Semantic Gradient Divergence

    Seung-hyeok Back, Do Hyun Ki, Juwan Kim, Seok Bong Yoo
    The rapid progress of identity-feature-based face-swapping technology has raised concerns about impersonation and privacy violations. Although proactive defenses aim to block identity extraction at the source, existing methods suffer from perceptible visual artifacts, poor generalization across diverse deepfake models, and vulnerability to post-processing techniques (e.g., diffusion purification, image compression, and transformations). This work proposes a robust, generalizable proactive face-swapping defense via semantic gradient divergence (SGD-Guard) to address these challenges. It introduces an integrated feature gallery that uses CLIP features and a generalized identity feature, obtained by iteratively refining heterogeneous identity features into a homogeneous representation. This framework facilitates our semantic distortion attack by leveraging consensus weighting to target specific facial attributes within a CLIP-identity joint embedding space, disrupting deepfake generation while preserving visual fidelity. Furthermore, to ensure robustness against purification and post-processing, this method incorporates a module that prioritizes critical transformations by exploiting directional discrepancies. Comprehensive experiments demonstrate that the method effectively defends against diverse face-swapping models with high cross-model transferability.
    Computer VisionAdversarial learning, adversarial attack and defense methodsComputer VisionImage and video synthesis and generationComputer VisionTransparency, accountability, fairness and privacy
  87. #993

    Retrieval-Guided Completion Hashing with Token–Patch Alignment for Incomplete Cross-Modal Retrieval

    Zhixi Luo, Zhenqiu Shu
    Cross-modal hashing has been extensively adopted in cross-modal retrieval tasks owing to its high storage efficiency and fast retrieval capability. However, in practical applications, multimodal data often suffer from modality missing issues, which cause semantic incompleteness and thus severely impair both cross-modal alignment and hashing-based retrieval performance. To address these issues, this paper proposes a novel incomplete cross-modal retrieval framework, called retrieval-guided completion hashing with token–patch alignment (RGCH-TPA). Specifically, it first constructs a dynamic memory bank based on the features of complete samples and uses available modalities to retrieve and locate similar regions of incomplete samples. Thus, it achieves intra-modal reconstruction of missing data under the condition of available modalities using retrieved expert signals. Moreover, a token-patch level cross-modal alignment mechanism is introduced to model the interaction between text tokens and image patches, while global representations are jointly integrated to further enhance the discriminability and robustness of the learned cross-modal representations. Extensive evaluations on three benchmark datasets demonstrate that the proposed RGCH-TPA method consistently outperforms other state-of-the-art incomplete cross-modal hashing methods across missing-modality ratios. The source code is available at https://github.com/szq0816/RGCH-TPA.
    Computer VisionImage and video retrievalComputer VisionMultimodal learning
  88. #1034

    Label Confidence Recovery with High-order Label Correlation in Partial Multi-label Learning

    Yuanchao Dai, Ximing Li, Ming-Kun Xie, Xurui Li, Changchun Li
    Partial multi-label learning (PML) addresses weakly-supervised scenarios where each instance is associated with a candidate label set containing both ground-truth and noisy labels. Existing PML methods primarily focus on instance-level features or pairwise label correlations for disambiguation. Building on recent insights that exploiting pairwise label correlations improves disambiguation, we observe that high-order label correlations provide even stronger disambiguation evidence in partial settings because ground-truth labels with high-order correlations frequently co-occur, while noise labels produce inconsistent combinations that rarely repeat. To exploit this property, we propose REHC-PML (Label Confidence REcovery with High-order Label Correlations in Partial Multi-label Learning), which mines frequent high-order co-occurrences from candidate sets to identify global high-order label correlations, selects instance-relevant correlations via Gumbel-Softmax pruning, and propagates their evidence to constituent labels for confidence recovery. Self-training iteratively refines pseudo-labels and trains the classifier. Extensive experiments on both real-world and UCI datasets demonstrate the effectiveness of REHC-PML.
    Machine LearningWeakly supervised learning
  89. #1044

    HiCD: Hyperbolic Insight Through Decomposed Educational Graphs for Long-Tailed Cognitive Diagnosis

    Shengwei Ji, Wenli Wang, Yongqiang Xie, Fei Liu, Yonghui Yang
    Cognitive diagnosis (CD) aims to infer students' mastery of knowledge concepts from their response behaviors and constitutes a core component of intelligent education and personalized learning. However, existing graph-based CD models struggle to handle the pronounced long-tail distributions in educational data, where most students and concepts interact with only a limited number of exercises, resulting in suboptimal representation learning and poor generalization to low-frequency instances. To address this challenge, we propose HiCD (Hyperbolic insight for Cognitive Diagnosis), a novel hyperbolic model that embeds students, exercises, and concepts into non-Euclidean space. By exploiting the exponential representational capacity of hyperbolic geometry, HiCD naturally captures hierarchical and sparse structures, effectively alleviating long-tail bias while enhancing embedding expressiveness. A key contribution of HiCD is a hyperbolic diagnostic function that operates directly on the manifold, avoiding Euclidean approximations and preserving geometric consistency. Moreover, HiCD decomposes the educational graph into three semantically distinct subgraphs and assigns each a dedicated curvature, enabling adaptive geometric modeling of heterogeneous relations. Extensive experiments on multiple benchmark datasets demonstrate that HiCD consistently improves diagnostic accuracy and robustness, particularly under severe long-tail scenarios. The source code is available at: https://github.com/CyberXie/HiCD.
    Data MiningMining graphsHumans and AICognitive modelingHumans and AIComputer-aided educationHumans and AIPersonalization and user modeling
  90. #1054

    Online Self-Calibration Against Hallucination in Vision-Language Models

    Minghui Chen, Chenxu Yang, Hengjie Zhu, Dayan Wu, Qingyi Si, Zheng Lin
    Large Vision-Language Models (LVLMs) often suffer from hallucinations, generating descriptions that include visual details absent from the input image. Recent preference alignment methods typically rely on supervision distilled from stronger models such as GPT. However, this offline paradigm introduces a Supervision-Perception Mismatch: the student model is forced to align with fine-grained details beyond its perceptual capacity, learning to guess rather than to see. To obtain reliable self-supervision for online learning, we identify a Generative-Discriminative Gap within LVLMs, where models exhibit higher accuracy on discriminative verification than open-ended generation. Leveraging this capability, we propose Online Self-CAlibRation (OSCAR), a framework that integrates Monte Carlo Tree Search with a Dual-Granularity Reward Mechanism to construct preference data and iteratively refines the model via Direct Preference Optimization. Extensive experiments demonstrate that OSCAR achieves state-of-the-art performance on hallucination benchmarks while preserving general multimodal capabilities.
    Computer VisionMultimodal learningNatural Language ProcessingApplicationsNatural Language ProcessingLanguage models
  91. #1058

    Leveraging Over-Parameterization to Improve the Verifiability of Neural Networks

    Andrea Gimelli, Luca Oneto, Armando Tacchella
    Over-parameterized neural networks, i.e., models with excess
    capacity that can fit training data exactly, have demonstrated superior
    generalization performance compared to classical models with balanced
    capacity. Nevertheless, their deployment in safety-critical domains re-
    mains severely constrained by their susceptibility to, e.g., natural per-
    turbations and adversarial manipulations. Verification techniques can
    solve such problems, but the computational cost of these methods of-
    ten scales poorly, specifically when applied to large models. In this work,
    we demonstrate that over-parameterization can be exploited not merely
    to enhance generalization, but also to mitigate neuron instability, one of
    the parameters affecting the efficiency of verification. Our experimental
    findings suggest that over-parameterization may serve as a crucial mech-
    anism for reconciling the long-standing trade-off between generalization
    and verifiability of neural networks.
    AI Ethics, Trust, FairnesSafety and robustnessMachine LearningDeep learning architectures
  92. #1061

    A Structural-Analysis-Based Information Fusion for Multi-Modal Cross-View Geo-Localization

    Xu Yan, Xiaoran Zhang, Ziwei Shi, Yu Zang, Weiquan Liu, Cheng Wang
    Cross-view geo-localization (CVGL) aims at localizing a ground-level query by retrieving its corresponding match from a database of geo-tagged satellite images. Existing multi-modal CVGL methods lack a structured design in the fusion stage, limiting their ability to fully exploit the information from multiple modalities. To overcome this limitation, we propose a Structural-Analysis-Based fusion principle that guides the design of network architecture. Following this principle, we present Decoupled Query Fusion (DQF), a novel fusion module that decouples feature interactions through role-specific learnable queries. These learnable queries aggregate features into distinct slots. Specifically, modality-specific queries capture unique information from each modality, while cross-modal queries extract redundant and synergistic information between different modalities. To better discriminate positive samples at varying geographic distances, we further propose Geo-aware Circle Loss, which adaptively weights supervision signal based on the geographical distance between samples and anchors. Extensive experiments on large-scale benchmarks demonstrate that our method significantly outperforms state-of-the-art approaches. Comprehensive ablation studies further validate the effectiveness of each component. Our code and models will be made publicly available.
    Computer Vision3D computer visionComputer VisionMultimodal learning
  93. #1065

    Spot-Adaptive Structural Rectification for Spatially Resolved Transcriptomics Data Clustering

    Huanjia Zhao, Shanghui Deng, Shunfan Li, Kun Sun, Weiqing Yan, Chang Tang
    Spatially resolved transcriptomics integrates gene expression with spatial coordinates to decode tissue microenvironments. Existing methods predominantly utilize graph structures to model relationships between spots. However, their performance is bottlenecked by the reliability of gene feature graph, facing the following hurdles: (1) ubiquitous housekeeping genes cause high-expression spots to densely connect with heterogeneous spots, leading to a skewed graph structure; (2) information reduction during Highly Variable Gene (HVG) selection results in the loss of intrinsic local structures. To address these challenges, we propose a Spot-Adaptive Structural Rectification method, called SASR. Specifically, SASR employs a hyperspherical expansion constraint that projects gene expression profiles onto a unit hypersphere to maximize angular distances, effectively separating spots falsely clustered by high total counts. Simultaneously, a topological consistency constraint repairs structural fractures caused by HVG selection via aligning latent embeddings with the local structures of the raw full-gene space. The complementary synergy balances angular discriminability with topological fidelity for accurate clustering. Experiments demonstrate that SASR effectively corrects structural biases and surpasses state-of-the-art methods in spatial clustering.
    Machine LearningFeature extraction, selection and dimensionality reductionComputer VisionRepresentation learningMachine LearningClustering
  94. #1068

    Three Minds, One Student: Online Multi-Teacher Knowledge Distillation for Multimodal Recommenders

    Hangtong Xu, Yuanbo Xu, En Wang
    Existing multimodal recommendation models using complex fusion mechanisms (e.g., attention) or multi-stage processes (e.g., early or late fusion) integrate different modalities. However, attention-based adaptive fusion is prone to shortcut learning, where dominant collaborative signals (ID) can overshadow other modalities. This dominance affects the entire fusion process: early fusion often amplifies biases driven by identity, while late fusion struggles to extract preference-relevant signals from misaligned modalities, even with alignment regularization. To address these issues, we propose Multi-Teacher Single-Student Online Distillation for Multimodal Recommendation (MTS2-4MM), which reframes the multimodal recommendation task from direct fusion to controllable knowledge transfer. Specifically, we construct multiple teachers to specialize in complementary perspectives, and a unified student distills their guidance via objectives at both the ranking and representation levels. This design explicitly controls modality contributions, improves robustness to modality noise and misalignment. Furthermore, we design a Modality-specific Preference Extractor to explicitly extract user preferences across different modalities equally. Extensive experiments across five real-world datasets demonstrate that MTS2-4MM consistently outperforms state-of-the-art baselines, achieving improvements of up to 7.22%.
    Data MiningCollaborative filteringData MiningInformation retrievalData MiningRecommender systems
  95. #1080

    HardSecBench: Benchmarking the Security Awareness of LLMs for Hardware Code Generation

    Qirui Chen, Jingxian Shuai, Shuangwu Chen, Shenghao Ye, Zijian Wen, Xufei Su, Jie Jin, Jiangming Li, Jun Chen, Xiaobin Tan, Jian Yang
    Large language models (LLMs) are increasingly used for hardware and firmware code generation, but existing studies primarily evaluate functional correctness while largely overlooking security. However, LLM-generated code that appears functionally sound may embed security flaws which could induce catastrophic damages after deployment. This critical research gap motivates us to design a benchmark for assessing security awareness under realistic specifications. In this work, we introduce HardSecBench, a benchmark with 924 tasks spanning Verilog Register Transfer Level (RTL) and firmware-level C, covering 76 hardware-relevant Common Weakness Enumeration (CWE) entries. Each task includes a structured specification, a secure reference implementation, and executable tests. To automate artifact synthesis, we propose a multi-agent pipeline that decouples synthesis from verification and grounds evaluation in execution evidence, enabling reliable evaluation. We evaluate diverse LLMs and find that they often satisfy functional requirements while leaving security risks. We also find that security results vary with prompting. These findings highlight pressing challenges and offer actionable insights for future advancements in LLM-assisted hardware design. Our data and code are available at https://github.com/chenqirui2002/HardSecBench.
    AI Ethics, Trust, FairnesSafety and robustnessMultidisciplinary Topics and ApplicationsAI hardware
  96. #1100

    RODIS: Robust Diffusion Solver to Dataset Quality in Combinatorial Optimization

    Hui Yuan, Zhigang Hua, Zihao Li, Qi Xu, Weilin Cong, Yan Xie, Taihui Li, Rong Jin, Shuang Yang, Bo Long
    Combinatorial optimization (CO) problems have widespread applications in science and engineering, but they present significant computational challenges. Recent advancements in generative models, particularly diffusion models, have shown promise in bypassing traditional optimization solvers by directly generating near-optimal solutions.
    However, existing diffusion solvers are highly sensitive to the quality of the training dataset, particularly the number of problem instances for training and the optimality of their labels. Notably, we observe an exponential scaling law between the performance of diffusion-based solvers and the number of near-optimally labeled instances needed. When training instances are scarce or sub-optimally labeled, diffusion-based solvers suffer significant performance degradation.
    To enhance the robustness of diffusion solvers to dataset quality, we propose a robust diffusion solver for combinatorial optimization capable of learning from sub-optimally labeled instances follows a two-stage generate-then-decode framework, integrating an objective-guided diffusion model, further reinforced by classifier-free guidance, to produce solutions that surpass the optimality of the training dataset.
    Experiments demonstrate the improved robustness in \myalg compared to the diffusion-based solver baseline, in a range of combinatorial optimization benchmark tasks such as TSP (Traveling Salesman Problem) and MIS (Maximum Independent Set).
    Machine LearningGenerative modelsAIConstraint Satisfaction and OptimizationAIMachine Learning
  97. #1110

    Credit Fairness: Online Fairness in Shared Resource Pools

    Seyed Majid Zahedi, Rupert Freeman
    We study repeated allocation of shared resources among agents with time-varying demands and capped linear utilities. In this setting, independently maximizing the minimum utility in each round satisfies sharing incentives (agents weakly prefer participating in the mechanism to not participating), strategyproofness (agents have no incentive to misreport their demands), and Pareto efficiency. However, this max-min mechanism can lead to large disparities in the total resources received by agents, even when they have the same average demand. We introduce credit fairness, a property that, together with Pareto efficiency, strengthens sharing incentives by ensuring that agents who lend resources in early rounds are able to recoup them in later rounds. Credit fairness can be achieved in conjunction with either Pareto efficiency or strategyproofness individually, but we show that, under anonymity, it cannot be achieved together with both. We propose a mechanism that is credit fair and Pareto efficient, and evaluate it in a computational resource-sharing setting.
    Agent-based and Multi-agent SystemsResource allocationGame Theory and Economic ParadigmsFair divisionGame Theory and Economic ParadigmsNoncooperative games
  98. #1138

    Opinion Maximization in Social Networks: An Inverse Optimization Perspective

    Yilu Liu, Bo Xue, Yiming Yao, Qingfu Zhang
    Given a social network 𝒢 with n nodes and m edges, the opinion maximization (OM) problem aims at identifying k (k ≪ n) opinion leaders to maximize their opinion propagation in 𝒢. Despite its significance and prevalence, existing OM methods often struggle to strike a satisfactory balance between theoretical performance and practical efficiency. This paper provides an inverse optimization perspective for OM by reformulating it as an opinion minimization (OMin) problem, whose objective function holds succinct expression and desired properties. Then, we introduce a bounded estimation scheme (BES) that can estimate the objective function of OMin with bounded error in Õ(kn+m) expected time, thereby making the optimization for OMin more efficient. To further enhance the optimization efficiency, a decomposition-based decremental estimation algorithm (DDEA) with a near (1-1/e) approximation ratio in Õ(kmn) expected time is specifically designed for OMin. Extensive experiments on real-world social networks of varying scales demonstrate the effectiveness of BES and the superiority of DDEA.
    Constraint Satisfaction and OptimizationConstraint optimization problemsSearchCombinatorial search and optimisation
  99. #1144

    Progressive Adversarial Multi-View Alignment for Unsupervised Embedded Feature Selection with Linear Complexity

    Shixuan Zhou, Yi Xiang, Haoxiang Qin, Haining Wang, Yukuan Ma, Han Huang
    Standard unsupervised multi-view feature selection (UMFS) methods for large datasets exhibit limitations in modeling the competition between cross-view alignment and intra-view diversity, resulting in suboptimal solutions and expensive computational costs. This challenge is exacerbated by the diverse signals inherent in complex samples, which tend to mask shared patterns, thus complicating the pursuit of an optimal trade-off. In this work, we propose a Progressive Adversarial Feature Selection (ProAd-FS) framework for large-scale multi-view learning, which formulates this static trade-off objective as a dynamic competitive process. To be specific, ProAd-FS develops an adversarial decoupled architecture that assigns these competing objectives to distinct inter-view and intra-view games, guided by a robust gated curriculum to prioritize the learning of underlying structures. The entire framework exhibits linear computational complexity in both sample size and feature dimension, and embeds structural sparse regularization for end-to-end optimization. Extensive experimental results show that ProAd-FS achieves state-of-the-art performance while being highly scalable as well, providing an effective solution for large-scale UMFS tasks.
    Machine LearningAdversarial machine learningMachine LearningFeature extraction, selection and dimensionality reductionMachine LearningMulti-view learning
  100. #1182

    Cooperative Multi-View Graph Learning via High-Rank Tensor Specificity

    Ling Ma, Qiyu Zhong, Songxuan Shi, Hao Wei, Shunjie Yang, Junbo Lian, Qiuru Hai, Lianjin Yu, Xiangning Zeng, Yi Shan, Zhen Yang, Gengyu Lyu
    Graph-based multi-view clustering, with its ability to mine potential associations between samples, has attracted extensive attention. To capture high-order correlations, tensor-based frameworks have been introduced to model multiple graphs jointly. Although these methods have achieved promising performance, existing methods mainly focus on stacking consistency graphs with low-rank constraints, while overlooking high-order specificity within diversity graphs, and the exploration of fine-grained diversity information among different samples across views remains insufficient. To address these issues, we propose High-Rank Tensor Specificity Induced Cooperative Multi-ViewGraph Learning (HTS-CMGL). Specifically, we first obtain the consistency graphs and diversity graphs from multi-view data, which are further reconstructed into the consistency tensor and the diversity tensor, respectively. Subsequently, a novel Enhanced Tensor Rank (ETR) is imposed on the consistency tensor, which is a tighter approximation of the tensor rank and is more noisy-robust to explore the high-order consistency. Meanwhile, we design a new Tensor High-Rank Logarithmic Norm (THLN) on the diversity tensor. By leveraging a unique high-rank constraint mechanism, THLN not only actively preserves high-rank and informative features, but also simultaneously captures view-level and sample-level diversity information. Extensive experiments on multiple datasets demonstrate the effectiveness of our proposed method on clustering multi-view data.
    Machine LearningClusteringMachine LearningMulti-view learningMachine LearningUnsupervised learning
  101. #1185

    GDAs-OT: A Prediction Method of Gene-Disease Associations Based on Optimal Transport for Identifying Genes Related to Immune-Related Adverse Events

    Ruhao Liu, Suixue Wang, Hang Yu, Peng Li, Qingchen Zhang
    Immune Checkpoint Inhibitors (ICIs) represent a cornerstone of modern cancer immunotherapy. However, their clinical application is frequently accompanied by immune-related Adverse Events (irAEs) of diverse severity. Predicting Gene-Disease Associations (GDAs) is crucial for identifying the related genes that cause irAEs but remains experimentally expensive in the context of cancer immunotherapy. While existing graph neural network-based methods provide solutions to this problem, they often overlook disease-disease and disease-gene associations that may exist but have not yet been identified, leading to contaminated representations. To address these limitations, we propose Gene-Disease Associations based on Optimal Transport (GDAs-OT), a novel framework for GDA prediction. GDAs-OT constructs gene-gene, disease-disease and gene-disease relationships as graphs. Optimal transport was utilized to identify potential disease-disease associations to expand the disease-disease graph structure, and select high-confidence negative pairs in the gene-disease graph. By mapping nodes into a refined embedding space, the processed gene-disease graph guides node representation learning, providing robust node representations. Comprehensive experiments demonstrate that GDAs-OT outperforms state-of-the-art methods across most evaluation metrics. In addition, GDAs-OT successfully identifies potential risk genes for irAEs, providing a computational foundation for understanding the mechanisms of immunotherapy toxicity. Code is available at
    https://github.com/RuhaoLiu/GDAs-OT.
    Multidisciplinary Topics and ApplicationsBioinformaticsMultidisciplinary Topics and ApplicationsHealth and medicineMultidisciplinary Topics and ApplicationsLife sciences
  102. #1196

    PCEvo: Path-Consistent Molecular Representation via Virtual Evolutionary

    Kun Li, Longtao Hu, Jiajun Yu, Yida Xiong, Hongzhi Zhang, Jiameng Chen, Xiantao Cai, Jia Wu, Wenbin Hu
    Molecular representation learning aims to learn vector embeddings that capture molecular structure and geometry, thereby enabling property prediction and downstream scientific applications. In many AI for science tasks, labeled data are expensive to obtain and therefore limited in availability. Under the few-shot setting, models trained with scarce supervision often learn brittle structure–property relationships, resulting in substantially higher prediction errors and reduced generalization to unseen molecules. To address this limitation, we propose PCEvo, a path-consistent representation method that learns from virtual paths through dynamic structural evolution. PCEvo enumerates multiple chemically feasible edit paths between retrieved similar molecular pairs under topological dependency constraints. It transforms the labels of the two molecules into stepwise supervision along each virtual evolutionary path. It introduces a path-consistency objective that enforces prediction invariance across alternative paths connecting the same two molecules. Comprehensive experiments on the QM9 and MoleculeNet datasets demonstrate that PCEvo substantially improves the few-shot generalization performance of baseline methods. The code is available at https://github.com/DrugD/PCEvo.
    Multidisciplinary Topics and ApplicationsBioinformatics
  103. #1210

    M-LoRA: Efficient Serving for Concurrent LoRA Adapters with Memory-Aware Speculative Scheduler on Single GPU

    Shaolong Li, Xiang Yang, Qi Qi, Haifeng Sun, Zirui Zhuang, Bo He, Wanyi Ning, Jingyu Wang
    Low-Rank Adaptation (LoRA) is a popular approach that enables large language models (LLMs) to quickly adapt to domain-specific tasks by adding lightweight trainable adapters. Existing multi-LoRA serving systems typically exploit parameter sharing to serve hundreds of LoRA models with a single base model. However, most systems rely on first-come-first-serve (FCFS) scheduling, which can incur severe queuing delay and lead to excessive adapter memory usage, squeezing KV cache space and reducing concurrency and throughput. To address these challenges, we propose M-LoRA, a memory-aware multi-LoRA serving system that reduces queuing delay and improves throughput through efficient request scheduling guided by fine-grained memory modeling. M-LoRA consists of three components: (1) Multi-LoRA Length Predictor, which estimates the output length and KV-cache demand of each request; (2) Memory-aware Speculative Scheduler, which dispatches requests based on predicted KV-cache and adapter memory requirements to maximize throughput under a fixed GPU memory budget; and (3) Recovery Mechanism, which corrects mispredictions while preserving generation quality. Compared to state-of-the-art LoRA serving systems, M-LoRA reduces average per-token latency by 87% and improves throughput by up to 1.7x.
    Machine LearningDeep learning architecturesPlanning and SchedulingScheduling
  104. #1213

    MP2D: Constrained Monte Carlo Tree-Guided Diffusion for Multi-Objective Protein Sequence Design

    Zitai Kong, Yifan Dong, Yixuan Wu, Zhaokang Liang, Jian Wu, Hongxia Xu
    Designing functional protein sequences that satisfy multiple desired properties is a core research focus of protein engineering. Prior methods struggle with inability or inefficiency when dealing with numerous, often conflicting, properties. We propose Multi-Property Protein Diffusion, (MP2D), a unified framework for multi-objective protein sequence optimization that integrates conditional discrete diffusion with constrained MCTS and global iterative refinement. MP2D formulates diffusion denoising as a constrained sequential decision-making process and employs MCTS to explore diverse denoising trajectories guided by Pareto-based rewards. A global iterative refinement strategy further enables repeated remasking and re-optimization of candidate sequences, while a dynamic Pareto constraint prevents candidate bloat and maintains balanced trade-offs across objectives. We evaluate MP2D on two challenging multi-objective protein design tasks: antimicrobial peptide and protein binder optimization, involving four to five conflicting properties. Experimental results demonstrate that MP2D consistently outperforms existing multi-objective baselines, achieving robust and balanced improvements across all objectives without retraining generative models. These results highlight MP2D as a practical and scalable solution for multi-objective functional protein design.
    Multidisciplinary Topics and ApplicationsBioinformaticsMultidisciplinary Topics and ApplicationsLife sciences
  105. #1227

    QFlash: Bridging Quantization and Memory Efficiency in Vision Transformer Attention

    Sehyeon Oh, Yongin Kwon, Jemin Lee
    FlashAttention improves efficiency through tiling, but its online softmax still relies on floating-point arithmetic for numerical stability, making full quantization difficult.
    We identify three main obstacles to integer-only FlashAttention: (1) scale explosion during tile-wise accumulation, (2) inefficient shift-based exponential operations on GPUs, and (3) quantization granularity constraints requiring uniform scales for integer comparison.
    To address these challenges, we propose QFlash, an end-to-end integer FlashAttention design that performs softmax entirely in the integer domain and runs as a single Triton kernel.
    On seven attention workloads from ViT, DeiT, and Swin models, QFlash achieves up to 6.73x speedup over I-ViT and up to 8.69x speedup on Swin, while reducing energy consumption by 18.8% compared to FP16 FlashAttention, without sacrificing Top-1 accuracy on ViT/DeiT and remaining competitive on Swin under per-tensor quantization.
    Our code is publicly available at https://github.com/EfficientCompLab/qflash.
    Computer VisionEfficiency and Optimization
  106. #1238

    TextGaze: Prompting Gaze Target Estimation with Textual Scene Cues

    Junhui She, Fei Wang, Kun Li, Yiqi Nie, Yuxin Liu, Zhangling Duan, Xun Yang
    Gaze target estimation aims to infer the position of a person's gaze within a scene. Within mainstream design logic, multi-branch methods require extra supervision and annotations, while streamlined designs prioritize low-level visual saliency over true gaze intent. The former leads to a high annotation burden and hinders domain transfer, whereas the latter causes misalignment between predicted attention and actual gaze targets. To address this issue, we propose TextGaze, a unified cross-modal architecture that leverages a Large Vision-Language Model (LVLM) as scalable semantic guidance to balance the two design paradigms. The model extracts visual features from a frozen encoder and utilizes an LVLM to obtain gaze-aligned textual cues. We design a transformer-based fusion module with hierarchical text supervision to preserve task semantics. Lightweight decoding heads enable the joint prediction of gaze heatmaps and in-/out-of-frame status. We evaluate our method on four mainstream datasets, and the results show competitive performance across key metrics with robust cross-dataset generalisation without extra fine-tuning. Overall, we provide a streamlined alternative to traditional designs and highlight the potential of LVLMs as accessible auxiliary guidance for gaze estimation. All contents are available at: https://github.com/idremo/TextGaze-IJCAI2026.
    Computer VisionScene analysis and understandingHumans and AIHuman-computer interactionMachine LearningMulti-modal learning
  107. #1246

    WBMCF: Robust Spatiotemporal Forecasting of Terrorism Fatalities with Multi-Scale Time–Frequency Fusion

    Zhenkai Qin, Baozhong Wei, Huan Zeng, Caifeng Gao, Ziqian Lin
    Reliable forecasting of cross-regional terrorism fatalities is essential for early warning and security planning. However, real-world terrorism data are characterized by extreme sparsity, abrupt volatility, and weak, rapidly shifting non-Euclidean spatial relationships, posing substantial challenges for existing spatiotemporal forecasting models in learning stable multi-scale dependencies. To address these challenges, we propose a robust spatiotemporal forecasting framework, termed WBMCF. In the temporal dimension, a single-level Discrete Wavelet Transform (DWT) is introduced to stabilize temporal structures, and a Bidirectional Gated Mamba module is employed to capture stage-dependent long-range temporal dependencies; additionally, in the spatial dimension, a topology-agnostic Convolutional Feed-Forwar+d Network (ConvFFN) is incorporated to model cross-regional dependencies via implicit feature coupling, thereby eliminating reliance on fixed spatial topologies. Extensive experiments on five regional subsets of the Global Terrorism Database (GTD) demonstrate that WBMCF achieves stable and competitive forecasting performance under sparse and noisy conditions. The source code is available at https://github.com/weibaozhong/WBMCF.
    Data MiningMining spatial and/or temporal dataMachine LearningDeep learning architecturesMachine LearningTime series and data streams
  108. #1253

    Mutually Reinforced Fair Multi-View Clustering

    Jiaxin Wang, Xiaoxu Tan, Jieren Cheng, Junwei Zheng, Wenxuan Tu
    Fair multi-view clustering aims to exploit complementary information across views to improve clustering quality, while ensuring unbiased outcomes with respect to sensitive groups. Existing methods typically separate multi-view clustering from fairness optimization into distinct stages. However, such a decoupled design provides limited synergy between the two objectives, which often makes progress on one objective detrimental to the other. To address this challenge, we propose Mutually Reinforced Fair Multi-View Clustering (MR-FMVC), which optimizes fairness alignment and clustering iteratively to enable coordinated optimization. Specifically, we first disentangle view representations into common and private components to mitigate fairness-related discrepancies across views. Subsequently, the optimization proceeds by interleaving cross-group matching in the common space with updating cluster centroids in the joint representation space, ultimately assigning matched samples to consistent clusters. Extensive experiments on four fairness datasets demonstrate that MR-FMVC achieves a superior trade-off between clustering performance and fairness.
    Machine LearningClusteringMachine LearningMulti-view learning
  109. #1285

    Communication-Efficient Federated Video Human Activity Recognition via Discriminative Subspace Discovery

    Allassan Tchangmena A Nken, Susan McKeever, Peter Corcoran, Ihsan Ullah
    Federated learning (FL) enables privacy-preserving video human activity recognition without centralizing data. Despite its potential, FL remains fundamentally constrained by the high communication overhead incurred during training.
    Existing methods, such as model quantization, gradient sparsification and low-rank approximation, aim to mitigate this overhead but often at the cost of model expressivity, leading to reduced classification accuracy.
    In this work, we take a representation-centric perspective and show that for video-base activity recognition using pretrained embeddings, class-discriminative information lies in a very low-dimensional subspace, that preserves linear seperability between activity classes. Motivated by this observation, we propose Federated discriminative subspace discovery, a framework that collaboratively identify and operate within such a subspace.
    Instead of communicating full classifier updates, each client first projects its local video embeddings using a shared random projection matrix. It then computes the covariance of the projected embeddings and transmits this statistic to the server for aggregation. The server reconstructs a global covariance matrix, performs singular value decomposition, and selects the top-k eigenvectors that define the global class-discriminative subspace. Subsequent communication occurs only in this compact subspace. Extensive experiments on UCF101, HMDB51, and Toyota-SmartHome demonstrate that, learning and sharing the classifier in the discovered subspace, reduces communication costs by up to 61%, with a moderate drop in activity recognition accuracy compared to the full model.
    Computer Vision3D computer visionComputer VisionAction and behavior recognitionComputer VisionEfficiency and OptimizationMachine LearningFederated learning
  110. #1291

    Abstracting the Indistinguishable in ASP

    Zeynep G. Saribatur, Markus Hecher, Johannes K. Fichte
    Answer Set Programming (ASP) is a popular knowledge representation and reasoning framework with countless applications for modeling and solving combinatorial problems. With increasingly large applications, identifying crucial details and collapsing irrelevant or indistinguishable information becomes crucial for a human in the loop. This idea led to the investigation of different notions of abstraction, which relate to forgetting and projection. Very recently, clustering formalisms have been designed that also preserve program dependencies. However, crucial computational properties, such as finding abstractions, remained entirely unexplored to date. In this work, we examine the computational complexity of existing abstraction techniques based on clustering (faithful and uniform abstractions). Besides checking, we tackle the crucial question of constructing abstractions, by also proposing a novel syntactic operator to achieve uniform abstractions, when possible. Moreover, we investigate whether symmetry detection can be used to determine abstractable parts in a program, and give insights on the difference of interchangeability and indistinguishability, by introducing a relaxation of the uniform abstraction condition to capture semantic indistinguishability. We observe that this notion enables us to obtain intuitive faithful abstractions, while symmetry does not. We explore properties needed to reach abstractions from syntactic symmetry.
    Knowledge Representation and ReasoningComputational complexity of reasoningKnowledge Representation and ReasoningLogic programmingKnowledge Representation and ReasoningNon-monotonic reasoning
  111. #1304

    Learning Well-Structured Logits: Leveraging Vision–Language Complementarity for Open-World Test-Time Adaptation

    Jia-Qi Lin, Yinghua Yao, Chang-Dong Wang, Yuangang Pan
    Open-world test-time adaptation (OWTTA) is increasingly studied for its ability to adapt models at inference time in the presence of both domain discrepancy and semantic variance. Existing methods typically rely on either discriminative models or vision-language models (VLMs) alone, leaving their complementarity in OWTTA underexplored. In this paper, we propose a general framework that explicitly leverages the complementary strengths of discriminative models and VLMs to enable robust open-world test-time adaptation. We first define well-structured logits for OWTTA and provide a theoretical analysis that reveals the complementary properties of logits from discriminative models and VLMs. Building on a unified formulation, we then decompose OWTTA into two coupled modules: confidence-calibrated filtering (CCF), which provides an estimate of in-distribution membership, and semantic complementarity adaptation (SCA), which gives the refined predictions through complementary logit fusion. Extensive experiments across multiple benchmarks empirically confirm the logit complementarity between discriminative models and VLMs, and show that our method effectively leverages it to consistently improve performance.
    Computer VisionMachine learning for visionComputer VisionRepresentation learningComputer VisionTransfer, low-shot, semi- and un- supervised learning
  112. #1326

    Gradient Enhancement Task Aware Post-training Quantization

    Yihua Shao, Yan Gu, Minxi Yan, Siyu Chen, Haiyang Liu, Ziyang Yan, Yongjia Li, Yan Wang, Qun Song, Hao Tang, Haotong Qin, Jingcai Guo, Nicu Sebe
    The huge parameters and extensive training corpora of Large Language Models (LLMs) empower them to tackle complex tasks. Yet, their massive size incurs substantial inference costs, hindering deployment on resource‑constrained edge devices. Low‑bit quantization provides a way to compress models and improve their practicality for edge-side deployment. Nonetheless, many edge applications do not require the full generalization abilities of LLMs and instead only depend on their knowledge in specific domains. This paper introduces Gradient Enhancement Task Aware Post-training Quantization, i.e., GTAQ, to address the generalization issue. Concretely, GTAQ locates task‑critical weights via gradient‑based validation and retrieval, and then amplifies these salient weights. For that purpose, GTAQ preserves and strengthens weights related to the target tasks, enabling task‑aware uniform‑bit quantization. We extensively evaluate the LLaMA family of language models on WikiText, C4, and MMLU. Our experiments show that GTAQ consistently delivers notable performance improvements across tasks, surpassing both general‑purpose and task‑aware quantization baselines. At the same time, GTAQ yields more than 3.5× speedup in model inference and substantially cuts storage requirements. https://github.com/YihuaJerry/GTAQ.git.
    Computer VisionTransfer, low-shot, semi- and un- supervised learningMachine LearningCost-sensitive learningMachine LearningFoundation modelsMachine LearningLearning sparse modelsMachine LearningOpen-World/Open-Set/OOD Learning
  113. #1335

    In-Context Reinforcement Learning via Communicative World Models

    Fernando Martinez, Tao Li, Yingdong Lu, Juntao Chen
    Reinforcement learning (RL) agents often struggle to generalize to new tasks and contexts without updating their parameters, mainly because their learned representations and policies are overfit to the specifics of their training environments. To boost agents' in-context RL (ICRL) ability, this work formulates ICRL as a two-agent emergent communication problem and introduces CORAL (Communicative Representation for Adaptive RL), a framework that learns a transferable communicative context by functionally separating latent representation learning from control. In CORAL, an Information Agent (IA) is pre-trained as a world model on a diverse distribution of tasks. Its objective is not direct return maximization, but world modeling and distilling its understanding into concise messages. The emergent communication protocol is shaped by a novel Causal Influence Loss, which measures the effect that the message has on the next action. During deployment, the previously trained IA serves as a fixed contextualizer for a new Control Agent (CA), which learns to solve tasks by interpreting the provided communicative context. Our experiments demonstrate that this approach enables the CA to achieve significant gains in sample efficiency and successfully perform zero-shot adaptation with the help of pre-trained IA in diverse online and offline environments, validating the efficacy of learning a transferable communicative representation.
    Agent-based and Multi-agent SystemsAgent communicationAgent-based and Multi-agent SystemsMulti-agent learningMachine LearningMultiagent Reinforcement LearningMachine LearningOffline reinforcement learningMachine LearningReinforcement learning
  114. #1345

    Model-Based Reinforcement Learning in Discrete-Action Non-Markovian Reward Decision Processes

    Alessandro Trapasso, Luca Iocchi, Fabio Patrizi
    Many practical decision-making problems involve tasks whose success depends on the entire system history, rather than on achieving a state with desired properties. Markovian Reinforcement Learning (RL) can be inadequate for such tasks, while modeling them as non-Markovian reward decision processes (NMRDPs) enables agents to handle temporal dependencies.
    Existing approaches offer limited formal guarantees on both (near-)optimality and sample efficiency. We address both issues with QR-max, a novel model-based algorithm for NMRDPs with discrete actions that factorizes Markovian transition learning from non-Markovian reward handling via reward machines. To our knowledge, this is the first model-based RL algorithm for discrete-action NMRDPs that leverages this factorization to obtain PAC convergence to ε-optimal policies with polynomial sample complexity.
    We then extend QR-max to continuous state spaces with Bucket-QR-max, a SimHash-based discretizer that preserves the same factorized structure and achieves fast and stable learning without manual gridding or function approximation. We experimentally compare our method with state-of-the-art model-based RL approaches on environments of increasing complexity, showing substantially improved sample efficiency and greater robustness in finding optimal policies.
    Machine LearningLearning theoryMachine LearningModel-based and model learning reinforcement learningMachine LearningReinforcement learning
  115. #1359

    Fast and Generalizable AI-Generated Image Detection via Model-Agnostic Feature Reconstruction

    Qinghui He, Haifeng Zhang, Bo Liu, Yang Wei
    Generative models continue to advance rapidly in both fidelity and diversity, posing increasing challenges for reliably distinguishing generated images from real ones. Existing reconstruction-based detection methods often rely on assumptions tied to specific generative models, which leads to limited cross-model generalization and substantial computational overhead. In this paper, we propose General Feature Reconstruction Error (GFRE), a fast and generalizable detection paradigm that leverages reconstruction behavior in a general-purpose representation space. Our key insight is that real and generated images exhibit consistently different reconstruction stability when projected into universal visual representations. Instead of tracing generator-specific artifacts, GFRE employs a lightweight autoencoder to model the reconstructability of image representations, producing a reconstruction signal that is inherently generator-agnostic and transferable across diverse generative processes. Extensive experiments on images synthesized by 18 different generative models demonstrate that GFRE consistently outperforms existing state-of-the-art methods, achieving a 4.70% improvement in detection accuracy and an 8.20% gain in cross-model generalization. Moreover, by avoiding costly generator inversion and diffusion-based reconstruction, GFRE reduces reconstruction error extraction time by up to 150x, enabling efficient and scalable deployment.
    Computer VisionRecognition (object detection, categorization)
  116. #1361

    From Neural Collapse to Label-Limited Evolving Streams: Geometry-Constrained Learning Under Dynamic Class Imbalance

    Hongliang Wang, Hongyuan Liu, Qirui Hao, Yun Sing Koh, Qinli Yang, Junming Shao
    Learning from label-limited streams presents significant challenges, particularly when coupled with concept drift and dynamic class imbalance. Existing works often struggle to maintain a discriminative feature space under these constraints, biasing decision boundaries toward majority classes or outdated concepts. To address this, we propose a novel framework named Neural collapse Inspired Label-limited Evolving stream learning (NILE). Instead of utilizing learnable classifiers, NILE exploits Neural Collapse (NC) geometry to explicitly construct a Simplex Equiangular Tight Frame (ETF) as a fixed classifier, ensuring maximal inter-class separability to guide feature discriminability. To maintain this discrimination with limited labels, NILE employs hybrid active learning to prioritize uncertain and minority samples, and an NC-adapted semi-supervised mechanism to enhance representation learning. NILE further updates the classifier by dynamically adjusting the ETF structure, enabling continual adaptation to drifting concepts. Extensive experiments show that NILE can effectively guide features toward the NC state, significantly outperforming state-of-the-art baselines in nonstationary environments.
    Data MiningMining data streamsData MiningClass imbalance and unequal costMachine LearningSemi-supervised learningMachine LearningActive learning
  117. #1364

    Generating High-Diversity Synthetic Tabular Data via Less-Constrained Prior

    Sanghun Park, Jaesung Lim, Jong-June Jeon, Seunghwan An
    Generating high-quality synthetic tabular data, regarding fidelity, diversity, and utility, is crucial for many practical purposes. Recent tabular data synthesis methods, including two-stage generative modeling approaches, have achieved this with nearly perfect fidelity. However, we observe that there remains room for improving diversity, and we show that this limitation arises from an additional constraint imposed on the latent support.
    Our main contribution is that we effectively eliminate this redundant constraint by directly deriving the objective function from the KL-divergence between the ground-truth density and the generative model used for synthetic sample generation. We empirically demonstrate that our model, which relies on a single prior distribution, significantly improves the quality of the synthetic data, especially in terms of diversity. Our implementation code is available at https://anonymous.4open.science/r/SPT-1EE2/.
    Machine LearningGenerative modelsMachine LearningUnsupervised learning
  118. #1367

    Continual Unsupervised Domain Adaptation for Cardiac Image Segmentation with Style-Adapting Generative Replay and Prototype Consolidation

    Shun Xiang, Yan Yi, Haiyong Chen, Yuanquan Wang, Yining Wang
    Continual unsupervised domain adaptation (UDA) for multi-domain cardiac image segmentation is crucial for clinical deployment under strict privacy constraints, where data from different centers cannot be stored or revisited. However, existing methods still suffer from catastrophic forgetting and training instability caused by noisy pseudo-labels and representation drift. In this study, we propose a novel continual UDA framework that enforces dual-level alignment at the input and feature levels. At the input level, the Domain-Style Adapting Generative Replay (DSA-GR) module employs trajectory-consistent diffusion distillation and style adaptation to synthesize high-fidelity, domain-consistent replay images without storing raw data. At the feature level, the Cross-domain Prototype-guided Feature Consolidation (CPFC) mechanism mitigates representation drift by maintaining a cross-domain prototype memory bank that aggregates features from the current and replay streams. These prototypes act as robust anchors for prototype-guided contrastive learning, rectifying the feature space and suppressing pseudo-label drift. Extensive experiments on two public datasets demonstrate that our framework consistently outperforms state-of-the-art methods. Code will be released at https://github.com/hlyf-xs/CUDA_Seg.
    Computer VisionBiomedical image analysisComputer VisionSegmentation, grouping and shape analysisComputer VisionTransfer, low-shot, semi- and un- supervised learning
  119. #1375

    Sharp-Wave Ripples Learning: A Bio-Inspired Incremental Learning Method

    Yi Sun, Xiaochang Hu, Wenzhuo Zhang, Zhiwei Wang, Jiting Li, Jian Li, Xu Xin, jia jun
    The primary challenge in incremental learning for deep neural networks (DNNs) lies in balancing memory stability with learning plasticity as they incrementally acquire new knowledge. Existing incremental learning methods typically require storing portions of old datasets or relying on specialized network architectures to achieve learning without forgetting. Recent neuroscience findings reveal that hippocampal sharp-wave ripples play critical roles in consolidating neocortex memory by internally triggering the reactivation of spontaneous neural circuits without external stimulus. Inspired by this biological mechanism, we propose an incremental learning method termed Sharp-Wave Ripples Learning (SWRL), a novel incremental learning framework composed of two functionally complementary modules: a neocortex-like network for long-term memory and a hippocampus-like model for rapid new-knowledge encoding. SWRL continuously integrates newly acquired knowledge from the hippocampus-like module into the neocortex-like network via a meta-weighted memory integration mechanism guided by internally generated SWR-like signals. Our framework is task-agnostic, architecture-agnostic, and exemplar-free. Experimental results on multiple public benchmarks demonstrate that SWRL achieves superior incremental learning performance under various scenarios across diverse DNN architectures, requiring no access to previous data or modifications to the base architecture. The code and appendix are both released at https://github.com/SYVAE/SWRL.
    Machine LearningClassificationMachine LearningIncremental learningMachine LearningMulti-task and transfer learningMachine LearningSupervised Learning
  120. #1379

    Manifold-Constrained Adversarial Training for Long-Tailed Robustness via Geometric Alignment

    Guanmeng Xian, Ning Yang, Philip S. Yu
    Adversarial training is effective on balanced datasets, but its robustness degrades under long-tailed class distributions, where tail classes suffer high robust error and unstable decision boundaries. We propose \emph{Manifold-Constrained Adversarial Training (MCAT)}, a unified framework that enforces the semantic validity of adversarial examples by penalizing deviations from class-conditional manifolds in feature space, while promoting balanced geometric separation across classes via an ETF-inspired regularization. We provide theoretical results that link geometric separation to lower bounds on adversarially robust margins, and show that manifold-constrained adversarial risk upper-bounds robust risk on high-density semantic regions. Extensive experiments on standard long-tailed benchmarks demonstrate consistent improvements in overall, balanced, and tail-class adversarial robustness. The codes and appendix are available on https://github.com/yneversky/MCAT.
    Machine LearningAdversarial machine learningMachine LearningTrustworthy machine learning
  121. #1389

    When and How to Adapt: Subject Shifts Detection and Prototype-Guided Correction for Online EEG Decoding

    Shaoqi Zhang, Xiyuan Jin, Xiaojun Ning, Yilin Chen, Jing Wang
    Online EEG decoding is pivotal for real-world Brain-Computer Interfaces (BCIs) but confronts significant challenges arising from continuous distribution shifts, including both inter- and intra-subject variations. Existing Unsupervised Continual Domain Adaptation (UCDA) methods typically rely on rigid hard boundaries such as subject switch labels or fixed batch sizes to trigger adaptation. Meanwhile, some online EEG decoding methods that utilize detected shifts as soft boundaries are ill-suited for unsupervised scenarios and lack effective distribution alignment strategies. To address these issues, we propose a novel Prototype-driven online EEG Decoding framework (PRED). PRED incorporates a Subject Shift Detector (SSD) based on policy stability to reliably identify latent domain shifts unsupervisedly, thereby constructing adaptive soft boundaries. Furthermore, we design a Prototype-guided Shift Correction (PSC) mechanism that leverages a multi-granular memory structure to guide distribution alignment while preserving semantic stability. Experiments on three public datasets confirm that PRED achieves superior plasticity and stability, and demonstrates significant futurity, i.e., the capability of forward transfer to a subject's future samples.
    Humans and AIApplicationsHumans and AIBrain sciencesHumans and AIHuman-computer interaction
  122. #1404

    A Foundation Model for Zero-Shot Logical Rule Induction

    Yin Jun Phua
    Inductive Logic Programming (ILP) learns interpretable logical rules from data. Existing methods are transductive: their learned parameters are bound to specific predicates and require retraining for each new task. We introduce Neural Rule Inducer (NRI), a pretrained model for zero-shot rule induction. Rather than encoding literal identities, NRI represents literals using domain-agnostic statistical properties such as class-conditional rates, entropy, and co-occurrence, which generalize across variable identities and counts without retraining. The model consists of a statistical encoder and a parallel slot-based decoder. Parallel decoding preserves the permutation invariance of logical disjunction; an autoregressive decoder would instead impose an arbitrary clause order. Product T-norm relaxation makes rule execution differentiable, allowing end-to-end training on prediction accuracy alone. We evaluate NRI on rule recovery, robustness to label noise and spurious correlations, and zero-shot transfer to real-world benchmarks, and we believe this work opens up the possibility of foundation models for symbolic reasoning. Code and the reference checkpoint are available at https://github.com/phuayj/neural-rule-inducer. An extended version with full appendices is available at https://arxiv.org/abs/2605.04916.
    Knowledge Representation and ReasoningLearning and reasoningKnowledge Representation and ReasoningLogic programmingMachine LearningFoundation modelsMachine LearningNeuro-symbolic methods/Abductive Learning
  123. #1406

    COAL: Counterfactual and Observation-Enhanced Alignment Learning for Discriminative Referring Multi-Object Tracking

    Shukun Jia, Shiyu Hu, Yipei Wang, Ximeng Cheng, Yichao Cao, Xiaobo Lu
    Referring Multi-Object Tracking (RMOT) faces a fundamental structural contradiction between the high-discriminability demand and the sparse semantic supervision. This mismatch is particularly acute in highly homogeneous scenarios that require fine-grained discrimination over complex compositional semantics. However, under sparse supervision, models overfit to salient yet insufficient cues, leading to shortcut learning and degraded compositional discrimination. To resolve this, we propose COAL (Counterfactual and Observation-enhanced Alignment Learning), a framework that advances RMOT beyond isolated structural optimization through knowledge regularization. First, we introduce Explicit Semantic Injection (ESI) via a VLM to densify the observation space and enhance instance discriminability. Second, leveraging LLM reasoning, we propose Counterfactual Learning (CFL) to augment supervision, enforcing strict attribute verification for robust compositional recognition. These strategies are unified within a Hierarchical Multi-Stream Integration (HMSI) architecture, which distills external knowledge into domain-specific discriminative representations. Experiments on Refer-KITTI and Refer-KITTI-V2 benchmarks validate COAL's efficacy. Notably, it surpasses the state-of-the-art by 7.28% HOTA on the highly challenging Refer-KITTI-V2. These results demonstrate the effectiveness of knowledge regularization for resolving the sparsity–discriminability paradox in RMOT.
    Computer VisionMotion and trackingComputer VisionMultimodal learningComputer VisionVision, language and reasoning
  124. #1420

    Focus-LIME: Surgical Interpretation of Long-Context Large Language Models via Proxy-Based Neighborhood Selection

    Junhao Liu, Haonan Yu, Zhenyu Yan, Xin Zhang
    As Large Language Models (LLMs) scale to handle massive context windows, achieving surgical feature-level interpretation is essential for high-stakes tasks like legal auditing and code debugging. However, existing local model-agnostic explanation methods face a critical dilemma in these scenarios: feature-based methods suffer from attribution dilution due to high feature dimensionality, which prevents them from providing faithful explanations. In this paper, we propose Focus-LIME, a coarse-to-fine framework designed to restore the tractability of surgical interpretation. Focus-LIME utilizes a proxy model to curate the perturbation neighborhood, allowing the target model to perform fine-grained attribution exclusively within the optimized context. Empirical evaluations on long-context benchmarks demonstrate that our method makes surgical explanations practical and provides faithful explanations to users.
    Machine LearningExplainable/Interpretable machine learningNatural Language ProcessingInterpretability and analysis of models for NLP
  125. #1422

    AMR-LLM: Knowledge-Enhanced Multi-Modal Automatic Modulation Recognition via Large Language Models

    Shen Hu, Yuhua Qian, Xinyan Liang, Zikun Jin, Jiaqian Zhang, Jiangfeng Zhang
    Existing multi-modal automatic modulation recognition (AMR) methods primarily focus on exploiting multi-view representations of raw signal data to improve performance, but still struggle to effectively model and exploit high-level human prior knowledge. Although recent studies attempt to introduce large language models (LLMs) to integrate the textual modality, most of them merely treat LLMs as feature extractors, without further exploiting the LLM's potential. To this end, we propose a knowledge-enhanced multi-modal automatic modulation recognition framework based on large language model (AMR-LLM). Specifically, we first specially design an AMR instruction construction mechanism based on knowledge to activate the LLM's potential for signal perception, which designates the LLM as a signal domain expert, introduces human prior knowledge as a supplement, and exploits the unique physical information of each data sample as connection guidance. Then, to enable the LLM to directly perceive digital signals, we introduce a progressive multi-modal fusion strategy mapping signal features into the space of LLMs. Experimental results on multiple benchmark datasets demonstrate that AMR-LLM achieves approximately 5% higher accuracy than current state-of-the-art methods. Moreover, it achieves the current domain performance with only 30% of the data.
    Machine LearningApplicationsMachine LearningMulti-modal learningMachine LearningMulti-view learning
  126. #1432

    Strikingness-Aware Evaluation for Temporal Knowledge Graph Reasoning

    Rikui Huang, Shengzhe Zhang, Wei Wei
    Temporal Knowledge Graph Reasoning (TKGR) aims at inferring missing (especially future) events from historical data. Current evaluation in TKGR uniformly weights all events, ignoring that most are trivial repetitions, which overestimate the true reasoning ability. Therefore, the rare outstanding events, whose prediction demands deeper reasoning, should be distinguished and emphasized. To this end, we propose a strikingness-aware evaluation framework, which introduces a rule-based strikingness measuring framework (RSMF) to quantify event strikingness by comparing its expected occurrence with peer events derived from temporal rules. Strikingness is then integrated as a weighting factor into metrics like weighted MRR and Hits@k. Experiments on four TKG benchmarks reveal: 1) All representative models perform worse as event strikingness increases, 2) Path-based methods excel on low-strikingness events and representation-based ones on high-strikingness events, 3) We design an ensemble method whose gains stem from fitting trivial events rather than reasoning improvement. Our framework provides a more rigorous evaluation, refocusing the field on predicting outstanding events.
    Knowledge Representation and ReasoningApplicationsKnowledge Representation and ReasoningLearning and reasoningKnowledge Representation and ReasoningPreference modelling and preference-based reasoningKnowledge Representation and ReasoningQualitative, geometric, spatial, and temporal reasoningKnowledge Representation and ReasoningSemantic Web
  127. #1444

    Local Geometry Improves Explanation Robustness for Graph Neural Networks

    Mengting Diao, Li Sun, Sen Su
    Graph Neural Networks (GNNs) are widely used in high-stakes decision-making, where explanation robustness is crucial. Yet, even minor structural perturbations can drastically change explanations without changing predictions. Existing methods primarily characterize structural perturbations by their edit distance or budget, and enforce robustness through uniform constraints or averaging mechanisms over this perturbation set, which fail to capture the inherently direction-dependent effects of structural changes.
    In this paper, we propose GeoXGNN (Local Geometry Improves Explanation Robustness for Graph Neural Networks), a robust explanation framework that explicitly models the directional sensitivity of structural perturbations from a geometric perspective. GeoXGNN introduces local, direction-aware metrics in the node representation space to characterize how perturbations along different directions affect node embeddings and explanation outcomes. Based on the learned local metrics, GeoXGNN further enforces directional constraints, encouraging explanations to remain robust along high-sensitivity directions while preserving explanatory fidelity.
    Extensive experiments on multiple node-level and graph-level benchmark datasets demonstrate that GeoXGNN significantly improves explanation robustness over existing methods.
    AI Ethics, Trust, FairnesExplainability and interpretabilityData MiningMining graphs
  128. #1464

    Multi-View Alignment and Denoising via Center-Guided Spectral Diffusion

    Jiayuan Wang, Jie Lian, Yongquan Shi, Jielong Lu, Zhiyuan Lai, Shiping Wang
    Multi-view learning aims to enhance performance by integrating information from multiple sources. While different views offer complementary perspectives, extracting consistent and discriminative representations remains a significant challenge due to discrepancies in representation and presence of noise. Existing methods typically separate the denoising and alignment processes, mapping the denoised heterogeneous views to a shared subspace for alignment. This means that noise may propagate or even amplify during the alignment stage, ultimately leading to suboptimal solutions. To address this issue, we propose center-guided spectral diffusion, which replaces traditional alignment with generative modeling. This method avoids noise propagation in multi-view alignment process and prevents multi-view features be aligned to the noise subspace during fusion process. Specifically, we first performs unconditional diffusion in both the feature space and a low-rank spectral space to learn a stable consensus centre anchor. This anchor is then used to condition a guided generative diffusion process, enabling the model to generate more consistent and realistic sample representations. By combining conditional and unconditional diffusion, the proposed method alleviates the noise amplification problem commonly found in traditional alignment methods. Experimental results show that the proposed method outperforms state-of-the-art methods on several datasets.
    Machine LearningMulti-view learning
  129. #1489

    SirenFNO: Efficient and Full Frequency Learning of Fourier Neural Operators

    Pengqing Shi, Jie Yin, Stephen Tierney, Junbin Gao
    Fourier neural operators (FNOs) are effective and efficient surrogates for approximating solutions of PDEs and generalize across discretizations. However, owing to the reliance on frequency truncation to maintain learning efficiency of FNOs, empirical studies suggest that FNOs exhibit spectral bias toward low-frequency information, which may hinder the learning capability especially for certain PDEs with strong high-frequency oscillations. To address this limitation, we propose SirenFNO, a novel framework that leverages sinusoidal representation networks (SIRENs) to learn implicit neural representations and performs mode-wise kernel parameterization. Our SIREN parameterization learns a full-grid spectrum with a constant and discretization-independent parameter count, thereby eliminating the need for frequency truncation. We further extend SirenFNO with functional tensor decompositions to enhance parameter and learning efficiency. Empirical results show that our SirenFNO consistently outperforms FNO with approximately 4 to 15 times parameter reductions with preserved discretization invariance, and our functional decomposition variants obtain performance improvements with a maximum of 73 times fewer parameters across multiple PDE benchmarks.
    Machine LearningDeep learning architecturesMultidisciplinary Topics and ApplicationsPhysical sciences
  130. #1518

    Rethinking Multi-Modal Point Cloud Completion: Query-Aware Gating Attention and Gramian Volume Alignment

    Jiye Liang, Haoyu Wang, Chenghao Fang, Kaixuan Yao
    Multi-modal point cloud completion aims to recover complete 3D geometric structures from partial observations by integrating auxiliary data. Although image-guided techniques are well-established, the potential of natural language as a source of high-level semantic cues remains under-explored. Therefore, effectively introducing natural language and synergizing it with the image modality to assist 3D structure recovery is key to breaking current performance bottlenecks. To address this, we propose Query-Aware Gating Gramian Alignment (QGGA), a framework for multi-modal completion. QGGA employs a novel query-dependent bidirectional attention mechanism to dynamically filter information flows between modalities and adaptively regulate fusion intensity. To facilitate alignment among the three heterogeneous modalities, we utilize the volume of the parallelotope spanned by multi-modal local features as a consistency metric, constraining geometric alignment by minimizing this volume. Finally, the fused and aligned features are fed into a coarse-to-fine decoder to first predict a coarse point cloud and then progressively recover the complete shape through stepwise upsampling and refinement. Furthermore, we construct ShapeNet-ViPC-Desc, a benchmark dataset enriched with detailed text descriptions. Experimental results demonstrate that QGGA achieves new state-of-the-art (SOTA) performance on this benchmark. The source code is freely accessible at https://github.com/whysq/QGGA.
    Computer Vision3D computer visionComputer VisionMultimodal learningComputer VisionSegmentation, grouping and shape analysis
  131. #1529

    Rethinking Convolutional Networks for Attribute-Aware Sequential Recommendation

    Shereen Elsayed, Ngoc Son Le, Ahmed Rashed, Lars Schmidt-Thieme
    Attribute-aware sequential recommendation entails predicting the next item a user will interact with based on a chronologically ordered history of past interactions, enriched with item attributes. Existing methods typically leverage self-attention mechanisms to aggregate the entire sequence into a unified representation used for next-item prediction. While effective, these models often suffer from high computational complexity and memory consumption, limiting their ability to process long user histories. This constraint restricts the model's capacity to fully capture long-term user preferences. In some scenarios, modeling item interactions purely through attention may also not be the most effective approach to extract sequential patterns. In this work, we propose ConvRec, an alternative method with linear computational and memory complexity that employs convolutional layers in a hierarchical, down-scaled fashion to generate compact, yet expressive sequence representations. To further enhance the model's ability to capture diverse sequential patterns, each layer aggregates the neighboring items gradually to reach a comprehensive sequence representation. Extensive experiments on four real-world datasets demonstrate that our approach outperforms state-of-the-art sequential recommendation models, highlighting the potential of convolution-based architectures for efficient and effective sequence modeling in recommendation systems. Our implementation code and datasets are available at https://github.com/ismll-research/ConvRec.
    Data MiningInformation retrievalData MiningRecommender systemsMachine LearningConvolutional networks
  132. #1538

    Information-Needs-Guided Virtual Knowledge Graph Enrichment via Large Language Models

    Lin Ren, Guohui Xiao, Guilin Qi, Wenjie Du, Yishuai Geng, Haohan Xue, Zhiyan Yue, Mingxuan Li, Marco Di Panfilo, Davide Lanti, Kamal Hamaz, Linfang Ding
    Virtual Knowledge Graphs (VKGs) provide an effective solution for data integration by mapping heterogeneous data sources to a unified ontology.
    However, existing VKG construction frameworks primarily focus on one-shot construction, which often results in partial data coverage and support for only initial information needs.
    After the VKG has been deployed, new information needs will inevitably arise over time.
    Therefore, enriching VKGs to support evolving information needs remains an expert-intensive iterative task.
    In this work, we formulate the task of Information-Needs-Guided VKG Enrichment (IN-VKGE), and propose an iterative framework that leverages large language models to assess whether information needs can be supported using SPARQL execution feedback and generate ontology and mapping enrichment proposals.
    Experiments on two real-world VKGs show that our approach outperforms existing paradigms, and produces enrichment proposals that receive high expert ratings for effectively resolving the identified information needs.
    Knowledge Representation and ReasoningDescription logics and ontologiesKnowledge Representation and ReasoningSemantic Web
  133. #1551

    scLLM-DSC: LLM-Knowledge Enhanced Cross-Modal Deep Structural Clustering for Single-Cell RNA Sequencing

    Ping Xu, Pengjiang Li, Tian Du, Zaitian Wang, Jiawei Gu, Zhiyuan Ning, Ziyue Qiao, Pengfei Wang, Yuanchun Zhou
    Clustering is fundamental to scRNA-seq analysis, serving as a cornerstone for identifying cell populations and resolving tissue heterogeneity. However, existing methods focus on mining numerical statistical patterns, suffering from semantic agnosticism by neglecting the intrinsic biological functions encoded by genes. While Large Language Models (LLMs) offer promising semantic capabilities, their direct adaptation to cell clustering is hindered by the structural mismatch between generative pre-training objectives and discriminative downstream tasks. To bridge this gap, we propose scLLM-DSC, a novel LLM-Knowledge Enhanced Cross-Modal Deep Structural Clustering framework. Diverging from data-driven paradigms, scLLM-DSC establishes a semantically-grounded representation by synergizing two views: a Knowledge-Driven Semantic View derived from NCBI gene priors and contextualized Cell2Sentence embeddings, and a Structure-Aware Topological View extracted via a graph-guided encoder. Crucially, we introduce a cross-modal contrastive alignment mechanism to enforce consistency between biological semantics and transcriptomic features within a unified latent space. Extensive benchmarks demonstrate that scLLM-DSC significantly outperforms eleven state-of-the-art baselines in clustering accuracy.
    Data MiningApplicationsMachine LearningApplicationsMultidisciplinary Topics and ApplicationsBioinformatics
  134. #1555

    LECDPR:LLM Enhancement and Concept-Document Interactive Modeling for Prerequisite Relation Prediction

    Kui Xiao, Lele Zheng, Xiaoxue He, Miao Zhang, Zhifang Huang, Yan Zhang
    Accurate prediction of prerequisite relations among concepts is important for course planning and intelligent tutoring systems. Existing text-based methods are frequently contaminated by noise such as redundant phrasing, ambiguous sentences, and domain-specific colloquialisms. Moreover, previous methods based on graph structures may have failed to capture the complex interactions between concepts and documents, an information gap exists between the two approaches. Therefore, we propose an LLM Enhancement and Concept-Document Interactive Modeling for Prerequisite Relation Prediction (LECDPR) model. Concretely, LECDPR elicits an LLM via two-stage prompting to distill concept descriptions and contextual evidence from documents, resulting in semantic space embeddings of concepts. Then, the complex interactive relations between concepts and documents are modeled through three propagation mechanisms to produce relational space embeddings, aligns the two embeddings through random masking and contrastive learning, and adaptively fuses the resulting embeddings. Finally, the fused concept representations are used for more accurate prerequisite relation prediction. Extensive experiments on the public UCD, Lecture Bank, and MOOC datasets show that LECDPR outperforms ten baselines on ACC, AUC, and F1. Our code is available at https://github.com/lecdpr/LECDPR.
    Machine LearningActive learningMachine LearningBenchmarksMachine LearningDeep learning architectures
  135. #1564

    DGCPath: Distribution-Aware Generative Contrastive Framework for Self-supervised Path Representation Learning

    Sean Bin Yang, Hao Miao, Zongyi Xu, Jilin Hu, Xiangmeng Wang, Hua Lu, Bin Yang, Christian S. Jensen
    Due to proliferation of vehicle trajectory data from advanced sensing technologies, path representation learning has become a pivotal task in intelligent transportation systems. Although existing self-supervised approaches work to some extent, their dependence on deterministic contrastive learning paradigms and handcrafted view augmentation strategies inherently restricts cross-scenario generalization capabilities. To address these limitations, we present DGCPath – an innovative Distribution-aware Generative Contrastive learning framework for Path representation. This architecture establishes a synergistic connection between generative modeling and distributional contrastive learning, enabling the acquisition of robust and transferable feature embeddings. Specifically, our framework incorporates: (1) a diffusion-based view generator that autonomously produces semantically coherent yet diversified trajectory views from Gaussian noise distributions; (2) a variational contrastive mechanism enforcing latent feature alignment at the distribution level, transcending conventional instance-wise consistency; and (3) a novel generative cross-supervision module that reinforces view-level consistency through cross-view reconstruction learning. Comprehensive evaluations on three real-world trajectory datasets demonstrate DGCPath's superior performance over state-of-the-art baselines in two distinct downstream tasks, validating its enhanced generalization capacity and representation effectiveness.
    Data MiningMining spatial and/or temporal data
  136. #1565

    Neural Projection Fusion for Sliced Wasserstein on the Hypersphere

    Hongliang Zhang, Jianjun Qian, Lei Luo, Jian Yang
    The Spherical Sliced Wasserstein (SSW) distance has emerged as a fundamental tool for comparing probability measures on the hypersphere, with broad applications spanning geology, medical imaging, computer vision, and representation learning. However, its reliance on uniformly sampling a large number of projection directions from the unit hypersphere can lead to suboptimal performance, as many directions are weakly informative and fail to capture salient differences between distributions. We address this limitation through Fusion Stereographic Spherical Sliced Wasserstein (FS3W), a data-adaptive divergence that incorporates distribution-dependent information into projection selection. FS3W uses a neural slicing fusion mechanism to refine prior directions, steering them toward projections aligned with discriminative geometric structures in the data. This adaptive approach produces more expressive and accurate characterizations of distributional discrepancies on the hypersphere. We evaluate FS3W across five diverse tasks, including gradient flows, Earth density estimation, and self-supervised representation learning. Empirical results consistently demonstrate that FS3W outperforms existing spherical optimal transport–based methods.
    Machine LearningLearning theoryMachine LearningOptimizationMachine LearningRepresentation learning
  137. #1567

    VLM-AR3L: Vision-Language Models for Absolute and Relative Rewards in Reinforcement Learning

    Kuan-Chen Chen, Winston Chen, Wei-Fang Sun, Min-Chun Hu
    Designing effective reward functions remains a major challenge in reinforcement learning (RL), particularly in open-ended environments where task goals are abstract and difficult to quantify. In this work, we present VLM-AR3L, a framework that leverages Vision-Language Models (VLMs) to provide both absolute and relative rewards for RL. VLM-AR3L interprets an agent’s visual observations in the context of a natural language task goal, and learns both absolute and relative rewards from VLM-generated preference labels. The absolute reward model predicts scalar evaluations for individual states, while the relative reward model compares consecutive observations to infer progress or regression toward the task goal. Their integration combines the stability of state-based evaluation with the robustness of comparative supervision. We evaluate VLM-AR3L across benchmarks spanning classic control, manipulation, and open-world embodied tasks, with a particular focus on Minecraft given its visual complexity and long-horizon decision-making requirements. Experimental results show that VLM-AR3L consistently outperforms prior VLM-based reward learning methods. Videos and code are available on the project website: https://vlm-ar3l.github.io/.
    Computer VisionEmbodied vision: Active agents, simulationMachine LearningFoundation modelsMachine LearningLearning preferences or rankingsMachine LearningPartially observable reinforcement learning and POMDPsMachine LearningReinforcement learning
  138. #1577

    GraphSculptor: Sculpting Pre-training Core Sets for Graph Self-supervised Learning

    Chuang Liu, Zelin Yao, Xueqi Ma, Luzhi Wang, Mukun Chen, Pinghua Xu, Wenbin Hu
    Graph self-supervised learning (SSL) typically relies on large-scale unlabeled datasets, heavily inflating computational costs. However, empirical evidence suggests that these datasets contain substantial redundancy—our analysis reveals that uniformly subsampling 50% of graphs retains over 96% of downstream performance. To exploit this redundancy, we introduce GraphSculptor for pre-training coreset construction. Unlike methods dependent on additional training-time signals or limited solely to topological statistics, GraphSculptor provides a label-free solution that constructs coresets via two complementary perspectives: intrinsic structure and contextual semantics. Concretely, structural diversity is quantified using intrinsic graph statistics, yielding a structural feature vector for each graph, while semantic diversity is captured by utilizing a pre-trained language model to encode descriptions generated via graph-to-text. GraphSculptor integrates these signals into a unified metric space and performs cluster-aware selection to preserve joint structural--semantic diversity. We further derive a theoretical bound on the loss gap between coreset and full-data pre-training, offering theoretical motivation for our selection formulation. Extensive experiments demonstrate that GraphSculptor effectively "sculpts" the dataset: a 10% coreset achieves 99.6% of full-data performance while reducing pre-training time by nearly 90%, offering a scalable solution for data-efficient graph pre-training.
    Data MiningMining graphs
  139. #1581

    PDFlow: Popularity-Debiased Flow Matching for Sequential Recommendation

    Zican Yang, Zeyu Li, Cong He, Xiaotao Wu, Guanfeng Liu, Pengpeng Zhao
    Generative models have emerged as a powerful paradigm in sequential recommendation due to their superior distribution modeling. However, long-tail data distributions inevitably induce popularity bias, as iterative generation trajectories gravitate toward dense clusters of popular items. Current debiasing methods struggle to enforce consistent step-wise constraints, rendering them ineffective for the multi-step process of generative recommendation. To address this challenge, we propose a method called Popularity-Debiased Flow Matching for sequential recommendation (PDFlow). Specifically, grounded in the flow matching framework, PDFlow integrates a residual popularity vector field beyond the main flow to model the popularity debiasing flow at each step. This enables real-time trajectory intervention to adjust the generative path, preventing the model from overemphasizing popular items. Subsequently, we incorporate cross-popularity alignment loss, using co-occurrence patterns within the same sequence to align features of both popular and tail items, thereby mitigating popularity-driven distribution separation. Jointly, trajectory correction prevents popularity overfitting while cross-popularity alignment ensures latent fairness, effectively mitigating popularity bias. Extensive experiments on four real-world datasets demonstrate the effectiveness of PDFlow in improving long-tail coverage and overall recommendation performance. All codes and datasets are available at https://github.com/eqmll/PDFlow.
    Data MiningRecommender systems
  140. #1591

    Adversarial Optimization Scheme for Threat Detection Based on Hierarchical Oracle Supervision

    Yiqing Luo, Mingshu He, Xiaojuan Wang
    Conventional threat detection methodologies encounter substantial limitations in real-world complex network settings, owing to their dependence on single-dimensional analysis, vulnerability to adversarial perturbations, and propensity for overfitting. This study introduces an adversarial training optimization framework that incorporates hierarchical label encoding and prompt learning, designed to enhance model robustness and generalization in threat detection. The framework first establishes a hierarchical structure of attack scenarios and types, leveraging a graph attention network to encode semantic and structural dependencies among labels. Subsequently, the classification task is reformulated as a masked language modeling problem through prompt learning, enabling effective semantic alignment. Furthermore, a novel adversarial training mechanism is proposed, which utilizes local hierarchical information as a potent regularization signal. Within a game-theoretic architecture comprising a generator, an encoder, and a discriminator, the encoder is steered to integrate authentic hierarchical priors and produce high-fidelity oracle representations. This process encourages the generator to implicitly assimilate sample-specific hierarchical knowledge during adversarial learning, thereby improving the model's resilience to noisy inputs and its capacity to detect infrequent attacks. Evaluation results show that this method achieves an accuracy of 0.9972–1.0000 in complex mixed scenarios for scenario detection and 0.9987–0.9999 for attack category detection, while exhibiting strong robustness against interference.
    Machine LearningAdversarial machine learningMachine LearningDeep learning architecturesMachine LearningMulti-label learningMachine LearningRobustnessMultidisciplinary Topics and ApplicationsSecurity and privacy
  141. #1593

    No More Shortcuts: Network Traffic Anomaly Detection via Bidirectional Prediction

    Xinglin Lian, Chengtai Cao, Fanglin Yu, Ting Zhong, Fan Zhou
    Network Traffic Anomaly Detection (NTAD), particularly under zero-positive settings, is a critical task in cybersecurity. Existing zero-positive NTAD approaches primarily rely on reconstruction-based pipelines. Nevertheless, these methods are susceptible to an identical shortcut issue, where models indiscriminately reconstruct both normal and anomalous inputs, leading to anomaly overgeneralization. To address this limitation, we propose BiPred, the first prediction-based detection paradigm for NTAD. BiPred symmetrically divides each network traffic sample into two segments and reformulates the NTAD task as a bidirectional prediction across these two segments. Unlike reconstruction-based methods, our BiPred not only eliminates shortcut learning by design but also establishes explicit bidirectional contextual dependencies, making it more sensitive to anomalous traffic. Moreover, we develop a novel Residual Scanning Mamba block that encodes multi-view contextual information to support bidirectional prediction. A residual fusion mechanism is proposed for the multi-view scanning Mamba to suppress the accumulation of inter-view redundancy. This design prevents representation degradation and provides multi-view contextual details for prediction. Extensive experiments demonstrate the superiority of our BiPred paradigm, highlighting a new research direction for NTAD. Code is available at https://github.com/ikun0124/BiPred.
    Data MiningAnomaly/outlier detectionMachine LearningMulti-view learningMachine LearningTime series and data streams
  142. #1595

    DIRECT: Decentralized Intention and Latent Rule Emergence for Multi-Agent Cooperation Under Intermittent Communication

    Enguang Yao, Qidong Liu, Jialu Sun, Kaibo Huang, Mingliang Xu
    Most of the existing decentralized multi-agent systems facilitate coordination by achieving consensus. However, reaching global consensus among a large number of agents in complex settings is challenging because resolving a local disagreement often triggers cascading adjustments across the system. This problem is further exacerbated under intermittent communication. In the field of transportation, human drivers usually rely on intentions (e.g., turn signals) in conjunction with traffic rules (such as "yielding to through traffic") to achieve efficient cooperation, instead of establishing consensus with all surrounding vehicles. Inspired by this observation, we propose the Intention \otimes Latent Rule paradigm to facilitate coordination beyond consensus, which is instantiated by DIRECT (Decentralized Intention and Latent Rule Emergence Cooperation). Within this framework, agents forecast their intentions based on current actions, and the latent rule inference module derives latent-rule representations from available information to regulate competing intentions, enabling cooperative action selection even when communication is disrupted. Extensive experiments on challenging StarCraft II micromanagement and Multi-Agent Particle Environments demonstrate that DIRECT consistently outperforms state-of-the-art methods, achieving more robust coordination across a range of communication-degraded scenarios. Our code is available at https://github.com/AAI-ZZU/DIRECT-master.
    Agent-based and Multi-agent SystemsCoordination and cooperationAgent-based and Multi-agent SystemsMulti-agent learningMachine LearningMultiagent Reinforcement Learning
  143. #1597

    Learning Tree Automata with Term Rewriting

    Jakub Kopystiański, Jan Otop
    We present an extension of the Angluin-style learning algorithm for tree automata that incorporates deductive inference.
    The learning algorithm is provided with a term rewriting system that specifies properties of the target tree language (e.g., the order of subtrees under a symbol f is irrelevant).
    This term rewriting system is used to infer answers to some queries, which reduces the query complexity of the learning algorithm.
    We present examples of rewrite systems that express natural properties of tree-structured data, which yield a significant reduction in the number of queries.
    Knowledge Representation and ReasoningAutomated reasoning and theorem provingKnowledge Representation and ReasoningLearning and reasoning
  144. #1624

    AdaGeM: Adaptive Geometric Learning on Manifolds for Hyper-Relational Knowledge Graphs

    Ke-Jia Chen, Yuzhe Hu, Yu Ni, Linfeng Liu
    Hyper-relational Knowledge Graphs (HKGs) extend traditional knowledge graphs by introducing high-order dependencies, where auxiliary qualifiers provide detailed information. However, modeling such intricate topologies (e.g., mixtures of hierarchies, cycles, and chains) is challenging. Existing methods suffer from structural distortion due to reliance on a single geometric space and overlook dynamic semantics across entity roles. To address these issues, we propose AdaGeM (Adaptive Geometric Learning on Manifolds), an innovative space-adaptive framework that integrates geometric learning with a role-aware Transformer. First, we propose an adaptive geometric hypergraph encoder that projects entities into a product manifold and design a topology-aware gating mechanism to dynamically select optimal geometric spaces for each entity. Second, we develop a role-aware Transformer equipped with role-specific projections and micro-structural bias injection to refine embeddings by distinguishing entity semantics across different roles and focusing on valid n-ary interactions, respectively. Extensive experiments on benchmark datasets demonstrate that AdaGeM outperforms state-of-the-art baselines in entity/relation prediction tasks, with ablation studies validating the necessity of multi-manifold modeling for heterogeneous HKG structures.
    Data MiningKnowledge graphs and knowledge base completion
  145. #1625

    Lookahead Branching for Neural Network Verification

    Liam Davis, Duo Zhou, Huan Zhang, Guy Katz, Clark Barrett, Haoze Wu
    In this work, we investigate the effect of lookahead branching strategies in neural network verification. We present a general recipe to integrate lookahead into any branch-and-bound verifier and demonstrate how one of the current state-of-the-art branching heuristics, FSB, can be viewed as a special instantiation of the lookahead branching strategy. We also describe how, in addition to improving the quality of branching decisions, lookahead can generate additional lemmas that accelerate verification. We instantiate the method in two representative branch-and-bound-based verifiers (Marabou and α-β-CROWN), and demonstrate that lookahead leads to consistent speedups in verification time and up to 57% more solved instances. Code is available at https://github.com/ai-ar-research/lookahead-branching.
    AI Ethics, Trust, FairnesSafety and robustnessAI Ethics, Trust, FairnesTrustworthy AI
  146. #1627

    MathCritique: Enhancing LLM Reasoning via Critique Models with Test-Time and Training-Time Supervision

    Zhiheng Xi, Dingwen Yang, Jixuan Huang, Jiafu Tang, Xin Guo, Guanyu Li, Yiwen Ding, Wei He, Boyang Hong, Shihan Dou, WenYu Zhan, Xiao Wang, Xiaowei Shi, Yitao Zhai, Rongxiang Weng, Jingang Wang, Rui Zheng, Tao Ji, Tao Gui, Zuxuan Wu, Qi Zhang, Xipeng Qiu, Xuanjing Huang, Yu-Gang Jiang
    Training critique models to provide useful feedback for actor models is an effective approach in scalable oversight, especially for complex tasks like math reasoning. However, current research lacks suitable datasets for effectively training critique models and integrating them in a principled way at both test time and training time. To bridge this gap, we first propose AutoMathCritique, an automated and scalable framework for collecting critique data. Using it, we create MathCritique-76k, a dataset of $76,321$ instances paired with step-level feedback, which is then used to train critique models for generating natural-language feedback. We show that critique models consistently improve the actor’s performance—particularly on difficult queries at test time—with larger gains as inference compute scales. Building on the findings, we propose a critique-in-the-loop self-improvement method that incorporates critique-based supervision into the actor’s self-training process. Extensive experiments demonstrate that this method improves the actor’s exploration efficiency and solution diversity, especially on challenging queries, leading to a stronger actor model. Our code and datasets are at https://mathcritique.github.io/.
    Natural Language ProcessingLanguage generationNatural Language ProcessingLanguage models
  147. #1629

    ReDi-FM: Frozen Foundation Model for Continual Test-Time Adaptation in Medical Image Segmentation

    Jianhang Ji, Zhiming Cheng, Jianxiang Zhao, Tingyu Wang, Bingtao Ma, Yuhan Gao, Zuobin Ying, Shuai Wang
    Continual test-time adaptation (CTTA) adapts a pre-trained medical segmentation model online to an unlabeled target stream whose distribution changes over time. However, most existing CTTA methods rely on pseudo-labeling and self-supervised objectives, which inevitably yield noisy supervision under domain shifts. To mitigate this limitation, we introduce off-the-shelf Vision Foundation Models (VFMs) as external knowledge sources. Zero-shot VFMs are insufficient for medical segmentation because they lack medical semantics, yet they contain rich and heterogeneous generic knowledge. To exploit such external knowledge, we propose Reciprocal Distillation with a Frozen Foundation Model (ReDi-FM), a novel framework with two core components: using structure-aware prompts from the source model to guide the frozen VFM to generate target-adapted supervision, and distilling the resulting knowledge back into the adapting model. For robust distillation, we introduce two complementary objectives: uncertainty-driven hard distillation for precise guidance in ambiguous regions, and hard class-balanced soft distillation for richer supervision of under-represented and challenging structures. A consensus-aware gate further stabilizes adaptation when the two teachers disagree. Extensive experiments on multi-domain medical segmentation benchmarks demonstrate that ReDi-FM outperforms state-of-the-art CTTA methods. Code is available at https://github.com/M4cheal/ReDi-FM.
    Computer VisionBiomedical image analysisComputer VisionSegmentation, grouping and shape analysisComputer VisionTransfer, low-shot, semi- and un- supervised learning
  148. #1645

    Online Goal Recognition Using Path Signature and Dynamic Time Warping

    Douglas Tesch, Nathan Gavenski, Leonardo Rosa Amado, Odinaldo Rodrigues, Felipe Meneguzzi
    Online goal recognition in continuous domains poses two central challenges: efficiently encoding large trajectories and effectively comparing them.
    Recent work addresses these challenges by using custom state-space representations and metrics to compare observations against hypotheses.
    However, these approaches often overlook well-established encoding techniques used in other domains that offer substantial advantages.
    This paper introduces a novel method for online goal recognition that leverages path signatures, a compact, expressive representation of rough path theory that captures key semantic features of trajectories in an efficient way, enabling more meaningful comparisons between them.
    Experiments show that our method consistently outperforms the state of the art in predictive accuracy and online planning efficiency, while remaining competitive offline.
    Planning and SchedulingActivity and plan recognitionPlanning and SchedulingApplications
  149. #1650

    EMMS: Evidential Multi-Label Multi-Dimensional Selection

    Li Yang, Yanyong Huang, Jinyuan Chang, Ou Zheng, Minbo Ma, Xiaoyi Jiang
    Multi-label data often contain high-dimensional features, outlier instances, and noisy labels, all of which can lead to the curse of dimensionality and decreased performance in downstream tasks. Although numerous data reduction methods have been developed, existing approaches face two major limitations: 1) existing methods typically select features, instances, or labels independently, without considering how noise or redundancy in one dimension may negatively influence the selection of others; 2) there are very few feature and instance co-selection methods that commonly assume label annotations are free of noise, which is seldom true in practice. To address these issues, we propose Evidential Multi-Label Multi-Dimensional Selection (EMMS), which jointly performs feature, instance, and label selection on multi-label data. EMMS introduces a dual projection mechanism with sparsity constraints that transforms high-dimensional data first into a latent space and then into the label space. Simultaneously, projection residuals are explicitly modeled to facilitate the identification of representative instances, enabling unified selection across features, instances, and labels. Moreover, EMMS employs evidence theory to fuse instance-level and label-level evidence, thereby enhancing the reliability of the learned labels and reducing the influence of noisy labels, which in turn promotes multi-dimensional selection. Extensive experiments demonstrate that EMMS consistently outperforms state-of-the-art methods.
    Machine LearningFeature extraction, selection and dimensionality reduction
  150. #1673

    TS-PEFT: Unveiling Token-Level Redundancy in Parameter-Efficient Fine-Tuning

    Dabiao Ma, Ziming Dai, Zhimin Xin, Shu Wang, Jian Yang, Haojun Fei
    Current Parameter-Efficient Fine-Tuning (PEFT) methods typically operate under an implicit assumption: once a target module is selected, every token passing through it contributes equally to the downstream task and requires a parameter update. In this paper, we challenge this convention by revealing a pervasive token-level redundancy in the fine-tuning of large models (LMs). We propose TS-PEFT, a theoretical framework utilizing proximal optimization that acts as a dynamic probe to identify token-level redundancy during the fine-tuning process. Extensive experiments demonstrate that indiscriminately updating all tokens is not only computationally superfluous but often introduces optimization noise. Surprisingly, by discarding 30%-70% of token updates, TS-PEFT consistently matches or exceeds the performance of dense baselines such as LoRA, DoRA. Our in-depth analysis shows that the learned token-level sparsity is a superior indicator of module importance compared to traditional weight criteria, providing a novel data-driven perspective on the intrinsic adaptation mechanism of LMs.
    Machine LearningLearning sparse modelsMachine LearningMulti-task and transfer learningNatural Language ProcessingInterpretability and analysis of models for NLP
  151. #1677

    PAAL: Pattern-Anchor Alignment for Continual Knowledge Graph Embedding Under Structural Distribution Shift

    Yue Jian, Lin Li, Kaize Shi, Junwei Zhou, Yu Yang
    Continual knowledge graph embedding (CKGE) has gained popularity for managing dynamic knowledge graphs. Unlike general graph continual-learning approaches, CKGE focuses on retaining triple-level knowledge, thereby overcoming the inability of static models to accommodate continuously arriving facts. However, as new fact triples come in, real-world knowledge graphs often experience structural distribution shifts: the underlying topology and the way facts interconnect evolve over time. This presents a challenge that the structural distribution of historical data may diverge from that of emerging data, creating a distributional mismatch that complicates the adaptation to new trends. To address this, we propose Pattern-Anchor Alignment (PAAL), which introduces relation-level structural anchors to explicitly model these structural shifts. By quantifying the alignment between the structural distribution of emerging triples and historical anchors, PAAL implements an alignment-modulated optimization strategy. It utilizes a calibrated gating mechanism to precisely regulate the intensity of historical constraints based on structural stability. Experimental results indicate the effectiveness of PAAL, showing average H@1 improvements of 11.77% and 9.29% on pattern-shift and standard CKGE benchmarks, respectively. Our code is available at https://github.com/cangjie553/PAAL.
    Knowledge Representation and ReasoningLearning and reasoning
  152. #1701

    Information Entropy for LLM-generated Text Detection

    Xiaoquan Yi, Haozhao Wang, Jingcai Guo
    With the explosive growth of text generated by large language models (LLMs) on the internet, technologies for detecting LLM-generated text are increasingly important. However, existing technologies often require frequent access to LLMs during detection, significantly restricting their application in scenarios where only the text is accessible. To tackle this challenge, we identify that human-written texts exhibit higher entropy than LLM-generated texts. Motivated by this, we propose a novel method named IED, which leverages the Information Entropy for generative text Detection. Specifically, IED first transforms the given text into a vector of which each dimension represents the information entropy of each word, and then adopts a classifier to conduct the detection. Further, we propose IEGD which adopts the information gain to construct the vector by enhancing the differentiation between human-written and LLM-generated texts. To evaluate the effective of the proposed method, we construct a new large-scale dataset comprised of generated texts using various LLMs (LLaMA-7B, GPT-NeoX, etc.). Extensive experiments show that the proposed methods achieve improvement over state-of-the-art methods by up to 10.81%.
    Computer VisionTransfer, low-shot, semi- and un- supervised learningMachine LearningCost-sensitive learningMachine LearningFoundation modelsMachine LearningLearning sparse modelsMachine LearningOpen-World/Open-Set/OOD Learning
  153. #1712

    Breaking the Trade-off: Orthogonal Semantic Decoupling for Generalizable and Fair Deepfake Detection

    Zhongyu Shi, Siyu Peng, Yimin Kang, Ruiyang Xia, Yizhi Fang, Weiping Wen, Zeyu Gu, Sai Cheng
    Deepfake detection faces dual challenges in real-world deployment: cross-domain generalization and demographic fairness. Existing approaches often struggle with a trade-off between these goals. Generalization-oriented detectors can over-rely on demographic shortcuts, while fairness constraints tend to steer optimization away from the most discriminative decision boundary. To address this, we propose Orthogonal Semantic Decoupling (OSD), a framework that decouples demographic semantics from forgery cues. Specifically, we perform Singular Value Decomposition on the pretrained weights of a vision-language model, freezing the principal semantic subspace while learning parameter-efficient low-rank experts in the residual subspace. The experts comprise (1) Demographic Semantic Experts, a set of experts specialized via hard sampling and routed based on the similarities between image embeddings and text embeddings of predefined descriptions; and (2) a Universal Forgery Expert, which captures forgery features transferable across domains and demographics. Extensive experiments across multiple benchmarks demonstrate that our approach outperforms state-of-the-art methods in both generalization and fairness, breaking the trade-off. The code is available at https://github.com/sonder-lin/osd-deepfake-detection.
    AI Ethics, Trust, FairnesBiasAI Ethics, Trust, FairnesFairness and diversityAI Ethics, Trust, FairnesSafety and robustness
  154. #1721

    AHead-VLN: A Hierarchical Reasoning Framework Guided by Foresight for Human-Aware Navigation

    Junyi Zhang, Ziwen Luo, Yinzhe Zhou, Xinghe Chu, Zhaoming Lu
    In human-aware vision-and-language navigation (HA-VLN), agents need to follow instructions while navigating safely among people in crowded environments. Existing methods often struggle to effectively transform forward-looking risk prediction into decision-making basis, resulting in intelligent agents being prone to collisions in crowded scenarios and taking overly cautious actions, thereby reducing navigation efficiency. Moreover, it is difficult for high-level decisions based on social norms to be stably implemented in dynamic crowd scenarios, which can easily lead to behavioral jitter and planning deviation. In this work, we propose the AHead-VLN, a navigation framework that incorporates forward-looking risk information in the reasoning process. Specifically, we designed the path risk association (PRA) module, which compiles future risks into structured risk events aligned with each candidate path, making risk information decision-ready for the large language model (LLM) reasoning. To balance executability and decision stability, we have further designed the hierarchical LLM planner (HLP) with two specialized components. The ``Brain'' LLM makes route choices based on instructions and risks, while the ``Cerebellum'' LLM prepares executable action scripts for candidate routes and retrieves the appropriate script for execution. Experiments on the HA-VLN benchmark show that AHead-VLN improves success while enhancing safety compared to strong baselines.
    RoboticsApplicationsRoboticsHuman robot interactionRoboticsRobotics and vision
  155. #1732

    Incentive-Compatible Diffusion Combinatorial Auctions

    Haotian Zhu, Miao Li, Bin Li, Dengji Zhao
    The diffusion auction is an emerging auction model which leverages social networks to recruit more buyers, thereby improving auction revenue and allocation efficiency. Previous studies on diffusion auctions have primarily focused on the single-parameter domain, where each buyer's valuation function is represented by a single private value. However, due to the complex interdependencies between allocations and network structures, few studies have examined the combinatorial auction domain, in which buyers' valuation functions exhibit multidimensionality. In this paper, we fully characterize implementable diffusion mechanisms within the combinatorial auction domain for the first time. Based on the characterization, we identify a class of allocation policies for each of which the optimal payment, namely the one maximizing the seller's revenue, can be derived in closed form. Furthermore, we apply the established theory to design a practical diffusion combinatorial auction, named the Exhausted-Reference Pricing Mechanism, which is incentive-compatible, individually rational, and weakly budget balanced.
    Game Theory and Economic ParadigmsAuctions and market-based systemsGame Theory and Economic ParadigmsMechanism designMultidisciplinary Topics and ApplicationsEconomics
  156. #1748

    Beyond Pipeline Mimicry: Expert-level Aesthetic ISP via Reward-Guided Flow

    Tong Qiao, Kepeng Xu, Gang He, Zhenyang Liu, Wenxin Yu
    Recent deep learning-based ISP methods are primarily constrained by mimicking fixed camera pipelines, consequently struggling to achieve expert-level aesthetic quality. To this end, we propose AesISP, the first expert-level aesthetic ISP framework formulated as a reward flow model. Addressing the ill-posed nature of ISP, AesISP establishes a Privileged Prior Distillation paradigm. By leveraging expert-image-guided latent proxies, we decompose the intractable task into a tractable, proxy-driven learning process. Subsequently, we propose MeanFlow++, which reformulates the flow matching objective via target-prediction parameterization and a Progressive Temporal Curriculum. By evolving from learning instantaneous to long-range average velocities, this mechanism rectifies transport trajectories and uses deterministic mapping to impose explicit constraints, anchoring the generation trajectory to the expert manifold. Finally, we introduce a multi-dimensional reward-driven reinforcement learning approach.Leveraging Group Relative Policy Optimization (GRPO) to balance trade-offs across dimensions such as color and lighting, it steers the model to converge precisely on expert-level aesthetic standards. Experiments demonstrate that AesISP outperforms state-of-the-art methods in both quantitative metrics and aesthetic quality.
    Computer VisionApplications and SystemsComputer VisionImage and video retrievalKnowledge Representation and ReasoningApplications
  157. #1758

    Grammar-State Aware Beam Search for Enhancing Structural Diversity in LLM Generation

    Hantao Hua, Jiming Su, Yiping Yao, Feng Zhu
    In many large language model (LLM) applications, generating structurally diverse candidate outputs is crucial for downstream decision-making and reasoning. However, conventional beam search allocates search budget at the token-prefix level, so multiple beam slots may be occupied by lexically different but structurally similar hypotheses. To address these limitations, we propose Grammar-State Aware Beam Search (GSA-Beam). Our method first organizes search at the constraint state level rather than individual prefix level, enabling explicit control over structural diversity. It further incorporates a dynamic state-level beam regulation mechanism that adaptively adjusts beam allocation based on the branching of constraint states, efficiently exploring the feasible structured solution space. Experiments on JSON-schema constrained generation show that GSA-Beam improves explicit state coverage and final structural diversity while preserving schema validity; across several LLM backbones, it reduces reference latency by 22.5% on average compared with standard constrained beam search. Code is available at https://github.com/Paradozile/GSABeam.
    Natural Language ProcessingLanguage generationNatural Language ProcessingLanguage modelsNatural Language ProcessingApplicationsMachine LearningGenerative modelsMachine LearningStructured prediction
  158. #1759

    Band Together: Untargeted Adversarial Training with Multimodal Coordination Against Evasion-Based Promotion Attacks

    Guanmeng Xian, Ning Yang, Philip S. Yu
    Multimodal recommender systems exploit visual and textual signals to alleviate data sparsity, but this also makes them more vulnerable to evasion-based promotion attacks. Existing defenses are largely limited to single-modal settings and mainly focus on poisoning-based threats, leaving evasion-based threats underexplored. In this work, we first identify a cross-modal gradient mismatch under the multi-user promotion setting, where visual and textual perturbations are optimized in inconsistent directions due to the dominance of distinct user groups. This phenomenon dilutes the attack effectiveness and leads robust training to underestimate worst-case risks. To address this issue, we propose Untargeted Adversarial Training with Multimodal Coordination (UAT-MC). UAT-MC tackles the challenge of unknown targeted items in evasion-based attacks (as opposed to poisoning-based attacks) by treating all items as potential targets, and introduces a gradient alignment mechanism to explicitly correct this mismatch. This design ensures synchronized perturbations across modalities, thereby maximizing adversarial strength for robust training. Extensive experiments demonstrate that UAT-MC significantly improves robustness against promotion attacks while maintaining acceptable recommendation performance under the defense–accuracy trade-off. Code is available at https://github.com/
    gmXian/UAT-MC.
    Data MiningRecommender systemsMachine LearningAdversarial machine learningMachine LearningMulti-modal learning
  159. #1805

    Perturbation Matters in Time Series Forecasting: A Wave-attention-aware Transformer Method

    Yiming Wang, Yiqing Su, Ximing Li, Changchun Li, Bing Wang
    Time series forecasting (TSF) refers to a fundamental task of predicting future sequential data based on historical observations. One representative category of TSF methods is transformer-based approaches, which translate time series into token sequences (i.e., as raw texts) before applying well-established transformers. In this paper, we conducted extensive preliminary experiments to evaluate their stability against various noises, and we empirically observed an interesting phenomenon: applying noises with appropriate levels to training time series can promote the predictive accuracy of transformer-based TSF methods. We analyze this phenomenon from the perspective of token attention and find one basic reason is that applying noises with appropriate levels can lead to token attention weights wavy, which is more consistent to the basic characteristic of time series data. Motivated by this phenomenon and analysis, we apply perturbations and propose a sub-objective wrt, perturbations constraining token attention weights to be wavy. Accordingly, we propose a novel TSF method, namely Wave-Attention-aware TransformER (WATER). We empirically evaluate \baby across the commonly used benchmark datasets, and experimental results indicate that it consistently outperforms the existing TSF methods.
    Machine LearningTime series and data streams
  160. #1814

    Disentangling Coarse and Fine Latent Dynamics for Probabilistic Time Series Forecasting

    Changze Zhou, Ruichu Cai, Shengbin Nie, Juntao Fang, Jie Qiao, Zijian Li
    Probabilistic time series forecasting seeks to quantify the uncertainty of future observations. While recent works introduce latent variables to alleviate the spurious dependencies caused by hidden confounders, thereby reducing overly wide confidence intervals, simply incorporating latent factors is not sufficient. When the underlying latent dynamics evolve at multiple temporal scales, existing methods may entangle temporally coarse and fine latent dynamics, which introduces spurious latent transitions and in turn amplifies predictive uncertainty. Therefore, disentangling temporally coarse and fine latent dynamics is essential for achieving sharper and more reliable probabilistic forecasting results. Building on this insight, we propose COFE (COarse And FinE latent dynamics disentanglement), a variational autoencoder–based framework that models the temporal distribution by disentangling and modeling the rapidly changing and slowly varying latent dynamics simultaneously. In particular, the proposed COFE harnesses the independence of estimated noises between adjacent latent states as well as a sparsity constraint on estimated noise to disentangle latent dynamics across different scales effectively. More specifically, we show that the multi-scale latent dynamics are disentangled with rigorous theoretical guarantees. Extensive experiments on 15 benchmark datasets, compared against 12 state-of-the-art baselines, demonstrate that COFE achieves superior uncertainty quantification, validating its effectiveness in a wide range of real-world scenarios. Code is available at https://github.com/polars8948/COFE
    Machine LearningTime series and data streams
  161. #1835

    Bi-CoG: Bi-Consistency-Guided Self-Training for Vision-Language Models

    Rui Zhu, Song-Lin Lv, Zi-Kang Wang, Lan-Zhe Guo
    Exploiting unlabeled data through semi-supervised learning (SSL) or leveraging pre-trained models via fine-tuning are two prevailing paradigms for addressing label-scarce scenarios. Recently, growing attention has been given to combining fine-tuning of pre-trained vision-language models (VLMs) with SSL, forming the emerging paradigm of semi-supervised fine-tuning. However, existing methods often suffer from model bias and hyperparameter sensitivity, due to reliance on prediction consistency or pre-defined confidence thresholds. To address these limitations, we propose a simple yet effective plug-and-play methodology named Bi-Consistency-Guided Self-Training (Bi-CoG), which assigns high-quality and low-bias pseudo-labels, by simultaneously exploiting inter-model and intra-model consistency, along with an error-aware dynamic pseudo-label assignment strategy. Both theoretical analysis and extensive experiments over 14 datasets demonstrate the effectiveness of Bi-CoG, which consistently and significantly improves the performance of existing methods.
    Machine LearningMulti-modal learningMachine LearningSemi-supervised learning
  162. #1852

    Automated Approach for Solving Infinite-state Polynomial Reachability Games

    Krishnendu Chatterjee, Ehsan Kafshdar Goharshady, Mehrdad Karrabi, Maximilian Seeliger, Đorđe Žikelić
    Reachability games are two-player games played on a graph, where the objective of REACH player is to reach the target set whereas the objective of SAFE player is to stay away from the target set. Reachability games have important applications in artificial intelligence and reactive synthesis, and many of these applications give rise to infinite-state reachability games. In this paper, we study turn-based reachability games on infinite-state graphs defined over valuations of a finite set of real variables. We consider the problem of determining the existence of and computing a winning strategy for REACH player. Our contributions are twofold. First, we propose ranking certificates for reachability games, a sound and complete proof rule for proving that REACH player has a winning strategy from the specified initial state. Second, we consider polynomial reachability games, where transitions and objectives are described by polynomial constraints over real variables, and propose a fully automated algorithm for computing a winning strategy for REACH player together with a formal correctness witness in the form of a ranking certificate. The algorithm is sound, semi-complete, and runs in sub-exponential time. Our experiments demonstrate the ability of our method to solve challenging examples from the literature that were out of the reach of existing methods. Specifically, for the classical Cinderella-Stepmother game, we are able to compute an optimal winning strategy for an arbitrary precision parameter for the first time.
    Agent-based and Multi-agent SystemsFormal verification, validation and synthesisPlanning and SchedulingPlanning algorithms
  163. #1855

    Fast Algorithms for Lexicographic Inference

    Jonas Klein, Matthias Thimm
    We present two new algorithms for the problem of lexicographic inference from conditional knowledge bases. These algorithms are based on SAT and MaxSAT encodings of the underlying problem of lexicographic comparisons of classical interpretations and require only a polynomial number of SAT solver calls. In our experimental evaluation we show that our new algorithms signficantly outperform the state of the art.
    Knowledge Representation and ReasoningNon-monotonic reasoning
  164. #1894

    Progressive Reasoning with Primitive Correction for Compositional Zero-Shot Learning

    Ziyi Chen, Haoyan Shi, Sunhan Xu, Congyan Lang
    Compositional Zero-Shot Learning (CZSL) aims to combine known attributes and objects as primitives for recognizing previously unseen attribute-object pairs. Prior works either predict attributes and objects independently, missing their strong contextual dependency, or use unidirectional conditional modeling (e.g., object-guided attribute prediction), which is prone to error propagation. We propose PRPC, a Progressive Reasoning framework with Primitive Correction, which explicitly models the bidirectional dependency between attributes and objects via step-wise inference. PRPC performs mutual correction of primitives to suppress prediction errors in earlier steps. Specifically, we formulate CZSL as structured, Q&A-style Chain-of-Thought reasoning process and constrain the MLLM to follow predefined semantic steps to generate intermediate decisions. To further enhance the reliability and logical consistency of intermediate reasoning, we introduce reinforcement learning post-training with a GRPO-based objective, providing step-level rewards aligned with the progressive inference procedure. Extensive experiments on three CZSL benchmarks demonstrate that PRPC achieves state-of-the-art performance, validating the effectiveness of progressive reasoning and bidirectional correction for robust compositional generalization.
    Computer VisionRecognition (object detection, categorization)Computer VisionTransfer, low-shot, semi- and un- supervised learning
  165. #1897

    Privacy-Preserving Reinforcement Learning with One-Sided Feedback

    Lin Cong, Guangyan Gan, Hanzhang Qin, Zhenzhen Yan
    We study reinforcement learning (RL) in multi-dimensional continuous state and action spaces with one-sided feedback, where the agent receives partial observations of the state and obtains reward information for only a subset of the state-action space at each time step. This setting introduces substantial challenges in both learning efficiency and privacy preservation. To address these challenges, we propose POOL, a novel privacy-preserving RL algorithm. We conduct a comprehensive theoretical analysis of POOL, deriving a sample complexity bound of O~((1+E_rho) H^3 alpha^-2), which matches the known lower bounds for non-private RL. Here, E_rho denotes the privacy parameter, H is the time horizon, and alpha is optimality-gap parameter. Our findings show that it is possible to enforce strong privacy guarantees while maintaining high learning efficiency, marking a significant step toward practical, privacy-aware RL in multi-dimensional environments with one-sided feedback.
    AI Ethics, Trust, FairnesOtherMachine LearningReinforcement learningMultidisciplinary Topics and ApplicationsOther
  166. #1910

    Geometry-Aware Riemannian Residual Displacement for Cross-Broad-Domain Graph Anomaly Detection

    Senyong Wang, Chen Zhu, Zihao Yao, Yaying Zhang
    Cross-Domain Graph Anomaly Detection supports the transfer of knowledge to unknown targets. Nevertheless, current approaches frequently struggle in the Cross-Broad-Domain paradigm which is characterized by two significant discrepancies: feature heterogeneity and structural disparity. Such diverse geometric topologies result in substantial manifold mismatch, as the traditional dependence on linear residuals in flat Euclidean space induces considerable geometric distortion when accommodating various topological patterns. We propose a novel framework termed Geometry-aware Riemannian Anomaly DEtector (GRADE) to tackle these difficulties. To address the semantic gap, we develop a domain-based contrastive learning technique combined with target-adaptive prompt tuning, which dynamically adjusts source representations to align with the target distribution while maintaining local contexts. To tackle the manifold mismatch, we present the Riemannian Residual Displacement (RRD), a curvature-adaptive metric that maps node representations onto multi-curvature Riemannian manifolds. By quantifying geodesic deviations in curved spaces, RRD corrects geometric distortions and identifies anomaly patterns that remain invariant to topological alterations. Additionally, a geometry-aware prototype alignment approach secures these residuals to data-independent priors for resilient inference. Comprehensive experiments on 11 benchmark datasets across 4 distinct domains demonstrate that GRADE significantly outperforms state-of-the-art baselines in both zero-shot and few-shot settings.
    Data MiningAnomaly/outlier detectionData MiningMining graphsMachine LearningLearning graphical models
  167. #1920

    Semantically Labelled Automata for Multi-Task Reinforcement Learning with LTL Instructions

    Alessandro Abate, Giuseppe De Giacomo, Mathias Jackermeier, Jan Křetínský, Maximilian Prokop, Christoph Weinhuber
    We study multi-task reinforcement learning (RL), a setting in which an agent learns a single, universal policy capable of generalising to arbitrary, possibly unseen tasks. We consider tasks specified as linear temporal logic (LTL) formulae, which are commonly used in formal methods to specify properties of systems, and have recently been successfully adopted in RL. In this setting, we present a novel task embedding technique leveraging a new generation of semantic LTL-to-automata translations, originally developed for temporal synthesis. The resulting semantically labelled automata contain rich, structured information in each state that allow us to (i) compute the automaton efficiently on-the-fly, (ii) extract expressive task embeddings used to condition the policy, and (iii) naturally support full LTL. Experimental results in a variety of domains demonstrate that our approach achieves state-of-the-art performance and is able to scale to complex specifications where existing methods fail.
    Agent-based and Multi-agent SystemsFormal verification, validation and synthesisKnowledge Representation and ReasoningLearning and reasoningKnowledge Representation and ReasoningReasoning about actionsMachine LearningMulti-task and transfer learningMachine LearningReinforcement learning
  168. #1925

    Going Beyond the Edge: Distributed Inference of Transformer Models on Ultra-Low-Power Wireless Devices

    Alexander Gräfe, Ding Huo, Vincent de Bakker, Johannes Berger, Marco Zimmerling, Sebastian Trimpe
    Transformer models are rapidly becoming a cornerstone of modern Internet of Things (IoT) applications, yet their computational and memory demands far exceed the capabilities of a single typical ultra-low-power IoT device. We present CATS, a framework for distributed transformer inference on ultra-low-power wireless devices, enabling multiple devices to collaboratively execute models far larger than what a single device can sustain. At its core, CATS is a communication-aware distributed transformer inference scheme co-designed across transformer partitioning, wireless communication and training. It employs SomeGather, a new pruned communication primitive that selectively broadcasts activation columns to reduce communication bandwidth and RAM usage without sacrificing model accuracy. Building on SomeGather, we design a partitioning method that exploits this primitive for efficient model parallelism. To cope with unreliable wireless communication, CATS employs message-dropout during training, which mimics packet losses and yields models that are robust to message loss during inference. In real-world experiments, we show that CATS brings distributed transformer inference to ultra-low-power wireless devices for the first time, with deployments on up to 16 devices that collaboratively execute transformer models up to 14 times larger than what a single device can run.
    Agent-based and Multi-agent SystemsAgent communicationAgent-based and Multi-agent SystemsEngineering methods, platforms, languages and tools
  169. #1942

    Off-Policy Evaluation and Learning for Survival Outcomes Under Censoring

    Kohsuke Kubota, Mitsuhiro Takahashi, Yuta Saito
    Optimizing survival outcomes, such as patient survival or customer retention, is a critical objective in data-driven decision-making. Off-Policy Evaluation (OPE) provides a powerful framework for assessing such decision-making policies using logged data alone, without the need for costly or risky online experiments in high-stakes applications. However, typical estimators are not designed to handle right-censored survival outcomes, as they ignore unobserved survival times beyond the censoring time, leading to systematic underestimation of the true policy performance. To address this issue, we propose a novel framework for OPE and Off-Policy Learning (OPL) tailored for survival outcomes under censoring. Specifically, we introduce IPCW-IPS and IPCW-DR, which employ the Inverse Probability of Censoring Weighting technique to explicitly deal with censoring bias. We theoretically establish that our estimators are unbiased and that IPCW-DR achieves double robustness, ensuring consistency if either the propensity score or the outcome model is correct. Furthermore, we extend this framework to constrained OPL to optimize policy value under budget constraints. We demonstrate the effectiveness of our proposed methods through simulation studies and illustrate their practical impacts using public real-world data for both evaluation and learning tasks.
    Constraint Satisfaction and OptimizationConstraint optimization problemsData MiningApplicationsMachine LearningCausalityMachine LearningEvaluation
  170. #1951

    TriCLIP: Tri-stage Calibration of Prompts, Evidence and Fusion for Robust Biomedical Vision–Language Models

    Fuxian Sui, Yongping Du, Deyi Li
    Large visual-language models demonstrate exceptional transferability in biomedical image analysis, yet their robustness remains challenging in generalization scenarios such as few-shot learning. This limitation stems from an overly entangled prompt space, causing instability in the conditional distribution, coupled with saliency bias that drives attention to dominate key regions, thereby restricting the coverage of visual representations. Furthermore, cross-modal fusion often struggles to extract critical information signals from redundant alignments. To address these structural bottlenecks, we propose TriCLIP, a decomposition-based framework designed to reconstruct the entire information flow across prompt conditioning, visual evidence extraction, and cross-modal fusion. Prompt Distribution Regularization enhances robustness to linguistic variations by optimizing class-conditional random context perturbations for continuous prompts. Feedback-driven Counterfactual Masking constructs a feedback-driven reverse saliency view that suppresses dominant evidence to promote complementary cue learning and mitigate attention bias. Finally, Selective Regulation Fusion regulates fusion selectivity through paired reinforcement-suppression principles, amplifying information correspondence while suppressing redundant channels. Extensive experiments on diverse biomedical benchmarks indicate that the proposed TriCLIP consistently outperforms existing methods.
    Computer VisionBiomedical image analysisComputer VisionMultimodal learning
  171. #1960

    Semantics-Guided Representation Learning for Long-Tailed ECG Classification

    Fengyi Guo, Ying An, Jianxin Wang
    Deep learning has achieved remarkable success in automated electrocardiogram (ECG) diagnosis. However, the long-tailed distribution of real-world ECG data remains a critical challenge. Existing methods mainly operate at the signal feature level and fail to exploit semantic relationships among diagnostic labels, which limits their ability to effectively recognize rare categories. To address this issue, we propose a Semantics-Guided Representation Learning (SGRL) framework, designed to leverage semantic priors to guide feature learning and thereby correct feature bias under long-tailed distributions. Specifically, we propose a Label Semantic Prompt Adaptation (LSPA) strategy, which introduces ECG-calibrated learnable parameters into label prompts to generate adaptive label semantic embeddings under a semantic-consistency constraint, thereby overcoming the limitations of coarse-grained label prompts in characterizing fine-grained intra-class morphological variations. Subsequently, we propose Semantic-Guided Representation Decorrelation (SGRD) that fuses ECG embeddings with adaptive label semantics and suppresses inter-class semantic correlations to obtain discriminative class semantic anchors, thereby improving tail-class separability. Extensive experiments on multiple public datasets demonstrate the significant superiority of SGRL in long-tailed ECG classification tasks. Our code is available on https://github.com/gfywudi/SGRL.
    Multidisciplinary Topics and ApplicationsHealth and medicineMachine LearningRepresentation learningMachine LearningMulti-modal learning
  172. #1969

    FlowOCT: Wavelet Flow Matching for OCT-to-OCTA Translation

    Ze Xiong, Dehui Qiu, Liguo Deng, Longfei Zhou, Zhetao Xu, Fa Zhang, Xiaohua Wan
    Retinal angiography provides critical vascular information, yet optical coherence tomography angiography (OCTA) acquisition remains slower and less accessible than conventional structural OCT. A key open question is: how angiographic signals can be recovered? To address the issues of artifacts and structural blurring in existing generative methods, we propose a wavelet flow matching-based model, FlowOCT. This method operates directly on OCT wavelet coefficients, learning frequency-specific velocity fields to map them to the OCTA wavelet distribution. To ensure anatomical accuracy of the vasculature, we introduce constraints such as depth-wise contrastive alignment and two-dimensional projection consistency. Given the sparse nature of vascular signals in OCTA data, a structure-focused loss function and corresponding evaluation metrics are designed. Experiment results show that FlowOCT outperforms existing methods in terms of both image quality and perceptual similarity. Downstream diagnostic tasks further validate its superior authenticity and generalization capability. Code is available at https://github.com/cnu-medilab/FlowOCT.
    Computer VisionBiomedical image analysisComputer VisionLow-level VisionComputer VisionMultimodal learning
  173. #1977

    Beyond Homophily: Spectrum-Based Graph Pre-Training and Cluster-Augmented Prompt Tuning

    Tingting Li, Yonghao Li, Xiangkun Wang, Sixiang Chen, Boyang Fan, Lingfei Ren, Hao Yu, Xin Yang
    Graph pre-training and prompt tuning provide an effective route to label-efficient node classification by learning transferable backbones and adapting them with lightweight prompts. However, existing pre-train-and-prompt pipelines often generalize poorly across graphs with diverse homophily due to two fundamental limitations: (i) spectral bias, where learned representations overemphasize a single frequency band, and (ii) prompt misalignment, where spatial-domain prompts cannot explicitly modulate critical spectral components and may even disrupt frequency-aware backbones. In this paper, we propose SCA-GPPT, a spectrum-aware framework that unifies Spectrum-based Graph Pre-training and Cluster-Augmented Prompt Tuning. First, we present a systematic spectral analysis of graph prompting, revealing the inherent limitations of spatial-domain prompts in capturing attenuated high-frequency signals. Second, we explicitly learn complementary low- and high-frequency spectral filters via parameterized Chebyshev polynomials and employ band-aware contrastive learning to model diverse spectral patterns during pre-training. For downstream adaptation, we freeze the pre-trained backbone and introduce cluster-augmented prompts that directly adapt Chebyshev coefficients while enforcing global structural consistency, enabling stable few-shot transfer across varying homophily regimes. Extensive experiments on real-world benchmarks demonstrate that SCA-GPPT consistently outperforms strong baselines under both transductive and inductive settings. Our code is available at https://anonymous.4open.science/r/SCA-GPPT-17B2.
    Data MiningMining graphsData MiningNetworksMachine LearningFew-shot learning
  174. #1978

    Toward a More Discriminative Learnware Paradigm via Explicitly Distinctive Specification

    Wenlu Yang, Wei Chen, Zhenan He
    The learnware paradigm aims to construct a dock system that maintains a collection of learnwares, each consisting of a well-established model coupled with a specification, thereby enabling users to directly reuse existing models to solve their tasks without training models from scratch. As a core component of the paradigm, the specification sketches the model’s properties and establishes reusability between the learnware and user tasks. Existing methods have demonstrated promising performance by designing specifications that characterize the task distributions mastered by well-established models. However, in complex real-world scenarios, task distributions often overlap, thereby weakening the uniqueness of specifications and obscuring the reusability relationships between learnwares and tasks. In this paper, we design an Explicitly Distinctive Specification (EDS) that enforces specification uniqueness, improving the discriminative capability of the learnware paradigm and avoiding ambiguity caused by overlapping task distributions. This specification method explicitly leverages distributional discrepancies between specifications to strengthen the uniqueness of the model properties sketched during the submitting stage and subsequently exploits the enhanced distributional discriminability in the deploying stage to improve learnware reusability. Extensive experiments demonstrate the effectiveness and strong performance of the proposed EDS within the learnware paradigm under complex task scenarios.
    Machine LearningLearnware/model reuse/transfer learning
  175. #1984

    NaVQA: Mitigating Silent Failures in Question Answering over Virtual Knowledge Graph

    Guohui Xiao, Haohan Xue, Lin Ren, Yishuai Geng, Guilin Qi, Shenyu Zhang, DeHao Guo, Marco Di Panfilo, Davide Lanti, Linfang Ding
    Virtual Knowledge Graphs (VKGs) provide unified access to legacy relational data sources through a high-level ontology modeling a domain of interest. The content of the ontology elements (classes and properties) is virtually mapped to underlying data sources through declarative mappings. The standard approach to interacting with a VKG system is to use SPARQL as the query language. However, the complexity of SPARQL creates a high entry barrier for normal users. In this paper, we study the VKG-QA task, which enables users to interact with the VKGs through a natural language (NL) interface by translating their questions into SPARQL queries. One critical challenge in such translation is silent failures, where a syntactically correct SPARQL query returns empty results because it involves ontology elements that lack mappings to the underlying data. To address this, we propose NaVQA (Navigation-based VKG Question Answering), a framework leveraging Large Language Models that (1) identifies and indexes the active ontology elements based on the mapping specifications, and (2) iteratively constructs a query graph from the NL question using the active ontology elements and their relations to synthesize SPARQL. Experiments on three real-world VKGs show that NaVQA significantly mitigates silent failures and outperforms existing baselines.
    Knowledge Representation and ReasoningDescription logics and ontologiesKnowledge Representation and ReasoningSemantic WebNatural Language ProcessingQuestion answering
  176. #1986

    Scalable Algorithms for Approximate DNF Model Counting

    Paul Burkhardt, David G. Harris, Kevin T. Schmitt
    Model counting of Disjunctive Normal Form (DNF) formulas is a critical problem in applications such as probabilistic inference and network reliability.
    For example, it is often used for query evaluation in probabilistic databases. Due to the computational intractability of exact DNF counting, there has been a
    line of research into a variety of approximation algorithms.

    We develop a new Monte Carlo approach with an adaptive stopping rule and short-circuit formula evaluation. We prove it achieves Probably Approximately Correct (PAC) learning bounds and has
    asymptotically improved time and randomness complexity compared to previous methods. We also show experimentally that it out-performs prior algorithms by at least three orders of magnitude in running time and scalability.
    Constraint Satisfaction and OptimizationConstraint satisfactionConstraint Satisfaction and OptimizationSolvers and toolsUncertainty in AIBayesian networksUncertainty in AIInferenceUncertainty in AIProbabilistic programming
  177. #1988

    MedVCoT: Bridging the Modality Gap in Medical VQA Through Latent Visual Reasoning

    Bo Xu, Quanhao Zhu, Boling Zhu, Chenyuan Wang, Liang Zhao, Hongfei Lin, Feng Xia
    With the rising demand for trustworthy AI in clinical practice, strong interpretability is now a critical requirement as well as accuracy. However, the modality gap for medical visual question answering is quite severe when continuous visual signals are forcibly projected into discrete text space for reasoning, and the loss of necessary diagnostic information leads to low precision and black-box opacity. To address this problem, we propose MedVCoT, which incorporates latent visual reasoning into the medical visual question answering(VQA) domain. Rather than merely integrating modules, MedVCoT utilizes the specialized expertise of MedSAM to train a large vision-language model so that it can autonomously generate consistent and continuous latent visual tokens within Visual Chain-of-Thought. This mechanism forces the model to explicitly "see" the lesion in the latent space before formulating a textual diagnosis, ensuring answers are causally rooted in verifiable visual evidence rather than statistical hallucination. We achieve this through a progressive 3-stage training procedure: medical feature alignment, visual reasoning learning by utilizing latent tokens generated, and instruction tuning for complex clinical scenarios. Extensive experiments show that MedVCoT can achieve state-of-the-art performance on multiple benchmarks, outperforming other methods by large margins. Meanwhile, it provides pixel-level segmentation masks to validate its diagnostic reasoning. Our demo is available at https://zhuqh19.github.io/MedVCoT.
    Computer VisionBiomedical image analysisComputer VisionInterpretability and transparencyMultidisciplinary Topics and ApplicationsBioinformaticsMultidisciplinary Topics and ApplicationsHealth and medicine
  178. #1992

    PDAR-RSITR: A Progressive Decoupling-Aggregation-Refinement Framework for Remote Sensing Image-Text Retrieval

    Shuhuai Wang, Songwei Pei, Bingfeng Liu, Duo Chai, Yuanzhou Huang, Jia Liu, Qian Li, Shangguang Wang
    Remote Sensing Image-Text Retrieval (RSITR) aims to achieve precise retrieval between remote sensing images and textual descriptions. However, existing methods neglect the multi-dimensional cognitive attributes inherent in remote sensing data and struggle to handle them simultaneously, leading to suboptimal retrieval performance. In this paper, we propose a novel Progressive Decoupling-Aggregation-Refinement framework for RSITR (PDAR-RSITR) to comprehensively capture multi-dimensional cognitive attributes. Specifically, we adapt Multi-dimensional Cognitive Decoupling to learn features across cognitive attributes of different dimensions via multi-process clustering. Subsequently, we utilize Salient Cognitive Aggregation to select and aggregate the most salient attributes via dynamic routing. Furthermore, we propose Expert Collaborative Refinement to enhance critical cross-modal relationships via three complementary expert perspectives. Extensive experimental results demonstrate that PDAR-RSITR significantly outperforms existing state-of-the-art methods across multiple metrics.
    Natural Language ProcessingInformation retrieval and text mining
  179. #1995

    Towards Fine-Grained Code-Switch Speech Translation with Semantic Space Alignment

    Yan Gao, Yazheng Yang, Zhibin Lan, Yidong Chen, Min Zhang, Daimeng Wei, Derek F. Wong, Jinsong Su
    Code-switching (CS) speech translation (ST) aims to translate speech that alternates between multiple languages into a target language text, posing significant challenges due to the complexity of semantic modeling and the scarcity of CS data. Previous studies mainly rely on the models themselves to implicitly learn semantic representations and resort to costly manual annotations. To mitigate these limitations, we propose enhancing Large Language Models (LLMs) with a Mixture-of-Experts (MoE) speech projector composed of language expert groups, where each group specializes in the semantic space of a specific language for fine-grained speech feature modeling. A language-specific loss and an intra-group load balancing loss are jointly introduced to guide efficient token routing across and within expert groups. Furthermore, we introduce a multi-stage training paradigm that utilizes readily available automatic speech recognition (ASR) and monolingual ST data, facilitating speech-text alignment and improving translation performance. To bridge the data gap for smooth domain transfer, a transition loss is employed to improve adaptation to CS scenarios. Extensive experiments on widely used datasets demonstrate the effectiveness and generality of our approach, achieving average improvements of 0.86 BLEU and 0.93 COMET over SeamlessM4T, with maximum improvements of 1.49 BLEU and 1.41 COMET across different test sets. Our code and supplementary appendices are available at https://github.com/XMUDeepLIT/CSST-SSA.
    Natural Language ProcessingMachine translation and multilingualityNatural Language ProcessingSpeech
  180. #1998

    Causal Newton Optimization: Online Calibration with Iterative Local Linear Modeling and Newton Updates

    Daigo Fujiwara, Tomonori Izumitani, Shohei Shimizu
    Optimization in industrial systems often involves calibrating from a semi-optimized state, where global exploration methods like Reinforcement Learning (RL) or Bayesian Optimization (BO) are inefficient or unsafe. We propose Causal Newton Optimization (CNO), an online algorithm that iteratively calibrates inputs under a known causal graph but unknown structural equations. CNO estimates local linear causal effects via additive interventions and employs a log-linear variance regression to robustly guide Newton-based updates. Evaluations on synthetic systems and a chemical plant simulator demonstrate that CNO achieves the best balance between objective improvement and robustness. While traditional PID control suits standard dynamical systems, CNO significantly outperforms RL and BO in complex structural causal models, providing the robust stability vital for safety-critical real-world applications.
    Uncertainty in AICausality, structural causal models and causal inferenceUncertainty in AISequential decision making
  181. #2007

    Counterfactual Reasoning for Responsibility Attribution in Probabilistic Multi-Agent Systems

    Chunyan Mu, Muhammad Najib
    Responsibility allocation---determining the extent to which agents are accountable for outcomes---is a fundamental challenge in the design and analysis of multi-agent systems. In this work, we model such systems as concurrent stochastic multi-player games and introduce a notion of retrospective (backward) counterfactual responsibility, which quantifies an agent's accountability for outcomes resulting from a given strategy profile. To allocate responsibility among agents, we utilise the Shapley value and formally show that this method satisfies key desirable properties, including fairness and consistency. Building on this foundation, we propose a formal framework that supports both verification and strategic reasoning in responsibility-aware multi-agent systems. Furthermore, by adopting Nash equilibrium as the solution concept, we demonstrate how to compute stable strategy profiles in which agents trade off responsibility against expected reward.
    Agent-based and Multi-agent SystemsFormal verification, validation and synthesisAIAgent-based and Multi-agent Systems
  182. #2013

    A Versatile Framework for Formula-Based Enforcement and Synthesis in Abstract Argumentation

    Andreas Niskanen, Jean-Guy Mailly, Yannis Dimopoulos, Pavlos Moraitis
    Argumentation dynamics provides techniques for revising argumentation theories in real-world domains (e.g., an autonomous medical diagnostic agent). Within this context, enforcement in abstract argumentation has become a prominent topic. Enforcement aims to modify an argumentation framework to satisfy given acceptability conditions while minimizing change from the original framework. Motivated by the need to address both syntactic and semantic notions of change, we propose formula-based enforcement, a generic framework that strictly generalizes existing approaches, by additionally covering cases they cannot handle, including semantic change. We analyze its complexity under central argumentation semantics, obtaining results from NP-completeness to completeness for the third level of the polynomial hierarchy. For second-level complete variants, we present an exact procedure based on MaxSAT solving and counterexample-guided abstraction refinement (CEGAR), and evaluate it empirically.
    Knowledge Representation and ReasoningArgumentation
  183. #2036

    LEMD: Latent Environment Extrapolation and Message Disentanglement for Dynamic Graph Under Distribution Shift

    Xiaoran Wei, Chen Zhao, Minglai Shao, Xintao Wu, Zhong Chen, Wenjun Wang, Qin Tian, Chang Liu
    Dynamic graph neural networks (DyGNNs) are widely used to model evolving interactions, but may fail under data distribution shift. Due to limited and unreliable interventions and insufficient disentanglement, the existing dynamic graph domain generalization approaches lead to suboptimal results. We formalize a message sufficiency causal view: a node representation is fully mediated by its received message multiset. Building on this perspective, we propose Latent environment Extrapolation and Message Disentanglement (LEMD), a novel robust representation learning framework for dynamic graph domain generalization. A message extrapolation mechanism under soft uncertainty constraints is proposed to obtain the diverse counterfactual message distributions. Causal information is disentangled fully from the messages to suppress shortcuts via a recoverable evolving disentanglement module. We further provide rigorous theoretical analysis and proofs to ensure the effectiveness of LEMD. Across all six datasets and two tasks, LEMD consistently improves over state-of-the-art dynamic graph generalization baselines under distribution shift, and achieves the best performance increase of 7.7% relative compared to the suboptimal baseline. The code of LEMD for reviewer is available at https://github.com/W-WuJi/LEMD.
    Data MiningAnomaly/outlier detectionData MiningKnowledge graphs and knowledge base completionData MiningMining graphsData MiningMining spatial and/or temporal dataMachine LearningRobustness
  184. #2037

    Salient-Residual Decoupled Multi-View Learning for Clustering

    Gaokai Wang, Yazhou Ren, Fengyu Zhang, Jie Xu, Chaoning Zhang, Zhen Long, Ce Zhu
    Multi-view clustering aims to utilize information from multiple feature representations to uncover underlying data structures. Most existing methods emphasize learning a consensus representation by enforcing consistency across views. However, those structures that cannot be directly incorporated into the clustering space receive little attention, and are often discarded as noise in practice. In this paper, we argue that such information can be informative signals for the clustering process and should be explicitly modeled rather than suppressed or ignored. Specifically, we propose salient-residual decoupled multi-view learning for clustering, SRDMVC, introducing a novel decomposition-fusion iterative optimization, which separates the feature space into a salient space and a residual subspace effectively and fuses them using a novel attention mechanism. The residual subspace can capture the deep-level structure of view information, making positive contributions to the clustering process, under proper constraints. Without assuming stronger view alignment or complementarity, SRDMVC enhances cluster discrimination, avoids spurious consensus, and alleviates representation degradation. Extensive experiments on benchmark datasets demonstrate the superior performance of the method we propose.
    Machine LearningClusteringMachine LearningMulti-view learning
  185. #2042

    Multiagent Stochastic Shortest Path Problem

    Martin Jonáš, Antonín Kučera, Vojtěch Kůr, Jan Mačák, Vojtěch Řehák
    We introduce and study the multi-agent stochastic shortest path (MSSP) problem, in which k agents strive to reach a target state, aiming to minimize the expected time to reach the target by any agent. We analyze the computational and strategy-complexity of the problem in both autonomous and coordinated settings, and we design efficient strategy-synthesis algorithms. The algorithms are experimentally evaluated on instances of increasing size against natural baselines.
    Agent-based and Multi-agent SystemsCoordination and cooperationAgent-based and Multi-agent SystemsMulti-agent planning
  186. #2052

    The Computational Complexity of Almost Stable Clustering with Penalties

    Farnam Mansouri, Sandra Zilles, Kamyar Khodamoradi
    We investigate the complexity of stable (or perturbation-resilient) instances of k-Means and k-Median clustering problems in metrics with small doubling dimension. While these problems have been extensively studied under multiplicative perturbation resilience in low-dimensional Euclidean spaces, we adopt a more general notion of stability, termed "almost stable", known from the literature as (alpha,epsilon)-perturbation resilience. Additionally, we extend our results to k-Means/k-Median with penalties, where each data point is either assigned to a cluster centre or incurs a penalty.
    We show that certain special cases of almost stable k-Means/k-Median (with penalties) are solvable in polynomial time. To complement this, we also examine the hardness of almost stable instances and (1 + 1/poly(n))-stable instances of k-Means/k-Median (with penalties), proving super-polynomial lower bounds on the runtime of any exact algorithm under the widely believed Exponential Time Hypothesis (ETH).
    Machine LearningClusteringMachine LearningLearning theory
  187. #2065

    From Standard to Robust: A Universal Framework for Continual Adversarial Defense

    Qian Wang, Hefei Ling, Yingwei Li, Qihao Liu, Ning Yu
    Continual adversarial defense (CAD) aims to defend target models against continuously emerging attacks. However, existing CAD methods typically rely on maintaining large amounts of replay data or multiple expert modules to mitigate catastrophic forgetting, or suffer from reduced robustness against adversarial examples or degraded performance on clean images. As a result, they often fail to provide clear advantages over standalone robust models. To address these limitations, we formulate four fundamental principles for CAD: (1) continual adaptation to new attacks without catastrophic forgetting, (2) few-shot adaptation, (3) memory-efficient adaptation, and (4) high classification accuracy on both clean and adversarial data. Guided by these principles, we explore and integrate cutting-edge techniques from continual learning, few-shot learning, and ensemble learning, and propose Universal Continual Adversarial Defense (UCAD), a universal framework that enables both standard and robust models to perform effective defense under the CAD setting. Extensive experiments validate the effectiveness of UCAD against multi-stage adversarial attacks and demonstrate significant improvements over a wide range of baseline methods. Moreover, we observe that as the number of encountered attacks increases, UCAD becomes increasingly robust, consistently enhancing the defense capability of existing robust models until saturation.
    Machine LearningAdversarial machine learningMachine LearningApplicationsMachine LearningFew-shot learningMachine LearningIncremental learning
  188. #2071

    Not All Timesteps Matter Equally: Selective Alignment Knowledge Distillation for Spiking Neural Networks

    Kai Sun, Peibo Duan, Yongsheng Huang, Guowei Zhang, Benjamin Smith, Nanxu Gong, Levin Kuhlmann
    Spiking neural networks (SNNs), which are brain-inspired and spike-driven, achieve high energy efficiency. However, a performance gap between SNNs and artificial neural networks (ANNs) still remains. Knowledge distillation (KD) is commonly adopted to improve SNN performance, but existing methods typically enforce uniform alignment across all timesteps, either from a teacher network or through inter-temporal self-distillation, implicitly assuming that per-timestep predictions should be treated equally. In practice, SNN predictions vary and evolve over time, and intermediate timesteps need not all be individually correct even when the final aggregated output is correct. Under such conditions, effective distillation should not force every timestep toward the same supervision target, but instead provide corrective guidance to erroneous timesteps while preserving useful temporal dynamics. To address this issue, we propose Selective Alignment Knowledge Distillation (SeAl-KD), which selectively aligns class-level and temporal knowledge by equalizing competing logits at erroneous timesteps and reweighting temporal alignment based on confidence and inter-timestep similarity. Extensive experiments on static image and neuromorphic event-based datasets demonstrate consistent improvements over existing distillation methods. The code is available at https://github.com/KaiSUN1/SeAl.
    Humans and AIBrain sciencesHumans and AICognitive modelingHumans and AICognitive systems
  189. #2081

    Feature Selection via Information Decomposition and Orthogonality Constraint for Partial Multi-Label Data

    Yao Zhang, Jun Tang
    Partial Multi-Label Feature Selection (PMLFS) integrates partial multi-label learning and Multi-Label Feature Selection (MLFS) within a unified optimization framework. Owing to the presence of noisy labels, it is more practical and challenging compared to traditional MLFS. However, existing methods often focus solely on label disambiguation while neglecting to distinguish the contributions of features to ground-truth labels and noisy labels. To address this, we propose a novel MLFS method based on information decomposition and orthogonality constraints, named PML-IDOC. The method partitions partial label information into ground-truth label information and noisy label information, imposing a local orthogonality constraint between ground-truth and noisy labels to prevent the latter from interfering with the former. Unlike previous approaches that constrain the low-rank property and sparsity of noisy labels, our method constructs separate mapping relationships from instances to ground-truth labels and noisy labels. By enforcing global orthogonality between the two coefficient matrices, it ensures low correlation between them, thus achieving the decoupling of feature contributions to ground-truth and noisy labels. Extensive experimental results demonstrate the effectiveness of PML-IDOC. The code is available at https://github.com/yunbao520/PML-IDOC
    Machine LearningFeature extraction, selection and dimensionality reductionMachine LearningMulti-label learningMachine LearningWeakly supervised learning
  190. #2100

    DAHGT-CCI: Dynamic Heterogeneous Graph Transformer for Spatial Transcriptomics Cell-Cell Interaction Inference

    Weiliang Huo, Shuo Yu, Qingchen Zhang
    Spatial transcriptomics has significantly advanced tissue biology and makes it possible to study the spatial interactions of cells in the microenvironment of complex tissues. However, accurately inferring intercellular communication from these data remains challenging due to the need to effectively integrate spatial topology with high-dimensional gene expression information. To address this challenge, we propose DAHGT-CCI, a novel method for inferring intercellular communication from spatial transcriptomics data. By fusing multimodal heterogeneous features to construct a unified node representation and introducing a meta-relation-aware dynamic attention mechanism, DAHGT-CCI can adaptively learn the weights of edges in the graph, thereby achieving superior precision and fine-grained inference of intercellular interactions. We evaluated the performance of DAHGT-CCI on six different spatial transcriptomics datasets. In these benchmarks, DAHGT-CCI outperformed existing graph-based learning methods and cell-cell communication inference models in terms of accuracy, AUROC, recall, and F1 score. These results demonstrate that DAHGT-CCI can more accurately reconstruct cell communication networks in complex tissue microenvironments, offering an indispensable computational tool for studying developmental processes, disease mechanisms, and potential therapeutic targets from a spatially resolved perspective.
    Machine LearningSupervised LearningMultidisciplinary Topics and ApplicationsBioinformaticsMultidisciplinary Topics and ApplicationsHealth and medicine
  191. #2104

    What Affects the Stability of Tool Learning? An Empirical Study on the Robustness of Tool Learning Frameworks

    Chengrui Huang, Zhengliang Shi, Yuntao Wen, Xiuying Chen, Peng Han, Shen Gao, Shuo Shang
    Tool learning methods have enhanced the ability of large language models (LLMs) to interact with real-world applications.
    Many existing works fine-tune LLMs or design prompts to enable LLMs to select appropriate tools and correctly invoke them to meet user requirements. However, it is observed in previous works that the performance of tool learning varies from tasks, datasets, training settings, and algorithms. Without understanding the impact of these factors, it can lead to inconsistent results, inefficient model deployment, and suboptimal tool utilization, ultimately hindering the practical integration and scalability of LLMs in real-world scenarios. Therefore, in this paper, we explore the impact of both internal and external factors on the performance of tool learning frameworks. Through extensive experiments on two benchmark datasets, we find several insightful conclusions for future work, including the observation that LLMs can benefit significantly from increased trial and exploration. We believe our empirical study provides a new perspective for future tool learning research.
    Agent-based and Multi-agent SystemsAgent communicationAgent-based and Multi-agent SystemsApplicationsData MiningApplications
  192. #2105

    Robust Scheduling Against Machine Failures

    Zhenwei Liu, Guochuan Zhang, Yifan Zhao
    We study a robust scheduling problem on identical machines in which machines may fail after the initial assignment. The goal is to compute an initial schedule together with a recovery strategy that minimizes the post-failure makespan, under the constraint that jobs on available machines cannot be moved.

    We consider two failure models: a strong adversary, which selects the failed machines, and a weak adversary, which selects only the number of failures. For up to k failures, we give algorithms with robustness ratios 1.618 for the strong adversary and 1.5 for the weak adversary. For the single-failure case, k = 1, we obtain best possible ratios 1.387 and 1.281, respectively, matching our lower bounds.
    Planning and SchedulingSchedulingGame Theory and Economic ParadigmsNoncooperative gamesAgent-based and Multi-agent SystemsResource allocationPlanning and SchedulingTheoretical foundations of planning
  193. #2110

    First Mathematical Runtime Analyses of Multi-Objective Evolutionary Algorithms for Multi-Valued Decision Variables

    Mingfeng Li, Zheng Cheng, Weijie Zheng, Benjamin Doerr
    Problems defined on binary decision spaces have been intensively studied in the theory of multi-objective evolutionary algorithms (MOEAs). In contrast, no mathematical runtime analyses exist so far for MOEAs dealing with decision variables that take a finite number \(r>2\) of values, despite the prevalence of such problems in practice. In this work, we begin to fill this research gap. We analyze how the classic SEMO algorithm with unit-strength local mutation computes the Pareto front of an \(r\)-valued counterpart of the classic \oneminmax benchmark. For the expected number of function evaluations until the Pareto front is covered by the population of this MOEA, we prove an upper bound of \(O(n^2 r^2 \log n)\) and a near-tight lower bound of \(\Omega(n^2 r (r + \log n))\). We can close the small remaining gap between these two bounds by considering a variant of the algorithm that accepts only strictly better solutions; for this variant, we show an upper bound of \(O(n^2 r (r + \log n))\), matching our lower bound (which also holds for this variant).
    Our results suggest that classic MOEAs encounter no significant additional difficulties when dealing with multi-valued decision variables. However, significantly more advanced tools may be required to obtain tight bounds for algorithms with more complex population dynamics.
    SearchEvolutionary computation
  194. #2111

    Uncertainty-Aware Spatial-Frequency Registration and Fusion for Infrared and Visible Images

    Xingyuan Li, Haoyuan Xu, Xingyue Zhu, Jun Ma, Zhiying Jiang, Jinyuan Liu, Yang Zou
    Infrared and Visible Image Fusion (IVIF) has shown promise in visual tasks under challenging environments, but fusion under unregistered conditions faces inherent misalignments. Current studies to solve them either predict the deformation parameters coarse-to-fine (i.e., coarse registration and fine registration) or estimate the deformation fields in multi-scales for registration. Though straightforward, they overlook the cumulative errors in registration, which contaminate the fusion stage and severely deteriorate the resulting images. We introduce the Spatial-Frequency Registration and Fusion (SFRF) framework, which incorporates uncertainty estimation and infrared thermal radiation distribution consistency into a unified pipeline to handle the error accumulation for robust registration and fusion across both spatial and frequency domains. Specifically, SFRF constructs a Multi-scale Iterative Registration (MIR) framework that iteratively refines the deformation field across scales, leveraging uncertainty estimation at each stage to mitigate error accumulation and enhance alignment accuracy dynamically. To ensure the accurate alignment of infrared thermal distributions during registration, thermal radiation distribution consistency is employed as a frequency-domain supervisory signal, promoting global consistency in the frequency domain. Based on the spatial-frequency alignment, SFRF further adopts a Dual-branch Spatial-Frequency Fusion (DSFF) module, which incorporates spatial geometric features and frequency distribution information to reconstruct visually appealing images. SRFR achieves impressive performance across diverse datasets. Code is available at github.com/xhhaoyan/SFRF.
    Computer VisionLow-level Vision
  195. #2112

    Stabilizing Welfare-Maximizing Decisions via Endogenous Transfers

    Joshua Kavner
    Many multiagent systems depend on collective decisions made by self-interested agents, which raises deep questions about coalition formation and stability. We study social choice with endogenous, outcome-contingent transfers, where agents voluntarily form contracts that redistribute utility depending on the collective decision, allowing for fully strategic coalition formation. We show that under consensus rules, individually rational strong Nash equilibria (IR–SNE) always exist, implementing welfare-maximizing outcomes with feasible transfers, and provide a simple, efficient algorithm to construct them. For more general anonymous, monotonic, and resolute rules, we identify necessary conditions for viable deviations, significantly limiting the possibility of destabilizing coalitions. By bridging cooperative and noncooperative perspectives, our approach shows that transferable utility can achieve core-like stability, restoring efficiency and budget balance even where classical impossibility results generally apply. Overall, this framework offers a practical and robust way to coordinate large-scale strategic multiagent systems.
    Agent-based and Multi-agent SystemsMulti-agent planningGame Theory and Economic ParadigmsComputational social choiceGame Theory and Economic ParadigmsMechanism design
  196. #2140

    Efficient Time Series Clustering from Multiscale Reservoir Dynamics with Granular-Ball Anchoring Graph Optimization

    Yifan Wang, Lifeng Shen, Shuyin Xia, Yi Wang
    Time-series clustering remains challenging due to the inherent trade-off between clustering effectiveness and computational efficiency.
    Similarity-based methods often suffer from quadratic complexity caused by pairwise distance computations, while deep learning–based approaches typically rely on costly iterative training and a large number of trainable parameters. In this paper, we propose MSRGC-Net, an efficient time-series
    clustering framework that integrates multiscale reservoir computing, granular-ball-based anchoring graph construction, and consensus learning.
    MSRGC-Net adopts a training-free reservoir computing paradigm to extract multiscale temporal representations from raw time series without
    backpropagation, significantly reducing computational overhead. To capture the intrinsic structure of the resulting representations, granular-ball
    computing is employed to adaptively model data distributions via density-consistent regions, yielding compact and robust anchor graph representations. Furthermore, a consensus-based anchoring graph optimization strategy is introduced to effectively align multiscale reservoir representations and integrate complementary information across temporal scales. Extensive experiments on widely used univariate and multivariate benchmark datasets demonstrate that MSRGC-Net consistently outperforms state-of-the-art methods in clustering performance while maintaining superior computational efficiency.
    Machine LearningClustering
  197. #2141

    scGTN: Deep Siamese Graph Transformer Network for Single-cell RNA Sequencing Clustering

    Jinke Wu, Yifan Wang, Siyu Yi, Caiyang Yu, Ziyue Qiao, Nan Yin, Jiancheng Lv, Wei Ju
    Single-cell RNA sequencing (scRNA-seq) serves a pivotal role in characterizing gene expression at the cellular level, enabling the identification of cell types and advancing the understanding of cellular heterogeneity. Despite the significant progress in scRNA-seq data clustering, we argue that current methods always ignore the sparsity and noise, as well as the complex intercellular structural information inherent in scRNA-seq data. Toward this end, in this paper, we propose a novel single-cell RNA-seq clustering framework via deep Siamese Graph Transformer Network (termed scGTN), which explicitly integrates gene expression profile and intercellular structural dependencies for cell clustering. In particular, we formulate scRNA-seq data as a graph and construct two augmented graph views that serve as dual views to capture complementary intercellular information. Then, a Siamese graph transformer network is employed to explicitly incorporate shortest-path information and node-wise distances for capturing richer structural relationships between cells. Finally, we employ an optimal transport strategy to guide the cell clustering in a self-supervised manner. Extensive experiments on multiple benchmark scRNA-seq datasets demonstrate that our scGTN consistently outperforms existing methods. Our code is available at https://github.com/W-RMSL/scGTN.
    Data MiningMining graphsMultidisciplinary Topics and ApplicationsBioinformatics
  198. #2145

    Collateral Damage Constrained Backdoor Attacks on Graph Neural Networks

    Di Jin, Zechuan Zhang, Bingdao Feng, Xiaobao Wang, Dongxiao He, Zhen Wang
    Graph Neural Networks (GNNs) are vulnerable to backdoor attacks, where models behave normally on clean data but exhibit targeted misclassifications once specific triggers are activated. Existing backdoor attacks on GNNs mainly focus on enhancing trigger stealthiness or diversifying attack paradigms. However, these methods overlook a fundamental property of GNNs: trigger-induced malicious signals inevitably propagate through graph neighborhoods, causing unintended mispredictions on clean nodes, i.e., Collateral Damage. To address this issue, we propose the Collateral Damage Constrained Graph Backdoor Attack (CDCA), a novel framework that explicitly controls malicious diffusion. Specifically, the proposed method combines neighborhood-aware target node selection with a self-constrained trigger generation strategy to suppress trigger-induced propagation by enforcing prediction consistency on clean K-hop neighboring nodes. Extensive experiments on real-world datasets demonstrate that the proposed method remains effective while significantly reducing collateral damage.
    Data MiningMining graphsMachine LearningAdversarial machine learning
  199. #2148

    A Unified Prompt for Enhancing Heterogeneous Graph Pre-training via Edge-based Message Passing

    Fengyu Yan, Xiaobao Wang, Qianhua Tang, Dongxiao He, Di Jin
    Inspired by natural language processing prompt learning, recent heterogeneous graph prompt-tuning methods have been developed to better align pre-trained models with downstream tasks. However, existing heterogeneous prompt methods primarily focus on holistic framework design, causing prompts to heavily depend on specific pre-trained models and thus limiting generalization. To address this, we focus on the common encoder module shared across pre-trained models and the multi-relational edge structure unique to heterogeneous graphs, proposing a novel heterogeneous graph prompt-tuning method named HGMRP. It simultaneously resolves the issue of inconsistent objectives during prompt optimization when messages propagate across different relational edges. By assigning learnable prompt vectors to different types of edges, HGMRP integrates multi-relational prompts directly into the message-passing process. Furthermore, we extend single-relation prompts into a composite structure that includes both relation-specific and shared components, thereby enhancing expressive capability. Extensive experiments conducted on four widely used heterogeneous graph datasets show that HGMRP can adapt to various types of pre-trained models and significantly outperforms existing heterogeneous prompt methods, validating its effectiveness and superiority.
    Data MiningMining graphsData MiningMining heterogenous data
  200. #2149

    Shot-Conditioned Vision-Language Adaptation for Effective Harmful Content Detection from Online Short Videos

    Shuai Xu, Zao Qiu, Xuelin Zhu, Yicong Li
    Short video harmful content detection aims to automatically identify diverse anomalies from user-generated media. This task presents unique challenges due to frequent editing cuts and highly variable anomaly densities, limiting the effectiveness of traditional surveillance-based approaches. Moreover, existing Vision-Language Model-based approaches typically rely on rigid instance selection mechanisms that fail to adapt to the unpredictable duration of anomalies in such unconstrained videos. To address these issues, we propose SVLA, a Shot-conditioned Vision-Language Adaptation framework, for effectively detecting harmful contents from online short videos. Our approach introduces a novel π-adaptive strategy to dynamically estimate shot-level anomaly density, replacing rigid selection with calibrated supervision. Furthermore, we employ a shot-conditioned temporal encoder to respect video hierarchy and adopt a dual-path contextual adapter to resolve semantic ambiguity. To benchmark this task, we construct a new dataset (SVA) covering more genuine online short videos that involve seven anomaly categories. Experiments on the SVA dataset demonstrate that SVLA can achieve the state-of-the-art performance and outperform its competitors across diverse scenarios. Codes and datasets are available at: https://github.com/xushuai7/IJCAI-SVLA.
    Computer VisionImage and video retrievalData MiningAnomaly/outlier detectionHumans and AIApplications
  201. #2156

    Amortized Multi-Objective Optimization Across Tasks with Generative Solution Modeling

    Tingyang Wei, Jiao Liu, Abhishek Gupta, Chin Chun Ooi, Puay Siew Tan, Yew-Soon Ong
    Many real-world applications require solving families of expensive multi-objective optimization problems~(EMOPs) under varying operational conditions. This can be formulated as parametric expensive multi-objective optimization problems (P-EMOPs) where each task parameter defines a distinct optimization instance. Current multi-objective Bayesian optimization methods have been widely used for finding finite sets of Pareto optimal solutions for each task. However, P-EMOPs present a fundamental challenge: the continuous task parameter space can contain infinite distinct problems, each requiring separate expensive evaluations. To address this, we propose learning an inverse model to amortize the multi-objective optimization cost across the continuous task-preference space, enabling direct solution prediction for any query without the need for expensive re-evaluation. This paper introduces a novel parametric multi-objective Bayesian optimizer that learns this inverse model by alternating between (1) generative solution sampling via conditional generative models and (2) acquisition-driven search leveraging inter-task synergies. This approach enables effective optimization across multiple tasks and finally achieves direct solution prediction for unseen parameterized EMOPs without re-evaluations. We theoretically justify the faster convergence by leveraging inter-task synergies through task-aware Gaussian processes. Based on that, empirical studies in synthetic and real-world benchmarks further verify the effectiveness of the proposed parametric optimizer.
    SearchEvolutionary computationSearchSearch and machine learningUncertainty in AISequential decision making
  202. #2160

    HyLoVQA: Dynamic Hypernetwork-Generated Low-Rank Adaptation for Continual Visual Question Answering

    Yiran Wang, Chenyi Xiong, Ziyue Qin, Miao Zhang, Kui Xiao, Zhifei Li
    Continual Visual Question Answering (VQA) requires learning from non-stationary streams of visual inputs and questions while preserving past knowledge. Most prior methods adapt by updating a largely shared parameter set. This often leads to cross-level task interference, hindering accurate adaptation to the current task and object. To address this limitation, we propose HyLoVQA. It maintains a drift-resilient memory bank of anchors. The bank stores the content of visual objects and textual tasks, and they are updated using current input features. Conditioned on retrieved anchors, a hypernetwork generates lightweight Low-Rank Adaptation (LoRA) adapters. This ensures parameter efficiency, allowing the model to adapt to each task and object dynamically. Additionally, we formulate an alignment loss that aligns semantic discrepancies in the feature space with functional changes in the parameter space, thereby constraining LoRA adapters to remain focused on the current task and object. Extensive experiments on VQA v2 and NExT-QA under both standard and compositional settings demonstrate the superiority of HyLoVQA over prior state-of-the-art methods. The code is available at https://github.com/HubuKG/HyLoVQA.
    Computer VisionImage and video retrievalComputer VisionVision, language and reasoning
  203. #2170

    How Well Do Large-Scale Chemical Language Models Transfer to Downstream Tasks?

    Tatsuya Sagawa, Ryosuke Kojima
    Chemical Language Models (CLMs) pre-trained on large-scale molecular data are widely used for molecular property prediction. However, the common belief that increasing training resources, such as model size, dataset size, and training compute, improves both pre-training loss and downstream task performance has not been systematically validated in the chemical domain. In this work, we evaluate this assumption by pre-training CLMs while scaling training resources and measuring transfer performance across diverse molecular property prediction (MPP) tasks. We find that while pre-training loss consistently decreases with increased training resources, downstream task performance shows limited improvement. Moreover, alternative metrics based on the Hessian or loss landscape also fail to estimate downstream performance in CLMs. We further identify conditions under which downstream performance saturates or degrades despite continued improvements in pre-training metrics, and analyze the underlying task-dependent failure modes through parameter space visualizations. These results expose a gap between pre-training-based evaluation and downstream performance, and emphasize the need for model selection and evaluation strategies that explicitly account for downstream task characteristics. The code is available at https://github.com/sagawatatsuya/MolScaleTransfer.
    Machine LearningApplicationsMachine LearningLearnware/model reuse/transfer learningNatural Language ProcessingLanguage models
  204. #2173

    MeteGS:Meteorology-Guided Gaussian Splatting for Scene Rendering and Recovery in Adverse Weather Conditions

    Sha Fan, Xinhua Shan, Mingyu Liang, Ningjie Bao, Wei Liu, Ying Fu
    3D Gaussian Splatting enables efficient, high-fidelity novel view synthesis with explicit Gaussians and differentiable rendering. However, adverse weather introduces rain streaks and droplets as well as volumetric scattering, producing view-dependent, spatially varying degradations that break the clean multi-view consistency assumption and lead to geometric drift and unstable appearance. Existing approaches are largely confined to the 2D image domain and seldom model the 3D degradation formation process, limiting controllable weather synthesis and consistent restoration in 3DGS. To our knowledge, we are the first to incorporate real meteorological observations as an external prior and propose a 3D-level unified rendering–restoration framework for joint deraining and dehazing within a single 3DGS pipeline. Precipitation is mapped to interpretable parameters controlling the density and morphology of rain Gaussians, while an atmospheric-scattering extinction strength drives 3D fog generation for reproducible weather modeling. We adopt a closed-loop dual-branch optimization where the rendering branch fits degraded observations to capture weather degradation patterns, and the restoration branch regularizes scene Gaussians toward a clean domain with multi-scale perceptual consistency, suppressing artifacts and improving cross-view detail fidelity. Experiments across diverse scenes and weather conditions consistently outperform strong baselines on all three metrics, with a particularly notable PSNR improvement of 1.78 dB on average.
    Computer Vision3D computer vision
  205. #2187

    Single-Round Clustered Federated Learning via Data Collaboration Analysis for Non-IID Data

    Sota Sugawara, Yuji Kawamata, Akihiro Toyoda, Tomoru Nakayama, Yukihiko Okada
    Federated Learning (FL) enables distributed learning across multiple clients without sharing raw data. When statistical heterogeneity across clients is severe, Clustered Federated Learning (CFL) can improve performance by grouping similar clients and training cluster-wise models. However, most CFL approaches rely on multiple communication rounds for cluster estimation and model updates, which limits their practicality under tight constraints on communication rounds. We propose Data Collaboration-based Clustered Federated Learning (DC-CFL), a single-round framework that completes both client clustering and cluster-wise learning, using only the information shared in DC analysis. DC-CFL quantifies inter-client similarity via total variation distance between label distributions, estimates clusters using hierarchical clustering, and performs cluster-wise learning via DC analysis. Experiments on multiple open datasets under representative non-IID conditions show that DC-CFL achieves accuracy comparable to multi-round baselines while requiring only one communication round. These results indicate that DC-CFL is a practical alternative for collaborative AI model development when multiple communication rounds are impractical. Our source code is publicly available at https://github.com/soutasuga/DC-CFL.
    Machine LearningFederated learningMachine LearningClusteringMachine LearningClassification
  206. #2191

    Temporal Smoothness Doubly Robust Learning for Debiased Knowledge Tracing

    Peilin Zhan, Wei Chen, Weilin Chen, Shuyi Pan, Ruichu Cai
    Knowledge Tracing (KT) is fundamental to intelligent education systems, yet relies on educational logs that are selectively observed. The non-random nature of exercise recommendations and student choices inevitably induces severe selection bias. Most existing KT methods neglect this issue, training on observed logs using standard empirical risk, which yields biased mastery estimates and accumulates errors in subsequent recommendations. To address this, we introduce a doubly robust (DR) formulation for KT that integrates a propensity model with an error imputation model, theoretically guaranteeing unbiasedness if either model is accurate. Beyond unbiasedness, in the sequential setting of KT, we identify that the estimator's performance is compromised by variance-dependent stochastic deviations that accumulate over time, thereby causing training instability and limiting performance. To mitigate this, we derive a generalization bound that explicitly characterizes the impact of estimator variance and identifies temporal smoothness as a key factor in controlling it. Building on these theoretical insights, we propose the Temporal Smoothness Doubly Robust (TSDR) framework. TSDR jointly optimizes the KT predictor and the imputation model with a smoothness regularizer, effectively reducing variance while preserving the unbiasedness guarantee of DR. Experiments on multiple real-world benchmarks demonstrate that TSDR consistently enhances various state-of-the-art KT backbones, underscoring the vital role of principled bias correction in KT.
    Humans and AIComputer-aided educationMachine LearningApplicationsMachine LearningLearning theoryMultidisciplinary Topics and ApplicationsEducation
  207. #2195

    Snap2Review: Vision-Grounded Retrieval and Pairwise Preference Alignment for Personalized Reviews Generation

    Honggyan Xu, Lei Zou, Dexiang Zhao, Xin Fan, Chuanwen Luo
    Personalized review generation is vital for e-commerce engagement, yet integrating user-provided images while maintaining distinct user personas remains a critical challenge. Existing methods often struggle to bridge the semantic gap between objective visual signals and subjective linguistic patterns, frequently resulting in hallucinations or homogenized, impersonal content. To address these limitations, we propose Snap2Review, a novel framework that harmonizes cross-modal retrieval with fine-grained preference learning. First, we optimize a cross-modal retriever that maps visual features directly to semantically related historical reviews, using these interactions as precise anchors to strictly ground the generation. Second, we introduce Tri-DPO, which employs a stratified negative sampling strategy with varying difficulty levels, ranging from obvious errors to subtle stylistic mismatches, to force the model to discern fine-grained user preferences. By distinguishing these fine-grained nuances, our model moves beyond generic praise to accurately mimic the user's authentic writing style. Extensive experiments on real-world datasets demonstrate that Snap2Review significantly outperforms strong baselines in both relevance and personalization metrics.
    Data MiningInformation retrievalData MiningRecommender systemsKnowledge Representation and ReasoningPreference modelling and preference-based reasoningMachine LearningMulti-modal learningNatural Language ProcessingLanguage generation
  208. #2199

    Joint Medical Image Enhancement and Segmentation with Diffusion-based Symbiotic Information Interaction

    Ying Chen, Jinyue Li, Qiankun Li
    Image quality is critical for accurate medical diagnosis. However, MRI, CT, and ultrasound images are often of low resolution and quality due to cost constraints, complicating the visualization of key anatomical structures and lesions. While such limitations are common in practice, traditional methods treat image enhancement as a separate preprocessing step, failing to fully leverage its potential synergy with image segmentation. To address this, we propose DiSIINet (Diffusion-based Symbiotic Information Interaction Network), which is built on the principle that enhancement and segmentation should mutually reinforce each other in a unified model. Based on Denoising Diffusion Implicit Models (DDIM), DiSIINet integrates an enhancement branch and a segmentation branch. These branches interact through a novel Symbiotic Information Interaction (SII) module, which facilitates dynamic, feature-level information exchange via cross-attention during the reverse diffusion process. This design enables both tasks to iteratively improve each other. The DDIM backbone ensures high-quality output and efficient inference through deterministic sampling. Experiments on multi-modal medical datasets (MRI, CT, ultrasound) show that DiSIINet achieves significant performance improvements compared to sequential or independent enhancement and segmentation approaches. The code is available at: https://github.com/Reconsider80/DiSIINet.
    Computer VisionBiomedical image analysis
  209. #2212

    Dual-Topology Learning with Adaptive Anchors for Multi-View Clustering

    Chenglong Zhang, Chao Zhang, Junhao Zhang, Junyi Guan, Xianzhong Zhou, Bo Wang, Huaxiong Li
    As a prominent paradigm for large-scale unsupervised learning, anchor-based multi-view clustering aims to reveal the latent structures across heterogeneous data representations with high efficiency. Despite achieving some progress, existing methods typically suffer from the following two limitations. Firstly, they rely on pre-constructed anchors or rigid constraints (e.g., orthogonality), whereas the intrinsic topological correlations among anchors are completely discarded. Secondly, most existing methods either perform clustering directly on the bipartite graph or treat different views equally, thereby failing to capture the global structural information. To this end, the Dual-tOpology learning with adapTive Anchors (DOTA) is proposed, which not only learns the sample-anchor relationship (i.e., bipartite graph) but also preserves the topology structure among anchors, significantly enhancing the discriminability of learned representation while preserving the underlying data manifold. By integrating an adaptive view-weighting strategy to balance view contributions, DOTA derives discriminative global sample embeddings by propagating spectral information through the bipartite graph. Extensive experiments on benchmark datasets demonstrate the superiority of DOTA.
    Machine LearningClusteringMachine LearningMulti-view learningMachine LearningUnsupervised learning
  210. #2239

    World4V2X: A Consistency-driven World Model for Robust V2X Cooperative Perception

    Rui Wang, Shuai Wang, Xiangyi Qin, Ze Yu, Xiaojun Tan
    Cooperative perception allows agents to extend perceptual capabilities through inter-agent communication. However, most existing methods still adopt a frame-wise paradigm, which limits the exploitation of spatio-temporal consistency in dynamic scenes. Therefore, when observations are partially occluded or degraded, these methods are unable to leverage historical context for compensation, leading to unstable perception and reduced detection accuracy. To address these challenges, we propose World4V2X, the first world model framework tailored for V2X cooperative perception. The proposed method first introduces a spatial observability modeling module that defines spatial consistency boundaries to distinguish reliable regions from uncertain ones, thereby enabling spatial consistency modeling over multi-agent heterogeneous observations. Building upon this, we construct a consistency-guided world modeling module. Specifically, a temporal consistency evolution mechanism leverages features from historical and current frames to model the state evolution of the scene, capturing environmental dynamics and producing temporal consistency beliefs. Meanwhile, a consistency-guided deterministic reconstruction mechanism exploits spatial boundaries and temporal beliefs to perform diffusion-based refinement for robust cooperative perception. Extensive experiments on the OPV2V and V2XSet datasets demonstrate that World4V2X achieves state-of-the-art perception performance across diverse V2X scenarios.
    Computer VisionRecognition (object detection, categorization)Multidisciplinary Topics and ApplicationsTransportationRoboticsPerception
  211. #2240

    FreSH: Frequency-Segmented Hierarchical Multi-Expert Framework for Multivariate Time Series Classification

    Pingping Liu, Muyao Wang, Zijian Zhang, Tongshun Zhang, Hao Miao, Guorui Xie, Qingliang Li, Qiuzhan Zhou
    Multivariate Time Series Classification (MTSC) demands models that can effectively capture complex temporal patterns across multiple scales while remaining computationally efficient. However, existing approaches generally struggle to reconcile fine-grained representation learning, especially under class imbalance and real-world constraints. In this paper, we present FreSH, a Frequency-Segmented Hierarchical Multi-Expert Framework designed to address these challenges. FreSH introduces a new perspective for MTSC by enabling adaptive, multi-scale analysis of temporal signals, allowing different aspects of the data to be modeled in a complementary and coordinated manner. By combining localized specialization with holistic context modeling, FreSH achieves strong representational capacity without incurring excessive computational overhead. An adaptive fusion strategy further enhances flexibility, enabling the model to dynamically emphasize the most informative components of the input. In addition, we incorporate a more robust optimization objective that improves learning stability across varying sample difficulties and class distributions. Extensive evaluations on 30 UEA benchmark datasets and real-world vibration data demonstrate that FreSH consistently outperforms state-of-the-art methods in classification accuracy, while substantially reducing model size and efficiency. The implementation code is publicly available at https://github.com/Wangmy2120/FreSH00.
    Data MiningMining spatial and/or temporal dataMachine LearningClassification
  212. #2248

    Dual-Process Distribution Calibration: Bridging Slow-Fast Thinking for Few-Shot Learning

    Yuchen Liu, Weining Weng, Lingxing Chen, Qianzhong Chen, Shiyang Li, Yuan Ma, Yang Gu
    Artificial intelligence models typically perform well on large-scale datasets, yet their effectiveness tends to degrade in real-world scenarios with scarce data, such as medical diagnostics. In contrast, humans can learn and reason effectively from few examples. Even when novel objects differ significantly from prior observations, humans can quickly infer category distributions accurately. Inspired by this, we explore how to leverage human-inspired cognitive mechanisms to improve the distribution calibration in few-shot learning models. Based on the ``fast-slow thinking'' dual-process theory, we propose a novel cognition-inspired few-shot learning framework. It mimics the human cognition when handling novel information: it first rapidly screens relevant knowledge through ``fast thinking'' and then infers inter-class relationships and calibrates distributions via ``slow thinking''. Specifically, the fast-thinking stage employs a gating mechanism to quickly match input samples with known base-class prototypes, activating relevant candidate knowledge. In the slow-thinking stage, the model aggregates inter-class edge embeddings into a summary relation graph and then applies divergent Gaussian sampling to generate multiple relation graphs representing different association strengths, thereby achieving distribution calibration for few-shot classes. Experiments on public few-shot benchmarks and medical image datasets show competitive performance. Visualization analyses further reveal that our framework exhibits meaningful interpretability.
    Humans and AICognitive modelingHumans and AIBrain sciencesMachine LearningFew-shot learningMachine LearningVariational Inference
  213. #2258

    SCD-MVC: Stable Conditional Diffusion for Multi-view Clustering

    Jinli Ma, Chenkai Guo, Renda Han, Renxiang Guan, Siwei Wang, Ke Liang, Xiaoyu Cui, Dayu Hu
    Multi-view clustering has garnered significant attention for its ability to integrate heterogeneous data and uncover underlying categorical structures. However, prevailing autoencoder-based methods often yield indiscriminative embeddings, rendering them prone to trivial solutions. While diffusion models have shown promise in modeling complex data distributions, their generative processes remain inherently unstable. More critically, the resulting latent representations often lack sufficient intra-class compactness and inter-class separability, leading to suboptimal clustering performance. In this paper, we propose SCD-MVC, a Stable Conditional Diffusion-driven framework for Multi-View Clustering. Specifically, it integrates conditional diffusion generation into the latent encoder to generate a target view conditioned on the remaining views, thereby facilitating robust joint distribution modeling. To align the generative process with clustering objectives, we introduce a clustering-guided semantic alignment module. This module regulates the diffusion path to enforce the learning of representations that are both view-consistent and discriminative. Furthermore, we introduce a temporal consistency regularization that enforces alignment between representations of adjacent diffusion steps. This stabilizes training and guides the optimization toward clustering-friendly representations. Extensive experiments on nine benchmark datasets demonstrate that SCD-MVC consistently outperforms eight state-of-the-art methods, achieving performance gains ranging from 6.68% to 13.09%. The source code is available at https://github.com/Knighttt0011/SCD-MVC.
    Machine LearningClustering
  214. #2265

    Beyond Uniform Updates: Drift Pattern Aware Online Time Series Forecasting Under Delayed Feedback

    Xingwang Li, Fei Teng, Cong Zhou, Qiang Duan
    Online time series forecasting relies on continual updates to cope with concept drift. In multi-step forecasting, however, the ground truth for an H-step prediction arrives only after H steps, so a delayed residual entangles persistent drifts with transient shocks and seasonal fluctuations. Existing methods typically apply a uniform update rule to all delayed errors, which can overreact to noise and under-adapt to real drift. We view delayed residuals as compressed observations of latent drift over the horizon, and propose PADRE, a drift evidence driven framework that converts each delayed feedback event into a context-sensitive adaptation decision. PADRE builds a Tri-View drift representation from (i) the current input context, (ii) recent delayed-residual trajectories, and (iii) retrieved typical types of drift, and quantifies their agreement via a geometric consistency measure. To exploit recurring drift structure, PADRE mines an offline Pattern Bank of regime-conditioned residual prototypes and retrieves a soft pattern prompt online. A lightweight prompt guided policy outputs a continuous update gate that scales the backbone’s gradient step, enabling decisive updates under coherent evidence and conservative updates otherwise. Experiments on five real-world benchmarks and three backbones demonstrate consistent gains under delayed feedback, reducing MSE by 6%–15% relative to recent baselines and improving update stability across horizons.
    Machine LearningMulti-view learningMachine LearningTime series and data streams
  215. #2287

    DiSGMM: A Method for Time-varying Microscopic Weight Completion on Road Networks

    Yan Lin, Jilin Hu, Shengnan Guo, Christian S. Jensen, Youfang Lin, Huaiyu Wan
    Microscopic road-network weights represent fine-grained, time-varying traffic conditions obtained from individual vehicles. An example is travel speeds associated with road segments as vehicles traverse them. These weights support tasks including traffic microsimulation and vehicle routing with reliability guarantees.
    We study the problem of time-varying microscopic weight completion. During a time slot, the available weights typically cover only some road segments. Weight completion recovers distributions for the weights of every road segment at the current time slot.
    This problem involves two challenges: (i) contending with two layers of sparsity, where weights are missing at both the network layer (many road segments lack weights) and the segment layer (a segment may have insufficient weights to enable accurate distribution estimation); and (ii) achieving a weight distribution representation that is closed-form and can capture complex conditions flexibly, including heavy tails and multiple clusters.

    To address these challenges, we propose DiSGMM that combines sparsity-aware embeddings with spatiotemporal modeling to leverage sparse known weights alongside learned segment properties and long-range correlations for distribution estimation. DiSGMM represents distributions of microscopic weights as learnable Gaussian mixture models, providing closed-form distributions capable of capturing complex conditions flexibly. Experiments on two real-world datasets show that DiSGMM can outperform state-of-the-art methods.
    Data MiningMining spatial and/or temporal dataMultidisciplinary Topics and ApplicationsTransportation
  216. #2307

    PILO: Principal Component-based Implicit Regularization with Low-rank Optimization for Robust Transfer Learning

    Shuaihe Liu, Qiugang Zhan, Guisong Liu, Tai-Xiang Jiang
    Adapting large, adversarially pre-trained models to specialized domains via transfer learning is a promising path toward building secure AI systems. However, a critical challenge arises when fine-tuning on limited downstream data: models often suffer from catastrophic forgetting of robustness, where the generalizable robust features from pre-training are erased, leading to severe robust overfitting. To address this, we first establish the underlying principle that adversarial vulnerability is not diffuse, but is concentrated in a low-dimensional subspace defined by the principal components of the model's weight matrices. We then introduce Principal component-based Implicit regularization with Low-rank Optimization (PILO), a novel parameter-efficient framework that operationalizes this insight. PILO employs a principled decoupling of learning objectives: a spectrally-guided adversarial branch, initialized via Singular Value Decomposition (SVD), performs a surgical update on the identified vulnerable subspace to enhance robustness. This is complemented by a transient, standard low-rank branch that gently adapts the model to natural data, preserving clean accuracy. This dual-branch design acts as a form of implicit spectral regularization, anchoring the model to its robust pre-trained state. Extensive experiments show that PILO significantly outperforms state-of-the-art full-parameter and parameter-efficient methods in robust accuracy across multiple benchmarks, all while reducing trainable parameters by up to 71%. Our work thus establishes a new, more effective paradigm for robust transfer learning through principled and targeted parameter optimization.
    Computer VisionAdversarial learning, adversarial attack and defense methodsMachine LearningAdversarial machine learning
  217. #2327

    DSSG: Dual-Stream Semantic Guidance for Source-Fully-Free Adaptation of Vision-Language Models

    Weiwei Xiang, Shun Peng, Guangyi Xiao, Hao Chen, Lei Yang
    Source-Fully-Free Domain Adaptation (SFF-DA) has emerged as a strategic paradigm to adapt Vision-Language Models (VLMs) without any access to source data or task-specific source models. However, we identify a critical "Dual Semantic Drift" that hinders this process: static drift caused by the stagnation of frozen hand-crafted templates, and dynamic drift resulting from noisy, instance-level generated captions. These issues lead to a severe stability-plasticity dilemma, manifesting as semantic misalignment and class collapse.
    To address this, we propose DSSG (Dual-Stream Semantic Guidance), an end-to-end framework that reconciles fine-grained plasticity with global stability. Our core contribution is the Dual Semantic Guidance (DSG) module, which integrates a captioning stream for detailed domain nuances with a template stream to anchor global categorical consistency. Furthermore, a Dynamic Cross-Modal Knowledge Distillation (CMKD) module is introduced to adaptively optimize teacher-student alignment across modalities. Extensive experiments demonstrate that DSSG significantly outperforms current state-of-the-art methods across multiple benchmarks, providing a robust solution for the unsupervised transfer of foundation models. The code is available on https://github.com/mrmenand/DSSG.
    Computer VisionMultimodal learningComputer VisionTransfer, low-shot, semi- and un- supervised learning
  218. #2328

    A Parallel Framework for the Maximum Common Induced Subgraph Problem

    Jieyu Wu, Quan Zhang, Yiyuan Wang, Shiwei Pan, Jian Gao
    Finding the maximum common induced subgraph (MCIS) between two graphs is a well-known NP-hard problem. While sequential MCIS algorithms have been extensively studied, parallel computing has emerged as an important direction for further performance enhancement with the advancement of computing resources. In this paper, we propose a parallel MCIS framework integrating a dynamic task decomposition method guided by search information and a novel pruning strategy based on shared information. The experimental results demonstrate that the algorithm enhanced by our framework achieves superior performance over both state-of-the-art sequential and existing parallel algorithms. Extensive results further demonstrate the wide scalability and generality of our framework, and the effectiveness of our strategies.
    SearchCombinatorial search and optimisationSearchHeuristic search
  219. #2342

    Computing Coverage-Based Prime Implicant Explanations for Tree-Based Models

    Gilles Audemard, Sylvie Coste-Marquis, Pierre Marquis, Mehdi Sabiri, Nicolas Szczepanski
    Coverage-based prime implicant explanations are formal explanations offering a number of valuable assets, especially in terms of faithfulness and generality. Unfortunately, deriving a coverage-based prime implicant explanation for an instance is computationally hard in the general case (the problem of identifying such an explanation being at the second level of the polynomial hierarchy).
    In this paper, we focus on the computation of a coverage-based prime implicant explanation for an instance given a tree-based model. We show that the specific nature of the domain theory linking the Boolean conditions in such models makes the problem computationally easier. We present a greedy algorithm to derive coverage-based prime implicant explanations when dealing with a tree-based model. We also present an empirical evaluation showing that this algorithm is efficient enough to be used in many practical cases.
    Machine LearningClassificationMachine LearningExplainable/Interpretable machine learning
  220. #2356

    Compact Modeling in Constraint Programming with Hybrid Tables

    Christophe Lecoutre, Mouny Samy Modeliar, Gilles Audemard, Nicolas Paris, Nicolas Szczepanski
    Hybrid tables, also referred to as `smart' in the literature, represent a valuable modeling technique within Constraint Programming (CP). These tables allow us to handle disjunctive cases (constraining expressions) in a compact and structured way, by authorizing tuples (table entries) to contain simple unary and binary arithmetic restrictions (similar to internal constraints).
    In this paper, we show the practical interest of using hybrid tables for planning-like combinatorial puzzles, when the transition from one state to the next can be encoded by a single hybrid table.
    Experimental results show that these hybrid models exhibit greater compactness and efficiency in solving compared to their conventional counterparts.
    Constraint Satisfaction and OptimizationConstraint optimization problemsConstraint Satisfaction and OptimizationConstraint programmingConstraint Satisfaction and OptimizationConstraint satisfactionConstraint Satisfaction and OptimizationModeling
  221. #2358

    Learning to Route LLMs from Implicit Cost-Performance Preferences via Meta-Learning

    Jiahao Zeng, Ming Tang, Ningning Ding
    Large language models (LLMs) present a trade-off between performance and cost, where more powerful models incur greater expense. LLM routing aims to mitigate expenses while maintaining performance by sending queries to the most suitable model. However, existing methods cannot perform well for different user cost-performance preferences. To address this gap, we introduce a novel perceptive LLM routing paradigm for personalized and user-centric cost-performance optimization, which efficiently learns users' implicit preferences through little interaction. To handle the challenge of heterogeneous user needs, we formulate preference profiles as a set of distinct tasks in contextual bandit and propose MetaRouter, a meta-learning framework designed for preference-aware LLM routing. Experimental results show that MetaRouter outperforms strong baselines on both in-distribution and out-of-distribution tasks. Furthermore, it exhibits high efficiency in learning user preferences, robustness to changes in the routable LLMs, and scalability to multi-model routing.
    Natural Language ProcessingApplicationsNatural Language ProcessingLanguage models
  222. #2392

    Absorbing Gradient Conflicts: Modeling Semantic Variance via Kent Distributions for Cross-Modal Hashing

    Hengjie Zhu, Dayan Wu, Zihao Zhang, Xinze Liu, Jingxuan Yu, Peng Fu, Zheng Lin, Weiping Wang
    Supervised proxy-based deep cross-modal hashing has become the dominant paradigm for large-scale retrieval. However, prevalent methods model class proxies as deterministic points in the embedding space. This rigid assumption causes severe gradient conflicts in multi-label scenarios, where gradient conflicts arising from label co-occurrence lead to severe gradient contention and optimization collapse.To resolve this, we propose Kent-based Distributional Proxy Hashing (KDPH), a novel framework that shifts proxy representation from static points to flexible anisotropic Kent distributions on the hypersphere. Unlike point proxies that must shift their positions to accommodate conflicting gradients, KDPH absorbs these conflicts by dynamically adjusting its directional variance. This allows the proxy to maintain a stable semantic mean direction while stretching to cover diverse label correlations. Furthermore, to ensure stable training of these geometric parameters, we derive a tailored loss function incorporating the Cayley transform to enforce strict orthogonality. To the best of our knowledge, KDPH is the first framework to successfully introduce the Kent distributions into cross-modal hashing.Experiments on three benchmark datasets demonstrate that KDPH mitigates proxy collapse and chaotic oscillation, significantly outperforms state-of-the-art methods. Code is available at https://github.com/Senmo996/KDPH-official-code.
    Computer VisionImage and video retrievalComputer VisionMultimodal learningComputer VisionRepresentation learning
  223. #2421

    Label Distribution Imputation and Bias-Corrective Representation Learning for Incomplete and Imbalanced Label Distribution

    Xiangcheng Sun, Miaogen Ling, Han Qin, Guohua Lv, Yongbiao Gao
    Label Distribution Learning (LDL) represents each instance with a label distribution, but these distributions are often incomplete in practice due to high annotation costs and annotators’ cognitive burden. Recent Incomplete Label Distribution Learning (InLDL) methods leverage global and local correlations to impute missing values, imputation errors remain systematic and non-uniform across labels. However, in the more challenging scenario of imbalanced label distributions, such non-uniform errors disproportionately harm low-degree labels, skewing learning toward dominant labels. To address the more complex challenge, we propose Label Distribution Imputation and Bias-Corrective Representations (LDIBR) for incomplete and imbalanced label distribution learning. LDIBR performs instance-adaptive imputation conditioned on instance features and the binary observation mask. It learns a prior, a reliability-gated correction, and entry-wise fusion weights to produce a normalized imputed distribution. Additionally, LDIBR decomposes the imputed distributions into a shared low-rank core and an instance-specific residual. The core captures stable global degree patterns, while the residual isolates sample-specific deviations and imputation noise. Comprehensive experiments across multiple benchmark datasets with a 50% missing ratio validate both the effectiveness and robustness of the LDIBR approach. The code is available at https://github.com/wenhuihji/LDCBR.
    Machine LearningMulti-label learning
  224. #2441

    An Information-theoretic Propagation Denoising and Fusion Framework for Fake News Detection

    Mengyang Chen, Lingwei Wei, Wei Zhou, Songlin Hu
    Incomplete propagation data significantly hinders robust fake news detection. Recent approaches leverage large language models to simulate missing user interactions via role-playing, thereby enriching propagation with synthetic signals. However, such propagation data is intrinsically unreliable, and directly fusing it can lead to biased representations, leading to limited detection performance. In this paper, we alleviate the unreliability of synthetic propagation from the mutual information perspective and propose a novel information-theoretic propagation denoising and fusion (InfoPDF) framework to learn effective representations from both real and synthetic propagation. Specifically, we first generate attribute-specific synthetic propagation using large language models. Then we model each synthetic propagation graph as a probabilistic latent distribution to guide reliability-aware adaptive fusion with real propagation. During training, we design a mutual information-based objective to learn compressed and task-sufficient propagation representations. It jointly suppresses noisy signals across attribute-specific synthetic propagation, maintains consistency between real and synthetic propagation representations, and ensures task sufficiency for fake news detection and attribute prediction. Experiments on three real-world datasets show that InfoPDF consistently achieves superior performance across various fake news detection tasks. Further analysis demonstrates that InfoPDF can estimate attribute-level reliabilities and learn more discriminative propagation representations.
    Data MiningAnomaly/outlier detectionData MiningApplicationsData MiningMining text, web, social mediaData MiningNetworksNatural Language ProcessingApplications
  225. #2444

    Mitigating Backdoors via Decoy Shortcuts and Knowledge Decoupling

    Zixuan Zhu, Rui Wang, Lihua Jing, Jinwen Zhong
    Backdoor attacks pose a serious threat to deep neural networks, especially when training relies on third-party data, allowing adversaries to inject malicious behaviors through data poisoning. In this work, we reveal that backdoor behaviors tend to be absorbed by a simpler parallel branch when jointly trained with the main network. Motivated by this insight, we propose Trapping and Removing (TR), a simple yet effective training-time defense that introduces a lightweight shortcut branch as a “honeypot” to trap backdoor knowledge. After training, backdoors can be removed by discarding the shortcut, without requiring any additional data. To further enhance backdoor isolation while maintaining benign performance, we design a knowledge decoupling strategy with entropy-based weight assignment, encouraging poisoned samples to flow through the honeypot while guiding the main network to focus on benign learning. In addition, we introduce an automatic shortcut generation strategy to improve generalization across model architectures. Extensive experiments on four benchmark datasets and five model architectures demonstrate that our approach effectively mitigates a wide range of backdoor attacks while preserving performance on benign data. Code: github.com/Zixuan-Zhu/TR.
    Computer VisionAdversarial learning, adversarial attack and defense methodsComputer VisionRecognition (object detection, categorization)Multidisciplinary Topics and ApplicationsSecurity and privacy
  226. #2446

    Learning to Solve and Optimize by Evolving Code

    Veronika Semmelrock, Benedetta Strizzolo, Francesco Zuccato, Gerhard Friedrich, Patrick Rodler, Konstantin Schekotihin
    Combinatorial and optimization problems are fundamental to many industrial AI applications. Solving large-scale real-world instances of such problems typically requires careful problem formalization, specialized solvers, and expert-designed heuristics.
    Thus, experts need to specify not only *what* solutions are, but also *how* they are derived.

    By introducing the tool CheckMate, we show that algorithm generation via code evolution represents a paradigm shift by eliminating the need to formulate the *how*. CheckMate solely relies on the *what*. Specifically, a formal specification ensures solutions' correctness and enables systematic performance evaluation of the generated programs, while a natural language description guides the evolutionary process.

    The effectiveness of our method is demonstrated on selected problems from two industrial domains: configuration and scheduling. In all cases, the evolved algorithms consistently outperform state-of-the-art solvers. This underscores the potential of formal methods in guiding code evolution for automatically solving complex real-world problems.
    Knowledge Representation and ReasoningApplicationsKnowledge Representation and ReasoningLearning and reasoningKnowledge Representation and ReasoningLogic programmingSearchSearch and machine learning
  227. #2453

    From Compression to Construction: Pseudo Neighbor Augmentation Sampling for Dynamic Link Prediction

    Zhigang Yu, Hao Yan, Changjun Fan, Senzhang Wang
    Dynamic link prediction aims to predict whether two nodes will interact at a future time point in a dynamic graph based on their historical interactions. Existing sampling based methods, which can be considered as a compressor, generally select a subset of one-hop neighbors from the entire interaction sequence for efficiency. It inevitably makes them lose rich context such as high order structural information and temporal patterns, and often underperform on nodes with sparse historical interaction with other nodes. To this aim, we propose a novel context construction based approach named PeNS that adopts Pseudo Neighbor Augmentation Sampling Strategy for more accurate dynamic link prediction. Specifically, PeNS tries to augment neighbor sequences with three complementary types of pseudo neighbors, injecting structural and temporal cues to provide rich context information especially in sparse neighbor scenarios. We further introduce a Streaming Neighbor Memory Module for efficient pseudo neighbor construction and a dual-stream fusion mechanism for robust representation learning. Extensive experiments demonstrate the effectiveness and generalizability of PeNS.
    Machine LearningSequence and graph learningData MiningMining graphsMachine LearningRepresentation learning
  228. #2455

    LURE: Latent Space Unblocking for Multi-Concept Reawakening in Diffusion Models

    Mengyu Sun, Ziyuan Yang, Andrew Beng Jin Teoh, Junxu Liu, Haibo Hu, Yi Zhang
    Concept erasure aims to suppress sensitive content in diffusion models, but recent studies show that erased concepts can still be reawakened, revealing vulnerabilities in erasure methods. Existing reawakening methods mainly rely on prompt-level optimization to manipulate sampling trajectories, neglecting other generative factors, which limits a comprehensive understanding of the underlying dynamics. In this paper, we model the generation process as an implicit function to enable a comprehensive theoretical analysis of multiple factors, including textual conditions, model parameters, and latent states. We theoretically show that perturbing each factor can reawaken erased concepts. Building on this insight, we propose a novel concept reawakening method: Latent space Unblocking for concept REawakening (LURE), which reawakens erased concepts by reconstructing the latent space and guiding the sampling trajectory. Specifically, our semantic re-binding mechanism reconstructs the latent space by aligning denoising predictions with target distributions to reestablish severed text-visual associations. However, in multi-concept scenarios, naive reconstruction can cause gradient conflicts and feature entanglement. To address this, we introduce Gradient Field Orthogonalization, which enforces feature orthogonality to prevent mutual interference. Additionally, our Latent Semantic Identification-Guided Sampling (LSIS) ensures stability of the reawakening process via posterior density verification. Extensive experiments demonstrate that LURE enables simultaneous, high-fidelity reawakening of multiple erased concepts across diverse erasure tasks and methods.
    AI Ethics, Trust, FairnesSafety and robustnessComputer VisionImage and video synthesis and generation
  229. #2460

    VGA-BenchV2: An Expanded Unified Benchmark and Multi-Model Framework for Evaluating Video Aesthetics and Generation Quality

    Longteng Jiang, Dandan Zheng, Qianqian Qiao, heng huang, Huaye Wang, Yihang Bo, Bao Peng, Jingdong Chen, Jun Zhou, Xin Jin
    The rapid advancement of AIGC video generation calls for evaluation frameworks that move beyond technical fidelity and incorporate human-centered aesthetic assessment. Existing benchmarks often overlook fine-grained perceptual qualities such as visual aesthetics, artistic style, and human preference. To address this limitation, we introduce VGA-BenchV2, an extended human-aligned benchmark and optimization framework for jointly evaluating and improving video generation quality and aesthetic value.

    Built upon VGA-Bench, VGA-BenchV2 preserves the original fine-grained taxonomy with two primary dimensions—Aesthetic and Generation—and 52 sub-dimensions. Guided by this taxonomy, we curate 1,016 diverse prompts and collect over 60,000 videos generated by 12 mainstream video generation models. More importantly, VGA-BenchV2 substantially expands human-labeled supervision by adding 36,000 task-level annotations, including 16,200 for aesthetic quality, 13,200 for aesthetic tagging, and 6,600 for generation quality, corresponding to 13.46×, 11.15×, and 1.55× scale-ups over VGA-Bench, respectively.

    Leveraging this enlarged annotation corpus, we develop a hybrid evaluator architecture consisting of VAQA-Net for continuous aesthetic scoring and two Qwen-based Large Vision-Language Model evaluators, VTag-Net and VGQA-Net, for aesthetic tagging and generation quality assessment. Extensive experiments demonstrate strong alignment with human judgments across diverse generation models. Beyond evaluation, VGA-BenchV2 further introduces an evaluation-to-optimization pipeline, where the learned aesthetic evaluator serves as a reward model for reinforcement learning-based generator fine-tuning. This closes the loop from benchmark construction and human supervision to automated evaluation and model optimization, enabling video generators to improve not only in realism but also in aesthetic quality and human preference alignment. Resources are available at https://huggingface.co/datasets/BestiVictoryLab/VGA-Bench.
    Computer VisionVideo analysis and understandingAIComputer Vision
  230. #2468

    StreamTimer: Efficient Inference for Long-Context Time Series Transformers

    Xiyu Meng, Yuhan Wu, Canran Xiao, Yabo Dong, Duanqing Xu
    Time series forecasting (TSF) plays a vital role across various domains such as finance, energy, healthcare, and meteorology. Currently, most deep learning based TSF methods typically operate with a fixed lookback window. This approach comes from the high compute and memory costs of long contexts, as well as the standard practice of using sliding windows. This creates a trade-off. Making the window larger reduces the number of training samples, which can harm stability and generalization. However, keeping the window small prevents the model from using long history during inference. We propose an inference-only streaming autoregressive framework that replaces repeated full-context recomputation with a one-time context warmup and incremental decoding, enabling efficient long-history forecasting without retraining. While straightforward caching attentions is brittle for time series due to distribution shifts and noisy or redundant histories, we address these issues with cache-consistent normalization and selective memory under a fixed cache budget. Across diverse benchmarks, our approach substantially reduces inference latency with no or marginal accuracy loss, and often improves performance when longer lookbacks are beneficial.
    Data MiningMining spatial and/or temporal dataMachine LearningTime series and data streams
  231. #2479

    Manifold-Aware Point Cloud Completion via Geodesic-Attentive Hierarchical Feature Learning

    Jianan Sun, Dongzhihan Wang, Zhangqi Huang, Mingyu Fan
    Point cloud completion seeks to recover geometrically consistent shapes from partial or sparse 3D observations. Although recent methods have achieved reasonable global shape reconstruction, they often rely on Euclidean proximity and overlook the intrinsic nonlinear geometric structure of point clouds, resulting in suboptimal geometric consistency and semantic ambiguity. In this paper, we present a manifold-aware point cloud completion framework that explicitly incorporates nonlinear geometry information throughout the feature learning pipeline. Our approach introduces two key modules: a Geodesic Distance Approximator (GDA), which estimates geodesic distances between points to capture the latent manifold topology, and a Manifold-Aware Feature Extractor (MAFE), which utilizes geodesic-based k-NN groupings and a geodesic-relational attention mechanism to guide the hierarchical feature extraction process. By integrating geodesic-aware relational attention, our method promotes semantic coherence and structural fidelity in the reconstructed point clouds. Extensive experiments on benchmark datasets demonstrate that our approach consistently outperforms state-of-the-art methods in reconstruction quality. The code has been available online https://anonymous.4open.science/r/Manifold-Aware-Point-Cloud-Completion--F380/README.md.
    Computer Vision3D computer visionComputer VisionRepresentation learningComputer VisionSegmentation, grouping and shape analysisRoboticsPerception
  232. #2511

    Learning Minimally Rigid Graphs with High Realization Counts

    Oleksandr Slyvka, Jan Rubeš, Rodrigo Alves, Jan Legerský
    For minimally rigid graphs, the same edge-length data can admit multiple realizations (up to translations and rotations). Finding graphs with exceptionally many realizations is an extremal problem in rigidity theory, but exhaustive search quickly becomes infeasible due to the super-exponential growth of the number of candidate graphs and the high cost of realization-count evaluation. We propose a reinforcement-learning approach that constructs minimally rigid graphs via 0- and 1-extensions, also known as Henneberg moves. We optimize realization-count invariants using the Deep Cross-Entropy Method with a policy parameterized by a Graph Isomorphism Network encoder and a permutation-equivariant extension-level action head. Empirically, our method matches the known optima for planar realization counts and improves the best known bounds for spherical realization counts, yielding new record graphs.
    Machine LearningReinforcement learningMachine LearningSequence and graph learningMachine LearningEvolutionary learning
  233. #2522

    Cycles in Liquid Democracy: A Game-Theoretic Justification

    Markus Brill, Rachael Colley, Anne-Marie George, Grzegorz Lisowski, Georgios Papasotiropoulos, Ulrike Schmidt-Kraepelin
    A common criticism of liquid democracy within the relevant academic literature is that delegation cycles can occur, seemingly resulting in unused voting power. Yet, practitioners argue that delegation cycles are not only unproblematic but are even formed intentionally by participants. To bring theory closer to reality, we introduce a model that captures this strategic behavior under uncertainty. We study the existence, structure and quality of Nash equilibria, revealing that delegation cycles naturally emerge. To complement these findings, we perform computational experiments using best-response dynamics.
    Game Theory and Economic ParadigmsComputational social choiceGame Theory and Economic ParadigmsNoncooperative games
  234. #2531

    Synthesis and Verification of Transformer Programs

    Hongjian Jiang, Matthew Hague, Philipp Rümmer, Anthony W. Lin
    C-RASP is a simple programming language that was recently shown to capture
    concepts expressible by transformers.
    In this paper, we develop new algorithmic techniques for automatically
    verifying C-RASPs.
    To this end, we establish a connection to the
    verification of synchronous dataflow programs in Lustre, which enables us to
    exploit state-of-the-art model checkers utilizing highly optimized
    SMT-solvers. Our second contribution addresses learning a
    C-RASP program in the first place. To this end, we provide a new algorithm
    for learning a C-RASP from examples using local search.
    We demonstrate efficacy of our implementation for benchmarks of C-RASPs in the
    literature, in particular in connection to the following applications:
    (1) transformer program optimization, and (2)
    constrained learning of transformer programs (based on a partial specification).
    Knowledge Representation and ReasoningAutomated reasoning and theorem provingKnowledge Representation and ReasoningKnowledge representation languagesKnowledge Representation and ReasoningLearning and reasoningMachine LearningAttention models
  235. #2546

    Unintended Consequences: Updating Causal Models

    Joseph Y. Halpern, Evan Piermont, Marie-Louise Vierø
    We examine how causal beliefs affect an agent's choices and how feedback on those choices leads to updated causal beliefs. Building on the structural-equations framework for modeling causality, we first examine the general problem of updating causal beliefs in the face of novel (and possibly inexplicable) data. We model an agent who is uncertain of the true causal model, and therefore entertains a probabilistic belief over the set of possible models. We then consider how causal beliefs influence choices by building a model of agency and utility on top of the usual structural-equations framework. Using these two components, we propose a notion of steady state, where the feedback received from an agent's optimal action, given her current beliefs about the true causal model, can be rationalized by those beliefs.
    Knowledge Representation and ReasoningBelief changeKnowledge Representation and ReasoningCausalityKnowledge Representation and ReasoningReasoning about actions
  236. #2563

    ResearchEnvBench: Benchmarking Agents on Environment Synthesis for Research Code Execution

    Yubang Wang, Chenxi Zhang, Bowen Chen, Zezheng Huai, Zihao Dai, Xinchi Chen, Yuxin Wang, Yining Zheng, Jingjing Gong, Xipeng Qiu
    Autonomous agents are increasingly expected to support scientific research, and recent benchmarks report progress in code repair and autonomous experimentation. However, these evaluations typically assume a pre-configured execution environment, which requires resolving complex software dependencies, aligning hardware and framework versions, and configuring distributed execution, yet this capability remains largely unbenchmarked. We introduce ResearchEnvBench, a benchmark for environment synthesis in research code execution. Given a research repository, documentation, and a target execution setting, agents must construct an environment that successfully executes at runtime. Evaluations on diverse research repositories reveal a substantial gap in current SOTA agents, with failures dominated by incomplete dependency resolution and brittle version coupling. ResearchEnvBench provides a realistic testbed for advancing autonomous agents toward reproducible scientific research.
    Agent-based and Multi-agent SystemsApplications
  237. #2596

    Approximate Strategyproofness in Approval-Based Budget Division

    Haris Aziz, Patrick Lederer, Jeremy Vollen
    In approval-based budget division, the task is to allocate a divisible resource to the candidates based on the voters' approval preferences over the candidates. For this setting, Brandl et al. (2021) have shown that no distribution rule can be strategyproof, efficient, and fair at the same time. In this paper, we aim to circumvent this impossibility theorem by focusing on approximate strategyproofness. To this end, we analyze the incentive ratio of distribution rules, which quantifies the maximum multiplicative utility gain of a voter by manipulating. While it turns out that several classical rules have a large incentive ratio, we prove that the Nash product rule (NASH) has an incentive ratio of 2, thereby demonstrating that we can bypass the impossibility of Brandl et al. by relaxing strategyproofness. Moreover, we show that an incentive ratio of 2 is optimal within three natural classes of rules and that the positive result for the Nash product rule even holds when voters may report arbitrary concave utility functions. Finally, we complement our results with an experimental analysis.
    Game Theory and Economic ParadigmsComputational social choiceGame Theory and Economic ParadigmsMechanism design
  238. #2598

    Towards Defining, Measuring and Leveraging Creativity in AI Applications

    Jingyi Yang, Alexander Tuzhilin
    Despite extensive research, computational creativity still lacks a conceptually grounded, practically relevant, and empirically applicable formal definition across contexts such as art, scientific research, and games. Existing approaches to creativity often focus on proposing informal conceptual definitions without providing formal computational concepts. Alternatively, they propose formal definitions without carefully examining their underlying assumptions or providing validation of these definitions by testing their performance in specific application settings. To address these problems, this paper proposes several formal postulates of creativity grounded in prior psychological work on creativity, especially as related to Boden’s three core criteria of creativity: newness, value, and surprise. Furthermore, we propose a conceptual mathematical definition of creativity that aligns with these postulates. We also validate our proposed creativity formula in two applications: determining the creativity of particular chess moves, and assessing the creativity of paintings by some artists, such as Picasso, Renoir, and other painters. Experimental results demonstrate that our formula of creativity successfully captures creative patterns across both the chess and the art domains. This suggests that creativity can be effectively characterized through a unified mathematical structure based on three fundamental criteria, namely newness, value, and surprise, whose specific measurements are adapted to the particular domains.
    Humans and AICognitive modelingMultidisciplinary Topics and ApplicationsArts and creativity
  239. #2613

    PURE: Purging Unrelated Representations for Content-Agnostic Forgery Detection

    Xinyu Wu, Dong Li, Minglai Shao, Xintao Wu, Zhong Chen, Chen Zhao
    Existing AI-generated image (AIGI) detectors perform well in-domain but degrade severely under distribution shift. We observe that this failure is mainly caused by content shortcuts, where detectors spuriously couple forgery artifacts with semantic content, such as object categories or demographic attributes, learning content–label correlations instead of generalizable forgery patterns. To address this issue, we propose PURE (Purging Unrelated Representations for Content-Agnostic Forgery Detection), which achieves content-agnostic detection through two complementary components: a Causal Semantic Generative (CSG) mechanism that disentangles semantic representations from forgery-irrelevant nuisance factors, and a Gaussian Mixture Model (GMM)-based prototype alignment module that suppresses category-specific content bias. Extensive experiments on CIFAKE, GenImage, and AlFace show that PURE achieves superior generalization under spurious correlation reversal. The code is available at https://github.com/wuxinyu519/PURE.
    Computer VisionMultimodal learningComputer VisionMachine learning for visionMachine LearningClassificationComputer VisionRepresentation learningMachine LearningCausality
  240. #2621

    Semi-supervised Clustering via Adversarially Enhanced Intent Propagation

    Wentao Zhong, Ruina Bai, Jingjing Xue, Ying Nie, Ruizhang Huang
    Semi-supervised clustering (SSC) enables personalized clustering under limited user supervision. Given the sparsity of initial user intents, constraint propagation has been proposed as a powerful approach to explore and generate new performance-enhancing constraints. However, existing methods struggle to handle a large number of ``gray samples'' that deviate from initial supervision signals and exhibit ambiguous semantics with indistinct boundaries. Compared with ``distinct samples'' that closely match the supervision, gray samples typically contain richer latent semantics, and accurately identifying their relational types can significantly improve clustering performance. To address this challenge, we propose Adversarially Enhanced Propagation-driven Intent-aware Clustering (AEPIC). Specifically, we design an Adversarially Enhanced Constraint Propagation (AECP) mechanism that leverages global adversarial learning over dual relational links to identify gray samples and expand them into meaningful pseudo-user intents. In addition, an intent-aware regularization strategy integrates these pseudo-user intents into representation learning and clustering optimization, further improving clustering performance. Experiments on 5 benchmark datasets demonstrate that, under sparse supervision, AEPIC consistently outperforms state-of-the-art semi-supervised clustering methods.
    Natural Language ProcessingInformation retrieval and text miningMachine LearningClusteringMachine LearningDeep learning architecturesMachine LearningFeature extraction, selection and dimensionality reduction
  241. #2630

    A Local-Rotation-Driven Global Consistency Framework with Dual-View Decoding for Semi-Supervised Medical Image Segmentation

    Zhen Yang, Dongshuai Zhang, Yunliang Qi, Guidong Zhang, Shouliang Li, Shuai Wu
    Medical image segmentation is still challenging, especially in semi-supervised scenarios where limited annotations are expected to support both accurate boundary delineation and coherent anatomical structures. We propose LR-GCF, a Local-Rotation-Driven Global Consistency Framework that couples strong local geometric perturbations with global consistency regularization in a dual-view architecture. Given an image, LR-GCF transforms it to different variants through patch-wise jigsaw-rotated perturbations. Such variants are decoded using parallel Swin Transformer decoders and perform consistency regularization by virtue of cross pseudo supervision. A rotation prediction head is designed to predict the rotation of each local patch. This makes locally discriminative representations robust to rotations and helps the model recognize global anatomical structure under strong perturbations. To further stabilize global modeling, we introduce a rotation-aligned attention consistency loss that encourages the inter-patch correlation across different variants to be as similar as possible. Moreover, we design a Position Encoding Duplicating scheme to propagate positional encodings from the low-resolution token map to high-resolution token maps, which helps alleviate the misalignment problem in rotation-aligned attention consistency. Extensive experiments on multiple medical image segmentation benchmarks demonstrate that LR-GCF consistently achieves superior performance under limited supervision, by jointly optimizing local detail extraction and stable global anatomical modeling. Source code is available at https://github.com/shuaiaihang/LR-GCF.
    Computer VisionBiomedical image analysisComputer VisionSegmentation, grouping and shape analysisComputer VisionTransfer, low-shot, semi- and un- supervised learning
  242. #2631

    Sprint or Delve: A Distribution-Aware Approach to Efficient Reasoning

    Zehui Ling, Deshu Chen, Hongwei Zhang, Yifeng Jiao, Xin Guo, Zenglin Xu, Yuan Cheng
    Reasoning chains in Large Language Models (LLMs) often exhibit heavy-tailed length distributions, yet existing efficiency methods rely on suboptimal linear penalties that suppress complex reasoning, limiting both accuracy and generalization. To address this, we first empirically observe that reasoning lengths are well approximated by a log-normal distribution, and provide an intuitive explanation for this phenomenon. Based on this insight, we propose the Powered Length Penalty (PLP), an adaptive regularizer that penalizes redundancy in short sequences while gradually reducing penalties for longer sequences, preserving deep reasoning. Trained solely on the elementary GSM8K dataset, PLP significantly improves reasoning efficiency by reducing inference costs on GSM8K and MATH500, while simultaneously enhancing accuracy on the challenging AIME2024 benchmark. Furthermore, PLP transfers effectively to diverse domains, including MMLU and GPQA. These results suggest that modeling reasoning length distributions and adapting penalties accordingly can mitigate the typical trade-off between efficiency and performance, enabling more reliable and cost-effective reasoning across tasks.
    Natural Language ProcessingLanguage modelsNatural Language ProcessingOtherNatural Language ProcessingQuestion answering
  243. #2647

    Multi-Semantic Aware Self-Supervised Learning for Multi-Label Node Classification

    Jiayu Zhang, Jitao Zhao, Dongxiao He, Cuiying Huo, Zhiyong Feng
    Graph self-supervised learning aims to mine intrinsic signals from graph data itself to train models. It enables the acquisition of high-quality representations without manual annotations, making it suitable for various label-scarce scenarios and thus garnering substantial interest. Existing graph self-supervised methods typically assume that each node possesses a single semantic meaning. However, many real-world entities exhibit multiple semantics, where a single node may simultaneously belong to multiple categories. Since existing approaches are mainly designed for single-semantic settings, they often struggle to accurately capture the multiple semantics within nodes. Meanwhile, existing multi-label graph learning methods mainly depend on extensive manual annotations, which are costly and of limited applicability. To address this, we make a bold attempt to extend graph self-supervised learning to multi-label graphs. It is particularly challenging, as lacking labels makes it difficult to identify multiple semantics, let alone determine the number of underlying categories for nodes. To this end, we propose a Multi-semantic Aware Self-Supervised pretraining method (MASS) for multi-label graphs. Specifically, we propose a multi-pseudo-label decomposition mechanism, enabling adaptive learning of multiple semantics within nodes without annotations. Extensive results validates the effectiveness of MASS, and it achieved competitive performances with supervised baselines.
    Data MiningMining graphs
  244. #2651

    Stability Under Valuation Updates in Coalition Formation

    Fabian Frank, Matija Novaković, Rene Romen
    Coalition formation studies how to partition a set of agents into disjoint coalitions based on their preferences. In this paper,
    we consider the class of additively separable hedonic games and study the setting in which agents' valuations of each other evolve over time. Since rearranging coalitions can be costly, our goal is to find a stable partition close to the current one, where distance is measured by the number of agents that must change coalition.
    We study this problem for four stability notions based on single-agent deviations: Nash, individual, contractual Nash, and contractual individual stability. For all four, deciding whether a close stable partition exists is NP-complete, even under severe restrictions on the valuations and even when only a single valuation changes. On the positive side, we give polynomial-time algorithms for the two contractual notions under restricted symmetric valuations, and show that over long sequences of updates, these algorithms maintain stability with constant average reconfiguration cost.
    Game Theory and Economic ParadigmsCooperative gamesGame Theory and Economic ParadigmsComputational social choice
  245. #2655

    Action-Gradient Monte Carlo Tree Search for Non-Parametric Continuous (PO)MDPs

    Idan Lev-Yehudi, Michael Novitsky, Moran Barenboim, Ron Benchetrit, Vadim Indelman
    Online planning in continuous state, action, and observation spaces remains challenging for autonomous systems. While Monte Carlo Tree Search (MCTS) scales effectively via sampling, most continuous (PO)MDP solvers do not exploit gradient-based action optimization. We propose Action-Gradient MCTS (AGMCTS), a framework that combines global tree search with local gradient-based action refinement, while maintaining consistent value estimates. We provide three key theoretical contributions: (1) an action score gradient theorem for particle belief states; (2) the Multiple Importance Sampling (MIS) Tree that supports frequent action-branch updates by reusing prior samples without introducing estimator drift; and (3) tractable action score gradients for smooth generative models using the Area Formula. Empirical results demonstrate that AGMCTS outperforms state-of-the-art sample-based solvers in multiple challenging continuous MDP and POMDP benchmarks.
    Machine LearningModel-based and model learning reinforcement learningPlanning and SchedulingPlanning under uncertaintyPlanning and SchedulingPOMDPs
  246. #2656

    Empirical Evidence and Analysis of a Critical Pitfall in Reward Learning from Human Feedback

    Taha Shaheen, Stephen G. West, Yu Zhang
    Reward learning via human feedback is a crucial capability for beneficial AI. Current methods are built on decision-making theories that assume a matched dynamics model between the learning agent and the feedback provider. However, humans often form imperfect internal dynamics models, and their feedback reflects these misconceptions. While this relationship has long been hypothesised, its manifestation in sequential decision-making remains largely an assumption. Our work provides the first comprehensive empirical investigation of this relationship through a randomized controlled trial (N=211). We followed a two-stage design where we first initialized the participants' understanding of the dynamics in a grid-world navigation domain and then manipulated it using text-based instructions. Causal mediation analysis revealed that humans' internal models play a mediating role in feedback behaviour. We show that this relationship is invariant across visual contexts and is robust to three common feedback types: pairwise preferences, trajectory corrections, and off-switch interventions. These findings confirm a critical limitation of current reward learning methods and establish the missing psychological foundation for approaches that incorporate dynamics understanding.
    Agent-based and Multi-agent SystemsHuman-agent interactionHumans and AICognitive modelingHumans and AIHuman-AI collaborationRoboticsHuman robot interactionUncertainty in AIDecision and utility theory
  247. #2661

    QuesRecAgent: A Dual-Loop Multi-Agent Question Recommender for Enhancing Knowledge Mastery

    Zhifeng Wang, Jialiang Shen, Yulin Hou, xiaoxue liu
    Adaptive Learning Systems (ALS) play a key role in promoting the equity of education, which can provide personalized teaching on a large scale. However, the existing question recommendation methods face the problem of data sparsity at the scenario of cold-start, and lack the ability of long-term teaching reasoning. To address these challenges, we propose QuesRecAgent, a multi-agent framework which involves Large Language Models (LLMs) and the Relexion mechanism for adaptive question recommendation. Different from the traditional black-box methods, the QuesRecAgent applies a framework Dual-Loop State Decoupling architecture, which achieves a rapid updating of beliefs through simulated interaction in the inner loop and do the authoritative diagnosis and meta-strategy reflection to optimize the teaching strategy in the outer loop. Furthermore, with the support of a Topological Knowledge Graph (TKG) and the theory of Zone of Proximal Development (ZPD), the QuesRecAgent ensures a high-quality learning guidance even facing students with limited interaction records. Our experiments on three real-world datasets show that the QuesRecAgent outperforms state-of-the-art baselines in total knowledge gain and other metrics, providing a robust and interpretable solution for promoting high-quality personalized education.
    Humans and AICognitive modelingHumans and AIComputer-aided educationAgent-based and Multi-agent SystemsAgent societiesMachine LearningKnowledge-aided learning
  248. #2673

    Adversarial Masked Graph Modeling for Robust Graph Autoencoders

    Qiqi Zhang, Chuanjin Liu, Gen Liu, Chao Li, Zhongying Zhao
    Graph Masked AutoEncoders (GMAEs) have achieved notable success in diverse downstream tasks. However, their superior performance usually relies on reliable and unperturbed input graphs. In real-world scenarios, graphs are vulnerable to adversarial attacks (e.g., perturbations on node features or topology). As a result, existing GMAEs tend to overfit these perturbed inputs, leading to poor robustness. Motivated by this, we propose an adversarial masked graph modeling for robust graph autoencoders, named ArmorGAE. Specifically, we first design a dual-consistency graph augmentation method that enriches the graph topology to mitigate structural vulnerabilities. Theoretically, we establish a connection between ArmorGAE and contrastive learning. It demonstrates that our augmentation strategy in GMAEs is equivalent to effectively expanding the set of high-quality positive samples in contrastive learning, providing richer and more reliable supervisory signals. We then present an adversarial masked autoencoder to mitigate model overfitting to the perturbed graph. The generator and discriminator are jointly trained under an adversarial paradigm, where the former aims to recover the underlying true data distribution, while the latter distinguishes reconstructed and real edges. Experimental results on five datasets demonstrate that ArmorGAE outperforms state-of-the-art methods in terms of classification accuracy on clean graphs and adversarial robustness under 16 distinct adversarial scenarios. The codes are publicly available at https://github.com/ZZY-GraphMiningLab/ArmorGAE.
    Machine LearningAdversarial machine learningMachine LearningRobustnessMachine LearningSelf-supervised Learning
  249. #2689

    Uni-RS: A Spatially Faithful Unified Understanding and Generation Model for Remote Sensing

    Weiyu Zhang, Yuan Hu, Yong Li, Yu Liu
    Unified remote sensing multimodal models exhibit a pronounced spatial reversal curse: although they can accurately recognize and describe object locations in images, they often fail to faithfully execute the same spatial relations during text-to-image generation, where such relations constitute core semantic information in remote sensing. Motivated by this observation, we propose Uni-RS, the first unified multimodal model tailored for remote sensing, to explicitly address the spatial asymmetry between understanding and generation. Specifically, we first introduce explicit Spatial-Layout Planning to transform textual instructions into spatial layout plans, decoupling geometric planning from visual synthesis. We then impose Spatial-Aware Query Supervision to bias learnable queries toward spatial relations explicitly specified in the instruction. Finally, we develop Image–Caption Spatial Layout Variation to expose the model to systematic geometry-consistent spatial transformations. Extensive experiments across multiple benchmarks show that our approach substantially improves spatial faithfulness in text-to-image generation, while maintaining strong performance on multimodal understanding tasks like image captioning, visual grounding, and VQA tasks.
    Computer VisionImage and video synthesis and generationComputer VisionVision, language and reasoningNatural Language ProcessingLanguage models
  250. #2698

    EnzyPGM: Pocket-conditioned Generative Model for Substrate-specific Enzyme Design

    Zefeng Lin, Zhihang Zhang, Weirong Zhu, Tongchang Han, Xianyong Fang, Tianfan Fu, Xiaohua Xu
    Designing enzymes with substrate-binding pockets is a critical challenge in protein engineering, as catalytic activity depends on the precise interaction between pockets and substrates. Currently, generative models dominate functional protein design but cannot model pocket-substrate interactions, which limits enzyme generation with precise catalytic environments. To address this issue, we propose EnzyPGM, a unified framework that jointly generates enzymes and substrate-binding pockets conditioned on functional priors and substrates, with a particular focus on learning accurate pocket–substrate interactions. At its core, EnzyPGM includes two main modules: a Residue-atom Bi-scale Attention (RBA) that jointly models intra-residue dependencies and fine-grained interactions between pocket residues and substrate atoms, and a Residue Function Fusion (RFF) that incorporates enzyme function priors into residue representations. Also, we curate EnzyPock, an enzyme–pocket dataset comprising 84,336 enzyme–substrate pairs across 1,036 four-level enzyme families. Extensive experiments demonstrate that EnzyPGM achieves state-of-the-art performance on EnzyPock. Notably, EnzyPGM reduces the average binding energy by 0.47 kcal/mol over EnzyGen, showing its superior performance on substrate-specific enzyme design. The code is available at https://github.com/John-Lin98/EnzyPGM.
    Multidisciplinary Topics and ApplicationsBioinformaticsMultidisciplinary Topics and ApplicationsLife sciences
  251. #2711

    IC3-Evolve: Proof-/Witness-Gated Offline LLM-Driven Heuristic Evolution for IC3 Hardware Model Checking

    Mingkai Miao, Guangyu Hu, Ziyi Yang, Hongce Zhang
    IC3, also known as property-directed reachability (PDR), is a commonly-used algorithm for hardware safety model checking. It checks if a state transition system complies with a given safety property. IC3 either returns UNSAFE (indicating property violation) with a counterexample trace, or SAFE with a checkable inductive invariant as the proof to safety. In practice, the performance of IC3 is dominated by a large web of interacting heuristics and implementation choices, making manual tuning costly, brittle, and hard to reproduce.
    This paper presents IC3-Evolve, an automated offline code-evolution framework that utilizes an LLM to propose small, slot-restricted and auditable patches to an IC3 implementation. Crucially, every candidate patch is admitted only through proof-/witness-gated validation: SAFE runs must emit a certificate that is independently checked, and UNSAFE runs must emit a replayable counterexample trace, preventing unsound edits from being deployed. Since the LLM is used only offline, the deployed artifact is a standalone evolved checker with zero ML/LLM inference overhead and no runtime model dependency. We evolve on the public hardware model checking competition (HWMCC) benchmark and evaluate the generalizability on unseen public and industrial model checking benchmarks, showing that IC3-Evolve can reliably discover practical heuristic improvements under strict correctness gates.
    Agent-based and Multi-agent SystemsFormal verification, validation and synthesisConstraint Satisfaction and OptimizationApplicationsConstraint Satisfaction and OptimizationSatisfiabiltyConstraint Satisfaction and OptimizationSolvers and tools
  252. #2727

    Beyond the Clouds: Reliable and Cloud-Aware Spatiotemporal Fusion via Adversarial Regression Wavelets

    Sichen Lu, Mingfei Li, Juanjuan Jing, Junhua Yu, Lei Yang, Boyang Nie, Jinsong Zhou
    Spatiotemporal fusion (STF) bridges the gap between temporal and spatial resolutions in satellite imagery, enabling effective monitoring of Earth's surface dynamics. However, existing methods rely on cloud-free reference images, a constraint that fails in realistic, cloud-prone scenarios. To overcome this, we propose the Cloud-Aware Wavelet Generative Adversarial Network (CLAW-GAN), a novel framework for high-fidelity reconstruction under cloud-contaminated conditions. CLAW-GAN introduces Regression Wavelet Analysis (RWA) to decouple spectral backgrounds from structural details. While a change-aware gated mechanism accounts for land-cover changes, the Frequency-Separated Fusion (FSF) module then independently integrates these components. To ensure visual realism, a multi-scale discriminator operates in the wavelet domain, enforcing consistency across high-frequency subbands to minimize artifacts. Evaluated on the newly introduced Global Cloud-shrouded Agricultural Regions (GCAR) benchmark and the simulated Daxing dataset, CLAW-GAN achieves state-of-the-art performance and demonstrates superior robustness across varying cloud coverage.
    Computer VisionAdversarial learning, adversarial attack and defense methodsComputer VisionImage and video synthesis and generationComputer VisionLow-level VisionHumans and AIComputational sustainability and human wellbeing
  253. #2736

    IPSM-Bench: A New Intermediate Phase Segmentation Benchmark in Microstructure Images of Zinc-Based Absorbable Biomaterials

    Jinglin Xu, Shangyan Zhao, Jiabo Wang, Xinghong Mu, Yulong Lei, Jiacheng Zhang, Hongbo Sun, Yageng Li
    Zinc-based alloys are indispensable emerging absorbable metallic biomaterials, and their macroscopic performance is governed by microstructural characteristics. Intermediate phases—key microstructural constituents—are pivotal in regulating mechanical and functional properties. However, intermediate phase segmentation in zinc alloy microstructures faces formidable challenges: scarce annotated datasets, low contrast, difficulty detecting small targets, and heterogeneous morphologies. To this end, we construct IPSM-Bench, the largest high-quality dataset for zinc-alloy intermediate phase segmentation. Furthermore, we propose SCoP-SAM, a new Spatial Context Prior-guided SAM method that leverages the gradient structure and grayscale properties of intermediate phases to capture spatial context priors and incorporates them into the entire SAM encoding-decoding process, improving segmentation performance. Based on the proposed IPSM-Bench, we establish a new benchmark for intermediate phase segmentation to systematically evaluate state-of-the-art (SOTA) methods and advance research on zinc alloy microstructure analysis. Extensive experiments on IPSM-Bench and additional public alloy benchmarks demonstrate that our SCoP-SAM not only achieves SOTA performance for zinc-alloy intermediate phase segmentation but also generalizes remarkably well to other alloy scenarios.
    Computer VisionSegmentation, grouping and shape analysisMultidisciplinary Topics and ApplicationsOther
  254. #2747

    TaylorMoDe-GS: Taylor-Driven Gaussian Splatting Motion Model for Multi-View Dynamic Scene Deblurring

    Xiaofeng Quan, Junzhe Wan, Chao Cai, Yifan Zuo, Xiaoshui Huang, Yuming Fang
    While 3D Gaussian Splatting (3DGS) has excelled in dynamic scene reconstruction, it struggles with multi-view object motion blur, where view-dependent non-uniform blur violates fundamental multi-view geometric constraints. Existing methods fail to balance complex motion fitting with physical consistency across different views. To address the challenge, we propose TaylorMoDe-GS, the first 3DGS framework tailored for multi-view dynamic object deblurring. Specifically, we shift the modeling paradigm from displacement fitting to velocity driven modeling. This is achieved by analytically deriving instantaneous 3D velocities via Taylor series. To map 3D physical motion onto the 2D image plane, we propose a velocity splatting technique. Building upon this, we introduce a neural Peano remainder network to compensate for high frequency non-linear dynamics, effectively resolving the conflict between physical priors and fitting flexibility. Combined with a learnable physical blur synthesis mechanism, our framework ensures rigorous spatiotemporal and view consistency. Extensive experimental results validate that our method achieves notable advancements in restoring high-fidelity dynamic scenes and physically consistent motion fields, effectively addressing the challenges of multi-view dynamic object deblurring. The code is released at https://github.com/Chiffin-0816/TaylorMoDe-GS.
    Computer Vision3D computer visionComputer VisionImage and video synthesis and generationComputer VisionLow-level Vision
  255. #2753

    RIVS: Mitigating Hallucination in Large Vision-Language Models via Representation Intervention on Visual Grounding Shift

    Xuanyu Yin, Xiaoye Qu, Wei Wei
    Large Vision-Language Models (LVLMs) demonstrate powerful generative capabilities yet remain prone to object hallucinations. Most existing methods mitigate this issue through training or decoding strategies, but provide limited exploration of how hallucinations arise from internal representations during generation.
    In this work, we study hallucination from the perspective of dynamic representation shift during generation and propose Representation Intervention based on Visual Grounding Shift (RIVS). Specifically, we first design an adaptive threshold-based method to identify visual attention drop points within the generation, finding that hallucinated tokens usually occur with decay of visual attention. Based on this observation, we construct a hallucination-related subspace from representation differences around these points on a small calibration set, without constructing contrast samples or extra supervision.
    During inference, we leverage the resulting hallucination-related subspace to perform an online projection-based intervention on intermediate hidden states to suppress the hallucination-related directions, mitigating hallucinations while preserving language quality.
    Our RIVS is training-free and computationally efficient. Experiments on four hallucination and two reasoning datasets demonstrate that RIVS consistently reduces hallucinations in both long and short sequence generation tasks.
    AI Ethics, Trust, FairnesBiasAI Ethics, Trust, FairnesTrustworthy AIAI Ethics, Trust, FairnesOther
  256. #2754

    Generalization Analysis for Adversarial Vision Transformer

    Ziwen Jiang, Chang Cao, Han Li, Hong Chen, Rushi Lan
    Vision Transformers (ViTs) exhibit notable susceptibility to adversarial attacks, presenting a significant challenge for their deployment in security-sensitive applications. Despite their considerable empirical successes, a rigorous theoretical foundation for ViT's adversarial generalization behavior has not been adequately established. To address this limitation, we leverage empirical Rademacher complexity to analyze the mechanism of perturbation accumulation through deep ViTs layers. We establish a high-probability generalization bound for ViTs in classification tasks under adversarial settings. Our theoretical framework elucidates the roles of several factors in mitigating perturbation effects, norm regularization of weight matrices (in both MLP and attention modules) and depth-wise propagation constraints on layer-wise norms. Extensive experiments on benchmark datasets corroborate our theoretical insights, bridging the gap between ViTs architecture design and adversarial robustness.
    Computer VisionMachine learning for visionMachine LearningAdversarial machine learningMachine LearningRobustness
  257. #2756

    MultiGeo: Predicting Drug-Target Affinity via Adaptive Multi-Conformation Ensemble Learning

    Ruida Zeng, Cheng Guo, Yajie Meng, Xunkun Cheng, Zhiwei Xu, Xiangzheng Fu, Pan Zeng, Shuting Jin, Junlin Xu
    Predicting drug–target affinity (DTA) is central to drug discovery, yet most deep learning models rely on a single static protein structure, neglecting the conformational heterogeneity that underlies many binding mechanisms. We propose MultiGeo, a DTA prediction framework that explicitly leverages multiple protein conformations rather than a single snapshot. For each target, MultiGeo starts from an ensemble of structure predictions and uses confidence scores to adaptively select a small set of high-quality, diverse conformers. A hierarchical structural encoder then extracts multi-scale geometric features from these conformers, which are aggregated by a GRU to obtain an ensemble-level representation. To avoid indiscriminately mixing noisy or redundant views, we introduce a disagreement-aware gating mechanism that adaptively fuses this ensemble representation with the dominant structure only when the additional conformers provide complementary information. Finally, ligand–target interactions are modeled via a block-wise cross-attention module that captures multi-perspective dependencies between ligand and protein features. Extensive experiments on multiple DTA benchmarks demonstrate that MultiGeo consistently outperforms state-of-the-art baselines, showing that explicitly encoding conformational diversity yields more accurate and robust affinity prediction.
    Multidisciplinary Topics and ApplicationsBioinformatics
  258. #2759

    Detect, Attend and Extract: Keyword Guided Target Speaker Extraction

    Haoyu Li, Yu Xi, Yidi Jiang, Shuai Wang, Kate Knill, Mark Gales, Haizhou Li, Kai Yu
    Target speaker extraction (TSE) aims to extract the speech of a target speaker from mixtures containing multiple competing speakers. Conventional TSE systems predominantly rely on speaker cues, such as pre-enrolled speech, to identify and isolate the target speaker. However, in many practical scenarios, clean enrollment utterances are unavailable, limiting the applicability of existing approaches. In this work, we propose DAE-TSE, a keyword-guided TSE framework that specifies the target speaker through distinct keywords they utter. By leveraging keywords (i.e., partial transcriptions) as cues, our approach provides a flexible and practical alternative to enrollment-based TSE. DAE-TSE follows the Detect-Attend-Extract (DAE) paradigm: it first detects the presence of the given keywords, then attends to the corresponding speaker based on the keyword content, and finally extracts the target speech. Experimental results demonstrate that DAE-TSE outperforms standard TSE systems that rely on clean enrollment speech. To the best of our knowledge, this is the first study to utilize partial transcription as a cue for specifying the target speaker in TSE, offering a flexible and practical solution for real-world scenarios. Our code (https://github.com/GnafiY/DAE-TSE) and demo page (https://gnafiy.github.io/DAE-TSE_demo) are now publicly available.
    Natural Language ProcessingSpeech
  259. #2761

    Harmonizing Real-Time Constraints and Long-Horizon Reasoning: An Asynchronous Agentic Framework for Dynamic Scheduling

    Shijie Cao, Yuan Yuan, Jing Liu
    The Dynamic Flexible Job Shop Scheduling Problem (DFJSP) necessitates a trade-off between instant reaction to stochastic disturbances and global optimization of production goals. Conventional priority rules are insufficiently flexible to handle complex disruptions, whereas learning-based approaches often compromise interpretability or fail to generalize across problem scales. Although Large Language Models (LLMs) offer advanced reasoning capabilities to bridge this gap, their substantial inference latency is incompatible with the millisecond-level decision cycles of industrial control systems. To resolve this conflict, we introduce RACE-Sched, an asynchronous agent-based framework that decouples policy execution from logical reasoning via a dual-stream architecture. The Reactive Stream executes low-latency symbolic heuristics to enable real-time dispatching, while the parallel Deliberative Stream leverages an LLM to synthesize, validate, and evolve these rules. Candidate rules undergo rigorous testing in a sandbox and are deployed via atomic updates, ensuring safety without blocking the control loop. Additionally, a semantic rule repository indexes validated heuristics for retrieval-based initialization which enhances transferability across problem scales. Extensive evaluations on GEN-Bench, MK-Bench, and JMS-Bench demonstrate that RACE-Sched outperforms leading Deep Reinforcement Learning and other LLM-based baselines. This approach harmonizes real-time constraints with long-horizon reasoning to achieve superior solution quality and robust adaptation to dynamic events.
    Planning and SchedulingLearning in planning and schedulingPlanning and SchedulingPlanning under uncertaintyPlanning and SchedulingPOMDPsPlanning and SchedulingReal-time planningPlanning and SchedulingScheduling
  260. #2771

    Label Enhancement via Cross-View Fusion and Mixed Graph Propagation

    Mengjiao Kai, Chao Tan, Yanda Wang, Juanna Zhai, Kang Wu, Ningkang Peng, Yanhui Gu
    Label Distribution Learning (LDL) effectively addresses label ambiguity by modeling the degree to which each label describes an instance. A key challenge in LDL is Label Enhancement (LE): recovering label distributions from logical labels. Existing LE methods typically treat logical labels as supervisory signals and learn a direct mapping from features to label distributions. However, they fail to fully exploit the rich information encoded in logical labels, limiting their performance. We propose CVMG (Cross-View Fusion and Mixed Graph-based Label Enhancement), a novel approach that addresses this limitation through two key innovations. First, we employ a cross-attention mechanism to integrate logical labels and features, leveraging their complementary information to generate enriched feature representations. Second, we construct a mixed dependency graph that captures both instance-level relationships from enhanced features and category-level dependencies from logical labels. Label distributions are then recovered through propagation over this graph. Extensive experiments on 13 real-world datasets demonstrate that CVMG significantly outperforms state-of-the-art methods, validating the effectiveness of our approach.
    Machine LearningMulti-label learningData MiningExploratory data miningMachine LearningAttention modelsConstraint Satisfaction and OptimizationConstraint satisfaction
  261. #2780

    S²-VLA: State-Space Guided Vision-Language-Action Models for Long-Horizon Manipulation

    Zhipeng Xie, Zongyi Han, Xiangyi Wei, Shiliang Sun, Yang Li, Jing Zhao
    Vision-Language-Action (VLA) models have demonstrated strong capabilities in robotic manipulation, but their performance degrades significantly in long-horizon tasks due to cumulative error propagation. This limitation largely arises from static feature fusion mechanisms that rely on fixed weights to combine visual, language, and action representations, preventing the model from adapting to different phases of task execution. To address this limitation, we propose S²-VLA, a framework that introduces a State-Space Guided Adaptive Attention (SSGAA) mechanism. SSGAA maintains a belief state that tracks task progression and generates dynamic gating weights to adaptively fuse information from three complementary sources visual features for spatial perception, task intents for high-level task planning, and temporal action sequences for execution consistency. This adaptive fusion allows the model to shift its focus throughout task execution, aligning with the evolving requirements of different task stages. Despite its compact 2B parameter size, S²-VLA consistently outperforms larger 7B-scale models and achieves state-of-the-art performance on long-horizon manipulation benchmarks, including LIBERO and SimplerEnv. highlighting the importance of adaptive feature fusion for long-horizon robotic manipulation.
    RoboticsLearning in robotics
  262. #2785

    Detoxifying Large Language Models via Localized Feature Editing with Sparse Autoencoders

    Yuhu Shang, Xiang Cheng, Dianyun Wang, Yimeng Ren, Xuexiong Luo, Hang Yang, Huijia Wu, Zhaofeng He
    Large Language Models (LLMs) powerful generative capabilities also pose significant risks, underscoring the need for effective detoxification methods to ensure safer deployment. Due to the polysemantic nature of LLM neurons, recent neuron intervention methods inevitably entangle unrelated concepts, compromising generation quality and interpretability. Sparse Autoencoders (SAEs) have opened new horizons for decomposing model activations into monosemantic features, offering interpretability and targeted feature-level steering. Empirical findings reveal that, despite capturing interpretable features, indiscriminate interventions on toxicity-related features expose the fragility of LLMs, achieving toxicity mitigation at the cost of degraded fluency. Building upon this finding, we propose DeLFE, a lightweight controlled detoxification approach that identifies specific toxic features across model layers and performs targeted interventions on them. DeLFE learns toxicity subspaces from label-guided SAE feature subsets to characterize toxic v.s. non-toxic activation patterns. When auto-completing a response token-by-token, DeLFE tracks the toxicity-triggering risks and steers toxic features away from the subspace via a flow-matching feature transformation. We further design three feature-level strategies that adjust intervention timing and strength to reconstruct the target model’s original activations. Extensive experiments demonstrate that our method achieves strong detoxification effectiveness while maintaining high generation quality across models of varying sizes and diverse base LLMs.
    AI Ethics, Trust, FairnesExplainability and interpretabilityAI Ethics, Trust, FairnesSafety and robustnessNatural Language ProcessingInterpretability and analysis of models for NLPNatural Language ProcessingLanguage generationNatural Language ProcessingLanguage models
  263. #2798

    Toward Synthesizability-Aware Multi-Step Retrosynthetic Planning

    Yujie Chen, AJie Lin, Tengfei Ma, Shu Wu, Leyi Wei, Yiping Liu, Xiangxiang Zeng
    Multi-step retrosynthetic planning aims to decompose target molecules into available starting materials by iteratively invoking single-step prediction models within a search algorithm. The success of retrosynthetic planning depends on the joint guidance of single-step reasoning and global search across steps. However, most existing frameworks make step-wise decisions based only on the current molecular state, without explicitly modeling synthesizability signals that reflect long-range reachability. In this work, we propose GuideRetro, a synthesizability-aware framework for multi-step retrosynthetic planning that integrates global synthesizability knowledge into step-wise retrosynthetic prediction. GuideRetro learns transferable knowledge from large-scale reaction networks by modeling the evolution of synthetic complexity along reaction pathways. During planning, a route-aware synthesis state modeling module combines the evolving retrosynthetic route with retrieved global signals to guide reactant generation at each step. Experiments on benchmark datasets show that GuideRetro achieves state-of-the-art performance. The integration of global synthesizability knowledge and route-aware modeling improves planning accuracy and search efficiency under realistic retrosynthetic settings. The code is available at: https://github.com/L-AJ/GuideRetro.
    Multidisciplinary Topics and ApplicationsOtherMultidisciplinary Topics and ApplicationsPhysical sciences
  264. #2802

    PerFlow: Physics-Embedded Rectified Flow for Efficient Reconstruction and Uncertainty Quantification of Spatiotemporal Dynamics

    Hao Zhou, Rui Zhang, Han Wan, Hao Sun
    Reconstructing PDE-governed fields from sparse and irregular measurements is challenging due to their ill-posed nature. Deterministic surrogates are trained on dense fields that struggle with limited measurements and uncertainty quantification. Generative models, by learning distributions over spatiotemporal fields, can better handle sparsity and uncertainty. However, existing generative approaches enforce data consistency and PDE constraints simultaneously via sampling-time gradient guidance, resulting in slow and unstable inference. To this end, we propose PerFlow, a Physics-embedded rectified Flow for efficient sparse reconstruction and uncertainty quantification of spatiotemporal dynamics. PerFlow decouples observation conditioning from physics enforcement, performing guidance-free conditioning by feeding observations into rectified-flow dynamics while embedding hard physics via a constraint-preserving projection (e.g., incompressibility or conservation). Theoretically, we establish invariance guarantees to ensure that trajectories remain on the physics-consistent manifold throughout sampling. Experiments on various PDE systems demonstrate competitive reconstruction accuracy with sound physics consistency, while enabling efficient conditional sampling (e.g., 50 steps) and up to 320x faster inference than 2000-step guided diffusion baselines.
    Multidisciplinary Topics and ApplicationsPhysical sciences
  265. #2818

    Keep Experts Diverse: A Task-Aware MoE for Multi-Task Traffic Analysis

    Jiadong Fu, Jiang Fang, Jiyan Sun, Laile Xi, Shangyuan Zhuang, Liru Geng, Yinlong Liu, Zhiqiang Lv
    Network traffic analysis is crucial for maintaining the security of networks. Yet deploying accurate models on edge nodes remains challenging due to protocol diversity, complex traffic behaviors, and stringent resource constraints. Although recent deep learning models achieve strong performance, their dense architectures incur high inference latency, making them unsuitable for edge deployments.
    Mixture-of-Experts offers a promising solution through conditional computation, yet existing methods overlook latent inter-task relations and suffer from severe expert load imbalance, leading to expert collapse and suboptimal efficiency.

    To address these limitations, we propose TAMoE, a novel Task-Aware Mixture-of-Experts framework for multi-task network traffic analysis.
    TAMoE integrates task semantics into both routing and representation learning through a novel Task-Aware Routing, enabling the router to dynamically select experts conditioned on both traffic features and task information.
    Besides, to train stably usable and scalable multi-task models, we further design a collaborative multi-task training strategy that encompasses both a dataset mixed construction and a joint training process.
    Experiments on six public benchmarks show that TAMoE achieves state-of-the-art results. Deployment on an NVIDIA® Jetson AGX Orin edge node demonstrates that TAMoE reduces inference latency by 60.6% compared to dense models and lowers the Gini Coefficient from 0.391 to 0.275 relative to Multi-Router MoE, achieving more balanced expert utilization.
    Data MiningApplicationsData MiningNetworksMultidisciplinary Topics and ApplicationsSecurity and privacy
  266. #2822

    FedFINFO: A General Full-Informativeness Federated Graph Learning from Open Cross-Domain Data

    Wan Zhang, Xiaoqian Jiang, Ye Wang, Zhiqiang Xu, Jing Zhang
    Open cross-domain federated graph learning facilitates collaborative learning among clients from distinct graph domains while preserving privacy. However, severe structure and feature heterogeneity in open scenarios exacerbates the multiplicative amplification of structural and feature noises within graph neural network encoders, leading to significant negative transfer. Existing methods addressing partial heterogeneity are insufficient. We theoretically reveal this amplification mechanism and propose a novel learning method FedFINFO, which leverages full-informativeness shared knowledge. Specifically, we construct an aligned and general feature space integrating node semantic and structural roles. We proposed a counterfactual multi-view framework to explicitly learn a structure masker for extracting consensus subgraphs. These jointly suppress the mutual amplification of noises at the source. We also propose a history-aware adaptive multi-channel aggregation mechanism to enable dynamic and personalized sharing driven by channel similarity while preventing collaborative oscillation. FedFINFO effectively reconciles common knowledge sharing with personalized knowledge retention, significantly mitigating negative transfer. Extensive experiments under cross-dataset and cross-domain settings demonstrate the superiority of our method.
    Machine LearningFederated learningMachine LearningSequence and graph learningData MiningMining graphs
  267. #2854

    Proportional Selection in Networks

    Georgios Papasotiropoulos, Oskar Skibski, Piotr Skowron, Tomasz Wąs
    We address the problem of selecting k representative nodes from a network, aiming to simultaneously achieve two objectives: identifying the most influential nodes and ensuring that the selection proportionally reflects the diversity within the network. We propose a general approach to accomplish this by combining ideas from network science and computational social choice. Notably, our algorithms depend only on the connections between nodes and do not utilize any additional information that would explicitly identify groups of nodes. We analyze them theoretically, and demonstrate their effectiveness through a series of experiments.
    Game Theory and Economic ParadigmsComputational social choice
  268. #2869

    DanceStyleCam: Style-Based 3D Multi-Style Dance Camera Movement Synthesis

    Xiaoying Huang, Sanyi Zhang, Xirui Wang, Qin Zhang, Long Ye
    Fully automatic camera movement directly affects the art quality of dance expressiveness, especially in terms of visual expression, as well as choreography and music. Current studies mainly focus on synthesizing camera movements conditioned on dance and music, but they overlook the camera movement style, which is essential factor for artistic and visual coherence. In this paper, we introduce DanceStyleCam, a unified framework that incorporates the style-consistent characteristic into dance camera movement synthesis with diverse stylistic characteristics. Specifically, a style-aware feature learning module is proposed to map dance style information into compact embeddings, facilitating stable and discriminative style learning. To further guarantee that the generated camera movements remain faithful to the target style, we propose a style-consistent adversarial training scheme, leading and optimizing the model to learn better style-consistent representations. In addition, we also enrich the DCM dataset with diverse camera movement style annotations. Extensive experiments demonstrate that DanceStyleCam outperforms state-of-the-art methods in both generation quality and style consistency. The project page is: https://anonymous.4open.science/r/DanceStyleCam.
    Computer VisionImage and video synthesis and generation
  269. #2878

    Vector-Quantized Discrete Latent Factors Meet Financial Priors: Dynamic Cross-Sectional Stock Ranking Prediction for Portfolio Construction

    Namhyoung Kim, Jae Wook Song
    Predicting cross-sectional stock returns is challenging due to low signal-to-noise ratios and evolving market regimes. Classical factor models offer interpretability but limited flexibility, while deep learning models achieve strong performance yet often underutilize financial priors. We address this gap with PRISM-VQ (PRior-Informed Stock Model with Vector Quantization), a dynamic factor framework that integrates expert prior factors, vector-quantized discrete latent factors learned from cross-sectional structure, and a structure-conditioned Mixture-of-Experts to generate time-varying factor loadings. Vector quantization acts as an information bottleneck that suppresses noise while capturing robust market structure, with discrete codes serving both as latent factors and as routing signals for temporal expert specialization. Experiments on CSI 300 and S&P 500 show consistent improvements in cross-sectional return prediction and portfolio performance over strong baselines while preserving interpretability. Our code is available at https://github.com/finxlab/PRISM-VQ.
    Machine LearningApplicationsMultidisciplinary Topics and ApplicationsFinance
  270. #2892

    Content-style Disentanglement Guided Representation Learning for Deep Incomplete Multi-view Clustering

    Enze Ji, Meng Liu, Yuzhe Li, Zhikui Chen
    Deep incomplete multi-view clustering methods can mine patterns of incomplete multi-view data without labels, gaining great attention in various domains. However, current methods are obsessed with aligning view-specific representations from available samples with complete views to learn view-invariant information for missing data recovery, ignoring fruitful view-complementary information beneficial to data imputations. Meanwhile, such the view-invariant learning often leads to the problem of dominant view dependency that models tend to over-rely on views with clear clustering structures and neglect weaker ones, causing a performance bottleneck. Therefore, a content-style disentanglement guided representation learning is proposed for the incomplete multi-view clustering (CSMVC). Specifically, CSMVC designs a view-specific dual-representation learning architecture that extracts view-invariant content representations and view-specific style representations from self-supervised reconstructions of each view with the guidance of the Hilbert schmidt independence criterion. Then, it performs a content-centered style-modulated representation imputation mechanism to infer missing data via fully utilizing inter-view complementary and consistent information. Meanwhile, it devises a cross-view consensus structure mining strategy to capture the clustering structure with intra-cluster compactness and inter-cluster separability from attention-based fusion representations that adaptively aggregate content representations and style representations of each view. Finally, comprehensive experiments show that CSMVC outperforms state-of-the-art methods.
    Machine LearningClusteringMachine LearningDeep learning architecturesMachine LearningMulti-view learning
  271. #2893

    DiffVec: Diffusion Model for Trajectory Vector Recovery

    Jiaqi Duan, Shengwei Tian, Long Yu, Xiangfu Meng, Ya Zhang
    The increasing availability of trajectory data is often hampered by sparsity and noise. Existing trajectory recovery methods are further limited by either information loss from coordinate discretization into location IDs, or the inefficiency of conventional diffusion models that require a lengthy denoising process from pure noise. To address these challenges, we propose DiffVec, a novel and efficient diffusion framework for free-space trajectory recovery that operates directly on continuous coordinate data. The backbone of our framework is MVformer, a Transformer-based architecture that models motion vectors—the differences between consecutive coordinates—to better capture the local motion patterns of movement. This model is trained within our ResTraj diffusion paradigm, which commences the denoising process from a structured, interpolated prior to significantly reduce sampling steps. Extensive experiments on the Geolife and Porto datasets demonstrate that DiffVec significantly outperforms state-of-the-art baselines across MAE, MSE, and NDTW.
    Data MiningFrequent pattern miningData MiningKnowledge graphs and knowledge base completionData MiningMining spatial and/or temporal data
  272. #2899

    I-EDI: Robust Self-Evolution Agents via Verifiable Counterfactual Simulation

    Runze Fan, Yong Li
    Current self-evolution methods mostly chase recall, they learn from any trajectory that ends with the right answer, even when the path relies on a "lucky guess". Such reasoning appears valid in-distribution but often fails the moment the task shifts slightly. We argue that robust evolution implies Structural Invariance: a reasoning path is valid only if its core dependency graph remains isomorphic under counterfactual perturbations. I-EDI enforces this with Verifiable Counterfactual Simulation (VCS). Instead of treating data generation as mere "sample and filter," we treat it as an intervention. Unlike standard rejection sampling, VCS acts as a structural stress test: it accepts a reasoning trace only if it remains valid across generated counterfactuals (e.g., numerical perturbations, context shifts). While this strict filtering reduces the volume of training data (trading recall for precision), our experiments on MATH and GSM8K show it effectively prevents hallucination accumulation, achieving superior out-of-distribution generalization compared to baselines.
    Agent-based and Multi-agent SystemsAgent communicationAgent-based and Multi-agent SystemsAgent theories and modelsAgent-based and Multi-agent SystemsMulti-agent learningNatural Language ProcessingApplications
  273. #2902

    GenID: A Generalizable Physical-Layer Device Identification Method for Out-of-Distribution Environments

    Yawei Zhang, Lanting Fang, Di Yao, Kaiyu Feng, Shuliang Wang
    Physical-layer device identification serves as a critical security mechanism for mitigating cyberattacks and enhancing network resilience. However, its deployment in modern full-duplex Ethernet networks faces two major challenges: (1) the presence of Out-of-Distribution (OOD) data caused by temporal distribution shift and diverse local transmitters, and (2) the difficulty of environment-agnostic device identification, where the unique features of target devices are often obscured in mixed signals. To address these challenges, we propose GenID, which consists of two core components. First, GenID constructs prototypes to emulate the full communication environment through a two stage process: offline and online. Based on the prototype context, it then introduces an invariant feature extraction method to disentangle mixed signals and improve generalization under OOD conditions.Extensive experiments demonstrate that GenID significantly outperforms state-of-the-art baselines, achieving an increase of at least 7.94\% in Macro-F1 score under OOD environments. Our code is available on https://github.com/zyw2004/GenID_IJCAI2026.
    Computer VisionRepresentation learningMachine LearningOpen-World/Open-Set/OOD LearningData MiningAnomaly/outlier detectionMachine LearningFeature extraction, selection and dimensionality reductionMachine LearningTime series and data streams
  274. #2904

    Equilibrium Refinements Improve Subgame Solving in Imperfect-Information Games

    Ondřej Kubíček, Viliam Lisý, Tuomas Sandholm
    Subgame solving is a technique for scaling algorithms to large games by locally refining a precomputed blueprint strategy during gameplay. While straightforward in perfect-information games where search starts from the current state, subgame solving in imperfect-information games must account for hidden states and uncertainty about the opponent's past strategy. Gadget games were developed to ensure that the improved subgame strategy is robust against any possible opponent's strategy in a zero-sum game. Gadget games typically contain infinitely many Nash equilibria. We demonstrate that while these equilibria are equivalent in the gadget game, they yield vastly different performance in the full game, even when facing a rational opponent. We propose gadget game sequential equilibria as the preferred solution concept. We introduce modifications to the sequence-form linear program and counterfactual regret minimization that converge to these refined solutions with only mild additional computational cost. Additionally, we provide several new insights into the surprising superiority of the resolving gadget game over the max-margin gadget game. Our experiments compare different Nash equilibria of gadget games in several standard benchmark games, showing that our refined equilibria consistently outperform unrefined Nash equilibria, and can reduce the exploitability of the overall strategy by more than 50%.
    Game Theory and Economic ParadigmsNoncooperative gamesMultidisciplinary Topics and ApplicationsGame playingSearchGame playingSearchLocal searchSearchSearch and machine learning
  275. #2937

    Manifold-Aligned Rectification Flow for Denoised Social Recommendation

    Qing Meng, Zhufu Song, Huiyu Min, Qian Huang, Pengcheng Zhang
    Social recommendation leverages social relations to enhance user preference modeling. However, real-world social networks are often noisy and unreliable, where misleading social relationships introduce anisotropic perturbations into user representations. Existing denoising approaches, including heuristic filtering and generative reconstruction, struggle to produce social embeddings that are well aligned with user preferences, limiting their effectiveness in downstream recommendation tasks. To address this challenge, we propose Manifold-Aligned Rectification Flow (MARF), a flow matching based framework that explicitly rectifies noisy social representations toward preference-aligned embeddings. MARF jointly integrates social relations and user-item interactions to construct a preference-aware manifold as the target distribution, guiding the learning of a continuous vector field that captures preference-oriented transformations in the social domain. Through this learned transport process, MARF effectively bridges the gap between the social and preference domains, yielding robust and discriminative user representations. Extensive experiments demonstrate that our proposed model consistently outperforms state-of-the-art social recommendation methods, particularly under sparse and noisy conditions, validating its effectiveness and robustness.
    Data MiningCollaborative filteringData MiningRecommender systems
  276. #2962

    AnoMamba: Aligning Reconstruction with Time Series Anomaly Detection via Selective Global Dependency Modeling

    Junqi Chen, Xu Tan, Jie Chen, Susanto Rahardja
    Reconstruction-based frameworks are widely adopted in Time Series Anomaly Detection (TSAD), assuming that models reconstruct normal behavior well but yield larger errors on anomalies. However, in unsupervised TSAD, minimizing reconstruction loss alone often breaks this assumption. Models tend to overfit local patterns for trivial reconstruction and fail to capture the global dependencies that characterize normal behavior. Consequently, anomalies that violate global dependencies can also be reconstructed well, leading to a misalignment between reconstruction and detection. To address this challenge, we propose AnoMamba, a novel TSAD framework that aligns reconstruction with anomaly detection by enhancing global dependency modeling. Specifically, AnoMamba employs patch embedding to reduce local redundancy and introduces a Mamba variant with global step-size reweighting to select meaningful global dependencies. The reweighting process is guided by multi-scale long-tail priors, which adaptively balance global and local dependencies to mitigate overfitting in the unsupervised setting. Extensive experiments on both univariate and multivariate benchmarks demonstrate that AnoMamba consistently outperforms state-of-the-art methods in both accuracy and efficiency, while offering interpretability through its hidden attention map.
    Data MiningAnomaly/outlier detectionMachine LearningDeep learning architecturesMachine LearningTime series and data streams
  277. #2974

    Online Contract Design

    Elad Lavi, Hadas Shachnai, Inbal Talgam-Cohen
    We initiate the study of online contracts, which integrate the game-theoretic considerations of economic contract theory, with the algorithmic and informational challenges of online algorithm design. Our starting point is the classic online setting with preemption, in which a hiring principal faces a sequence of adversarial agent arrivals. Upon arrival, the principal must decide whether to tentatively accept the agent to their team, and whether to dismiss previous tentative choices. Dismissal is irrevocable, giving the setting its online decision-making flavor. In our setting, the agents are rational players: once the team is finalized, a game is played where the principal offers contracts (performance-based payment schemes), and each agent decides whether or not to work. Working agents reward the principal, and the goal is to choose a team that maximizes the principal's utility. Our main positive result is a 1/2-competitive algorithm when agent rewards are additive, which matches the best-possible competitive ratio. Our algorithm is randomized and this is necessary, as we show that no deterministic algorithm can attain a bounded competitive ratio. Moreover, if agent rewards are allowed to exhibit combinatorial structure known as XOS, even randomized algorithms might fail. En route to our competitive algorithm, we develop the technique of balance points, which can be useful for further exploration of online contracts in the adversarial model.
    Game Theory and Economic ParadigmsMechanism design
  278. #2985

    BEACON: Budget-Efficient Discovery of Policy Violations in Large Language Models via Cognitive-Guided Monte Carlo Tree Search

    Xinyi Huang, Jie Wang, Pengrui Xiang, Yifan Wang, Yu Fu, Jinduo Liu
    Systematic safety evaluation of large language models must uncover diverse policy violations under tight query budgets. However, most red-teaming methods optimize attack success rate and repeatedly probe a narrow set of vulnerabilities, yielding redundant failures and leaving rarer yet critical violation categories unexplored. Under fixed budgets, such inefficient exploration delays the first discovery and limits category coverage. To address these limitations, we propose the Budget-Efficient Adaptive Cognitive Offense Navigator (BEACON), a budget-aware safety testing framework that uses Cognitive-Guided Monte Carlo Tree Search to navigate the violation search space under fixed budgets. BEACON innovatively approaches safety testing as a budget-constrained failure discovery process, aiming to identify diverse safety violations as early as possible within a fixed query budget. It also provides an efficiency-oriented evaluation perspective that measures early discovery and harm category coverage under budget constraints. Experiments on standard benchmarks and frontier LLMs show that BEACON discovers failures earlier and achieves higher coverage across policy violation categories. These results underscore the value of evaluating safety testing through discovery efficiency rather than attack success rate alone. Warning: This paper contains examples of harmful language and images, and reader discretion is recommended.
    Multidisciplinary Topics and ApplicationsSecurity and privacyNatural Language ProcessingLanguage modelsNatural Language ProcessingResources and evaluationSearchApplications
  279. #2991

    Mitigating Tool Overuse for LLMs via Active Knowledge Boundary Probing

    Zhaoyu Yang, Wenjun Ke, Yuanyao Li, Yuchen Liu, Junqi Xu, Peng Wang, Hengyuan Xu
    Tool-augmented methods aim to enhance the reasoning capabilities of large language models (LLMs) by invoking external tools, which can be broadly categorized into training-free and training-based methods. Training-free methods can directly instruct LLMs to invoke external tools, but they exhibit limited generalization in dynamic environments. In contrast, training-based methods use a teacher model to generate tool-use trajectories and fine-tune a student model to distill the trajectories that can generalize across tasks. However, they still face two major challenges: (1) Knowledge boundary mismatch: teacher trajectories may contain steps that the student cannot execute without tools due to their mismatch of knowledge boundaries. (2) Tool overuse propagation: the student inherits the misaligned tool-use strategy from teacher model. To address these challenges, we propose RADAR, an automated framework for knowledge boundary discovery and tool overuse mitigation. First, RADAR performs semantic anchoring to select representative seeds. Second, simulated annealing probing is employed to actively explore the student model’s tool-use boundary and acquire sparse, verified student-aware labels. Third, it globally propagates these labels and rewrites conflicts, yielding deployment-aligned decisions with fewer unnecessary tool calls. We conduct experiments on multiple benchmarks, compared to SOTA method, RADAR reduces average tool-use by 52.7% while improving accuracy by 3.2% across three models.
    Natural Language ProcessingInterpretability and analysis of models for NLPNatural Language ProcessingResources and evaluationNatural Language ProcessingTools
  280. #2992

    Joint Neural Architecture Search and Token Pruning for Efficient Visual Tracking

    Yihong Chen, Shuo Wang, Jiayao Zheng, Yongqiang Bai
    Recently, transformer-based trackers have become the leading approach, surpassing traditional CNN-based trackers in accuracy. However, their high computational demands hinder their deployment on edge platforms, necessitating efficient solutions. To address this, we propose a novel Neural Architecture Search (NAS) framework designed for transformer-based trackers, named NASTrack. This framework incorporates token pruning to optimize both transformer block structures and the layer-wise token keeping ratio, striking a balance between performance and efficiency. To handle the larger search space introduced by the keeping ratio, we propose a blacklist strategy and a matching-based distillation driven by the Token Overlap Ratio (TOR). Our method discovers hundreds of high-performing trackers, with FLOPs ranging from 1G to 18G. The searched trackers consistently outperform existing efficient state-of-the-art trackers such as CompressTracker and LiteTrack under comparable computational budgets.The code and models are available at https://github.com/Cyhoon84/NASTrack.git.
    Computer VisionEfficiency and OptimizationComputer VisionMotion and tracking
  281. #3007

    Taming Treewidth DP with Modulators: A General Booster for Graph Heuristics

    Jialiang Li, Aneta Neumann, Frank Neumann, Hung Nguyen, Mingyu Guo
    Treewidth is a fundamental graph invariant that quantifies how tree-like
    a given graph is. It is extensively used with dynamic programming to
    design fixed-parameter tractable algorithms for many NP-hard graph
    combinatorial optimization problems. However, despite broad theoretical
    applicability, treewidth dynamic programming (TDP) does not scale in
    practice beyond graphs with very small treewidth. Rather than applying
    TDP as a standalone technique, we demonstrate that TDP can serve as a
    broadly applicable enhancer for a wide range of graph combinatorial
    optimization algorithms. Our framework leverages the concept of
    treewidth modulators, which refer to vertex sets whose removal
    significantly reduces the treewidth. We further propose an empirically
    efficient procedure for generating such treewidth modulators. To enhance
    an algorithm 𝒜, we use 𝒜 to heuristically make decisions on the
    modulators vertices, after which the remaining decisions outside the
    treewidth modulators become scalable for TDP.

    To demonstrate the general applicability of our proposed framework. We
    experimented with three classic graph combinatorial optimization models:
    Maximum Independent Set, Minimum Vertex Cover, and Max Cut. We apply TDP
    to enhance algorithms across diverse paradigms, including evolutionary
    search, greedy heuristics, and graph-neural-network-based heuristics.
    For all combinations of optimization models and base algorithms, TDP
    significantly improves performance over the original methods. In many
    settings, TDP-enhanced greedy heuristics are competitive with, and
    sometimes clearly outperform, state-of-the-art commercial solvers.
    SearchCombinatorial search and optimisationSearchHeuristic searchSearchSearch and machine learning
  282. #3015

    Bridge: A Cross-Modal Learning Framework for Unified Semantic Representation in Noisy Communication

    Liang Chen, Yanze Huang, Limei Lin, Xiaoding Wang, Wei Lou, Jie Wu, Sun-Yuan Hsieh
    Multimodal semantic communication systems face a critical challenge in extracting and aligning semantic features across heterogeneous modalities within a unified representation space, particularly under noisy transmission conditions. To address this, we propose Bridge, a cross-modal learning framework that integrates video, audio, and text into a unified semantic space through feature disentanglement and contrastive alignment. Bridge separates modality-specific and modality-invariant representations, enhancing both intra-modal precision and inter-modal generalization. During decoding, it leverages heterogeneous foundation models to preserve semantic fidelity under channel noise. Extensive experiments on multiple multimodal benchmarks under varying SNR levels demonstrate that Bridge maintains high semantic consistency and reconstruction quality, improving image reconstruction by approximately 16.3% in low-SNR regimes. Our work provides a practical and theoretically grounded framework for next-generation multimodal semantic communication systems.
    Machine LearningRepresentation learning
  283. #3018

    BotVA: Combating Social Bots via Variational Feature Augmentation and Adversarial Graph Learning

    Longlong Zhang, Xi Wang, Hongyi Nie, Zeqing Zhang, Huixiang Zhang, Hongping Wang, Yang Liu
    Social bots threaten online platforms by spreading disinformation and manipulating public discourse. Graph neural networks have emerged as effective tools for bot detection by modeling user interactions, yet two fundamental challenges limit their practical deployment: severe class imbalance where bots constitute a small minority of users, and camouflaged edges where bots forge deceptive connections to humans to evade detection. Class imbalance causes decision boundaries to shift toward the majority class, while camouflaged edges corrupt neighborhood aggregation through spurious message passing. We present BotVA, a unified framework addressing both challenges through variational feature augmentation and adversarial graph learning. Our approach makes three key contributions: (i) a conditional variational autoencoder that models minority class distributions and synthesizes semantically coherent features, effectively expanding minority support in representation space; (ii) an adversarial training paradigm where a generator simulates camouflage by injecting deceptive edges while a graph transformer discriminator with semantic attention learns to identify and downweight such perturbations; and (iii) a two-stage training strategy that pretrains the variational module before alternating generator-discriminator optimization to ensure stable convergence. Experiments on three benchmarks demonstrate that BotVA achieves state-of-the-art accuracy and F1-score, exhibits strong robustness under camouflage perturbations, and maintains competitive performance with limited supervision.
    Data MiningApplicationsData MiningMining text, web, social mediaMachine LearningAdversarial machine learning
  284. #3047

    Context-Aware Multi-Agent Coordination: Learning Correlated Equilibria Under Situational Constraints

    Libo Zhang, Zhirui Zeng, Yang Chen, Jiamou Liu
    Effective multi-agent coordination requires aligning incentives while adhering to complex requirements. However, real-world systems often impose situational constraints, context-dependent requirements triggered only under specific conditions, which challenge standard Correlated Equilibria (CE) solutions. We propose Situational-Constrained Density-Based Correlated Equilibria (SC-DBCE), a novel concept in Markov Games that formalizes situational constraints as logic implications. To solve this, we introduce Situational-Constrained Correlated Policy Iteration (SC-CPI), a reinforcement learning algorithm employing a smooth Log-Sum-Exp mechanism for constraint optimization. Evaluations on multi-agent games, smart grids, and warehouse robotics demonstrate that SC-CPI consistently outperforms baselines in both equilibrium quality and constraint adherence. To our knowledge, this is the first method learning CE under situational constraints.
    Agent-based and Multi-agent SystemsCoordination and cooperationAgent-based and Multi-agent SystemsMulti-agent learning
  285. #3054

    VaryBalance: Detecting LLM-Generated Text Through Variation

    Xuecong Li, Xiaohong Li, Qiang Hu, Yao Zhang, Junjie Wang
    Detecting text generated by large language models~(LLMs) is crucial but challenging. Existing detectors depend on impractical assumptions, such as white-box settings, or solely rely on text-level features, leading to imprecise detection ability. In this paper, we propose a simple but effective and practical LLM-generated text detection method, VaryBalance. The core of VaryBalance is that, compared to LLM-generated texts, there is a greater difference between human texts and their rewritten version via LLMs. Leveraging this observation, VaryBalance quantifies this through mean standard deviation and distinguishes human texts and LLM-generated texts. Comprehensive experiments demonstrated that VaryBalance outperforms the state-of-the-art detectors, i.e., Binoculars, by up to 34.3% in terms of AUROC, and maintains robustness against multiple generating models and languages.
    Natural Language ProcessingInterpretability and analysis of models for NLPNatural Language ProcessingLanguage modelsNatural Language ProcessingText classification
  286. #3056

    MOBO: A Merging-Oriented Bi-Level Optimization Framework for Class Incremental Learning

    Siyu Zhang, Wen Wang, Wenju Sun, Qingyong Li, Yangliao Geng
    Class-Incremental Learning (CIL) aims to enable models to sequentially learn new tasks while retaining knowledge from previous ones. Recently, merging-based pre-trained CIL methods have gained significant attention due to their competitive performance and high inference efficiency. However, most existing approaches decouple training from merging, neglecting the compatibility among task-specific adapters. This incompatibility introduces severe conflicts during integration, resulting in catastrophic forgetting and notable performance degradation. To address this limitation, we propose MOBO, a Merging-Oriented Bi-Level Optimization framework that synergizes the optimization of the current task model with the performance of the final merged model. At the upper level, we introduce a global loss that anticipates the merged model's behavior, guiding the current task parameters toward a solution space that facilitates effective merging. At the lower level, we employ a dynamic weighted merging strategy to optimize merging coefficients and update the merged model. By alternately optimizing the task-specific and merged models, MOBO effectively mitigates the performance loss caused by incompatible adapter integration. Comprehensive experiments on CIFAR-100, CUB-200, ImageNet-R, ImageNet-A, and VTAB demonstrate the superiority of our approach, highlighting its robustness and scalability, particularly on long task sequences.
    Machine LearningClassificationMachine LearningIncremental learning
  287. #3060

    ULE-MWC: An UnLocking-Enhanced Lookahead Framework for Exact Maximum Weight Clique Search

    Mingming Jin, Chu-Min Li, Kun He
    The Maximum Weight Clique Problem (MWCP) is NP-hard and is typically solved exactly within a Branch-and-Bound (BnB) framework, where the quality of upper bounds critically determines the pruning efficiency. Recent solvers enhance independent-set-based bounds using MaxSAT reasoning. However, their effectiveness is limited by two inherent limitations: (i) independent sets in conflicts are discarded after a single use, preventing their reuse in subsequent reasoning; and (ii) conflicts are detected in a fixed order regardless of their potential contribution to bound tightening. To address these issues, we propose two complementary techniques. The UnLocking Mechanism (ULM) preserves conflict cores and enables the reuse of independent sets in subsequent reasoning, while the Benefit-Enhanced Strategy (BES) steers the MaxSAT reasoning process toward conflicts and independent sets with higher pruning potential. By integrating ULM and BES, we develop a new solver, ULE-MWC (UnLocking Enhanced MWC). Experimental results on standard benchmarks show that ULE-MWC consistently yields tighter upper bounds and outperforms state-of-the-art exact solvers in both runtime and search efficiency.
    Constraint Satisfaction and OptimizationConstraint optimization problemsConstraint Satisfaction and OptimizationSolvers and tools
  288. #3061

    Raise One and Infer Three: Toward Reasoning- and Memory-Augmented Diffusion Policy Generalization

    Yihang Zhu, Yuxuan Wang, Tong Li, Jiexi Yan, Xu Yang, Cheng Deng
    Diffusion policy has shown impressive performance in robotic manipulation tasks while struggling with out-of-distribution shifts and limited demonstrations. Recent advances primarily focus on improving geometric or perceptual representations for diffusion policy. However, these approaches rely heavily on instantaneous observations, making them vulnerable when visual inputs deviate from the training distribution. Our key insight is that robust generalization in diffusion policy requires both structured reasoning over skill compositions and an explicit memory mechanism for accumulating and reusing learned policy knowledge. To this end, we propose "raise one and infer three" diffusion policy (ROITDP), a novel approach that introduces two complementary mechanisms. First, we propose a reasoning mechanism built upon the Chain-of-Skill Noise Watermark, which encodes skill-level reasoning graph representations into the initial noise space, thereby enabling temporally coherent multi-step reasoning throughout the diffusion process under distribution shifts. We further introduce a memory mechanism in the form of a Self-evolving Policy Knowledge Bank that discretizes spatio-temporally refined trajectories into reusable skill primitives via a VQ-VAE architecture, providing adaptive memory to guide and refine action generation. Extensive experimental results on both simulated and real-world environments demonstrate the superiority and robustness of our method.
    Computer VisionMultimodal learningRoboticsManipulation
  289. #3070

    Bridging Feature-structural Homophily and Long-range Heterogeneity for Self-supervised Heterogeneous Graph Learning

    Minda Chen, Yujie Mo, Junkai Huang, Guoqiu Wen, Xiaofeng Zhu
    Self-supervised heterogeneous graph learning has achieved promising results in diverse applications but still faces two issues: (i) existing methods focus on either feature similarity or meta-path to capture homophily, neglecting their inherent complementarity; (ii) existing methods rely on meta-paths to capture interactions among same-type nodes, which may introduce noise and inherently exclude long-range cross-type interactions. To address these issues, we first propose a self-expressive solver that captures the complementary homophily between meta-paths and node features to obtain homophilous representations. Meanwhile, we design separate path encoders to model diverse interactions, thus explicitly including cross-type interactions while mitigating noise via adaptive fusion. Theoretical analysis verifies that homophilous representations exhibit a high-order grouping effect to capture complementary homophily, while path encoders possess adaptive smoothness capabilities to filter noise. Extensive experiments on diverse datasets, including a large-scale dataset, demonstrate the superiority of the proposed method.
    Data MiningMining graphsData MiningMining heterogenous dataMachine LearningSelf-supervised LearningMachine LearningRepresentation learning
  290. #3100

    From Language to Segmentation: Collaborative Category-Guided Unsupervised Camouflaged Object Detection with SAM3

    Huafeng Chen, Yueming Lyu, Caifeng Shan
    Camouflaged Object Detection (COD) aims to segment objects that are hidden within complex backgrounds. Due to the low visual contrast of camouflaged objects, annotations are costly, motivating unsupervised COD (UCOD) to eliminate labeling expenses. Most UCOD methods follow the “MLLMs + other foundation models + SAM” paradigm, which relies on spatial interactions that are unreliable in camouflaged scenarios, leading to fundamental performance bottlenecks. In this paper, we propose a novel UCOD framework that leverages SAM3 through category-level interaction with MLLMs, bypassing unreliable spatial prompts. To address SAM3’s sensitivity to category granularity, we introduce Fine-grained Category Query, guiding MLLMs to generate full-granularity category chains for robust category prompting. To mitigate suboptimal segmentation caused by high camouflage and background confusion, we propose Semantic–Geometric Dual Confirmation, which jointly validates segmentation masks from semantic and spatial perspectives. Furthermore, we introduce Semantic–Geometric Reasoning Injection, which injects critical semantic and geometric cues into MLLMs to refine category reasoning and progressively correct segmentation errors under extreme camouflage or MLLM hallucinations. Extensive experiments show that our method significantly outperforms existing UCOD approaches and achieves performance comparable to weakly supervised COD.
    Computer VisionScene analysis and understandingComputer VisionTransfer, low-shot, semi- and un- supervised learning
  291. #3118

    STCBN-EC: A Spatio-Temporal Constrained Bayesian Causal Network for Multimodal Brain Effective Connectivity Learning

    Zhihao Su, Junzhong Ji, Minqi Yu, Jinduo Liu
    Brain effective connectivity (EC) characterizes directional causal interactions among brain regions. However, learning stable and directionally explicit EC networks from multimodal data remains challenging. In practice, functional magnetic resonance imaging (fMRI) and electroencephalography (EEG) differ substantially in spatio-temporal resolution and noise characteristics. Existing methods often rely on manual spatio-temporal alignment and tend to recover only partial causal structures under high noise and Bayesian equivalence classes. To address these challenges, we propose a spatio-temporal constrained Bayesian causal network for multimodal brain effective connectivity learning (STCBN-EC). First, STCBN-EC constructs an anatomically guided EEG–fMRI spatial mapping and derives a unified spatio-temporal representation through slice-level alignment and adaptive modality fusion. Then, a Bayesian causal network is employed to model nonlinear inter-regional dependencies, where uncertainty-driven surrogate scoring is used to evaluate candidate structures. Finally, multimodal representation learning and EC structure estimation are jointly optimized via a gradient-free global optimization strategy. Experiments on simulated and real EEG–fMRI datasets demonstrate that STCBN-EC outperforms state-of-the-art methods and effectively captures state-dependent directional interactions among brain regions.
    Data MiningApplicationsKnowledge Representation and ReasoningApplicationsKnowledge Representation and ReasoningCausalityMachine LearningApplications
  292. #3123

    Unsupervised Anomaly Detection in Dynamic Graphs via Compatibility Modeling and Boundary Learning

    Jiachi Luo, Shameng Wen, Ziyang Qiu, Chen Jiang, Qi Huang, Yuxing Tian, Aiwen Jiang
    Anomaly detection in dynamic graphs is essential for monitoring evolving systems such as transaction networks and online platforms. Yet existing methods remain limited in realistic edge-stream settings: snapshot-based approaches discretize continuous interactions and miss fine-grained temporal signals, while many continuous-time models rely on scarce node/edge attributes and often fail to explicitly assess whether a destination is compatible with a source’s recent context. Moreover, under extreme class imbalance and the lack of anomaly labels, unsupervised detectors frequently yield ill-defined normality criteria and ambiguous decision boundaries. We propose \textbf{BAD}, an unsupervised framework for anomaly detection in continuous-time dynamic graphs. BAD adopts a minimalistic design that represents nodes with learnable identity embeddings and performs pairwise compatibility modeling via cross-attention between each destination node and the source’s recent neighbors, enabling direct characterization of context-dependent deviations without requiring attributes. To obtain a principled separating boundary, BAD further integrates normalizing flows to model the distribution of normal interactions and derive likelihood-based anomaly scores. Extensive experiments on four real-world datasets demonstrate that BAD consistently performs well, highlighting its robustness under feature-scarce and label-scarce conditions.
    Data MiningAnomaly/outlier detectionData MiningMining graphs
  293. #3133

    A Sampling-Based Relaxation Approach to Contextual Inverse Optimization

    Yasunari Hikima, Naoyuki Kamiyama, Shinsaku Sakaue, Taira Tsuchiya
    Decision-making pipelines increasingly rely on prediction models whose outputs serve as inputs to downstream optimization problems. Decision-Focused Learning (DFL) has emerged as a promising approach to training such models by directly optimizing decision quality rather than predictive accuracy alone. While most existing DFL methods assume a complete-information setting in which the ground-truth optimization parameters are observed, this paper studies Contextual Inverse Optimization (CIO), an incomplete-information setting in which only the resulting solutions are observed.
    Prior work on CIO has proposed learning algorithms based on optimality conditions for linear programs, as well as methods that repeatedly solve inverse optimization problems to handle integer programs, often incurring a substantial computational burden. In this paper, we propose a learning algorithm for general optimization problems with linear objective functions that eliminates the need to solve inverse optimization problems. The proposed method learns prediction models by solving a Relaxed Inverse Optimization Problem (RIOP), constructed based on feasible solutions randomly sampled from the feasible region, thereby reducing the computational overhead associated with existing CIO methods. Numerical experiments demonstrate that our method achieves competitive performance in terms of regret compared with existing methods, while offering improved computational efficiency for certain classes of downstream optimization problems.
    Machine LearningOptimization
  294. #3147

    Learning Local Feature Masks with Variational Information Bottleneck

    Lu Sun, Jun Sakuma
    Instance-wise feature selection (IWFS) identifies informative features for each instance, improving generalization by discarding irrelevant information and enhancing interpretability through personalized explanations. Most IWFS methods adopt a selector--predictor architecture, where a selector generates instance-specific masks to guide prediction. This often leads to co-adaptation, in which the selector encodes label information into the mask, resulting in spurious correlations and unfaithful explanations. Existing methods also struggle to capture diverse local patterns, which is critical for IWFS under heterogeneous sparsity. We propose VIBMask, a unified IWFS framework by the variational information bottleneck. VIBMask mitigates co-adaptation by penalizing mutual information between unselected features and the label, and improves expressivity via an ensemble of diverse selectors that capture heterogeneous sparse patterns. We further derive a novel variational lower bound for discrete masks, enabling efficient end-to-end training through reparameterization. Experiments on synthetic and real datasets show that VIBMask consistently outperforms state-of-the-art IWFS methods in both predictive accuracy and informative feature discovery.
    Machine LearningExplainable/Interpretable machine learningMachine LearningFeature extraction, selection and dimensionality reductionMachine LearningLearning sparse modelsMachine LearningVariational Inference
  295. #3157

    Multi-Agent Non-Discriminatory Contracts

    Ke Ding, Bo Li, Ankang Sun
    We study multi-agent contracts, in which a principal delegates a task to multiple agents and incentivizes them to exert effort. Prior research has mostly focused on maximizing the principal’s utility, often resulting in highly disparate payments among agents. Such disparities among agents may be undesirable in practice, for example, in standardized public contracting or worker cooperatives where fairness concerns are essential. Motivated by these considerations, our objective is to quantify the tradeoff between maximizing the principal's utility and equalizing payments among agents, which we call the price of non-discrimination. Our first result is an almost tight bound on the price of non-discrimination, which scales logarithmically with the number of agents. This bound can be improved to a constant by allowing some relaxation of the non-discrimination requirement. We then provide a comprehensive characterization of the tradeoff between the level of non-discrimination and the loss in the optimal utility.
    Game Theory and Economic ParadigmsAuctions and market-based systemsGame Theory and Economic ParadigmsComputational social choice
  296. #3160

    G-SalAlignMamba: Geometry-Aware Vision Mamba for Dual-Modal Salient Object Detection

    Haixiao Gao, Yimin Zheng, Mengke Song, Linyou Xiao, Tian-Tian Zhang, Zhi-Ri Tang
    Recently, Visual State Space Models offer powerful global modeling for Dual-modal Salient Object Detection (SOD). However, they are still constrained by three inherent limitations: first, Mamba's strict reliance on sequential ordering makes it sensitive to cross-modal geometric misalignment, where spatial shifts disrupt token correspondence; second, general indiscriminate scanning treating all tokens equally may lead to signal dilution, where sparse foreground features are overwhelmed by background noise; third, conventional decoders rely on implicit upsampling, causing boundary degradation during resolution recovery. To address these challenges, we propose G-SalAlignMamba, a geometry-aware framework tailored for dual-modal SOD. We introduce Geometry-Aware Encoding with explicit alignment to correct spatial shifts, Semantics-Informed Refinement to prevent signal dilution by prioritizing foregrounds, and Structure-Preserving Decoding that integrates explicit alignment with unsupervised boundary refinement. Extensive experiments show that G-SalAlignMamba achieves state-of-the-art performance on RGB-D and RGB-T benchmarks with favorable efficiency (30.41 FPS, 83.80M parameters). The code is available at https://github.com/PC1-99/G-SalAlignMamb.git.
    Computer VisionMultimodal learningComputer VisionScene analysis and understanding
  297. #3162

    Fairness and Stability in Allocation-Induced Hedonic Games

    Ayşe Mutlu Derya
    Understanding when fair allocation mechanisms lead to stable coalition structures is fundamental for designing robust multi-agent systems. We investigate the compatibility between stable allocations in transferable utility (TU) games and the stability of coalition structures in induced hedonic games, where agents' preferences over coalitions are derived from payoffs assigned by a fixed allocation rule applied to all subgames of the TU game. We analyze FX-FE strong Nash stability (SNS) in induced hedonic games, which captures a strong form of robustness under free-exit and free-entry conditions. We show that any efficient allocation rule ensuring core membership for the grand coalition induces a hedonic game that guarantees the existence of an FX-FE strong Nash stable partition. Examining the Shapley value, we further show that when it lies outside the core, the resulting hedonic game may or may not possess FX-FE strongly Nash stable partitions, highlighting a sensitive interaction between Shapley-based fairness and coalition-level stability. Our framework bridges SHAP methodology from explainable AI with hedonic coalition theory, providing theoretical foundations for understanding when fair allocation mechanisms shape coalition-level incentives in a way that ensures strategic stability in team formation and multi-agent systems.
    AI Ethics, Trust, FairnesExplainability and interpretabilityAI Ethics, Trust, FairnesSafety and robustnessAI Ethics, Trust, FairnesValuesGame Theory and Economic ParadigmsCooperative gamesGame Theory and Economic ParadigmsMechanism design
  298. #3193

    ClinAlign: Clinical Workflow Aligned Memory Retrieval for Radiology Report Generation

    Lihong Qiao, Shiyi Gao, Yucheng Shu, Bin Xiao, Weisheng Li
    Automated radiology report generation aims to create clear and clinically correct diagnostic reports from medical images. Existing retrieval enhancement methods primarily focus on reusing textual knowledge, neglecting the crucial role of local visual pattern memory in clinical diagnosis. Furthermore, cross-modal retrieval lacking explicit clinical semantic constraints can easily introduce irrelevant pathological information, thereby reducing the clinical effectiveness of the generated reports. To address these challenges, we propose ClinAlign—a memory-based retrieval framework aligned with clinical workflow, drawing inspiration from clinical diagnostic workflows. Visually, we construct a disease-aware visual memory bank and enhance local patch representations through proposed Memory‑based Patch Pattern Augmentation (MPPA), thereby improving the perception and discrimination of pathological regions. On the textual side, we construct a disease-aware textual memory bank and introduce Classification-Guided Prompt Augmentation (CGPA), where disease state predictions are converted into structured diagnostic prompts to provide explicit semantic guidance for textual memory retrieval.Extensive experiments on two medical report generation benchmarks, MIMIC-CXR and IU X-Ray, demonstrate the effectiveness and practical value of our proposed method.
    Computer VisionBiomedical image analysisComputer VisionMultimodal learning
  299. #3200

    Resisting Label Drift: Real-Time Multi-View Clustering with Semantic Consistency

    Qi Liu, Suyuan Liu, Hao Tan, Yangfan Du, Bowen Zhang, Wenpeng Lu, Xinwang Liu
    Real-time clustering of dynamic multi-view data streams is a critical yet challenging task in open-world applications. While several methods have been proposed to address this task, most of them extract features incrementally but fail to output instant clustering results for the current batch. In addition, they neglect semantic consistency, causing the cluster labels of identical concepts to drift unpredictably due to independent processing. To address these limitations, we propose a real-time semantic consistent incremental multi-view clustering framework. Specifically, we constructs a compact historical knowledge base via an adaptive diversity-aware selection mechanism, which guides the clustering of incoming data, enabling immediate inference without accessing the full history. Furthermore, we introduce a semantic alignment strategy based on consensus centers to ensure robust label consistency over time. Extensive experiments demonstrate the effectiveness and efficiency of our proposed method.
    Data MiningMining data streamsMachine LearningClusteringMachine LearningMulti-view learningMachine LearningOnline learning
  300. #3201

    Bounding Acceptability Degrees and Eliciting Initial Weights in Gradual Argumentation

    Nir Oren, Bruno Yun
    Many semantics for abstract weighted argumentation assume that each argument is associated with a numerical initial weight. Eliciting these initial weights poses several challenges: (1) accurately providing a specific numerical value is often difficult, and (2) individuals frequently confuse initial weights with acceptability degrees in the presence of other arguments. We therefore propose an elicitation pipeline that allows a user to specify their believed final acceptability degree intervals for each argument. We can determine which portion (if any) of these intervals are rational, refining the intervals, or restoring rationality when the intervals are irrational. This allows us to ultimately identify possible initial weights for each argument.
    Knowledge Representation and ReasoningArgumentation
  301. #3204

    Adaptive GoGI-Skip: Coupling Goal-Gradient Importance with Dynamic Uncertainty for Efficient Reasoning

    Ren Zhuang
    Chain-of-Thought (CoT) prompting trades inference speed for reasoning accuracy. Existing compressors force a compromise as static gradient techniques treat tokens independently, severing sequential logic, while uncertainty-based pruning ignores the final answer. We introduce Adaptive GoGI-Skip, a framework that resolves this tension by non-linearly coupling Goal-Gradient Importance (GoGI) with Adaptive Dynamic Skipping (ADS). GoGI quantifies each token's functional contribution to answer correctness via gradient sensitivity. ADS leverages runtime entropy to dynamically modulate the GoGI threshold, preserving low-gradient tokens essential for structural coherence at high-uncertainty junctions. Trained on 7,472 MATH traces, our policy transfers zero-shot to AIME, GPQA, and GSM8K, reducing token volume by >45% and accelerating inference up to 2.0x without accuracy loss. These results suggest that thinking-optimal compression demands synergy between teleological goals and epistemic uncertainty.
    Knowledge Representation and ReasoningLearning and reasoningMachine LearningOptimizationNatural Language ProcessingLanguage modelsNatural Language ProcessingLanguage generation
  302. #3219

    Bounded Fitting for Expressive Description Logics

    Maurice Funk, Jean Christoph Jung, Tom Voellmer
    Bounded fitting is an attractive paradigm for learning logical formulas from labeled data examples that offers PAC-style generalization guarantees and can often be implemented leveraging SAT solvers. It has been successfully applied to learning concepts of the description logic ALC. We study bounded fitting for learning concepts in expressive description logics that extend ALC with inverse roles, qualified number restrictions, and feature comparisons. We investigate under which conditions bounded fitting keeps its favorable theoretical properties in this setting, and implement is using a SAT solver. We compare our implementation against state-of-the-art concept learners with encouraging results, demonstrating that it is a practical approach to expressive concept learning.
    Knowledge Representation and ReasoningDescription logics and ontologiesKnowledge Representation and ReasoningLearning and reasoningMachine LearningLearning theoryMachine LearningSupervised Learning
  303. #3222

    Splitting Meanings: A Unified View on Paraconsistency and Inconsistency Measurement

    Yakoub Salhi
    We propose a modular logical framework for both reasoning with and measuring inconsistency in propositional knowledge bases. The framework extends propositional logic with markers attached to occurrences of atoms and interpreted as pointers to worlds. This allows different occurrences of the same atom to be evaluated in different contexts unless they share a marker. We define paraconsistent entailment relations via abnormality functions that associate each model with a set of deviations from a preferred behavior. Each entailment relation is then defined with respect to models that are minimal under set inclusion. We show that suitable markings and abnormality functions capture several existing forms of inconsistency-tolerant reasoning, including entailment based on maximal satisfiable subsets and the minimally inconsistent Logic of Paradox. We also show how a range of inconsistency measures can be expressed in the same setting. Thus our framework provides a uniform basis for diverse approaches to inconsistency handling and measurement.
    Knowledge Representation and ReasoningNon-monotonic reasoningKnowledge Representation and ReasoningReasoning about knowledge and beliefKnowledge Representation and ReasoningOther
  304. #3271

    Discrete Cosine Transform Based Decorrelated Attention for Vision Transformers

    Hongyi Pan, Emadeldeen Hamdan, Xin Zhu, Ahmet Enis Cetin, Ulas Bagci
    Self-attention is central to the success of Transformer architectures; however, learning the query, key, and value projections from random initialization remains challenging and computationally expensive. In this paper, we propose two complementary methods that leverage the Discrete Cosine Transform (DCT) to enhance the efficiency and performance of Vision Transformers. First, we address the initialization problem by introducing a simple yet effective DCT-based initialization strategy for self-attention, where projection weights are initialized using DCT coefficients. This structure-preserving approach consistently improves classification accuracy on the CIFAR-10 and ImageNet-1K benchmarks. Second, we propose a DCT-based attention compression technique that exploits the decorrelation properties of the frequency domain. By observing that high-frequency DCT coefficients typically correspond to noise, we truncate high-frequency components of the input patches, thereby reducing the dimensionality of the query, key, and value projections without sacrificing accuracy. Experiments on Swin Transformer models demonstrate that the proposed compression method achieves a substantial reduction in computational overhead while maintaining comparable performance. Code: \url{https://github.com/NUBagciLab/DCT-Transformer}.
    Machine LearningClassificationMachine LearningDeep learning architectures
  305. #3280

    Are White Knights Worth the Trouble? Reconciling POCL and Partial-Order Plans for Plan Optimization

    Harrison Oates, Pascal Bercher
    Partial order causal link (POCL) planning offers rich structural representations for plan optimization. However, under the classical threat definition, POCL plans form a strict subset of valid partial-order (PO) plans. This gap limits existing POCL optimization techniques to a restricted subspace of PO solutions. We revisit 'white knight' threat semantics, which permit a causal link to persist across a threat provided the deleted condition is re-established by an intermediate producer. We prove that under this definition, the POCL and PO solution spaces become equivalent. By extending a MaxSAT-based optimization framework, we demonstrate that white knights are theoretically necessary for completeness and practically advantageous: they yield plans with significantly fewer ordering constraints in domains with complex causal interference while maintaining computational tractability on standard benchmarks.
    Planning and SchedulingTheoretical foundations of planning
  306. #3282

    From Gridworlds to Warehouses: Adapting Lightweight One-shot Multi-Agent Pathfinding for AGVs

    Hiroki Nagai, Keisuke Okumura
    Multi-agent pathfinding (MAPF) under one-shot planning is a core component of warehouse automation, yet classical formulations typically assume four-connected 2D grids with unit-time moves in four directions. To fill reality gaps while still being trackable with discrete combinatorial search, this work proposes a more practical counterpart tailored to differential-drive AGVs. We term this multi-agent warehouse pathfinding (MAWPF), featured with four constraints: (i) agent actions are restricted to straight motion and in-place rotation; (ii) rotations require multi-step costs; (iii) acceleration and deceleration are considered, and; (iv) follower collisions are prohibited to prevent rear-end crashes. To solve MAWPF efficiently, we adapt representative suboptimal MAPF algorithms-PP, LNS2, PIBT, and LaCAM-and conduct comprehensive benchmarking. Our experiments reveal that PP and LNS2 struggle to solve instances with many agents, while PIBT-based approaches achieve preferable scalability with increased solution cost. We believe that these constitute an important step toward adapting classical gridworld MAPF to operational warehouse setups.
    Agent-based and Multi-agent SystemsMulti-agent planningRoboticsMotion and path planningSearchHeuristic search
  307. #3306

    McGcn: Learning Continuous Graph Dynamics for Multi-Channel Fusion

    Na Song, Zihan Fang, Weidong Zhang, Zehua Jia, Shiping Wang
    Multi-channel data fusion is essential for capturing comprehensive representations in complex systems. While graph convolutional networks have demonstrated remarkable efficacy, existing fusion paradigms primarily rely on discrete architectures governed by first-order propagation. These models are typically confined to discrete message-passing mechanisms, which makes it difficult to characterize the continuous evolution of underlying system dynamics. To address these limitations, we propose a multi-channel Graph Continuous Network (mcGCN), a novel framework for multi-channel data fusion. By formulating the information propagation as a second-order partial differential equation on graphs, mcGCN transitions from discrete layer-wise updates to a continuous dynamical system. Specifically, mcGCN integrates a multi-channel encoding module with continuous feature dynamics to initialize and evolve the latent node representations over static graph topologies. This physical analogy enables more robust information flow and effectively mitigates the performance degradation typically associated with deep graph architectures. Extensive experiments on diverse benchmark datasets demonstrate that our method outperforms state-of-the-art baselines, validating its effectiveness and robustness.
    Data MiningMining graphsMachine LearningClassification
  308. #3310

    Safe Multi-Objective Linear Bandits with Hierarchical Preferences

    Bo Xue, Mengxia He, Yilu Liu, Ji Cheng, Zhe Zhao, Qingfu Zhang
    Multi-objective bandits with hierarchical preferences and safety constraints is central to many real-world decision-making tasks such as healthcare treatment planning and safe autonomous control, where multiple objectives must be optimized according to their priorities while ensuring safety requirements are satisfied. In this paper, we study a multi-objective stochastic linear bandit framework that incorporates hierarchical preferences together with safety constraints, requiring the learner to remain competitive with respect to a known baseline policy. We consider two practically motivated safety models: (i) cumulative constraints, which require the cumulative performance to exceed the baseline, and (ii) stage-wise constraints, which impose this requirement at each time step. We propose two algorithms, LexUCB-C and LexTS-S, designed for the cumulative and stage-wise settings, respectively. We establish regret bounds showing that both algorithms achieve performance comparable to existing single-objective safe linear bandit methods, while simultaneously optimizing multiple objectives. In addition to theoretical guarantees, we develop a carefully designed experimental framework that captures the interaction between hierarchical preferences and safety constraints. Experiments on synthetic and real-world datasets validate our theory and demonstrate the effectiveness of the proposed methods.
    Constraint Satisfaction and OptimizationConstraint optimization problemsAIUncertainty in AIMachine LearningLearning theory
  309. #3314

    Unsupervised Graph-Level Anomaly Detection via Multi-granular Graph Structure Learning

    Ge Zhang, Huimei Li, Guohao Sun, Xiu Fang, Xixun Lin, Xiaobao Wang, Pengfei Jiao, Liang Yang
    Graph-level anomaly detection (GLAD) aims to identify graphs that deviate from the majority in a dataset of graphs. Existing methods typically adopt either a global aggregation perspective that summarizes nodes within a graph into a representation vector, or a subgraph-oriented perspective which regards certain local subgraphs as indicators of anomalous properties. However, both paradigms operate from a fixed perspective and may fail to identify anomalous graphs whose discriminative characteristics manifest at multiple levels of structure granularity, where each level features the coarsened graphs at a specific granularity. In this paper, we propose M-GLAD, an unsupervised GLAD method via multi-granular graph structure learning. M-GLAD is grounded in the Prototype-guided Multi-granular Information Bottleneck (PMIB) principle. PMIB aims to measure the mutual information between coarsened graphs at the specific level of structure granularity and the learnable granularity-specific prototypes that summarize normal patterns of graphs at each granularity. The multiple levels of structure granularity of graphs and prototypes are abstracted by a graph structure abstraction module. This formulation enables granularity-aware anomaly scoring that considers anomalous characteristics across different levels of structure granularity. Extensive experiments on eight real-world graph datasets demonstrate that M-GLAD achieves superior performance over competitive baselines.
    Data MiningAnomaly/outlier detectionData MiningMining graphs
  310. #3323

    GRASP: Enhancing Zero-Shot Multi-Modal Anomaly Detection via Geometric Refinement and Semantic Prompting

    Zhongbin Sun, Yuze Cui, Yong Zhou
    Zero-shot multimodal anomaly detection is critical for identifying structural defects that are often invisible to traditional RGB images, particularly in scenarios lacking target domain samples. However, existing methodologies face two significant impediments: the prohibitive computational overhead caused by relying on multi-view rendering for depth processing, and the inability of static text prompts to adapt to fine-grained local anomalies. To address these challenges, a unified zero-shot multimodal anomaly detection framework GRASP is proposed. Firstly, a Frequency Domain Enhancement module is introduced to replace costly rendering with spectral transformations, directly synthesizing high-fidelity depth images for efficient utilization. Secondly, a Prompt-Conditioned Variational module is designed to bridge the semantic gap by grounding global textual descriptions into local visual nuances. Finally, a Dual Cross-Injection Alignment module is proposed to enable robust feature fusion for enhancing anomaly classification performance, while a Pyramid Anomaly Map Recalibration module further refines anomaly localization across multiple scales. Extensive experiments on MVTec 3D-AD and Eyecandies demonstrate that GRASP establishes a new state-of-the-art, yielding substantial improvements of 3.0 points in I-AUROC and 2.7 points in AUPRO compared to the SOTA method.
    Computer VisionMultimodal learningComputer VisionRecognition (object detection, categorization)Computer VisionTransfer, low-shot, semi- and un- supervised learning
  311. #3336

    FedUP: One-Shot Federated Unlearning via Centroid-Guided Plug-in Filters

    Feihong Nan, Zhengyi Zhong, Pan Wang, Weidong Bao, Xiongtao Zhang, Quan Wen, Ji Wang
    Federated unlearning (FU) is critical for complying with legal mandates like the right to be forgotten in decentralized systems, yet current methods face a persistent dilemma between non-target knowledge loss and high request latency. To resolve these issues, we propose FedUP, a one-shot federated unlearning framework utilizing lightweight pluggable filters that act as a "knowledge funnel" to screen out target data while preserving original model performance. By freezing original model parameters and training filters at the server side using differentially private (DP)-protected class centroid samples, FedUP bypasses the need for multi-round client-server communication and complex retraining, reducing unlearning latency from minutes to mere seconds. Additionally, the framework's pluggable architecture ensures inherent reversibility, enabling the seamless restoration of forgotten knowledge by simply removing the filters. Extensive experiments on diverse image and text tasks demonstrate that FedUP effectively reduces non-target knowledge loss and achieves superior unlearning precision and efficiency across various scenarios. Code is available at: https://github.com/suows/FedUP-code.
    Machine LearningFederated learning
  312. #3349

    Inversely Learning Transferable Rewards via Abstracted States

    Yikang Gui, Prashant Doshi
    Inverse reinforcement learning (IRL) has made significant progress in recovering reward functions from expert demonstrations. However, a key challenge remains: how to extract reward functions that generalize across related but distinct tasks. In this paper, we address this by focusing on transferable IRL, learning intrinsic rewards that can drive effective behavior in unseen but structurally aligned environments. Our method leverages a variational autoencoder to learn an abstract representation of the state space shared across multiple source tasks. This abstracted space captures high-level features that are invariant across tasks, enabling the learning of a unified abstract reward function. The learned reward is then used to train policies in a separate, previously unseen target task without requiring new demonstrations in the target task. We evaluate our approach on multiple environments from Gymnasium and AssistiveGym, demonstrating that the learned abstract rewards consistently support successful policy learning in novel task settings.
    Machine LearningReinforcement learningRoboticsLearning in robotics
  313. #3357

    Two-Stage Fine-Grained Trajectory Generation Constrained by Road Networks

    Zewu Lv, Zipei Fan, Zhiwen Zhang, Xuan Song
    Trajectory generation is a pivotal technique for mitigating data sparsity, but existing methods struggle to simultaneously achieve strict road network alignment and capture realistic movement characteristics. To bridge this gap, we propose RNTrajGen, a two-stage fine-grained trajectory generation framework constrained by road networks. Specifically, we first develop a Road Network Knowledge-Enhanced Encoder (RNKEE) to provide semantically rich representations for trajectory generation. By leveraging graph attention networks, RNKEE encodes static prior knowledge of the road network while integrating self-supervised learning to extract latent movement patterns from real-world trajectories. Subsequently, RNTrajGen follows a two-stage generation strategy: it first generates a sequence of road segments to ensure strict topological alignment, and then infers the moving ratios of trajectory points along road segments to capture fine-grained movement characteristics. Extensive experiments on two real-world datasets demonstrate that RNTrajGen significantly outperforms state-of-the-art baselines, and the generated trajectories exhibit high utility in downstream prediction tasks. Our code is available at https://github.com/jkzh986/RNTrajGen.
    Data MiningMining spatial and/or temporal dataMachine LearningRepresentation learningMachine LearningSequence and graph learning
  314. #3360

    Reducing Bias and Variance: Generative Semantic Guidance and Bi-Layer Ensemble for Image Clustering

    Feijiang Li, Zhenxiong Li, Jieting Wang, Zizheng Jiu, Saixiong Liu, Liang Du
    Image clustering aims to partition unlabeled image datasets into distinct groups. A core aspect of this task is constructing and leveraging prior knowledge to guide the clustering process. Recent approaches introduce semantic descriptions as prior information, most of which typically relying on matching-based techniques with predefined vocabularies. However, the limited matching space restricts their adaptability to downstream clustering tasks. Moreover, these methods primarily focus on reducing bias to improve performance, frequently overlooking the importance of variance reduction. To address these limitations, we propose GSEC (Image Clustering based on Generative Semantic Guidance and Bi-Layer Ensemble), a framework designed to reduce bias through generative semantic guidance and mitigate variance via ensemble learning. Our method employs Multimodal Large Language Models to generate semantic descriptions and derive image embeddings via weighted averaging. Additionally, a bi-layer ensemble strategy integrates cross-modal information through BatchEnsemble in the inner layer and aligns outputs via an alignment mechanism in the outer layer. Comparative experiments demonstrate that GSEC outperforms 20 state-of-the-art methods across six benchmark datasets, while further analysis confirms its effectiveness in simultaneously reducing both bias and variance.
    Machine LearningClusteringMachine LearningUnsupervised learning
  315. #3363

    Relation-Aware Graph Learning with Mixture-of-Experts Prediction for Cognitive Diagnosis

    Jingwei Qu, Mingze Zhang, Pingshun Zhang, Li Tao, Ying Wang, Zhaofang Yang, Haibin Ling
    Cognitive diagnosis aims to infer students’ concept-level mastery from exercise response logs and exercise-concept associations. Fully leveraging heterogeneous relations and modeling large mastery-difficulty variations remain challenging, especially with a single predictor. To address these challenges, we propose RMCD, a unified cognitive diagnosis model that integrates relation-aware graph learning with Mixture-of-Experts (MoE) prediction. RMCD constructs a heterogeneous relational graph over students, exercises, and concepts with multiple relation types, and learns node and edge representations simultaneously. It derives relation-strength vectors from student-concept and exercise-concept edges to distinguish relation effects and refine node representations. RMCD further introduces an MoE-based prediction head that adaptively combines multiple expert predictors to capture diverse mastery-difficulty discrepancies. Experiments on benchmark datasets demonstrate that RMCD consistently outperforms state-of-the-art cognitive diagnosis methods. Our algorithm is available at https://github.com/swu-qjw-lab/code/tree/main/RMCD.
    Data MiningMining graphsMultidisciplinary Topics and ApplicationsEducation
  316. #3378

    Detection-Explanation-Improvement: A Closed-Loop Framework of Enhancing Anomaly Detection with Counterfactual Explanations

    Peng Zhou, Zhiyong Huang, Yuanting Yan
    Many state‑of‑the‑art anomaly detection models operate as black boxes, limiting interpretability and hindering reliable deployment. While recent advances in explainable artificial intelligence have focused on explaining why individual instances are detected as anomalous, comparatively little attention has been paid to how such explanations can be systematically exploited to improve the detectors themselves. To address this gap, we propose EAD‑CE (Enhancing Anomaly Detection with Counterfactual Explanations), a model‑agnostic, closed‑loop framework that tightly integrates detection, explanation, and improvement. Specifically, given a trained anomaly detector and its detected anomalies, EAD‑CE generates minimal and semantically meaningful counterfactual explanations that reveal how targeted feature perturbations influence anomaly scores or decisions. Feature importance inferred from these counterfactual explanations is then used to guide a dynamic feature-weight optimization process, enabling detector refinement without modifying its underlying architecture. Extensive experiments on nine real‑world datasets and three anomaly detection models demonstrate that EAD‑CE accurately identifies anomaly‑driving features, substantially enhances interpretability, and consistently improves detection performance (average AUC increases by 6.9%, up to 23%). Implementation details are provided to support reproducibility.
    Data MiningAnomaly/outlier detectionMachine LearningExplainable/Interpretable machine learning
  317. #3402

    Unlocking More Granular Control of Memory-Efficient LLM Finetuning

    Yezhen Wang, Zhouhao Yang, Fanyi Pu, Kenji Kawaguchi
    Low-rank gradient projection (LoRP) has recently emerged as a memory-efficient alternative to low-rank adapters (LoRA) for finetuning large language models. Existing LoRP methods, however, implicitly fix the projection unit to a single gradient row, leaving the effect of grouping multiple rows (or subdividing a row) largely unexplored. In this work, we systematically investigate the impact of the projection unit on LoRP methods. Specifically, we extend existing LoRP approaches by introducing an additional degree of freedom, projection granularity, beyond the traditional rank hyperparameter. This enables a framework capable of performing Various-grained Low-Rank Projection of gradients, which we term VLoRP. Using VLoRP, we observe that, under an identical memory budget, fine-grained projections consistently deliver superior performance. Moreover, VLoRP requires no extra computation and minimal code changes, effectively providing a no-cost accuracy boost to LoRP. Finally, we provide convergence analysis on VLoRP with either SGD or an Adam-based memory-efficient optimizer, and extensive experiments are conducted to validate our findings, covering tasks such as Commonsense Reasoning, MMLU, and GSM8K
    Machine LearningMatrix/tensor methodsMachine LearningOptimizationMachine LearningOther
  318. #3419

    Obstacle Avoidance and Trajectory Tracking of Redundant Manipulators with Unknown Physical Parameters Under Multiple Constraints

    Lei Jia, Hui Deng
    Redundant manipulators are broadly used in safety tasks that require precise execution. However, some actual factors, such as assembly defects and mechanical wear, inevitably introduce uncertainties in their physical parameters. Due to the insufficient excitation, existing methods fail to achieve high-accuracy physical parameter identification, which seriously affects the accurate operation of multiple safety tasks. Considering that quadratic programming (QP) can integrate desired behaviors with constraints into a unified optimization framework and can be solved online in real time, this paper formulates the collision-free trajectory tracking of redundant manipulators with uncertain structure as a QP problem, which describes the trajectory tracking, obstacle avoidance and joint motion limits simultaneously. To tackle the above issues, we first develop an online physical parameters identification method called regularization and zeroing neural dynamics (RZND) by using the state information of the manipulator, which realizes highly accurate physical parameters identification and real-time Jacobian matrix updates. On the basis of this, a multi-task quadruple projection neural network (MT-QPNN) solver is proposed by applying projection operators to handle the joint constraints, and further the trajectory tracking and obstacle avoidance tasks of the redundant manipulator are well achieved. Relative experiments verify that the presented RZND method and MT-QPNN solver can accurately identify physical parameters and enable collision-free tracking with superior performance.
    RoboticsApplicationsRoboticsBehavior and controlRoboticsMotion and path planning
  319. #3432

    CoDG-Net: Structure-Guided Style Diffusion and Collaborative Learning to Mitigate Catastrophic Forgetting in Medical Image Domain Generalization

    Yucheng Song, Jincan Wang, Haokang Ding, Zhiqiang Tian, Kangxu Fan, Zhifang Liao
    Domain Generalization (DG) for medical image segmentation is both highly challenging and critically important. However, existing medical DG methods largely overlook the issue of Catastrophic Forgetting (CF): Models often sacrifice their ability to retain source-domain knowledge while pursuing cross-domain robustness. This can directly threaten diagnostic safety in already-deployed clinical scenarios. To address this, we investigate data augmentation strategies and catastrophic forgetting for medical image DG segmentation. First, we propose a structure-guided style diffusion augmentation method. Constrained by anatomical structure consistency in the frequency domain, this method performs cross-domain diffusion on the amplitude spectrum, generating samples with more diverse and broader style coverage to better support domain generalization. Then, we design a collaborative learning network with a dual-branch interactive architecture (CoDG-Net), together with a novel learning bias-guided strategy that adaptively regulates knowledge transfer at both the layer level and the task level, thereby effectively mitigating catastrophic forgetting on the source domain. Experiments and ablation studies on single-source and multi-source medical DG benchmark datasets demonstrate that CoDG-Net not only outperforms existing state-of-the-art methods in target-domain segmentation performance, but also achieves a lower forgetting rate on the source-domain data. The code is available at: https://github.com/wangprocess/CoDG-Net.
    Computer VisionBiomedical image analysisComputer VisionSegmentation, grouping and shape analysisComputer VisionTransfer, low-shot, semi- and un- supervised learning
  320. #3437

    Graph Label Denoising via Neighborhood Agreement–Guided Expectation Maximization

    Dezhi Liu, Richong Zhang, Junfan Chen, Fengbo Tian, Si Chen
    Graph Neural Networks are susceptible to label noise, in which message-passing mechanisms serve as conduits for propagating erroneous supervision. Current mitigation techniques typically recover clean labels via heuristics that lack theoretical grounding, which often leads to ineffective denoising. To tackle the issue, we propose NAEM, a latent label estimation framework that models clean labels as latent variables by integrating neighborhood agreement into the Expectation-Maximization (EM) paradigm. To overcome the posterior collapse problem in standard EM under severe noise, we design a structure-aware E-step that leverages neighborhood agreement as a structural prior. This mechanism acts as a dynamic confidence gate based on local consensus to prevent the model from overfitting to noise. Simultaneously, by explicitly modeling the noise transition matrix in the M-step, NAEM decouples noise dynamics from semantic representation learning. Extensive experiments on five benchmark datasets demonstrate that NAEM consistently outperforms SOTA methods under varying noise conditions, validating the effectiveness of our framework.
    Data MiningMining graphsMachine LearningWeakly supervised learning
  321. #3447

    DaV-Gen: End-to-End Generative Retrieval via Draft-and-Verify

    Meng Zhao, Chunmei Liu, Qinyong Wang
    Mainstream industrial information retrieval systems (e.g., search and recommendation) are usually built upon Multi-Stage Cascade Architectures (MCAs), which balance effectiveness and efficiency through a coarse-to-fine "retrieval-ranking" pipeline. However, the optimization objectives across different stages are substantially inconsistent, propagating or even amplifying the early-stage errors that ultimately degrade the quality of final results. While emerging end-to-end generative models offer a potential solution by unifying the pipeline, their online serving performance is severely hindered by the auto-regressive process inherited from the standard decoder-only structure.

    To bridge this gap, we introduce DaV-Gen, a novel unified solution designed to fundamentally refactor the paradigm for both search and recommendation via a "Draft-and-Verify" mechanism. Inspired by the process used by speculative decoding, our framework redesigns the generation task into two synergistic operations within a single model. During training, the model is concurrently optimized for both candidate drafting and fine-grained verification. This is achieved by a composite loss function that jointly trains the model on two distinct but related objectives: 1) a contrastive loss that structures the embedding space for efficient drafting, and 2) a fusion loss that combines generative likelihood with vector similarity to produce a superior verification score. This integrated training strategy equips the model with dual capabilities. At inference time, it first performs highly efficient vector-based drafting to generate a candidate set, and then verifies these candidates using the more powerful fused scoring function, thereby achieving both the speed of sparse drafting and the precision of advanced generative models within a unified, end-to-end architecture.
    Data MiningCollaborative filteringData MiningRecommender systemsNatural Language ProcessingLanguage generation
  322. #3453

    BEAT2AASIST: BEATs Feature Splitting with Dual-Branch AASIST for Environmental Sound Deepfake Detection

    Sanghyeok ‍Chung, Seungsang Oh, Donggun Kim, Jeongbin You, Il-Youp Kwak, Gaeun Heo, Eujin Kim, Nahyun Lee, Sunmook Choi, Soyul Han
    Recent advances in text-to-audio (TTA) and audio-to-audio (ATA) generation models have enabled the creation of highly realistic environmental sounds, raising growing concerns about malicious audio manipulation in real-world scenarios. To address this emerging threat, the ESDD 2026 Challenge was introduced as the first large-scale benchmark for Environmental Sound Deepfake Detection (ESDD), featuring two tracks that evaluate generalization to unseen generators and robustness under black-box, low-resource conditions. In this paper, we present BEAT2AASIST, an enhanced deepfake detection framework built upon the BEATs-AASIST baseline. Motivated by the observation that token-based audio representations may weaken explicit preservation of structured acoustic cues, the proposed method introduces a dual-branch AASIST architecture that explicitly splits BEATs-derived representations along frequency or channel dimensions. This design enables specialized modeling of complementary spoofing artifacts that may be attenuated in unified representations. To further enrich acoustic features, we incorporate multi-layer fusion strategies that aggregate information from multiple transformer layers using concatenation, CNN-gated, and SE-gated mechanisms. In addition, vocoder-based data augmentation with multiple high-fidelity neural vocoders is employed to enhance robustness against unseen and black-box spoofing attacks. Experimental results on the EnvSDD dataset demonstrate that BEAT2AASIST achieves strong and consistent performance across both challenge tracks. In particular, the proposed approach attains 3rd place in Track 2 and 4th place in Track 1 in the ESDD 2026 Challenge, despite using fewer ensemble components than top-ranked systems. These results suggest that explicit modeling of heterogeneous acoustic subspaces, combined with targeted representation fusion and data augmentation, provides an effective and efficient design strategy for real-world environmental sound deepfake detection. The code is available at https://github.com/ikwak2/BEAT2AASIST.
    Machine LearningClassificationMachine LearningFeature extraction, selection and dimensionality reductionMachine LearningTime series and data streamsMachine LearningDeep learning architectures
  323. #3466

    G2C-MT: Graph-Guided Context Selection for Document-Level Machine Translation

    Baijun Ji, Zixuan Zhou, Xiangyu Duan, Yu Liu, Longbo Sun, Rupu Wei, Bohong Zhao
    Effective document-level machine translation (DocMT) requires capturing long-range discourse dependencies. Recent work has explored retrieval-based and discourse-aware context selection. However, these approaches often lack an explicit mechanism for modeling structured discourse dependencies between distant paragraphs in a document.
    In this paper, we propose G²C-MT (Graph-Guided Context for Machine Translation),
    which views DocMT context selection as a structured path discovery problem on a lightweight discourse graph,
    rather than retrieving unstructured context sets or relying on expensive LLM-based discourse modeling.
    In detail, we represent each paragraph as a node and model the relationship between each pair of nodes, considering their semantic similarity, adjacency, and keyword overlap.
    Furthermore, we propose a depth-biased random walk over the graph to sample a backward context path for each target paragraph. The context path will be used to prompt a large language model (LLM) for translation.
    This framework naturally supports multi-path context sampling, which can improve robustness by aggregating diverse translation candidates for discourse-ambiguous inputs.
    Experiments conducted across various domains show that G²C-MT outperforms strong baselines on multiple LLMs, including DeepSeek-V3, Gemini-2.5-Flash-lite, and the Qwen-2.5/3 series.
    Natural Language ProcessingMachine translation and multilinguality
  324. #3473

    INSHAPE: Instance-Level Shapelets for Interpretable Time-Series Classification

    Seongjun Lee, Seokhyun Lee, Changhee Lee
    Discovering shapelets -- i.e., discriminative temporal patterns within time series -- has been widely studied to address the inherent complexity of time-series classification (TSC) and to make model decision-making processes more transparent.
    However, existing methods primarily focus on population-level shapelets optimized across the entire dataset, which leads to two fundamental limitations: (i) population-level patterns often misalign with instance-specific features, resulting in suboptimal performance and potentially misleading interpretations, and (ii) most methods treat shapelets as independent entities, overlooking important temporal dependencies and interactions among multiple patterns.
    To address these limitations, we propose INSHAPE, an interpretable TSC framework that discovers variable-length, discriminative temporal patterns specific to each time series.
    INSHAPE identifies these patterns as non-overlapping segments and models their temporal dependencies, thereby providing clear instance-level interpretations while achieving strong predictive performance.
    Furthermore, INSHAPE bridges local and global interpretability through a bottom-up approach, aggregating instance-level shapelets into prototypical (population-level) shapelets.
    Extensive experiments on 128 UCR and 30 UEA benchmark datasets show that INSHAPE consistently outperforms state-of-the-art shapelet-based methods while providing more intuitive and interpretable insights.
    Machine LearningClassificationMachine LearningExplainable/Interpretable machine learningMachine LearningFeature extraction, selection and dimensionality reductionMachine LearningTime series and data streams
  325. #3480

    ASTPKEFormer: Adaptive Spatiotemporal Prior Knowledge Embedding-Induced Transformers for Traffic Data Forecasting

    Wenfeng Zhou, Xiaoyun Xia, Xiangjie Kong, Guojiang Shen, Bin Chen, Fei Wu, Binbin Guo
    Traffic forecasting is fundamentally challenging due to the complex and dynamic spatiotemporal dependencies inherent in road networks. Although existing prediction models are able to achieve certain results on this task, existing Transformer-based models usually rely on simple embedding strategies and do not fully utilize the prior knowledge embedded in traffic patterns and network topology. To address these limitations, we propose ASTPKEformer, a prior knowledge-guided Transformer framework for traffic prediction. Based on the pure Transformer spatiotemporal self-attention mechanism, this model first uses a multi-scale temporal channel alignment module to generate discriminative feature embeddings. Then, it incorporates temporal prior embeddings and spatial graph structure prior embeddings to provide learning guidance for the model in both temporal and spatial dimensions. To further enhance the representation capability, an embedding cross-fusion mechanism is introduced to strengthen the interaction between the previous embeddings and the adaptive spatiotemporal embedding. Extensive experiments on six real-world traffic datasets demonstrate that ASTPKEformer consistently outperforms state-of-the-art (SOTA) baselines, validating its effectiveness and strong generalization ability.
    Data MiningMining spatial and/or temporal dataMachine LearningRepresentation learning
  326. #3488

    IdentityMask: A Robust Face-Centric Privacy Protection Against Unauthorized Personalization of Diffusion Models

    Weiwei Tan, Rui Wang, Lihua Jing, Yanjun Zhang, Runbo Li, Leo Yu Zhang
    Unauthorized personalization based on diffusion models pose a severe and growing threat to digital privacy by enabling the unauthorized replication and exploitation of individual identities. Existing disrupting-based defenses primarily add invisible perturbations arbitrarily across the entire image space to disrupt the generation process. However, we reveal that these methods fundamentally overlook the spatio-temporal dynamics of the personalization process, resulting in inefficient optimization that fails to sufficiently disrupt the core identity encoding mechanism. To mitigate these limitations, we propose IdentityMask, a robust protection framework that shifts the paradigm from arbitrary confusion to precise, targeted feature corruption. By anchoring the perturbation on subject-specific semantics and prioritizing the most critical diffusion timesteps, our framework ensures the disruption is maximized precisely where the identity is encoded. Additionally, a novel manifold projection strategy is introduced to embed the adversarial signals into the intrinsic structure of the image, rendering the protection resilient against state-of-the-art purification. Extensive experiments across diverse datasets, personalization techniques, and defense settings demonstrate that IdentityMask consistently outperforms prior state-of-the-art approaches in both protection efficacy and robustness.
    Computer VisionAdversarial learning, adversarial attack and defense methods
  327. #3497

    Efficient Optimization of Fixed-Length Paths

    Martino Ciaperoni, Nikolaos Tziavelis, Panagiotis Karras
    Optimization problems such as Viterbi decoding and V-optimal histogram construction seek a path of exact length L through a state space that minimizes a cost function. These problems are traditionally solved using dynamic programming (DP). A best-first-search (BFS) solution is also applicable, yet requires maintaining a priority queue. In all cases, memory usage grows linearly with both state space size and path length. In this paper, we propose CompactBFS, a framework that limits the growth of the BFS priority queue to space-efficiently determine the exact optimal cost for a fixed-length path and then constructs such a path by a divide-and-conquer strategy that eliminates the memory overhead. We apply CompactBFS to Viterbi decoding, which remains relevant to speech recognition, and V-optimal histogram construction. Our experimental results demonstrate significant gains over state-of-the-art solutions in runtime and memory consumption.
    Constraint Satisfaction and OptimizationConstraint optimization problemsData MiningBig data and scalabilityData MiningMining graphsNatural Language ProcessingSpeechSearchCombinatorial search and optimisation
  328. #3512

    CHoE: Cross-Domain Heterogeneous Graph Prompt Learning via Structure-Conditioned Experts

    Peiyuan Li, Yongqi Huang, Jitao Zhao, Dongxiao He, Di Jin, Weixiong Zhang
    Heterogeneous Graph Prompt Learning (HGPL) has emerged as a promising paradigm for bridging the gap between the objectives of pre-training foundation models and their downstream applications in heterogeneous graph settings. However, existing HGPL methods are primarily designed for in-domain scenarios, whereas real-world deployments often span multiple domains, and the data used for pre-training and downstream tasks may originate from different distributions. Consequently, the applicability of current HGPL approaches is limited in in-domain settings, and their performance typically degrades when application domains shift. To address this serious limitation, we develop CHoE, a cross-domain HGPL method built upon an expert network. During pre-training, we introduce and train structure-conditioned experts, and during prompt tuning, we propose a structure-aware expert routing and load balancing mechanism to select structurally compatible experts for each meta-path view. In addition, we design a prompt-based semantic fusion module to integrate representations across views for downstream prediction. Extensive experiments show that CHoE consistently improves performance in few-shot cross-domain applications, outperforming all baseline approaches.
    Data MiningMining graphs
  329. #3522

    DyG-Seg: Unsupervised 3D Point Cloud Segmentation via Geometric Manifold Rectification

    Benyu Wu, Kun Zhou, Xulun Ye
    Unsupervised 3D semantic segmentation is vital for label-free open-world perception. However, current methods typically struggle with two core limitations: fixed category assumptions that restrict adaptability in complex scenes, and geometric incompleteness caused by irregular point sampling, which degrade the integrity of local geometric descriptors for fine-grained or long-tail objects. To address these, we propose DyG-Seg, a Dynamic Geometric-Semantic Alignment framework. First, targeting the challenge of incomplete geometry, the Geometric Manifold Rectification (GMR) leverages a diffusion model to reconstruct integral object geometry and recover the underlying manifold topology, thereby enhancing feature discriminability. Utilizing this geometric prior, the Dynamic Prototype Contrastive Clustering (DPCC) applies non-parametric Bayesian inference to automatically estimate the optimal number of clusters and generate pseudo-labels. Finally, our Iterative Optimization with Dynamic Constraints integrates Dynamic Class Balancing (DCB) to mitigate long-tail bias. Capitalizing on the recovered manifold topology, we enforce Spatial Geometric Consistency to rectify structural incompleteness and ensure spatially coherent semantic predictions. Extensive experiments on ScanNet and S3DIS demonstrate that DyG-Seg discovers latent semantic structures and significantly outperforms state-of-the-art unsupervised methods.
    Computer VisionSegmentation, grouping and shape analysis
  330. #3523

    TPPMG: Temporal Planning-driven Progressive Motion Generation

    Ruoyu Wang, Can Deng, Xinyi Li, Zhuo Li
    Most text-to-motion models adopt a fixed-length sampling paradigm, treating time as a hyperparameter rather than a decision variable inferred from semantics. This leads to a systematic temporal mismatch between text and generated motion: short prompts suffer from duration trailing, while long prompts exhibit stage-wise semantic collapse.

    To address this issue, we propose TPPMG (Temporal Planning-driven Progressive Motion Generation), a temporally planned progressive framework that follows the pipeline semantics, temporal structure, and motion realization. TPPMG has two key components: (1) a Temporal Planner, which uses an LLM with a duration calibrator to infer stage boundaries and durations from text, yielding an explicit semantics-to-temporal structure mapping; and (2) a Progressive Diffusion Generator, which introduces heterogeneous noise scheduling within fixed windows and a sliding-window mechanism at inference to resolve the mismatch between fixed-length diffusion training and variable-length sampling, thereby generating variable-length motion sequences that match the planned temporal structure.

    We further build three benchmarks: HumanML3D-Short for short texts, and two multi-stage benchmarks HumanML3D-Concat and BABEL-Concat, along with a comprehensive evaluation suite. Experiments show that TPPMG effectively suppresses duration trailing for short texts and simultaneously improves stage-structure accuracy, segment-level semantic consistency, and boundary continuity for long texts.
    Computer Vision3D computer visionComputer VisionAction and behavior recognitionComputer VisionMotion and tracking
  331. #3528

    BehaviorGuard: Online Backdoor Defense for Deep Reinforcement Learning

    Yinbo Yu, Xueyu Yin, Jiadai Wang, Chunwei Tian, Sai Xu, Qi Zhu, Daoqiang Zhang
    Backdoor attacks pose a serious threat to deep reinforcement learning (DRL). Current defenses typically rely on reward anomalies to reverse-engineer triggers and model finetuning to remove backdoors. However, complex trigger patterns undermine their robustness, and fine-tuning entails high costs, limiting practical utility. To this end, we shift defense concerns to trigger-agnostic backdoor output behaviors and propose BehaviorGuard, an online behavior-based backdoor detection and mitigation framework for DRL. Specifically, we find that regardless of attacks, backdoored policies induce consistent shifts in action distributions to ensure reliable activation, leaving detectable traces in high-quantile regions and distribution tails, even in the absence of triggers. Based on this, we design a novel metric that captures behavioral drift in action distributions to identify and suppress backdoor actions at runtime. To our knowledge, this is the first online backdoor defense that counters attacks both in single- and multi-agent DRL. Evaluated across diverse benchmarks with different backdoor attacks, BehaviorGuard consistently surpasses prior methods in both efficacy and efficiency.
    Machine LearningMultiagent Reinforcement LearningMachine LearningReinforcement learningMultidisciplinary Topics and ApplicationsSecurity and privacy
  332. #3537

    Spatially Generalizable Mobile Manipulation via Adaptive Experience Selection and Dynamic Imagination

    Ping Zhong, Liangbai Liu, Bolei Chen, Tao Wu, Jiazhi Xia, Chaoxu Mu, Jianxin Wang
    Mobile Manipulation (MM) involves long-horizon decision-making over multi-stage compositions of heterogeneous skills, such as navigation and picking up objects. Despite recent progress, existing MM methods still face two key limitations: (i) low sample efficiency, due to ineffective use of redundant data generated during long-term MM interactions; and (ii) poor spatial generalization, as policies trained on specific tasks struggle to transfer to new spatial layouts without additional training. In this paper, we address these challenges through Adaptive Experience Selection (AES) and model-based dynamic imagination. In particular, AES makes MM agents pay more attention to critical experience fragments in long trajectories that affect task success, improving skill chain learning and mitigating skill forgetting. Based on AES, a Recurrent State-Space Model (RSSM) is introduced for Model-Predictive Forward Planning (MPFP) by capturing the coupled dynamics between the mobile base and the manipulator and imagining the dynamics of future manipulations. RSSM-based MPFP can reinforce MM skill learning on the current task while enabling effective generalization to new spatial layouts. Comparative studies across different experimental configurations demonstrate that our method significantly outperforms existing MM policies. Real-world experiments further validate the feasibility and practicality of our method. The source code is available at https://csu-hero-lab.github.io/SG-MM Web.
    RoboticsBehavior and controlRoboticsLearning in roboticsRoboticsMotion and path planning
  333. #3538

    Attention as Selection: Semantic-Guided Time Series Forecasting

    Xueyu Luo, Qiang Lu, Sangui Jian, Yangxue Hu, Zhengyu Ying, Ye Yu, Wenxing Lu, Yan Qiao
    Recent advances in time series forecasting (TSF) leverage large language models (LLMs) to provide semantic priors, enabling more robust forecasting under limited training data. However, directly fusing semantic signals into temporal features often induces modality entanglement and obscures local temporal structures when cross-modal correlations are weak, noisy, or inconsistent. To address these issues, we propose Attention as Selection (AAS), a dual-branch framework that decouples temporal and semantic representations. Specifically, we define cross-modal attention as a selection process, where semantic prompts are employed to induce a sparse temporal attribution distribution over temporal positions. This guides the model to focus on critical time steps without interfering with the construction of temporal representations. Furthermore, low-entropy regularization is employed alongside global cross-modal consistency constraints to regulate the selection behavior, ensuring that semantic guidance remains sparse, stable, and aligned with temporal dynamics. Extensive experiments on six real-world datasets demonstrate that AAS outperforms existing methods across various forecasting scenarios. Code is available at https://github.com/VIMLab-hfut/Attention-as-Selection.
    Machine LearningMulti-modal learningMachine LearningTime series and data streams
  334. #3553

    SpeciFuse: Learning Degradation-Type Specificity for Robust Infrared and Visible Image Fusion Under Composite Degradations

    Xuan Li, Zhaoming Feng, Xiang Yuan, Huabing Zhou, Jiayi Ma
    Existing degradation-resistant infrared-visible image fusion methods struggle to effectively handle composite degradations, where multiple degradation types exhibit intricate coupling and mutual interference. To address this challenge, we propose SpeciFuse, an infrared-visible image fusion network that learns degradation-specific representations. By explicitly modeling the specificity of individual degradations through a siamese architecture trained on single-degradation data, our method effectively addresses the conflicts arising from multiple degradations, thereby achieving robust fusion performance that generalizes to diverse composite degradation scenarios. In SpeciFuse, a degradation-type specificity decoupling module is designed to disentangle the feature representations by attenuating their redundant correlations in a latent space. It minimizes off-diagonal elements to enforce independence among the encodings of distinct degradation types, while preserving dominant diagonal elements to maintain shared content information. Furthermore, a degradation-aware gated fusion module leverages these decoupled features to dynamically modulate cross-modal fusion across both channel and spatial dimensions. This facilitates a degradation-conditional fusion process that adaptively suppresses degradation artifacts while preserving beneficial information across varying conditions. In extensive comparisons with state-of-the-art methods, SpeciFuse demonstrates more robust performance under diverse composite degradations. The code is available at https://github.com/xbsj-cool/SpeciFuse.
    Computer VisionMachine learning for visionComputer VisionMultimodal learningComputer VisionRepresentation learning
  335. #3554

    SDFLoRA: Selective Decoupled Federated LoRA for Privacy-preserving Fine-tuning with Heterogeneous Clients

    Zhikang Shen, Jianrong Lu, Haiyuan Wan, Jianhai Chen
    Federated learning (FL) has emerged as a promising paradigm for adapting large language models (LLMs) to distributed data. To mitigate the high communication and memory overhead, parameter efficient techniques such as Low Rank Adaptation (LoRA) are widely adopted. However, practical deployments often exhibit rank and data heterogeneity, making direct aggregation of LoRA updates biased and unstable. Existing approaches enforce a unified rank or align heterogeneous updates into a single shared subspace, which undermines personalization and accuracy. Moreover, under differential privacy (DP), adding noise to such mixed updates could perturb client specific directions that should remain local, resulting in utility loss. To address these issues, we propose Selective Decoupled Federated LoRA (SDFLoRA), a structure aware LoRA framework that decouples each client update into a shared component for updating and a private component that preserves client specific semantics. Subspace alignment and aggregation are applied selectively to the shared module, while private modules remain local. We further design low rank re-compression to maintain a fixed rank budget for the shared module. This structure supports privacy aware optimization by injecting DP noise exclusively into the shared module. Experiments on multiple benchmarks demonstrate that SDFLoRA beats federated LoRA baselines and exhibits strong robustness under privacy constraints.
    Machine LearningFederated learningMachine LearningSupervised LearningNatural Language ProcessingLanguage models
  336. #3567

    UNOP: Physics-Constrained Unsupervised Neural Operator for Long-Horizon PDE Learning on Generalized Geometries

    Xinrui Cheng, Tianqi Zhao, Zhaodong Zhang, Ngai Wong, Zhongjie Wang, Ruihan Hu
    Unsupervised learning of neural operators is constrained by numerical instability, causing predictions to diverge in long-horizon rollouts. To address this, we present a physics-constrained unsupervised neural operator for long-horizon PDE learning on generalized geometries (UNOP). This framework replaces differential constraints with integral consistency for stable, label-free learning. Unlike prior works, UNOP is built upon Latent Integral Physics Embedding (LIPE), which enforces physical consistency through integral constraints. To extend integral formulations to generalized geometries, the Geometry-Agnostic Latent Adapter (GALA) projects them onto a unified latent grid of PDE inputs, providing a regularized domain for spatial integral evaluation. Based on this shared embedding, the Gated Spectral Evolution Operator (GSEO) performs stable temporal integration while retaining spatial regions with sharp gradients and fine-scale structures, with the evolution constrained by the LIPE objective. Experiments on 1D, 2D, and 3D benchmarks show UNOP outperforms state of the art methods, reducing error accumulation by up to 60% in
    20-step rollouts. Code is available at https://github.com/chengxinrui/UNOP.
    Machine LearningDeep learning architecturesMachine LearningUnsupervised learningMultidisciplinary Topics and ApplicationsPhysical sciences
  337. #3575

    Training-Free Inference for High-Resolution Sinogram Completion

    Jiaze E, Srutarshi Banerjee, Tekin Bicer, Guannan Wang, Yanfu Zhang, Bin Ren
    High-resolution sinogram completion is critical for computed tomography reconstruction, as missing projections can introduce severe artifacts. While diffusion models provide strong generative priors for this task, their inference cost grows prohibitively with resolution. We propose HRSino, a training-free and efficient diffusion inference approach for high-resolution sinogram completion. By explicitly accounting for spatial heterogeneity in signal characteristics, such as spectral sparsity and local complexity, HRSino allocates inference effort adaptively across spatial regions and resolutions, rather than applying uniform high-resolution diffusion steps. This enables global consistency to be captured at coarse scales while refining local details only where necessary. Experimental results show that HRSino reduces peak memory usage by up to 30.81% and inference time by up to 17.58% compared to the state-of-the-art framework, and maintains completion accuracy across datasets and resolutions.
    Computer VisionApplications and Systems
  338. #3591

    LBA: Textual Hard-Label Adversarial Attack Under Low Query Budgets

    Shixin Guo, Ming Zhong, Xuhong Zhang, Dandan Zhao, Zhe Wang, Bo Zhang, Shouling Ji, Hao Peng
    Generating high-quality adversarial texts with low query budgets remains a challenging problem in the hard-label scenario. Most existing approaches rely on greedy algorithms, where one position in the text is selected for substitution, followed by the substitutions of other positions. This local search approach may fail to discover high-quality adversarial examples and often leads to excessive query costs. Ideally, an optimal adversarial sample would consider all possible position combinations in the text, but exhaustive search is computationally impractical. To address this challenge, we propose a sampling-based method called LBA, which constructs an approximate distribution of high-quality adversarial examples by integrating both prior and posterior knowledge, and utilizes this distribution for sampling. As sampling progresses, posterior knowledge updates the approximate distribution, which in turn guides more effective sampling. Extensive experiments on six language models, ranging from small-scale to large-scale architectures across four datasets, demonstrate that LBA significantly outperforms state-of-the-art baselines on all evaluation metrics. Additionally, LLM-based assessment indicates that LBA generates more semantically preserved and comprehensible adversarial texts.
    AI Ethics, Trust, FairnesSafety and robustnessNatural Language ProcessingLanguage modelsNatural Language ProcessingText classification
  339. #3592

    OJBKQ: Objective-Joint Babai-Klein Quantization

    Xinyu Wang, Ziyu Zhao, Peng Lu, Yu Gu, Xiao-Wen Chang
    Post-training quantization (PTQ) is widely used to compress large language models without retraining. However, many existing weight-only methods rely on heuristic objectives and greedy rounding, thus leading to noticeable degradation under low-bit quantization In this work, we introduce OJBKQ (Objective-Joint Babai-Klein Quantization with K-Best Sampling), a layer-wise PTQ method that formulates weight quantization as a joint optimization problem over activations and weights. This formulation results in
    a multiple-right-hand-side box-constrained integer least squares (BILS) problem in each layer, which is NP-hard.
    For each column of the weight matrix, we apply an extended Babai nearest-plane algorithm and an extended version of Klein’s randomized Babai algorithm to find the minimum-residual Babai–Klein point, a sub-optimal solution to the BILS problem.
    Experimental results on large language models show that OJBKQ achieves lower perplexity at 3–4 bits compared to existing PTQ approaches, while maintaining comparable computational cost.
    Constraint Satisfaction and OptimizationConstraint optimization problemsNatural Language ProcessingLanguage models
  340. #3593

    Predicting Context-Aware Transcriptional Responses to Unseen Genetic Perturbation Subject to Interactome Distance Constraints

    Feiyu Ma, Yunfei Zhang, Hau-San Wong, Si Wu
    Predicting responses to genetic perturbation is pivotal for elucidating gene regulatory machinery. However, existing methods often rely on statistical perspectives to model differential expression, overlooking the constraints of the underlying molecular interactome, which renders predictions susceptible to spurious correlations. Based on the fact that a genetic perturbation is a molecular stimulus acting through functional connectivity to reconfigure cellular expression profiles, we propose PertDCR, a framework for context-aware Perturbation response prediction via Distance-Constrained Refinement. Specifically, PertDCR addresses the heterogeneity of perturbation effects by contextualizing the perturbation source with relevant gene programs inferred from the protein-protein interaction network as well as cell representation. To activate biologically valid responses, PertDCR predicts the distance between the perturbation source and highly responsive genes within the PPI network, and dynamically modulates the response intensity based on distance, thereby facilitating accurate perturbation prediction. Extensive experiments demonstrate that PertDCR achieves state-of-the-art performance across diverse unseen genetic perturbations, and the predictions are consistent with underlying biological mechanisms.
    Multidisciplinary Topics and ApplicationsBioinformatics
  341. #3603

    Scale-Invariant Conditional VAE for Coarse-Grained Economic Time-Series Forecasting

    Jianping Zhu, Lei Wang, Yang Chen, Bo Jin, Xiaopeng Wei
    Coarse-grained time series (CGTS) are critical for business and macroeconomic analysis. However, CGTS are typically updated infrequently and contain few observations, so model-centric training on raw data is prone to overfitting and degraded forecast accuracy. To address this, we propose SI-CVAE, a scale-invariant conditional generative model built around variable-length subsequences, and adopt a data-centric "train on synthetic, test on real" paradigm to enhance downstream forecasting. Specifically, we first introduce a variable-length subsequence clustering-and-matching algorithm to capture cross-scale recurring patterns within a series. We then design a frequency-domain conditional VAE with a frequency-domain linear decoder to enable controllable, arbitrary-length sequence synthesis while enforcing scale-invariance constraints. Finally, we develop a time-domain reconstruction strategy with subsequence-conditioned fusion to ensure temporal continuity and spectral consistency across concatenated segments. We conduct extensive experiments on six ship-sales datasets and four U.S. macroeconomic indicators. Results show that replacing or augmenting the original training set with SI-CVAE–generated data yields consistent accuracy gains across multiple forecasting baselines, and that SI-CVAE attains higher synthetic-data quality than state-of-the-art generators on standard metrics.
    Machine LearningTime series and data streams
  342. #3605

    DNFormer: Differential Attention for Graph Transformers

    Zizhen Wang, Dongxiao He, Zhizhi Yu, Kuntharrgyal Khysru, Weixiong Zhang
    Graph Transformers (GTs) have emerged as a powerful paradigm for graph representation learning, leveraging attention mechanisms to enable flexible information exchange. Recent GTs often adopt local attention to restrict computation to neighborhoods, thereby reducing costs and enhancing scalability. However, local attention computation poses a serious issue that significantly reduces the efficacy of the overall approaches – because computation is restricted to local neighborhoods, node pairs that share topological patterns receive similar attention scores, thereby reducing attention’s discriminative power. Consequently, GT-based models typically overemphasize common structural motifs but fail to capture node-specific properties. To address this problem, we propose DNFormer, a novel GT grounded in differential modeling. We introduce a differential local attention mechanism in DNFormer that computes attention scores as differences across multiple topology-aware similarity measures, rather than relying on a single score. This mechanism suppresses shared topological patterns among neighbors and emphasizes relative node-to-node distinctions. As a result, DNFormer mitigates structural dominance in attention aggregation, better capturing node-specific relational patterns while preserving local structural properties. Extensive experiments confirm DNFormer's superior performance over representative GT baselines.
    Data MiningMining graphs
  343. #3609

    gDMC: A Generic Distributed Model Counting Framework via Work-Stealing

    Zhenghang Xu, Minghao Yin, Junping Zhou, Jean-Marie Lagniez
    Propositional Model Counting (#SAT) is essential for probabilistic reasoning but faces scalability limits on single cores. Existing distributed approaches struggle with high initialization overheads (static decomposition) or precision loss and rigid architecture (dynamic solvers like dmc). We propose a novel, generic framework for distributed exact model counting. Leveraging C++ templates, our architecture decouples parallel orchestration from solving logic, enabling state-of-the-art solvers to be parallelized with minimal modification. We implement an adaptive work-stealing strategy that ensures load balancing and guarantees exact results via arbitrary-precision arithmetic. Experiments on competition benchmarks show that our approach achieves near-linear scalability and significantly outperforms existing distributed solvers.
    Constraint Satisfaction and OptimizationConstraint programmingConstraint Satisfaction and OptimizationDistributed constraintsConstraint Satisfaction and OptimizationSolvers and tools
  344. #3613

    Beyond Hard Writes and Rigid Preservation: Soft Recursive Least-Squares for Lifelong LLM Editing

    Xinyu Wang, Sicheng Lyu, Yu Gu, Jerry Huang, Peng Lu, Yufei Cui, Xiao-Wen Chang
    Model editing updates a pre-trained LLM with new facts or rules without retraining while preserving unrelated behavior. In real deployment, edits arrive as long streams, creating a plasticity–stability dilemma: repeated locate-then-edit “hard writes” can accumulate interference over time, while rigid preservation constraints may protect only explicitly constrained directions, allowing past edits or unconstrained behaviors to deviate.
    We propose RLSEdit, a recursive least-squares editor for long sequential editing. RLSEdit formulates editing as an online quadratic optimization with soft constraints, minimizing a cumulative key-value fitting objective together with two regularizers that control deviation from the pre-trained weights and from a designated anchor mapping. This objective admits an efficient Woodbury-based online recursion, with per-edit cost independent of history length and scaling only with the current edit size. We further provide deviation bounds and an asymptotic characterization of the adherence–preservation trade-off in the many-edits regime.
    Experiments on CounterFact and ZsRE across multiple model families show stable scaling to 10K edits, outperforming strong baselines in both edit success and holistic stability, while retaining early edits and preserving general capabilities on GLUE and held-out reasoning/code benchmarks.
    Code will be at here.
    Constraint Satisfaction and OptimizationConstraint optimization problemsKnowledge Representation and ReasoningApplicationsMachine LearningGenerative modelsNatural Language ProcessingLanguage models
  345. #3625

    Post Hoc Extraction of Pareto Fronts for Continuous Control

    Raghav Thakar, Gaurav Dixit, Kagan Tumer
    Agents in the real world must often balance multiple objectives, such as speed, stability, and energy efficiency in continuous control. To account for changing conditions and preferences, an agent must ideally learn a Pareto frontier of policies representing multiple optimal trade-offs. Recent advances in multi-policy multi-objective reinforcement learning (MORL) enable learning a Pareto front directly, but require full multi-objective consideration from the start of training. In practice, multi-objective preferences may arise after a policy has already been trained on a single specialised objective. Existing MORL methods cannot leverage such a pre-trained ‘specialist’ to learn Pareto fronts and avoid incurring the sample costs of retraining. We introduce Mixed Advantage Pareto Extraction (MAPEX), an offline MORL method that constructs a frontier of policies by reusing pre-trained specialist policies, critics, and replay buffers. MAPEX combines evaluations from specialist critics into a mixed advantage signal, and weights a behaviour cloning loss with it to train new policies that balance multiple objectives. MAPEX’s post hoc Pareto front extraction preserves the simplicity of single-objective off-policy RL, and avoids retrofitting these algorithms into complex MORL frameworks. We formally describe the MAPEX procedure and evaluate MAPEX on five multi-objective MuJoCo environments. Given the same starting policies, MAPEX produces comparable fronts at 0.001% the sample cost of established baselines.
    Agent-based and Multi-agent SystemsOtherMachine LearningReinforcement learningRoboticsLearning in robotics
  346. #3639

    On-Device Realistic Test-Time Adaptation via Bias-Resistant Statistical Alignment

    Haojie Bai, Aiguo Chen, Ruiting Dai, Yijia Rong, Zirui Wang, Jiaxin Liu, Kexin Li, Schahram Dustdar
    Test-Time Adaptation (TTA) aims to adapt pretrained models to unseen test data, which is crucial for resource-constrained edge devices that must handle distribution shifts on the fly without human supervision. However, conventional TTA methods often fail in realistic scenarios characterized by continuous shifts and severe class imbalances while incurring prohibitive memory overheads. To enable continuous adaptation to test data on edge devices under a realistic TTA setting, we propose a Bias-Resistant Online Statistical Alignment (BOSA) method. BOSA facilitates unbiased adaptation via a discrepancy-aware statistical alignment mechanism integrated with a class-balanced memory bank. To ensure memory efficiency, we further design a saliency-guided activation sparsification and gradient reconstruction scheme, which drastically reduces memory overhead without sacrificing gradient integrity. Extensive evaluations demonstrate that BOSA achieves superior accuracy with a compact memory consumption compared to state-of-the-art methods under realistic TTA settings.
    Computer VisionApplications and SystemsComputer VisionEfficiency and OptimizationComputer VisionMachine learning for vision
  347. #3666

    4DVarGen: A 4D Variational-Inspired Generative Model for Eddy-Resolving Surface Ocean Reconstruction

    Junpeng Huang, Wuxin Wang, Xiaoyong Li, Juan Zhao, Senliang Bao, Di Zhang, Difu Sun
    Sea surface variable reconstruction from sparse observations is a key ocean-science challenge. Traditional methods, such as the four-dimensional variational (4DVar) approach, rely on numerical models for background information, leading to high computational costs. Deep learning methods are more efficient but often fail to capture eddy dynamics, resulting in limited effective resolution. We propose 4DVarGen, a 4DVar-inspired generative framework for reconstructing sea surface variable fields at eddy-resolving scales from sparse remote-sensing observations. 4DVarGen establishes a mathematical equivalence between 4DVar and an observation-guided denoising process. Its key innovation is injecting the observation-likelihood gradient into denoising iterations, driving the generated trajectories to evolve in the direction of minimizing the 4DVar objective function toward a maximum a posteriori solution. Spatiotemporal priors learned by a diffusion model serve as background information, reducing computational costs and mitigating the adverse effects of Gaussian assumptions. Experiments show that 4DVarGen effectively leverages the temporal evolution patterns of sea surface temperature (SST) and sea surface height (SSH), as well as their dynamical mappings learned by the diffusion model, leading to improved reconstruction accuracy and effective resolution. Our model, pretrained on GLORYS12V1 reanalysis data, generates sea surface variable fields guided by real observations, achieving accuracy and effective resolution improvements of 18% and 58%, respectively, compared to GLORYS12V1. This study offers a novel framework for reconstructing Earth system states from sparse observations.
    Computer VisionMultimodal learningMachine LearningApplicationsMultidisciplinary Topics and ApplicationsEnergy, environment and sustainability
  348. #3685

    NPRIP: Nucleus-to-Periphery Retrieval-Iterative Prompting for Improved Abstractive Summarization in Low-Resource Mongolian

    Menghan Li, Nier Wu, Yang Liu, Yatu Ji, Shuo Sun
    Large language models often face challenges in low-resource agglutinative language text summarization tasks due to poorly designed prompts, leading to core information dilution, reduced fidelity, and critical information loss caused by the complex grammatical structures of agglutinative languages. For traditional Mongolian, a typical low-resource agglutinative language, this paper proposes a Nucleus-to-Periphery Retrieval-Iterative Prompting (NPRIP). This method first guides the model to extract highly condensed semantic core information (events, persons, time, etc.) from the original text. Subsequently, through multiple rounds of self-refinement iteration, it progressively expands peripheral details (background, causes, consequences, secondary facts, etc.). The model performs fact consistency checks, redundancy removal, and fidelity correction on the current draft, achieving gradual improvements in information completeness and fidelity. To enhance Mongolian language representation, we perform parameter-efficient fine-tuning on LLaMA3-8B using the CCMT2019 Mongolian-Chinese parallel corpus, and construct a larger abstractive Mongolian news summarization dataset MoSum along with its augmented version. Experiments on traditional Mongolian text summarization tasks demonstrate that our proposed method significantly outperforms multiple baseline models on automatic evaluation metrics including ROUGE-1, ROUGE-2, and ROUGE-L. This validates the effectiveness of core-priority structured iterative prompting in low-resource agglutinative language summarization scenarios.
    Natural Language ProcessingSummarization
  349. #3690

    FunCineForge: A Unified Dataset Pipeline and Model for Zero-Shot Movie Dubbing in Diverse Cinematic Scenes

    Jiaxuan Liu, Yang Xiang, Han Zhao, Xiangang Li, Zhenhua Ling
    Movie dubbing is the task of synthesizing speech from scripts conditioned on video scenes, requiring accurate lip sync, faithful timbre transfer, and proper modeling of character identity and emotion. However, existing methods face two major limitations: (1) high-quality multimodal dubbing datasets are limited in scale, suffer from high word error rates, contain sparse annotations, rely on costly manual labeling, and are restricted to monologue scenes, all of which hinder effective model training; (2) existing dubbing models rely solely on the lip region to learn audio-visual alignment, which limits their applicability to complex live-action cinematic scenes, and exhibit suboptimal performance in lip sync, speech quality, and emotional expressiveness. To address these issues, we propose FunCineForge, which comprises an end-to-end production pipeline for large-scale dubbing datasets and an MLLM-based dubbing model designed for diverse cinematic scenes. The pipeline enables the construction of the first television dubbing dataset, CineDub, which serves as a high-quality foundation for training and evaluation. Building on this, our dubbing model effectively captures multimodal cues and supports complex dubbing scenarios, including monologue, narration, dialogue, and multi-speaker settings. Experiments demonstrate that our approach consistently outperforms state-of-the-art methods in audio quality, word error rate, lip sync, temporal alignment, timbre transfer, and instruction following. Code and demos are available at https://funcineforge.github.io/.
    Natural Language ProcessingSpeechComputer VisionMultimodal learningNatural Language ProcessingApplications
  350. #3697

    Towards Cardinality-Aware Local Search for SAT with Cardinality Constraints

    Shuli Hu, Dian Ling, Jiaqi Li, Minghao Yin
    Satisfiability (SAT) with cardinality constraints arises naturally in many practical applications, where high-level counting requirements coexist with standard Conjunctive Normal Form (CNF) clauses. Translating these constraints into CNF can destroy structural information, limiting the effectiveness of search-based heuristics. In this paper, we propose a cardinality-aware local search framework to solve this problem, denoted as CardSAT-LS. CardSAT-LS integrates a preprocessing phase based on the generalized unit propagation and resolution, a cardinality-sensitive scoring function combining the make-break mechanism and cardinality violation, and an initialization based on fake-backbone variables. Furthermore, CardSAT-LS employs a unified framework that adaptively alternates between flip and swap operators when the search gets trapped in local optima. Finally, we conduct experiments on five public benchmarks from real-world applications as well as the MaxSAT and SAT competitions. Compared with ten state-of-the-art competitors, including SAT, MaxSAT, and PB solvers, CardSAT-LS solves the most instances with the lowest PAR-2 score. Additionally, we integrate CardSAT-LS into exact solvers for phase selection, which leads to significant speedups.
    Constraint Satisfaction and OptimizationSatisfiabiltySearchCombinatorial search and optimisationSearchLocal search
  351. #3699

    LISA: Language-guided Interference-aware Spatial-Frequency Attention for Driver Gaze Estimation

    Jun Ma, Zhenye Yang, Ruichen Zhou, Pei Zhang, Huan Li, Jinpeng Chen
    Driver gaze estimation serves as a fundamental metric for evaluating driver attentiveness in modern monitoring systems. Beyond being vulnerable to sudden lighting changes and sensor noise, spatial-domain models struggle to disentangle authentic gaze cues from irrelevant visual attributes. In this paper, we propose LISA, a Language-guided Interference-aware Spatial-Frequency Attention framework that combines frequency-domain priors with vision-language knowledge. Observing that the amplitude spectrum remains relatively stable even under spatial perturbations, we design a dual-domain fusion mechanism. It integrates stable low-frequency semantics into high-frequency details, employing spatial attention to precisely target ocular regions. To reduce semantic ambiguity, we also introduce a training-time disentanglement strategy. Using a frozen CLIP encoder and orthogonal regularization, we explicitly separate gaze features from appearance interference. Experiments on two benchmarks show that LISA achieves state-of-the-art performance, with significantly improved robustness against occlusions and lighting variations. The code repository is available at https://github.com/Mason-bupt/LISA.
    Data MiningApplicationsMachine LearningMulti-modal learning
  352. #3705

    LISTEN to Your Preferences: An LLM Framework for Multi-Objective Selection

    Adam S. Jovine, Tinghan Ye, Francis Bahk, Jingjing Wang, Matthew Ford, David B. Shmoys, Peter I. Frazier
    Human experts often struggle to select the best option from a large set of items with multiple competing objectives, a process bottlenecked by the difficulty of formalizing complex, implicit preferences. To address this, we introduce \textbf{LISTEN} (\textbf{L}LM-based \textbf{I}terative \textbf{S}election with \textbf{T}rade-off \textbf{E}valuation from \textbf{N}atural-language), an agentic LLM-based framework that treats the LLM as a decision-making agent capable of iteratively refining its internal preference model and taking actions (e.g., proposing utilities or selecting candidates) to maximize alignment with a user's implicit goals. To operate within LLM constraints like context windows and inference costs, we propose two iterative algorithms: \textbf{LISTEN-U}, which uses the LLM to refine a parametric utility function, and \textbf{LISTEN-T}, a non-parametric method that performs tournament-style selections over small batches of solutions. Evaluated on diverse tasks including flight booking, shopping, and exam scheduling, our results show LISTEN-U excels when preferences are parametrically aligned (a property we measure with a novel concordance metric), while LISTEN-T offers more robust performance overall. This work explores a promising direction for steering complex multi-objective decisions directly with natural language, reducing the cognitive burden of traditional preference elicitation.
    Humans and AIPersonalization and user modelingKnowledge Representation and ReasoningPreference modelling and preference-based reasoningNatural Language ProcessingLanguage models
  353. #3708

    Bridging the Biophysical Gap: Holistic Environmental Awareness for 3D Linker Design

    Mengwei Sun, Chengwei Ai, Xiaoyi Liu, Shiqiang Ma, Qiaozhen Meng, Fei Guo
    3D molecular linker design is a critical task in structure-based drug discovery, which requires the precise synthesis of chemical bridges to connect fragments within the constrained environment of a protein binding pocket. Existing methods often suffer from environmental blindness, treating the inter-fragment space as a vacuum and yielding candidates with poor binding affinity or severe steric clashes. To address this, we propose LinkerBridge, an equivariant framework that unifies biochemical semantics with physical constraints through two innovations: a Contextual Interaction-Aware Representation module that internalizes pre-existing biochemical semantics, and a Differentiable Physical Guidance mechanism derived from Van der Waals potentials to steer generation away from collision zones. Extensive evaluations on ZINC, GEOM, and BindingMOAD benchmarks, as well as Hsp90 and JNK3 case studies, demonstrate that LinkerBridge significantly outperforms state-of-the-art methods in real-world drug discovery.
    Machine LearningApplicationsMultidisciplinary Topics and ApplicationsBioinformaticsMultidisciplinary Topics and ApplicationsHealth and medicine
  354. #3719

    CoDA: Co-adaptive Dual-path Alignment for Vision-Language Models

    Yi Zhang, Rui Zhu, Channi Li, Xiaoxu Li, Zhanyu Ma, Jing-Hao Xue
    Parameter-efficient adaptation methods, such as adapters and prompt learning, have become popular for transferring pre-trained vision-language models (VLMs) to downstream tasks. However, under limited supervision during adaptation, overly aggressive cross-modal alignment can distort the intrinsic structure of modality-specific representations, leading to degraded generalization. In this paper, we propose Co-adaptive Dual-path Alignment (CoDA), a new adaptation framework that explicitly disentangles and coordinates cross-modal semantic alignment and intra-modal structural consistency. In CoDA, we also propose a parent class labeling strategy that injects hierarchical semantic priors into textual prompts, further stabilizing alignment under limited supervision. Extensive experiments on eleven benchmark datasets show that CoDA outperforms state-of-the-art parameter-efficient methods, particularly under few-shot learning and distribution-shift scenarios. Our code is available at https://github.com/yizhang-ac/CoDA.
    Computer VisionMultimodal learningComputer VisionTransfer, low-shot, semi- and un- supervised learningMachine LearningFew-shot learning
  355. #3741

    MoTRa: Motion-Aware Target Representation Learning for End-to-End Multi-Object Tracking

    Yuanzhou Huang, Songwei Pei, Shuhuai Wang, Bingfeng Liu, Qian Li, Shangguang Wang
    Multi-object tracking (MOT) has long faced challenges with identity switches, especially for targets with low appearance discriminability and complex motion. Existing end-to-end trackers typically enhance robustness by modeling long-term temporal information across target-level representations, yet these representations remain insufficiently discriminative for spatially and visually similar targets. In this paper, we present MoTRa, a Motion-aware Target Representation learning framework for end-to-end MOT that enriches target-level representations with adaptive motion cues. As a result, each representation adaptively balances motion cues and appearance-related content cues under varying conditions, enhancing its discriminability for tracking. To achieve this, we propose a feature fusion and alignment module that extracts motion cues and adaptively fuses them with content features into target representations. To further regularize the dynamically fused representations, we propose a target-specific contrastive learning strategy that promotes intra-trajectory consistency and inter-target separability. Experimental results demonstrate the effectiveness of our method, achieving competitive performance on multiple benchmarks.
    Computer VisionMotion and trackingComputer VisionRepresentation learningComputer VisionVideo analysis and understanding
  356. #3743

    Scaling Decision-Focused Learning to Large Problems with Lagrangian Decomposition

    Stéphane Eilles-Chan Way, Hugo Percot, Quentin Cappart, Tias Guns, Louis-Martin Rousseau
    Decision-focused learning has shown great promise for addressing predict-then-optimize problems, particularly in the presence of under-specified models. However, its practical deployment is often hindered by high computational costs and limited scalability, as it requires solving a constrained optimization problem for each training instance at every iteration. To address these challenges, we propose a novel framework that incorporates Lagrangian decomposition into the decision-focused learning paradigm. Specifically, we introduce a new surrogate objective along with two loss functions for evaluating and training the underlying prediction model. We further propose two variants of our approach, which offer different trade-offs between computational efficiency and solution quality. Our framework can be seamlessly integrated with standard decision-focused learning methods, including Smart Predict-then-Optimize (SPO+) and Implicit Maximum Likelihood Estimation (IMLE). Through experiments on two standard benchmarks, the multi-dimensional knapsack problem and quadratic portfolio optimization, we demonstrate that our approach achieves competitive performance while remaining amenable to parallelization. In particular, it consistently outperforms traditional decision-focused learning methods on large-scale instances, involving up to eight times more variables than those typically considered in related work. The implementation is available at https://github.com/corail-research/DFL-LD.
    Constraint Satisfaction and OptimizationConstraint optimization problemsConstraint Satisfaction and OptimizationOther
  357. #3748

    Efficient and Exact Global Attention on Latent Summaries for Knowledge Graph Reasoning

    Chenxiao Lin, Lei Wang, Yin Zhang, Wei Liu, Ye Luo, Qingqiang Wu
    Capturing global context through attention is essential for reasoning over knowledge graphs, especially when relevant entities are distant or disconnected. To scale attention to large graphs, recent methods replace Softmax with kernel feature mappings, reducing computational complexity to linear in the number of nodes. While efficient, these approximations tend to produce overly smooth attention scores, which can reduce discrimination between correct triplets and hard negative samples. Moreover, they exhibit scale-dependent shifts in attention entropy, making them sensitive to changes in graph size during inductive inference. In this paper, we introduce LaGR, a novel approach for integrating global information in knowledge graph reasoning. Rather than approximating interactions among all nodes, LaGR compresses the graph into a fixed, compact set of latent summaries and applies exact self-attention within this latent space. This change yields scale-invariant attention and stable performance across diverse data settings. In addition, we propose a node-adaptive residual fusion mechanism that dynamically balances local and global information at the node level, leading to more expressive representations. Extensive experiments on both transductive and inductive benchmarks show that LaGR substantially outperforms state-of-the-art baselines, demonstrating that exact attention over latent summaries is an efficient and effective way to capture global context in knowledge graph reasoning. Our implementation is available at https://github.com/XMU-KG/LaGR.
    Data MiningKnowledge graphs and knowledge base completionKnowledge Representation and ReasoningLearning and reasoning
  358. #3768

    One4Many-StablePacker: An Efficient Deep Reinforcement Learning Framework for the 3D Bin Packing Problem

    Lei Gao, Shihong Huang, Shengjie Wang, Hong Ma, Feng Zhang, Hengda Bao, Qichang Chen, Weihua Zhou
    The three-dimensional bin packing problem (3D-BPP) is widely applied in logistics and warehousing. However, existing learning-based approaches often neglect stability constraints and struggle to generalize across diverse bin dimensions. To address this, we propose a novel deep reinforcement learning framework, One4Many-StablePacker (O4M-SP). The primary advantage of O4M-SP lies in its ability to handle variable bin dimensions in a single training process while explicitly enforcing two types of practical stability constraints: support constraints, which ensure an item's bottom center lies within the convex hull of the underlying contact area, and weight constraints, which restrict the total vertical load on an item to its bearing capacity. Our training method introduces two innovative mechanisms. First, a weighted reward function integrates the loading rate with a novel height difference metric for packing layouts, promoting improved bin utilization via flatter packing configurations. Second, clipped policy gradient optimization with tailored policy drifting mitigates entropy collapse, encouraging exploration at critical decision nodes during packing to prevent premature convergence. Extensive experiments demonstrate that O4M-SP generalizes effectively across diverse bin dimensions and significantly outperforms baseline methods. Furthermore, the framework exhibits strong practical applicability by solving complex scenarios under strict stability constraints.
    Machine LearningReinforcement learningPlanning and SchedulingLearning in planning and schedulingSearchSearch and machine learning
  359. #3779

    ChemKGL: Bridging Knowledge Graphs and Large Language Models for Chemical Multi-Step Reaction Pathway Inference

    Fan Yang, Feiyang Xu, Kun Zhang, Huadong Liang, Pengyang Shao, Xin Li, Le Wu
    Large language models have shown promising potential in chemistry, with prior work exploring molecular recognition, classification, and property prediction. Despite the achieved progress, LLMs are still far from satisfactory when dealing with complex chemical multi-step reaction pathway inference task due to the lack of domain knowledge and limited ability to maintain consistent multi-step reasoning. To address these challenges, we propose ChemKGL, a novel multi-step reasoning framework enhanced by knowledge graph retrieval. We reformulate the task as a knowledge graph retrieval problem to better control LLM generation, and leverage graph technologies to integrate textual descriptions, reaction conditions, and materials. Based on this reformulation, we design a novel multi-retrieval strategy to improve the performance of reaction pathway inference, which includes forward retrieval, reverse retrieval, and historical retrieval. An entropy-based pruning mechanism is further introduced to alleviate cold-start issues caused by retrieval failure. In addition, we construct a comprehensive chemical knowledge graph and corresponding dataset to support model training and analysis. Finally, we employ multiple evaluation methods to assess our proposed ChemKGL, and the experimental results demonstrate its superiority. Our code and dataset can be obtained at https://github.com/Double-Sail/ChemKGL.
    Natural Language ProcessingApplicationsNatural Language ProcessingLanguage generationNatural Language ProcessingLanguage models
  360. #3791

    Environment-Aware Multiscale Geometric Interaction for Equivariant Molecular Spectral Prediction

    Haoran Li, Weiran Cui, Minghui Li
    Predicting molecular spectra requires modeling 3D conformation and solvent modulation. However, E(3)-equivariant networks based on local message passing exhibit limited sensitivity to long-range geometric dependencies, affecting the discrimination of globally distinct conformers. We introduce the Multiscale Geometric Interaction Layer (MGIL), which integrates global context by augmenting local features with centroid-referenced anchors, geometric moments, and virtual nodes. This design explicitly encodes global anisotropy while maintaining equivariance. Furthermore, we propose a Solvent Field Modulator (SFM) to encode solvent topology for conditional feature adaptation. Experiments demonstrate that MGIL enhances the capture of global structural variations, yielding consistent performance gains across spectral prediction benchmarks while maintaining linear computational efficiency.
    Machine LearningDeep learning architecturesMachine LearningGeometric learningMachine LearningRepresentation learningMachine LearningSequence and graph learning
  361. #3792

    Physically Guided Visual Mass Estimation from a Single RGB Image

    Sungjae Lee, Junhan Jeong, Yeonjoo Hong, Kwang In Kim
    Estimating object mass from visual input is challenging because mass depends jointly on geometric volume and material-dependent density, neither of which is directly observable from RGB appearance. Consequently, mass prediction from pixels is ill-posed and therefore benefits from physically meaningful representations to constrain the space of plausible solutions. We propose a physically structured framework for single-image mass estimation that addresses this ambiguity by aligning visual cues with the physical factors governing mass. From a single RGB image, we recover object-centric three-dimensional geometry via monocular depth estimation to inform volume and extract coarse material semantics using a vision-language model to guide density-related reasoning. These geometry, semantic, and appearance representations are fused through an instance-adaptive gating mechanism, and two physically guided latent factors (volume- and density-related) are predicted through separate regression heads under mass-only supervision. Experiments on image2mass and ABO-500 show that the proposed method consistently outperforms state-of-the-art methods.
    Computer VisionMultimodal learningComputer VisionVision, language and reasoningRoboticsPerception
  362. #3801

    A Novel SAM Coupling Mechanism for SAR Image Segmentation

    Yang Liu, Miao Fu, Yingqi Gao, Lin Lin, Jiarui Li, Rui Liu
    The Segment Anything Model (SAM) provides a new paradigm for visual perception tasks through large-scale pre-training and interactive prompts. However, the quality of mask generation is limited by the difficulty of aligning the optimization direction between the backbone and the prompt, which is especially significant in non-natural domain images with low contrast and indistinct semantic edge structure features, especially in SAR images. Therefore, this paper proposes a coupling optimization mechanism that integrates the originally independent prompt generation process with the backbone into a closed loop through the Point Impact Decomposition (PID) module to guide the iterative optimization of the prompt. Specifically, this paper introduces the concept of PID in the decoder to quantify the contribution of prompt points to the masks, constructing an interpretable reward variable to drive the reinforcement learning prompt optimizer. This optimizer designs a reward function based on hypothesis testing ideas, achieving iterative updates of prompt points, and providing feedback on the effects of mask generation based on prompt points. Experiments show that this mechanism can provide a prompt that fits image features; its performance exceeds the existing baseline, which effectively improves the generalization ability of SAM on SAR images.Code is available at https://github.com/Fm336/PID-SAM.
    Computer VisionSegmentation, grouping and shape analysisMachine LearningFoundation modelsMachine LearningModel-based and model learning reinforcement learning
  363. #3806

    DSSL-Hash: Dynamic Semantic Structure Learning for Unsupervised Cross-Modal Hashing

    Fan Yang, Tongxuan Pei, Yuanzhi Zhao, Yudong Zhao
    A central bottleneck in unsupervised cross-modal hashing is that both similarity guidance and objective weighting are typically static, while the semantic structure induced during training is inherently dynamic. We present DSSL-Hash, which turns semantic structure learning into an evolving process rather than a one-shot preprocessing step. Specifically, DSSL-Hash distills semantic cues from a vision–language foundation model and constructs a dynamic perception graph via an adaptive similarity matrix, enabling graph propagation to capture higher-order relations beyond pairwise matching. We further reshape Hamming-space learning with a semantic-channel constraint that explicitly regulates distances for partially similar pairs, reducing semantic distortion after binarization. Finally, we develop a Retrieval-Aware Adaptive Scheduler (RAAS) that leverages retrieval feedback to co-adjust modality interactions and objective weights, achieving robust optimization without extensive manual hyper-parameter search. Extensive experiments on MS COCO, NUS-WIDE, and MIRFLICKR-25K across multiple code lengths demonstrate that DSSL-Hash consistently outperforms recent state-of-the-art unsupervised cross-modal hashing methods.
    Computer VisionImage and video retrievalData MiningInformation retrievalMachine LearningMulti-modal learningMachine LearningMulti-view learningNatural Language ProcessingInformation retrieval and text mining
  364. #3817

    Improving Code Translation with Syntax-Guided and Semantic-aware Preference Optimization

    Yuhan Wu, Huan Zhang, Wei Cheng, Chen Shen, Jingyue Yang, Wei Hu
    LLMs have shown immense potential for code translation, yet they often struggle to ensure both syntactic correctness and semantic consistency. While preference-based learning offers a promising alignment strategy, it is hindered by unreliable semantic rewards derived from sparse test cases or restrictive reference translations. We argue that a robust semantic reward for code translation must be derived directly from the source code. In this paper, we propose CTO to improve code translation with syntax-guided and semantic-aware preference optimization.
    Through contrastive learning, we train a cross-lingual semantic model to directly assess functional equivalence between source and translated code. By formulating code translation as a multi-objective optimization problem, this robust semantic signal is seamlessly unified with compiler-based syntactic feedback within the direct preference optimization framework. Extensive experiments on C++, Java, and Python translations demonstrate that CTO significantly outperforms existing baselines and alternative preference optimization strategies.
    Natural Language ProcessingApplicationsNatural Language ProcessingMachine translation and multilinguality
  365. #3826

    Learning to Remove Coupled Rain and Mist from Single Degradation Priors

    Yan Zhang, Yuxin Feng, Zhe Huang, Fan Zhou, Zhuo Su
    Existing image deraining models struggle to handle complex real-world scenarios with rain-mist coupled degradation. The core challenge stems from the high visual similarity and spatial coupling between rain streaks and mist, making it difficult to accurately model their joint degradation patterns. Consequently, both specialized deraining models and all-in-one models for multiple degradations are ineffective in rain-mist coexistence scenarios. To address these challenges, we propose a novel Rain-Mist Removal (RMR) framework. It effectively utilizes single degradation priors from existing deraining and dehazing datasets to model the joint degradation, thereby achieving effective rain streak removal while preserving background structures obscured by mist. To enhance the generalization to real-world scenarios, we leverage text prompts trained in the CLIP perceptual space to drive the generated results toward real samples. Extensive experiments demonstrate that the proposed RMR outperforms state-of-the-art methods in rain-mist coexistence scenarios.
    Computer VisionLow-level Vision
  366. #3831

    VFM-Dynamo: Accurate Non-rigid Motion Identification via Vision Foundation Model for Self-supervised Monocular Depth Estimation

    Qianqian Du, Hui Yin, Xingyu Miao, Zhengyin Liang
    Accurate identification of non-rigid motion is crucial for geometric validity in self-supervised monocular depth estimation (MDE), yet it remains challenging for current methods. Inspired by Human Visual Perception (HVP), we present VFM-Dynamo, an efficient self-supervised MDE framework that disambiguates non-rigid objects by combining a vision foundation model (VFM) such as Grounding DINO and SAM with coarse to fine motion identification strategy: 1) Integrating visual-textual feature dependencies, we activate the capability of Grounding DINO global understanding with text prompts to detect potential object bounding boxes. 2) We then estimate coarse motion mask from reprojection error and photometric consistency to separate static from dynamic content. This mask provides coarse guidance for constraining rigid flow in static regions through two analogous estimates aligned by a consistency loss, and it drives a non-dynamical suppression module that removes boxes associated with static objects. 3) The remaining boxes are used to prompt SAM to produce refined motion identification, which in turn guide the joint optimization of depth and optical flow. Additionally, due to foreground–background mixing, standard interpolation-based upsampling often produces boundary artifacts. We introduce a learnable neighbor affinity interpolation (LNAI) module, which directly upsamples depth to full resolution and can be seamlessly integrated as a plug-and-play component. Experiments on a series of benchmarks demonstrate that proposed framework achieves state-of-the-art performance among unsupervised methods. Code is available at \url{https://github.com/Qianqian3764/VFM-Dynamo}.
    Computer Vision3D computer visionComputer VisionImage and video synthesis and generationComputer VisionMotion and trackingComputer VisionVideo analysis and understandingComputer VisionVision, language and reasoning
  367. #3847

    Survival Fully-Collapsed Copula Mixed Membership Blockmodel

    Yanxing Song, Richard Yi Da Xu
    Survival models on networks describe the time-evolving dynamics of complex systems, but existing frameworks suffer from two key limitations: First, many past approaches implicitly assume independent censoring, which can introduce substantial bias. Second, point-process-based models often emphasize only sending behavior and overlook shared latent structure between interacting nodes. Recent work has incorporated copulas into deep survival models to mitigate censoring bias, but these methods do not capture topological role structure. Another line of research combines survival analysis with the Mixed-Membership Stochastic Blockmodel (MMSB) to model heterogeneity via role-specific hazard rates; however, it typically assumes structural independence and therefore misses important correlations between roles. To bridge these gaps, we apply an improved Copula–MMSB mechanism and propose the first Survival Fully-Collapsed Copula Mixed Membership Blockmodel. Our framework jointly models time-to-event dynamics and complex dependencies among latent roles. On the Enron benchmark, it surpasses state-of-the-art survival blockmodels in both predictive likelihood and structural interpretability. Additionally, the model admits an exact closed-form discrete distribution over role pairs, which enables a fully collapsed Gibbs sampler and avoids the slow mixing and poor convergence common in non-conjugate inference. In simulations, this yields a 100× improvement in sampling efficiency over semi-collapsed baselines.
    Machine LearningBayesian learningMachine LearningProbabilistic machine learningMachine LearningSequence and graph learning
  368. #3848

    Dual-View Self-Supervised Pre-Training for Expert Finding

    Mingqiao Zhang, Hongtao Liu, Yinghui Wang, Yumeng Wang, Qiyao Peng
    Expert finding plays a crucial role in Community Question Answering platforms by routing questions to the most suitable answerers. The key challenge lies in learning high-quality representations for questions and experts. Most existing methods rely on limited supervised signals such as expert–question interactions, and typically focus on modeling only one side of the expert–question pair, suffering from data sparsity and incomplete representation. In this paper, we propose a Dual-view Self-Supervised pre-training framework for Expert Finding (SSEF) that simultaneously pre-trains expert and question representations from large-scale unlabeled data. Specifically, on the expert view, we design a self-supervised module with two data-augmentation strategies, namely historical behavior cropping and reordering, and optimize expert representations via contrastive learning over augmented sequences of historically answered questions. On the question view, we apply analogous augmentation strategies to capture intrinsic semantic differences among questions. The two view-specific modules are unified through multi-task learning with shared PLM parameters, enabling the model to capture latent semantic relatedness across views. Extensive experiments on six real-world CQA datasets demonstrate that SSEF consistently outperforms existing methods, and further analysis confirms its effectiveness under zero-shot settings and its transferability to other models.
    Data MiningRecommender systemsNatural Language ProcessingInformation retrieval and text mining
  369. #3864

    InterLight: Leveraging Intrinsic Illumination Priors for Low-Light Image Enhancement

    Ziqi Wang, Xu Zhang, Laibin Chang, Shi Chen, Jiaqi Ma, Huan Zhang
    Low-Light Image Enhancement (LLIE) has long been a challenging problem in low-level vision, as insufficient illumination often leads to low contrast, detail loss, and noise. Recent studies show that deep learning-based Retinex theory can effectively decouple illumination and reflectance. However, existing methods frequently suffer from over-enhancement or color distortion, and often assume uniform noise or ideal lighting. To address these limitations, we propose InterLight, a novel framework that systematically excavates and operationalizes intrinsic illumination priors for LLIE. Our core insight is that robust enhancement requires not just estimating illumination, but constructing an illumination-aware pipeline. We first inject sensor-level illumination-response priors via physics-guided augmentation, then represent the degradation through adaptive prompts conditioned on the scene's latent illumination state. This explicit representation directly guides a luminance-gated intrinsic memory mechanism to selectively compensate for information loss, prioritizing reconstruction in dark regions while preserving fidelity in bright ones. Finally, the entire process is regularized by a self-supervised consistency objective that distills illumination-invariant features. By deeply exploiting intrinsic illumination priors, our method achieves clearer textures and more visually coherent enhancement results. Extensive experiments across multiple benchmarks demonstrate the effectiveness of our approach. Code is available at: https://github.com/House-yuyu/InterLight.
    Computer VisionLow-level Vision
  370. #3874

    MarCon: Max-Margin Contrastive Learning for Imbalanced Domain Adaptation Semantic Segmentation

    Yibo Wang, Ruikang Xu, Guangcheng Zhu, Cheng Peng, Haobo Wang, Runze Wu, Minmin Lin, Changjie Fan
    Unsupervised Domain Adaptation for Semantic Segmentation (UDA-SS) has seen significant progress in recent years. Existing UDA-SS approaches mostly adopt a pseudo-labeling schema to adapt model in the target domain, but they often overlook the inherent long-tailed data distribution in segmentation. We find that such scarce tail samples can lead to representation collapse for tail classes and further hinder the quality of pseudo-labels. To address these challenges, we propose MarCon, a framework that mitigates the long-tailed problem from both feature representation and pseudo-labeling perspectives. Specifically, to explicitly learn a Maximum-Margin Distribution, we derive a reformulated pixel-level contrastive learning objective by modeling feature distributions with the von Mises-Fisher (vMF) distribution. It enforces strict margins to enhance intra-class compactness and inter-class separability, preventing tail classes from being overwhelmed by head classes. Furthermore, to mitigate label noise, we introduce a Reliability-aware Filter (RaF) based on the vMF-derived metrics, which performs adaptive class-wise pixel reliability assessment to identify unreliable pixels and attenuate their contribution to model training, thereby mitigating confirmation bias. Extensive experiments on GTA → Cityscapes and SYNTHIA → Cityscapes demonstrate that MarCon consistently outperforms current leading methods across various transformer-based architectures.
    Machine LearningWeakly supervised learning
  371. #3877

    FedCIGAR: A Personalized Reconstruction Approach for Federated Graph-Level Anomaly Detection

    Yunfeng Zhao, Yixin Liu, Qingfeng Chen, Shiyuan Li, Yue Tan, Shirui Pan
    Graph-level anomaly detection (GLAD) is crucial for ensuring the reliability of graph-driven applications by identifying abnormal graphs that deviate from the majority. Considering the privacy concerns in distributed scenarios, federated graph-level anomaly detection (FedGLAD) has emerged as a promising solution to enable collaborative detection without sharing raw data. However, existing methods suffer from poor generalization due to the reliance on unrealistic synthetic anomalies and insufficient personalization capabilities under data heterogeneity. To address these challenges, we propose a novel Federated graph-level anomaly detection approach with Cluster-adaptIve GAted Reconstruction (FedCIGAR). Specifically, we design a reconstruction-based paradigm trained on normal graphs to avoid synthetic data. Furthermore, we introduce a client-side node contribution gating mechanism and a server-side sliding window-based clustering strategy to tackle data heterogeneity. Extensive experiments demonstrate that FedCIGAR achieves superior performance and robustness compared to state-of-the-art methods.
    Data MiningAnomaly/outlier detectionData MiningMining graphsMachine LearningFederated learningMachine LearningUnsupervised learning
  372. #3885

    A Fast and Unified Partial Dependence Plot Algorithm

    Ron Wettenstein, Alexander Nadel, Udi Boker
    Partial Dependence Plots (PDPs) visualize how changes in a single feature affect the average model prediction. They are widely used in practice to interpret decision tree ensembles and other machine learning models. Joint-PDPs extend this idea to pairs of features, revealing their combined effect. Partial Dependence Interaction Values (PDIVs) measure feature interactions. The Any-Order-PDIVs task computes these interactions for every feature subset across all rows of the dataset.

    We introduce WOODELF++, a unified and efficient approach for computing all these useful explainability tools on decision tree ensembles, building on WOODELF, an algorithm for efficient SHAP computation. By deriving suitable metrics over pseudo-Boolean functions, WOODELF++ can compute PDPs (exact and approximate), Joint-PDPs, and Any-Order-PDIVs in a unified framework. Our method delivers substantial complexity improvements over the state of the art, including an exponential gain for Any-Order-PDIVs. Additionally, we introduce and efficiently compute Full PDPs, which leverage the model’s split thresholds to faithfully capture its behavior across all possible feature values.

    WOODELF++ is implemented in pure Python and supports GPU acceleration. On a dataset with 400,000 rows, WOODELF++ computes PDP and Joint-PDP up to 6x faster than the state of the art and up to five orders of magnitude faster than scikit-learn. For Any-Order-PDIVs, the gap is even larger: WOODELF++ computes all interaction values in 5 minutes, while the state of the art is estimated to require over 1,000,000 years.
    Machine LearningEnsemble methodsMachine LearningExplainable/Interpretable machine learningMachine LearningGame Theory
  373. #3903

    Propagating Unsafe Actions in LLM Controlled Multi-Robot Collaboration via Single Robot Compromise

    Zhen Huang, Zhihuang Liu, Mengxuan Luo, Weishang Wu, Zhiping Cai
    Large language models (LLMs) are increasingly used as general planners in embodied intelligence, enabling high level coordination and low level task planning for both single robot and multi-robot collaboration. This increasing reliance on embodied LLM planners also raises critical security concerns, since misaligned or manipulated instructions can be translated into physical actions. Prior work has studied such threats in single robot settings, while security risks in LLM controlled multi-robot collaboration, especially those propagated through inter robot communication, remain largely unexplored. To bridge this gap, we propose a novel attack paradigm for multi-robot system in which the adversary interacts with only a single entry robot. The compromised robot then propagates malicious intent through peer communication, leading to coordinated unsafe actions across the system. Our evaluation, covering high risk dimensions of dereliction of duty, privacy compromise, and public safety hazards, reveals a persistent safety alignment gap in multi-robot planners. We quantify this process with three metrics, obedience, infectiousness, and stealthiness. Experiments demonstrate both persistent attacker control and rapid propagation: obedience reaches 1.00 in the strongest cases, and infectiousness rises to 0.90. Notably, the attack is highly efficient, requiring as few as 3.0 rounds to compromise all the robots while maintaining a stealthiness score of 0.81. Such risks are amplified when robots must resolve trade offs in critical situations, such as emergencies or conflicts of rights, because the coordination mechanism can unintentionally allow adversarial instructions to override safety requirements. The code is available at https://github.com/TheFatInsect/InfectBot.
    AI Ethics, Trust, FairnesSafety and robustnessAI Ethics, Trust, FairnesTrustworthy AIRoboticsMulti-robot systems
  374. #3908

    GBD-DRP: Generalizable Bias-Disentangled Learning for Drug Response Prediction with Missing Modalities

    Biyang Zeng, Shikui Tu
    Predicting the response of anticancer drugs for specific cancer cell lines remains an important challenge in cancer treatment, largely due to the diversity among cell lines. Existing machine learning methods usually perform well on cell lines seen during training, but their performance often drops when applied to new cell lines. Many approaches rely heavily on cell-line specific features, which limits their ability to generalize. At the same time, methods that focus mainly on generalization often fail to capture differences in drug response across distinct cellular environments. Some studies improve prediction accuracy by introducing additional and more detailed cell line features. However, these features are difficult to collect and are frequently missing, especially for newly studied cell lines. To address these issues, we propose a generalizable bias-disentangled framework for drug response prediction. Our method separates drug-related patterns that are shared across cell lines from representations that reflect cell-line specific environments. We further isolate real but non-generalizable biases that arise from noise or unobserved factors. This design helps preserve accurate predictions for specific cell lines while improving performance on unseen ones. In addition, the bias-related branch supports counterfactual consistency learning when some modalities are missing, allowing the model to handle incomplete data. Experimental results show that GBD-DRP achieves state-of-the-art performance, and additional tests on unseen cell lines and missing-modality settings indicate good generalization in practice.
    Machine LearningRepresentation learningMachine LearningSupervised LearningMultidisciplinary Topics and ApplicationsBioinformatics
  375. #3940

    Pivot-Centric Trajectory Prediction: Bridging Long Horizons via Dynamical Guidance

    Xiucong Zhao, Jindong Tian, Hao Miao
    Forecasting precise future motion of surrounding agents is essential for reliable autonomous vehicles. However, as the demand for longer prediction horizons increases, existing endpoint-completion or iterative-refine methods increasingly struggle with weak guidance and compounding errors. To tackle the long-horizon prediction challenge, we propose Pivot-Centric Trajectory Prediction (PCTP). By introducing "pivots" and focusing on predicting pivot points along extended trajectories, we divide the long-term prediction task into short-term sub-tasks at various scales. Specifically, PCTP decouples the long-term trajectory predicting process into two processes: pivot prediction and pivot-based trajectory refinement. The pivot prediction process aims to utilize global map context and agent-to-agent interactions to identify these "pivot points", while the pivot-based trajectory refinement process focuses on local map details and refines the short-term trajectory based on predicted "pivot points". Compared with existing methods, PCTP provides more intermediate guidance while reducing compounding errors. Moreover, PCTP is a flexible approach that can be integrated into most state-of-the-art trajectory prediction models. Experimental results show that PCTP improves the prediction accuracy of leading models on both Argoverse I and Argoverse II datasets with minimal impact on model size. Specifically, PCTP combined with QCNet outperforms all published ensemble-free methods on the Argoverse II leaderboard at submission.
    Data MiningMining spatial and/or temporal dataAIMachine Learning
  376. #3958

    Tab-semiSL: Tabular Data-Driven Semi-Supervised Learning to Identify Factors Associated with Immune-Related Adverse Events

    Ruhao Liu, Suixue Wang, Hang Yu, Peng Li, Qingchen Zhang
    Immune Checkpoint Inhibitors (ICIs) have become a major therapeutic strategy in cancer treatment. However, widespread ICIs use can cause mild-to-severe immune-related Adverse Events (irAEs). Identifying factors associated with irAEs is beneficial for assessing the risk of irAEs occurrence during ICIs treatment. Nevertheless, the lack of sufficient irAEs-related clinical tabular data has slowed progress in this field. Thus, we proposed Tabular data-driven semi-Supervised Learning (Tab-semiSL), a novel Semi-Supervised Learning (SSL) method specifically designed for tabular data. Tab-semiSL can be divided into the pre-training phase and SSL phase. Specifically, in the pre-training, we used the collected dataset of 4,817 cancer patients without the irAEs label to train an auto-encoder. In the SSL phase, the predictor was fine-tuned using the collected dataset of 237 irAEs labeled cancer patients who treated with ICIs. We assigned class-specific thresholds and used samples as pseudo-labels only when their maximum confidence exceeded the threshold, with the thresholds dynamically adjusted during training to improve reliability. We tested Tab-semiSL against three traditional machine learning approaches and five semi-supervised or self-supervised methods using two public datasets plus the collected clinical data, achieving outperformed results across most metrics. SHapley Additive exPlanations (SHAP) analysis revealed ten key factors linked to irAEs. Code and Supplementary Material are available at https://github.com/RuhaoLiu/Tab-semiSL
    Multidisciplinary Topics and ApplicationsBioinformaticsMultidisciplinary Topics and ApplicationsHealth and medicineMultidisciplinary Topics and ApplicationsLife sciences
  377. #3962

    Efficient Minimization of Decision-DNNF Circuits via Semantic Hashing and Provenance Tracking

    Armin Biere, Jean-Marie Lagniez, Emmanuel Lonca
    Knowledge Compilation transforms propositional formulas into tractable structures like decision-DNNF to support efficient reasoning. However, these representations often suffer from exponential size, and standard minimization via SAT sweeping is computationally prohibitive for large instances. In this paper, we propose a scalable minimization framework for decision-DNNF that eliminates the need for SAT solvers. We introduce a semantic hashing technique leveraging polynomial-time model counting to rapidly filter redundancies, followed by a polynomial-time verification strategy based on CNF projection. Our experimental evaluation demonstrates that this approach efficiently compresses decision-DNNF circuits while avoiding the bottleneck of NP-hard equivalence checks.
    Knowledge Representation and ReasoningKnowledge compilation
  378. #3976

    MindCopilot: Towards Formalizing and Evaluating Granular Human-LLM Co-Writing

    Youqing Fang, Yinhao Tang, Yanan Sun, Jiangning Liu, Ziyi Wang, Xun Zhao, Bin Liu, Weiming Zhang, Kuikun Liu, Wenwei Zhang, Kai Chen
    Recent writing assistants are increasingly shifting from passive, prompt-driven interaction to proactive, suggestion-based completion, which integrates localized continuations into the writing flow and reduces coordination burden. However, existing evaluations simply focus on output quality, failing to capture how users accept, edit, or repair suggestions in real-time interaction, and thus obscuring the true usability of proactive co-writing systems. To address this gap, we adopt a sequential, behavior-centered view of interactive writing and formalize co-writing as a Human-in-the-Loop Markov Decision Process, modeling writing as an interaction shaped by user acceptance and editing decisions. Based on this formulation, we introduce the Co-Writing Fidelity Suite, an interaction-aware metric suite that captures both user–assistant alignment and cognitive editing effort, including Hierarchical Acceptance Rate and Knowledge-aware Editing Distance. We conduct a large-scale simulation study across 16 writing domains, using 1,688 controlled continuation queries sampled from different writing stages. Our analysis reveals systematic effects of interaction structure on acceptance behavior and editing cost. A follow-up user study with 30 participants confirms that these behavioral patterns align with real user experience. Together, our findings demonstrate that interaction-aware evaluation provides insights beyond output-only metrics and informs the design of more effective proactive writing assistants.
    Agent-based and Multi-agent SystemsCoordination and cooperationHumans and AIHuman-AI collaborationHumans and AIHuman-computer interaction
  379. #3980

    Empowering Self-Balance of Deep Information Bottleneck for Multimodal Clustering

    Youwei Wang, Qiwei Miao, Haichuan Fang, Yangdong Ye, Shizhe Hu
    Multimodal clustering (MMC) focuses on learning consistent representations through fusing discriminative features from each modality in an unsupervised fashion. Recently, information bottleneck-based MMC methods transform the representation learning into a non-redundant multimodal feature puzzle process. However, they depend on manually setting the trade-off parameter $\beta$ to compress task-irrelevant features while preserving task-relevant features, which severely impairs final clustering results. In addition, manually setting $\beta$ overlooks both the \textit{information quality balance} and the \textit{information quantity balance}, hindering the learning of compact and meaningful representations for discovering cluster patterns. In this work, we propose a novel $\beta_{qq}$-guided information bottleneck ($\beta_{qq}$ IB) for addressing the aforementioned problems. The core of $\beta_{qq}$ IB is a self-balance mechanism that consists of a $\beta_{quality}$ component and a $\beta_{quantity}$ component. The $\beta_{quality}$ component aims to achieve an information quality balance by modeling the relation between task characteristics and data sample scale. Meanwhile, the $\beta_{quantity}$ component intends to realize information quantity balance through modeling the dynamic changes of data scale within the deep variational architect. Experiments on six benchmark datasets demonstrate the superiority of $\beta_{qq}$ IB method. To the best of our knowledge, this is the first work to investigate self-balance learning in IB-based multimodal clustering.
    Machine LearningClusteringMachine LearningMulti-modal learningMachine LearningMulti-view learningMachine LearningUnsupervised learning
  380. #4020

    ViSA-Gait: Leveraging Vision Foundation Models for Semantic Anchored Gait Recognition

    Xiangru Li, Deqiang Yin, Yifan Xie, Guojian Li, Zebang Cheng, Fei Ma
    Gait recognition has achieved remarkable success in constrained environments, yet its performance often degrades significantly in cross-domain and cross-vertical-view scenarios. This is primarily due to the fact that domain-specific silhouette geometry causes models to overfit to extrinsic geometric characteristics rather than learning generalizable motion patterns. To address this issue, we present ViSA-Gait, which utilizes Vision Foundation Models (VFMs) as a “Semantic Compass” for stable universal human body knowledge guidance, enabling a lightweight backbone to robustly capture fine-grained gait dynamics. Specifically, we introduce a token-based distillation mechanism where a Spatial Token Learner (STL) and a Temporal Token Learner (TTL) filter dense VFM features into motion-consistent descriptors. These descriptors are then adaptively injected into a lightweight 3D-CNN backbone via a Gated Cross-Attention (GCA) mechanism, functioning as “Semantic Anchors” that regularize the feature space and guide the model to focus on intrinsic body motion. Extensive experiments demonstrate that ViSA-Gait achieves SOTA performance on cross-domain and cross-vertical-view benchmarks while remaining competitive within-domain, offering a new perspective on bridging the gap between geometry-based analysis and universal semantic understanding.
    Computer VisionLow-level VisionComputer VisionMachine learning for visionComputer VisionRecognition (object detection, categorization)Computer VisionRepresentation learning
  381. #4040

    OrienDiffusion: Taming Diffusion Towards Fine-grained Portrait Matting

    Zhengzheng Tian, Wei Ma, Hongbin Zha
    Recently, diffusion models have been applied to reformulate the regressive portrait matting task as a gradual denoising-driven generative process, alleviating the inherent limitations of discriminative models via iterative error correction. However, due to the lack of high-quality ground-truth mattes, existing generative methods for portrait matting still suffer from severe detail degradation, especially in structurally delicate hair strands and pixels with ambiguous colors and context. To address these challenges, we propose OrienDiffusion, a diffusion-based framework tailored for fine-grained portrait matting. Specifically, we construct a Hair Orientation Field to conditionally guide the denoising process, constraining the generation toward authentic structural consistency in hair regions. Furthermore, instead of relying on overly encoded deep features or ambiguous original color cues as existing methods do, we develop a Pixel Color-anchored Local Structure Embedding approach, which models alpha matte transitions as local structural variations, achieving accurate alpha estimation in pixels with ambiguous colors or context. Extensive experiments on multiple portrait matting datasets demonstrate that OrienDiffusion achieves state-of-the-art performance in these matting challenging regions. Code is available at https://github.com/xtz0001/OrienDiffusion.
    Computer VisionImage and video synthesis and generationComputer VisionSegmentation, grouping and shape analysis
  382. #4043

    Uniform Interpolation Closure for Branching Time Temporal Logics

    Renyan Feng, Yisong Wang, Mingsen Deng, Erman Acar
    Computation Tree Logic (CTL) is a fundamental formalism for specifying and reasoning about the behavior of non-terminating systems. Due to the lack of uniform interpolation (UI) property, CTL is usually limited in modular reasoning and system abstraction. This paper studies uniform interpolation in CTL fragments with restricted temporal operators from the point of knowledge forgetting, aiming to identify the minimal logic extensions of these fragments that preserve UI property (referred to as their uniform interpolation closure). Our results demonstrate that: (1) CTL(X), the fragment of CTL allowing only the “neXt” operator, enjoys the UI property; (2) the UI closure of the fragment CTL(F<, X) is the bisimulation-invariant fragment of quantified CTL in prenex-normal-form, where F< is the operator “next Future”; (3) CTL(F<) fails to possess Craig interpolation, and every extension of CTL(F<) that possesses the Craig interpolation property is necessarily an extension of CTL(U) which contains only the “Until” operator. These findings provide a precise characterization of the expressive power required to achieve modularity in CTL.
    Knowledge Representation and ReasoningAutomated reasoning and theorem provingKnowledge Representation and ReasoningKnowledge representation languagesKnowledge Representation and ReasoningQualitative, geometric, spatial, and temporal reasoning
  383. #4048

    AgentRetro: Agent-Enhanced Molecular Spectral Domain Generalization Framework for Retrosynthesis Prediction Under Mixed OOD Shifts

    Jiahang Shen, Hongxin Xiang, Yiping Liu
    Retrosynthesis prediction is a cornerstone of drug discovery, enabling the synthesis of novel therapeutic candidates. However, current deep learning models falter when navigating the unexplored chemical space essential for innovation. In these realistic scenarios, models face a challenging mixed out-of-distribution (OOD) shift: they must handle both novel molecular scaffolds (covariate shift) and unseen reaction templates (label shift), leading to severe performance degradation. To address this, we propose AgentRetro, a framework integrating spectral domain generalization with Large Language Model (LLM) agents. It features a Spectral Domain Enhancer (SDE) that partitions chemical space via spectral clustering to capture diverse domain structures, and a Multi-Knowledge Agent (MKA) that enriches data with LLM-generated OOD reactions and textual rationales. These components are unified by a dual training strategy: meta-learning for template adaptability and adversarial learning for structural invariance. Crucially, our analysis reveals that explicitly decoupling structural invariance from mechanistic adaptability is essential for overcoming the interference between conflicting distribution shifts. Experiments show AgentRetro consistently improves state-of-the-art baselines, notably boosting EditRetro’s OOD Top-10 accuracy from 41.7% to 50.1%. The source code is available at https://github.com/jiahangshen/IJCAI26-AgentRetro.
    Agent-based and Multi-agent SystemsApplicationsMachine LearningAdversarial machine learningMachine LearningMeta-learningMachine LearningOpen-World/Open-Set/OOD LearningMultidisciplinary Topics and ApplicationsBioinformatics
  384. #4068

    MLLM-ITM: Multimodal Large Language Model Promotes Inverse Tone Mapping

    Jingchao Peng, Thomas Bashford-Rogers, Haitao Zhao, Kurt Debattista
    High dynamic range (HDR) imaging is crucial for capturing real-world lighting conditions.
    HDR imaging is traditionally achieved either by fusing multiple exposure frames or via inverse tone mapping from a single SDR image.
    However, the multi-exposure HDR method is prone to motion-induced artefacts and imposes demanding hardware requirements, limiting its practical applicability.
    Traditional inverse tone-mapping techniques primarily rely on pixel-wise regression methods, which ignore semantic scene contexts and thus frequently introduce halo artefacts and structural distortions.
    To address this limitation, this paper proposes MLLM-ITM, a novel inverse tone-mapping framework incorporating multimodal large language models (MLLMs).
    MLLM-ITM utilizes cross-modal features extracted from a frozen MLLM to simultaneously encode visual features and semantic understanding. These features are integrated into a downstream HDR reconstruction backbone through lightweight adapters, enabling content-aware dynamic-range expansion.
    Furthermore, the decoupled design between MLLM and HDR backbone avoids costly fine-tuning of MLLMs and remains model-agnostic, allowing effortless substitution with emerging multimodal architectures. Extensive experiments on public benchmarks demonstrate that the proposed MLLM-ITM achieves state-of-the-art performance compared with existing inverse tone-mapping methods, highlighting the effectiveness of cross-modal semantic priors in enhancing HDR imaging performance.
    Computer VisionImage and video synthesis and generationComputer VisionLow-level VisionComputer VisionMachine learning for visionComputer VisionMultimodal learning
  385. #4069

    A Faster Deterministic Algorithm for Kidney Exchange via Representative Set

    Kangyi Tian, Mingyu Xiao
    The Kidney Exchange Problem is a prominent challenge in healthcare and economics, arising in the context of organ transplantation. It has been extensively studied in artificial intelligence and optimization. In a kidney exchange, a set of donor-recipient pairs and altruistic donors are considered, with the goal of identifying a sequence of exchanges—comprising cycles or chains starting from altruistic donors—such that each donor provides a kidney to the compatible recipient in the next donor-recipient pair. These exchanges create a network of transplants aimed at maximizing the total number, t, of successful transplants. Due to constraints in medical resources, limits are often imposed on the lengths of these cycles and chains. Recently, this problem was deterministically solved in O* (14.34ᵗ) time (IJCAI 2024). In this paper, we introduce the representative set technique for the Kidney Exchange Problem, showing that the problem can be deterministically solved in O* (6.855ᵗ) time.
    Constraint Satisfaction and OptimizationConstraint optimization problemsGame Theory and Economic ParadigmsAuctions and market-based systemsGame Theory and Economic ParadigmsComputational social choice
  386. #4114

    Mutual Information Guided Reinforcement Learning for Ambiguous Label Disambiguation

    Jinfu Fan, Jiangnan LI, Xiaohui Zhong, Kangrui Ren, Linqing Huang
    Partial-label learning (PLL) addresses challenging scenarios where each instance is associated with a set of candidate labels and only one is the truth. Most existing PLL methods rely on static disambiguation heuristics, which are prone to error propagation when the ambiguity labels are high. To address this issue, we propose a novel mutual information guided reinforcement learning framework for partial label disambiguation (MR-PLL). In this framework, label disambiguation is formulated as a sequential Markov decision process, where an agent dynamically discriminates whether to retain, modify, or abstain from correcting ambiguous labels. To ensure reliable decision making, we introduce a mutual information guided gating mechanism that adaptively adjusts the confidence of soft label propagation according to the dependency between feature representations and labels. The abstention mechanism allows the model to postpone uncertain decisions, resulting in more reliable disambiguation. Furthermore, we design an information weighted reward term during the actor-critic process to gradually improve the label disambiguation ability of the policy, and provide theoretical analysis on convergence and bias reduction. Experiments on benchmark and real datasets verify the effectiveness of the proposed algorithm.
    Machine LearningClassificationMachine LearningClusteringMachine LearningMulti-label learningMachine LearningWeakly supervised learning
  387. #4141

    The Sword, Shield, and Achilles’ Heel: Characterizing the Linguistic Inductive Bias of Large Language Models for Spatial Reasoning in Navigation Planning

    Xudong Zhang, Jian Yang, Shengkai Wang, Jiangpeng Tian, Shaowen Chen, Xian Wei, Ke Li, Xiong You
    Large Language Model (LLM)-based navigation systems have commonly constructed expli cit spatial representations (e.g., topological graphs, semantic raster maps) and translated them into textual descriptions as LLMs’ inputs. However, the linguistic structures of such text-based spatial representations and the choices of contextual features (e.g., topology, geometry) they contain are often treated as neutral engineering decisions rather than key factors that shape LLMs' behavior. To address this gap, we propose a dual-interventional framework that disentangles linguistic structures from different contextual cues to evaluate the linguistic inductive bias of LLMs for navigation planning. In the framework, representation intervention varies linguistic format and the degree of linguistic compression, clarifying when linguistic representations support or inhibit navigation planning; context intervention, combined with contextual feature combination and conflict probing, explicitly clarifies the preferences and weaknesses of LLMs when processing different contextual cues. Experiments across diverse spatial reasoning tasks and multiple model scales reveal a consistent pattern: topological information is a sturdy shield and the backbone of robust planning; linguistic format is a double-edged sword whose effect depends on model size, task demands, and the compression level; and semantic information is a fatal Achilles' heel---incorrect semantic cues can systematically derail the planning process. Overall, our study shows that effective text-based spatial representations in LLM-based navigation should preserve topological integrity, calibrate representational compression to model capacity, and ensure semantic correctness, rather than simply adopting a single representation.
    AI Ethics, Trust, FairnesBiasHumans and AICognitive systemsKnowledge Representation and ReasoningQualitative, geometric, spatial, and temporal reasoningNatural Language ProcessingNatural language semanticsRoboticsMotion and path planning
  388. #4142

    DCIL: DropConnect-based Causal Imitation Learning Under Environment Heterogeneity

    Xinyu Zhang, Shihua Li, Rongjie Liu
    In many real-world scenarios, agents operate across heterogeneous environments where the underlying dynamics and data-generating processes vary. Standard reinforcement learning and imitation learning methods often fail in such settings as they typically assume stationarity and learn policies that overfit to environment-specific correlations. A key challenge is the presence of spurious correlations as observed states often contain both causal and non-causal features, with the latter introducing environment-specific biases that undermine generalization. To address this problem, we propose DropConnect-based Causal Imitation Learning (DCIL), a novel offline imitation learning framework designed to identify and exploit stable causal mechanisms across diverse environments. DCIL introduces a gradient alignment constraint that encourages the policy to align with causal structures shared across training environments. To further mitigate overfitting to spurious correlations, DCIL involves DropConnect-based regularization, injecting stochastic perturbations into network weights to simulate parameter uncertainty and reduce reliance on unstable features. We evaluate DCIL on synthetic benchmarks derived from OpenAI Gym control tasks, where non-causal features exhibiting spurious correlations are explicitly injected to simulate environmental heterogeneity. Experimental results show that DCIL outperforms state-of-the-art imitation learning baselines, achieving superior generalization to unseen environments. These findings highlight the importance of incorporating causal reasoning and structured regularization into policy learning for robust performance under environment shift.
    Machine LearningCausalityMachine LearningOffline reinforcement learning
  389. #4185

    MVRNet: A Multi-View Refinement Network for Accurate Recognition of Challenging Intracranial Aneurysms in Enhanced 3D CTA Images

    Peiying Li, Yongchang Liu, Shikui Tu, Lei Xu
    Intracranial aneurysms are life-threatening and require accurate, timely detection. Traditional manual diagnosis by radiologists can be subjective, leading to misdiagnoses, while existing deep learning approaches struggle with small aneurysms or cases complicated by surrounding tissues. In this paper, we present the Multi-View Refinement Network (MVRNet), a framework that delivers clinically meaningful performance gains on the most challenging intracranial aneurysm cases. It directly addresses real-world diagnostic difficulties, particularly for small, obscured, or ambiguously bounded lesions that conventional methods often miss. In particular, we propose a feature enhancement technique to obtain boundary-clear CTA, an informative supplement to the bone-free CTA, to further reduce the impact of noise and enhance the robustness of IA recognition. Moreover, we develop a multi-view encoder with 2D slicing in different directions to mitigate tissue occlusion effects, coupled with a progressively refined decoder that iteratively corrects uncertain predictions, ensuring precise localization and segmentation. Experimental results indicate that MVRNet improves the F1-score by 24% and the IoU by 16% in comparison with the state-of-the-art methods. Remarkably, previous methods struggled with challenging cases, while our method still achieves satisfactory results, demonstrating a >2× improvement over prior arts on challenging test sets. Our code has been released in https://github.com/TheResearchWorks/MVRNet.
    Multidisciplinary Topics and ApplicationsBioinformaticsMultidisciplinary Topics and ApplicationsHealth and medicine
  390. #4208

    Info-Driven Zero-Cost Proxy: Rethinking Vision Transformer Architecture Evaluation via Information Quantification

    Yue Yang, Jiacheng Wang, Zhenkai Yang, Menglan Hu, Gaoyang Liu, Bo Xu, Tianyue Zheng, Kai Peng
    Neural Architecture Search (NAS) automates the design of Vision Transformer (ViT) architectures. However, the high computational cost of training-based methods has made training-free, zero-cost proxies a key research direction. While existing proxies can estimate model potential, they fail to capture critical information transmission characteristics due to the black-box nature of neural networks. Consequently, they suffer from high computational latency and weak correlation with ground-truth performance. Although information transmission determines performance, it has not been effectively quantified or utilized. This limitation remains the core bottleneck for current zero-cost proxies. In this paper, we propose Info-NAS, a zero-cost proxy based on architectural information. This method achieves the first structured quantification of information transmission in ViT. Specifically, it evaluates architectures using three core components: global information volume, local information gradient, and global consistency. Info-NAS derives proxy scores solely from architectural parameters, requiring no forward or backward propagation. Consequently, the calculation and search processes are highly efficient. Moreover, we construct the ViT-Info-Bench dataset to facilitate correlation analysis and algorithm evaluation. Experimental results demonstrate that Info-NAS significantly reduces search overhead while achieving superior ranking accuracy. With a single evaluation requiring only 1.8 ms, Info-NAS outperforms existing methods in both efficiency and performance.
    Machine LearningAutomated machine learning
  391. #4218

    DeepSTE: Deep Spectral Temporal Embeddings for Dynamic Graph Representation Learning

    Qiang Huang, Ke Liu, Renjie Gong, Sijing Zhang, Hao Wang, Shanshan Feng, Xiao Yan, Jiawei Jiang
    Temporal embeddings play a crucial role in dynamic graph neural networks (DGNNs) by capturing the temporal dynamics of interactions. However, existing Random Fourier Feature (RFF)-based methods in DGNNs directly sample Fourier frequencies from a fixed, data-independent distribution $p(\omega)$, neglecting the temporal characteristics of dynamic graphs and thereby limiting representational capacity. We propose DeepSTE, a deep spectral temporal embedding framework for dynamic graphs. DeepSTE learns RFF representations via Monte Carlo importance sampling with a tractable proposal distribution $q(\omega)$ (e.g., \texttt{Gaussian}) to approximate the feature map of a shift-invariant or positive-definite kernel whose latent spectral density is analytically intractable. DeepSTE adopts a data-dependent scale parameter $\eta$, estimated from interaction intervals, to construct the frequency proposal distribution $q(\omega)$ reflecting time–frequency uncertainty. The frequency DNN $f_{\boldsymbol{\omega}}$ and the importance-weighting DNN $g_{\mathbf{w}}$, initialized from $q(\omega)$, are jointly optimized to model the importance-sampled spectral representation and learn adaptive temporal features. Experiments indicate the effectiveness of DeepSTE with average improvements of 2.32\% and 1.32\% on the tasks of dynamic link prediction and node classification, respectively, and reveal insights such as temporal embedding decay and accelerated convergence.
    Data MiningMining graphsData MiningMining spatial and/or temporal data
  392. #4257

    Syntactic Structure-Guided Visual Grounding with Subject-Centric Feature Enhancement and Verification

    Jiepeng Cai, Zhen Xu, Tiesong Zhao, Hau-San Wong, Si Wu
    Visual grounding aims to localize target objects based on natural language descriptions, and the core challenge lies in the cross-modal gap, which is partly caused by the significant differences in semantic structure between language and vision. Existing methods typically rely on holistic sentence-level semantic representations to modulate visual features, while overlooking the inherent structure of textual prompts. In this work, we propose a Syntactic Structure-guided Visual Grounding framework, referred to as SSVG. Specifically, to inject syntactic priors into unstructured visual representations, we design a semantic structure-based feature refinement module to adaptively modulate subject-centric and contextual visual features. To perform cross-modal alignment, we further incorporate a visual semantic consistency verification module, which leverages a subject-aware contrastive learning strategy to constrain and verify the semantic correspondence between the visual prediction and subject-level textual representation, thereby enhancing the model's robustness against semantically similar distractors. Extensive experiments demonstrate that our approach consistently outperforms state-of-the-art methods, and detailed analyses verify the effectiveness of each component.
    Computer VisionMultimodal learningComputer VisionRecognition (object detection, categorization)Computer VisionSegmentation, grouping and shape analysis
  393. #4266

    Dynamic Multi-Path Retrieval for Knowledge-based Visual Question Answering

    Zeyu Song, Yimin Deng, Yuxin Zhang, Guoshuai Zhao, Chengxu Liu, Jialie Shen, Xueming Qian
    Knowledge-based Visual Question Answering (KB-VQA) requires models to answer visual questions by reasoning over external knowledge beyond the given image. Existing approaches suffer from two main limitations. First, candidate knowledge is often retrieved in a single modality, either textual or visual, which prevents effective use of heterogeneous and complementary evidence. Second, current approaches typically employ fixed fusion weights, ignoring the varying importance of modalities for different queries. Consequently, irrelevant evidence is introduced while critical knowledge may be overlooked. To address these issues, we propose Dynamic Multi-Path Retrieval for KB-VQA (DMRAG). Our framework retrieves candidates through multiple retrieval paths that capture complementary visual and semantic cues. It then performs Question-Adaptive Gated Fusion (QGF) to balance contributions from different modalities according to the query’s information need. The fused candidates are further refined via multimodal rearrangement to support accurate answer generation. Experiments on the E-VQA and InfoSeek datasets show that DMRAG improves both retrieval recall and answer accuracy over prior methods, demonstrating its effectiveness for KB-VQA. Our code is available at https://github.com/qwqq335/DMRAG.
    Computer VisionImage and video retrievalComputer VisionMultimodal learningComputer VisionVision, language and reasoning
  394. #4269

    SplitScaling: Adaptive Scaling for Disaggregated LLM Serving Against Traffic Bursts via DRL

    Wei Xiao, Xuefeng Huang, Weijia Shi, Baokang Zhao
    The disaggregated Prefill-Decode (PD) architecture has emerged as a prominent paradigm for efficient Large Language Model inference serving. However, resource management remains a critical challenge, particularly under the dual burstiness of real-world scenarios—characterized by volatile fluctuations in both request arrival rates and Prompt-to-Response ratios. Existing rule-based heuristics often fail to accurately identify system bottlenecks, leading to severe resource misallocation and Service Level Objective (SLO) violations. To address this, we propose a Deep Reinforcement Learning-based auto-scaling framework tailored for the PD architecture. By modeling the resource allocation problem as a Markov Decision Process, our framework enables the agent to capture non-linear load dynamics, thereby achieving decoupled and precise scaling for prefill and decode pools. Furthermore, to mitigate Head-of-Line blocking caused by scaling latency, we design an immediate rescheduling mechanism that migrates queued tasks to newly ready nodes in real-time. Experimental results driven by Azure bursty load traces demonstrate that our framework significantly reduces computational costs by 25.2% and 28.3% compared to the Static configuration and HeteroScale, respectively, while strictly adhering to SLOs.
    Machine LearningReinforcement learningPlanning and SchedulingLearning in planning and scheduling
  395. #4284

    Fairly Dividing Non-identical Random Items: Just Sample or Match

    Aprup Kale, Navya Garg, Rucha Kulkarni
    We study the question of existence and fast computation of fair and efficient allocations of indivisible resources among agents with additive valuations. As such allocations may not exist for arbitrary instances, we ask if they exist for typical or random instances, meaning when the utility values of agents for the resources are drawn from certain distributions. In this paper, we extend the previously studied formal models of this problem to non-identical items. We assume that every item is associated with a distribution U_j, and every agent's utility value for the item is drawn independently from U_j. We show that envy-free fair and maximum social welfare efficient allocations exist with high probability in the asymptotic setting, meaning when the number of agents n and items m are large. Further, we show that when m = Ω(n log n), then by only sampling O(log m) or O((log m)^2) utility values per item instead of all the n, we can compute these allocations in Õ(m) time. Finally, we simulate our algorithms on randomly generated instances and show that even for small instances, we suffer small multiplicative losses in the fairness and efficiency guarantees and converge to fully optimal guarantees quickly.
    Game Theory and Economic ParadigmsFair divisionAIGame Theory and Economic ParadigmsMachine LearningGame TheoryMultidisciplinary Topics and ApplicationsEconomics
  396. #4290

    Phi-Actor-Critic: Steering General-Sum Games to Pareto-Efficient Correlated Equilibria

    Wongyu Lee, Francesco Lelli, Omran Ayoub, Massimo Tornatore
    Real-world multi-agent systems, from traffic coordination to resource allocation, are often modeled as general-sum games where individual incentives conflict with collective welfare. In these settings, the central challenge is not merely finding an equilibrium, but selecting socially desirable outcomes among many suboptimal Nash equilibria.
    Standard deep multi-agent reinforcement learning (MARL) methods struggle with this problem, as value-decomposition approaches are constrained by monotonicity assumptions and policy-gradient methods often converge to stable but socially inefficient equilibria.
    To address this limitation, we propose Phi-Actor-Critic (Phi-AC), a framework that leverages swap regret minimization to steer learning toward high-welfare correlated equilibria (CE). To make counterfactual regret estimation tractable in deep MARL, Phi-AC employs a centralized attention critic that predicts vector-valued regrets in a single forward pass, avoiding computationally expensive counterfactual simulations. We further introduce a Lagrangian-based equilibrium selection mechanism that optimizes social welfare while enforcing stability through regret constraints.
    Experiments on matrix games, Multi-Agent Particle Environments (MPE), and the Melting Pot Harvest scenario demonstrate that Phi-AC learns efficient and stable coordination strategies across diverse mixed-motive settings while maintaining high collective return and competitive fairness.
    Agent-based and Multi-agent SystemsAgent communicationAgent-based and Multi-agent SystemsAgent theories and modelsGame Theory and Economic ParadigmsComputational social choiceGame Theory and Economic ParadigmsMechanism design
  397. #4298

    MemoVAD: Resource-Efficient Video Anomaly Detection via Dynamic Semantic Memory in Edge Computing Scenarios

    Guo Li, Jiandian Zeng, Yang Li, Zihao Peng, Ke Chen, Tian Wang
    Deploying Video Anomaly Detection (VAD) in real-world surveillance faces a fundamental tension between the demand for high-level semantics to ensure effectiveness and the limited computational resources of edge devices. Vision–Language Models (VLMs) provide rich open-vocabulary semantics, but their latency and computational cost preclude on-device deployment. To address the challenge, we propose MemoVAD, an edge–cloud collaborative framework that selectively incorporates VLM semantics into streaming VAD. MemoVAD runs most inference on the edge with a lightweight detector and a causal Temporal Context Encoder (TCE) to model temporal dependencies. Specifically, we introduce an Uncertainty-Aware Gating (UAG) policy grounded in Subjective Logic to model perceived uncertainty and query the cloud-based VLM only for high-uncertainty and semantically novel clips. Besides, a Dynamic Semantic Memory (DSM) is designed to cache VLM-verified prototypes for efficient retrieval, enabling the edge model to progressively incorporate VLM-level semantics via a semantic adapter. Experiments on UCF-Crime and XD-Violence datasets via a real edge device show that MemoVAD substantially reduces communication overhead while surpassing state-of-the-art performance. The demo video is available at: https://memovad2026.github.io/.
    Multidisciplinary Topics and ApplicationsReal-time systemsMultidisciplinary Topics and ApplicationsSensor networks and smart citiesMultidisciplinary Topics and ApplicationsUbiquitous computing cystems
  398. #4299

    RegionCache: Semantic-Aware Region Reuse for Efficient Multi-Turn Image Generation

    Peizheng Li, Xin Ai, Hanyuan Liu, Qiange Wang, Yanfeng Zhang
    Real-world image generation generally requires multi-turn editing, where users iteratively refine a small region while the majority of the image remains stable across turns. Despite this strong region-level stability, existing diffusion transformer (DiT)–based editing pipelines recompute the entire image at each turn, incurring substantial redundant computation on unchanged regions. Moreover, caching methods in existing DiT acceleration frameworks ignore semantic correspondence across prompts, leading to either unnecessary recomputation or unsafe reuse that degrades editing quality. To this end, we propose RegionCache, a semantic-aware reuse framework for multi-turn image editing that selectively reuses diffusion states from unchanged regions. RegionCache identifies reusable regions by detecting semantic overlap between consecutive prompts and localizing their spatial support via cross-attention maps, and employs an adaptive reuse schedule that dynamically determines how long cached regions can be safely reused based on prompt semantic similarity and contextual consistency. Experiments on PixArt-alpha show that RegionCache achieves 1.43x–2.55x end-to-end inference speedup while maintaining comparable image quality.
    Computer VisionImage and video synthesis and generation
  399. #4303

    QiMeng-VPID: Verification-Grounded Port-Level Iterative Decomposition for Complex Verilog Generation

    Hongguang Wang, Jiaming Guo, Rui Zhang, Zerun Li, Di Huang, Pengwei Jin, Zidong Du, Xing Hu, Qi Guo, Yunji Chen
    While Large Language Models (LLMs) have shown promise in translating natural-language specifications to Register-Transfer Level (RTL) designs, they often fail on complex, port-rich IPs. Existing frameworks typically separate generation from debugging, relying on static decomposition and iterative repair, which is hard to verify and yields unstable, inefficient maintenance. In this work, we propose VPID, a multi-agent framework for generating complex Verilog that achieves monotonic functional improvement. Given RTL’s inherent concurrency and the alignment between verification and port-level behavior, our key insight is to treat ports as verifiable boundaries for dynamic decomposition, enabling fine-grained analysis to pinpoint root causes and guide targeted debugging. Specifically, we implement a behavior locking mechanism that preserves the behavior of verified ports to ensure monotonicity, preventing regressions in correct functionalities. To accelerate convergence, we introduce an experience-guided refinement strategy that distills historical waveform mismatches into constraints, guiding the targeted debugging for the unverified ports. Together with precise Abstract Syntax Tree (AST)-based code extraction, these mechanisms enable an efficient incremental generation workflow. Experiments on RealBench demonstrate that VPID outperforms both one-pass general-purpose LLMs and existing agent-based frameworks in both syntax and functional correctness, presenting a robust approach for automating Verilog generation for complex RTL designs.
    Agent-based and Multi-agent SystemsApplicationsNatural Language ProcessingApplications
  400. #4306

    Beyond the Common Ranking Property: Stable and Efficient Outcomes in Hedonic Games with Top-Coalition Properties

    Bugra Caskurlu, Ali Eser
    Hedonic games are a central model of coalition formation, yet most of their general subclasses are marked by negative results: stable outcomes often fail to exist, and deciding their existence is typically computationally hard. The most notable exception is the class of hedonic games with the common ranking property (HGCRP), a nontrivial subclass that guarantees both the existence and tractability of stable partitions. The common ranking property implies the top-coalition property, which in turn implies the weak top-coalition property, forming a natural hierarchy of increasingly general domains. In this paper, we extend the frontier of strong existence and tractability results beyond the HGCRP to hedonic games with the top-coalition property (HGTCP) and hedonic games with the weak top-coalition property (HGWTCP). We show that every HGWTCP instance admits a strong individually stable (SIS) partition that can be computed in polynomial time, and prove that our polynomial-time algorithm generates a partition that is both SIS and Pareto optimal (PO) if preferences are strict. These results show that strong stability and tractability results persist well beyond the common ranking framework. We further show that a contractually Nash stable (CNS) partition may fail to exist even in HGTCP, revealing a sharp contrast with HGCRP, where the existence of partitions that are both CNS and PO is guaranteed. Taken together, these results provide a comprehensive characterization of stability and efficiency in top-coalition-based hedonic games, mapping how stability guarantees evolve from HGCRP to HGWTCP.
    Agent-based and Multi-agent SystemsCoordination and cooperationGame Theory and Economic ParadigmsComputational social choiceGame Theory and Economic ParadigmsCooperative games
  401. #4335

    FedAMB: Adaptive Modality Balancing for Dominance-Robust Multimodal Federated Distillation

    Seungjin Han, Juyeob Lee, Sangmin Lee, Eunil Park
    Federated knowledge distillation (Fed-KD) exchanges distilled predictions on a shared proxy dataset, often reducing communication and accommodating heterogeneous client architectures. However, in multimodal federated learning (MFL), modality dominance can bias local optimization and contaminate the aggregated global teacher, degrading both multimodal accuracy and robustness to missing modalities, especially under non-IID heterogeneity. We propose Federated Adaptive-Modality Balancing (FedAMB). At the client side, Selective-Modality Regulation (SMR) models dominance as a state-dependent phenomenon and intervenes only when it destabilizes training, strengthening weak modalities without over-regularization. At the server side, Component-wise Modality Distillation (CMD) regulates how aggregated knowledge is transferred to each modality branch, preventing the propagation of fusion-biased teachers while directly improving unimodal representations. Experiments on CREMA-D, AVE, and UR-FUNNY show that FedAMB consistently improves multimodal accuracy and missing-modality robustness.
    Machine LearningClassificationMachine LearningFederated learningMachine LearningMulti-modal learning
  402. #4338

    Diversity of Extensions in Abstract Argumentation

    Johannes K. Fichte, Markus Hecher, Yasir Mahmood, Zhengjun Wang
    Argumentation is an important topic of AI for modeling and reasoning about arguments. In abstract argumentation, we consider directed graphs, so-called argumentation frameworks (AF), that express conflicts between arguments. The semantics is defined
    by the notion of extensions, which are sets of arguments that satisfy particular relationship conditions in the AF. Usually, standard reasoning in argumentation do not reveal how far apart extensions are.
    We introduce a quantitative notion of diversity of extensions based on the symmetric-difference and provide a systematic complexity classification. Intuitively, diversity captures whether extensions of a framework (accepted viewpoints) differ only marginally or represent fundamentally incompatible sets of arguments. We study whether an AF admits k-diverse extensions, admits k-diverse extensions covering specific arguments, and to compute the largest k for which an AF admits k-diverse extensions. We outline a prototype and provide an evaluation for computing diversity levels.
    Knowledge Representation and ReasoningArgumentationKnowledge Representation and ReasoningComputational complexity of reasoning
  403. #4348

    FLASH: Flexible Learning of Adaptive Sampling from History in Temporal Graph Neural Networks

    Or Feldman, Krishna Sri Ipsit Mantri, Carola-Bibiane Schönlieb, Chaim Baskin, Moshe Eliasof
    Aggregating temporal signals from historic interactions is a key step in future link prediction on dynamic graphs. However, incorporating long histories is resource-intensive. Hence, temporal graph neural networks (TGNNs) often rely on historical neighbors sampling heuristics such as uniform sampling or recent neighbors selection. These heuristics are static and fail to adapt to the underlying graph structure. We introduce FLASH, a learnable and graph-adaptive neighborhood selection mechanism that generalizes existing heuristics. FLASH integrates seamlessly into TGNNs and is trained end-to-end using a self-supervised ranking loss. We provide theoretical evidence that commonly used heuristics hinder TGNNs performance, motivating our design. Extensive experiments across multiple benchmarks demonstrate consistent and significant performance improvements for TGNNs equipped with FLASH
    Machine LearningDeep learning architecturesMachine LearningGeometric learning
  404. #4372

    Energy-Regularized Spatial Masking: A Novel Approach to Enhancing Robustness and Interpretability in Vision Models

    Tom Devynck, Djamel Bouchaffra, Nadjib Lazaar, Mustapha lebbah, Bilal Faye, Hanane Azzag
    Deep convolutional neural networks achieve remarkable performance by exhaustively processing dense spatial feature maps, yet this brute-force strategy introduces significant computational redundancy and encourages reliance on spurious background correlations. As a result, modern vision models remain brittle and difficult to interpret. We propose Energy-Regularized Spatial Masking (ERSM), a novel framework that reformulates feature selection as a differentiable energy minimization problem. By embedding a lightweight Energy-Mask Layer inside standard convolutional backbones, each visual token is assigned a scalar energy composed of two competing forces: an intrinsic Unary importance cost and a Pairwise spatial coherence penalty. Unlike prior pruning methods that enforce rigid sparsity budgets or rely on heuristic importance scores, ERSM allows the network to autonomously discover an optimal information-density equilibrium tailored to each input. We validate ERSM on convolutional architectures and demonstrate that it produces emergent sparsity, improved robustness to structured occlusion, and highly interpretable spatial masks, while preserving classification accuracy. Furthermore, we show that the learned energy ranking significantly outperforms magnitude-based pruning in deletion-based robustness tests, revealing ERSM as an intrinsic denoising mechanism that isolates semantic object regions without pixel-level supervision. Code is available at https://github.com/Tom-Dvk/ERSM.
    AI Ethics, Trust, FairnesExplainability and interpretabilityComputer VisionRecognition (object detection, categorization)Machine LearningAttention modelsMachine LearningConvolutional networksMachine LearningFeature extraction, selection and dimensionality reduction
  405. #4379

    Provably Sub-Linear Two-Timescale NeuroEvolution with Online Plasticity

    Shishen Lin, Yixin Chen
    NeuroEvolution of Augmenting Topologies is a widely used NeuroEvolution algorithm for learning neural network architectures and weights for control tasks. However, standard offline optimisation searches for connection strengths directly, which can scale poorly in high-dimensional weight spaces and more difficult continuous control problems. Hybrid methods that combine neuroevolution with online learning can address this challenge, but their theoretical properties remain underexplored.

    This paper gives a first regret analysis for a general NeuroEvolutionary Online Learning (NEOL) framework, which decouples learning into two timescales: an outer loop for architecture search and an inner loop for online weight adaptation via reward-modulated plasticity. Under mild conditions, we prove that NEOL achieves sublinear regret. Empirically, under fixed interaction budgets on four standard control benchmarks, a NEAT-based NEOL implementation achieves higher final fitness and lower variance than pure NEAT, and is competitive with strong reinforcement-learning baselines on several tasks. The results are supported
    by Wilcoxon rank-sum tests and ablation studies. Overall, the findings show that online plasticity can improve the sample efficiency and robustness of two-timescale neuroevolution. Code is available at https://github.com/boobaa2001/NeuroEvolution Online Learning NEOL.
    SearchEvolutionary computationSearchSearch and machine learningSearchMeta-reasoning and meta-heuristicsSearchLocal searchSearchHeuristic search
  406. #4393

    BEVFormer++: Temporal Amplified BEVformer with Explicit Parameter Prediction for Automatic Trajectory Prediction

    Jiabin Fang, Xu Zhang, Zhuoming Ding, Xuan Liu, Meifang Zhang, Jin Yuan, Yuyi Wang
    Vision-based trajectory prediction with BEV representations has achieved promising results, yet existing methods often suffer from limited temporal modeling and insufficient characterization of motion dynamics. To address these issues, we propose a temporally enhanced framework with explicit motion parameter prediction. Specifically, we introduce BEVFormer++, which leverages multi-view images and BEV features from multiple preceding timesteps to generate more robust BEV representations, along with BEV differential features to capture temporal variations. Moreover, we propose a motion-parameter-decoupled tracking module that explicitly estimates velocity, acceleration, and heading angle, providing informative motion cues for trajectory prediction. Extensive experimental results demonstrate that our method outperforms state-of-the-art approaches and can be seamlessly integrated into existing vision-based frameworks, consistently yielding performance improvements.
    Computer VisionAction and behavior recognitionComputer VisionMotion and trackingComputer VisionMultimodal learning
  407. #4403

    Mask-Guided Hybrid Triggers for Robust Clean-Label Backdoor Attacks

    Shengye Pang, Xiangyu Ji, Jungang Yang, Song Yang, Guobing Zou
    Clean-label backdoor attacks pose significant security threats to deep neural networks by injecting triggers without altering ground-truth labels. However, existing methods face a fundamental dilemma: sample-agnostic triggers are robust but easily detectable, while sample-specific triggers offer superior stealthiness but suffer from limited effectiveness due to feature suppression. To bridge this gap, we propose a new backdoor trigger framework called Mask-Guided Hybrid Trigger (MGHT). MGHT uses an adaptive mask to allocate spatial regions of the hybrid trigger between a sample-agnostic anchor for reliable memorization and a sample-specific camouflage for perceptual and semantic consistency. To prevent the optimization from greedily relying on a single trigger component, we further propose a Synergy-driven Co-optimization Strategy with a margin-based Synergy Loss. This ensures that the hybrid trigger is more effective and robust than either component alone. Extensive experiments on benchmark datasets demonstrate that MGHT achieves competitive performance, attaining over 99% ASR on CIFAR-10 and CelebA and showing strong effectiveness on higher-resolution benchmarks, while maintaining high visual quality (PSNR > 30 dB) and robustness to mainstream backdoor defenses.
    AI Ethics, Trust, FairnesSafety and robustnessAI Ethics, Trust, FairnesTrustworthy AI
  408. #4414

    Sufficient Decision Proxies for Decision-Focused Learning

    Noah Schutte, Grigorii Veviurko, Krzysztof Postek, Neil Yorke-Smith
    When solving optimization problems under uncertainty with contextual data, utilizing machine learning to predict the uncertain parameters' values is a popular and effective approach. Decision-focused learning (DFL) aims at learning a predictive model such that decision quality, instead of prediction accuracy, is maximized. Common practice is to predict a single scenario representing the uncertain parameters, implicitly assuming that there exists a deterministic problem approximation (proxy) that allows for optimal decision-making. The opposite has also been considered, where the underlying distribution is estimated with a parameterized distribution. However, little is known about when either choice is valid. This paper investigates for the first time problem properties that justify using a certain decision proxy. Using this, we present alternative decision proxies for DFL, with little or no compromise on the complexity of the learning task. We show the effectiveness of presented approaches in experiments on continuous and discrete problems, as well as problems with uncertainty in the objective function and in the constraints.
    Constraint Satisfaction and OptimizationConstraint optimization problemsConstraint Satisfaction and OptimizationMixed discrete and continuous optimizationMachine LearningCost-sensitive learningMachine LearningRegressionMachine LearningSupervised Learning
  409. #4426

    Learning Quantitative Automata Modulo Theories

    Eric Hsiung, Nathan Tsoi, Swarat Chaudhuri, Joydeep Biswas
    We introduce QUINTIC, a general algorithm for actively learning quantitative automata from preferences. Quantitative automata evaluate input sequences by applying a valuation function---such as sum, product, or average---to the output labels of the states visited. Such models naturally arise in use cases ranging from probabilistic verification, sequence classification, and sequential decision making. However, existing learning approaches, such as variants of L*, weighted‐automata learning algorithms, and active learning preference‐driven methods, either assume finite output alphabets or restrict the valuation function to particular forms. QUINTIC utilizes a symbolic observation table and applies deductive reasoning with the assistance of an SMT solver to identify the correct minimal state, transition, and state label combination of the quantitative automaton. The deductive reasoning relies on the minimal combination of theories determined by the valuation function and output alphabet. Consequently, QUINTIC has completeness, minimalism, and query complexity guarantees, and learns quantitative automata across finite, integer, and rational outputs. Our extensive experiments show how QUINTIC scales under weak or strong feedback, and alternative MaxSMT objectives.
    Constraint Satisfaction and OptimizationConstraint satisfactionKnowledge Representation and ReasoningPreference modelling and preference-based reasoningMachine LearningActive learningMachine LearningSymbolic methods
  410. #4428

    Stability and Efficiency in Hedonic Project Games

    Jaber Valizadeh, Dongmo Zhang, Omar Mubin
    We introduce Hedonic Project Games, a model in which agents choose projects with divisible rewards while holding subjective preferences over coalition composition. This framework captures a fundamental trade-off absent from existing models: agents care simultaneously about who they collaborate with and what they work on. We study three stability notions: classical Nash Stability and two refinements, Joining Stability and Leaving Stability, which account for the welfare of both the deviating agent and the affected coalition members. We evaluate the efficiency of stable outcomes using the Price of Anarchy and Price of Stability, comparing the social welfare of stable outcomes to that of an optimal allocation. While stable outcomes may not exist in general, we identify broad and natural preference classes in which stability and efficiency improve significantly. In particular, under monotonic-decreasing preferences in coalition size, Nash and joining stability coincide and are guaranteed to exist, whereas leaving stability may fail. Under per-capita non-decreasing preferences, socially optimal outcomes are always Nash stable and coincide with leaving stability, although equilibrium inefficiency remains unbounded. Experiments on synthetic and real-world data support the theoretical efficiency results.
    Agent-based and Multi-agent SystemsAgent theories and modelsAgent-based and Multi-agent SystemsCoordination and cooperationAgent-based and Multi-agent SystemsResource allocationGame Theory and Economic ParadigmsCooperative gamesGame Theory and Economic ParadigmsNoncooperative games
  411. #4430

    Explaining Jailbreaks: Structured and Interpretable Safety Assessment for Large Language Models

    Sunghee Dong, Sungwon Yi, Kangmin Bae, Jaeyoon Kim
    Large Language Models (LLMs) remain highly vulnerable to jailbreak attacks, yet existing evaluations rely primarily on outcome-level metrics such as Attack Success Rate (ASR), providing limited insight into how and why safety failures occur.
    We propose an explanation-aware safety framework that augments binary harmfulness detection with structured, human-interpretable explanations capturing severity, strategies, trigger spans, rationales, and derived safety factors.
    To enable scalable and consistent supervision, we introduce a human--LLM hybrid annotation and canonicalization pipeline. We then fine-tune a compact model to generate canonical explanations alongside harmfulness decisions.
    Across both seen and unseen benchmark settings, our method improves robustness and explanation fidelity.
    In jailbreak defense evaluation, our approach reduces ASR to 0.44% on Vicuna-7B and 1.30% on GPT-3.5, outperforming existing defense baselines while also achieving the lowest StrongREJECT scores.
    Beyond outcome-level gains, the model more accurately recovers diagnostic attributes (e.g., attack strategy, trigger spans, and safety factors) than strong general-purpose LLM baselines.
    Overall, explanation-aware learning exposes diagnostic dimensions that ASR alone cannot capture and provides a more faithful and actionable foundation for robust LLM safety assessment.
    AI Ethics, Trust, FairnesExplainability and interpretabilityAI Ethics, Trust, FairnesSafety and robustnessAI Ethics, Trust, FairnesTrustworthy AINatural Language ProcessingLanguage models
  412. #4439

    Object-Centric Alignment and Anchor Distillation for Weakly Supervised Referring Expression Comprehension

    Yi Tian, Cheng Yang, Qingbao Huang
    Weakly supervised Referring Expression Comprehension (WREC) aims to localize referred objects
    using only image-text pairs without box-level annotations. Existing one-stage methods predominantly
    rely on anchor-level alignment, which suffers from
    two fundamental limitations: (1) anchors represent
    local visual patches rather than holistic objects, and
    (2) they lack the capability to model inter-object
    relations. To address these issues, we propose
    OCAAD, an Object-Centric Alignment and Anchor Distillation framework. Our key insight is that
    different self-attention heads in DINOv2 naturally
    attend to distinct semantic regions, effectively capturing object-level information. Building on this,
    OCAAD introduces two complementary modules:
    (1) Anchor-Object Distillation Module (AODM),
    which bridges the semantic gap between anchors
    and objects by transferring object-level knowledge
    from attention heads to anchors via overlap-aware
    contrastive learning; and (2) Intra-Modal Relation Consistency (IRC), which explicitly models
    inter-object relations by enforcing the relational
    structure among linguistic entities to match that of
    their visual counterparts. Extensive experiments on
    RefCOCO, RefCOCO+, and RefCOCOg demonstrate that OCAAD achieves new state-of-the-art
    performance, validating the effectiveness of objectcentric alignment for WREC. Code is available at
    https://github.com/VILAN-Lab/OCAAD.
    Computer VisionVision, language and reasoningMachine LearningWeakly supervised learning
  413. #4452

    StressEval: Failure-Driven Dynamic Benchmarking for Knowledge-Intensive Reasoning in Large Language Models

    Yongrui Chen, Yangyang Ma, Xiaoying Huang, Shenyu Zhang, Huajun Chen, Haofen Wang, Guilin Qi
    Static benchmarks for LLMs are increasingly compromised by contamination and overfitting, especially on knowledge-intensive reasoning tasks. While recent dynamic benchmarks can alleviate staleness, they often increase difficulty at the expense of answerability and controllability. In this paper, we propose StressEval, a failure-driven data synthesis framework that turns observed model failures into dynamic, challenging, and controlable test instances. StressEval consists of three stages: (i) it constructs a semi-structured difficulty card that identifies the failed reasoning step and its root cause; (ii) it applies a dual-perspective instance-synthesis method that targets both knowledge gaps and reasoning breakdowns while preserving the underlying difficulty factors; and (iii) it applies a gating mechanism to retain only grounded, unambiguous instances. Seeding from multiple knowledge-intensive reasoning datasets, we employ StressEval to build Dynamic-OneEval, a focused suite of challenging dynamic benchmark. Across several state-of-the-art LLMs, Dynamic-OneEval yields substantially larger performance drops than the original benchmarks while retaining explicit difficulty factors, enabling more actionable iteration.
    Knowledge Representation and ReasoningLearning and reasoningNatural Language ProcessingApplicationsNatural Language ProcessingQuestion answering
  414. #4458

    STAR: Spatio-Temporal Attention Rebalancing for Eco-Friendly Autonomous Mobility-on-Demand Systems

    Jungeun Lee, Seungjae Baek, Seongjae Lee, Sunhwi Kim, Jeong hwan Jeon
    Existing Autonomous Mobility-on-Demand (AMoD) systems and demand-responsive algorithms are designed to adaptively respond to traffic demands, enhancing traffic service quality and profitability. However, such approaches may exacerbate traffic congestion by concentrating vehicles in high-demand areas. In addition, repeated braking, acceleration, and low-speed driving in traffic jams lead to substantial increases in emissions. Moreover, existing Deep Reinforcement Learning (DRL)-based approaches fail to address the instability of cooperative learning in environments with highly variable rewards, such as carbon emissions. To address these issues, we propose the Spatio-Temporal Attention Rebalancing (STAR) framework, leveraging an encoder-decoder architecture. In the proposed framework, we maximize the number of requests served and minimize CO₂ emissions, while ensuring high service quality by limiting maximum travel delays and waiting times for passengers. To evaluate the proposed framework, we conduct simulations in realistic urban traffic scenarios. Experimental results demonstrate that our framework consistently outperforms existing baselines. In a large-scale scenario with a fleet of one hundred vehicles, our approach improves the service rate by up to 10.1% and reduces CO₂ emissions per passenger by up to 69.6% compared to the baseline methods. The proposed framework and baselines are publicly available at https://github.com/2jungeuni/eco-friendly-fleet-rebalancing.
    Agent-based and Multi-agent SystemsCoordination and cooperationAgent-based and Multi-agent SystemsMulti-agent learningPlanning and SchedulingLearning in planning and scheduling
  415. #4459

    Learnable Data Augmentation and Contrastive Pre-training for Temporal Link Prediction

    Canghong Jin, Jiafeng Zhao, Feng Xu, Tongya Zheng, Zemin Liu, Lina Wei, Mingli Song
    Link prediction is a foundational task in temporal graphs. While temporal graph neural networks exhibit commendable performance, they are often criticized for providing inadequate representations, especially under limited data. Contrastive learning has been introduced as a solution for graph pre-training to mitigate data scarcity. However, the data augmentation techniques on which they depend frequently lead to suboptimal augmentations, manifesting either as over-augmentation or under-augmentation.
    In light of these challenges, we explore the under-explored domain of contrastive learning of temporal graph transformers and propose a novel model, \model, which employs a dual-view graph transformer. This transformer is fine-tuned to extract sequences tailored for specific target nodes, encapsulating both spatial and temporal perspectives. To optimize the contrastive learning of this dual-view transformer, we put forth an innovative, learnable data augmentation method. This technique, which involves selective masking of elements within dual-view sequences, generates superior augmentations, thereby amplifying the potency of the contrastive learning approach.
    Extensive experiments on five public temporal network datasets demonstrate that our model can consistently outperform all baselines, especially in small training set conditions.
    Data MiningMining graphsData MiningMining heterogenous data
  416. #4461

    Profiling the Voice: Speaker-Specific Phoneme Fingerprinting for Speech Deepfake Detection

    Jun Xue, Tong Zhang, Zhuolin Yi, Yihuan Huang, Yi Chai, Yiyang Zhang, Yanzhen Ren
    The rapid advancement of generative AI has made audio deepfakes increasingly indistinguishable from authentic human vocals, posing significant threats to persons-of-interest (POI) such as public figures. Current detection systems primarily rely on generic, black-box models that fail to capture speaker-specific idiosyncratic traits and lack interpretability. In this paper, we propose Phoneme-based Voice Profiling (PVP), a novel personalized defense framework. By shifting the detection paradigm from macro-utterance analysis to micro-phonetic modeling, PVP captures the unique acoustic distributions underlying a POI’s habitual articulatory patterns. Specifically, our framework models speaker-specific phonetic realizations using lightweight Gaussian Mixture Models (GMMs) estimated solely from bona fide reference speech. This design enables data-efficient profiling and robust generalization to previously unseen spoofing attacks without requiring heavy spoof-specific training. Furthermore, we introduce the first large-scale Chinese POI deepfake dataset to benchmark speaker-specific detection. Experimental results demonstrate that PVP significantly outperforms state-of-the-art generic detectors in POI spoofing scenarios, achieving substantial EER reductions while providing fine-grained, phoneme-level interpretability for forensic analysis. Code and data are available at: https://github.com/JunXue-tech/PVP
    Natural Language ProcessingSpeech
  417. #4463

    A Unified Knowledge Embedded Reinforcement Learning-based Framework for Generalized Capacitated Vehicle Routing Problems

    Wen Wang, Xiangchen Wu, Liang Wang, Hao Hu, Xianping Tao
    The Capacitated Vehicle Routing Problem (CVRP) is a fundamental NP-hard problem with broad applications in logistics and transportation. Real-world CVRPs often involve diverse objectives and complex constraints, such as time windows or backhaul requirements, motivating the development of a unified solution framework.
    Recent reinforcement learning (RL) approaches have shown promise in combinatorial optimization, yet they rely on end-to-end learning and lack explicit problem-solving knowledge, limiting solution quality.
    In this paper, we propose a knowledge-embedded framework inspired by the Route-First Cluster-Second heuristics.
    It incorporates knowledge at two levels: (1) decomposing CVRPs into the route-first and cluster-second subproblems, and (2) leveraging dynamic programming to solve the second subproblem, whose results guide the RL-based constructive solver to solve the first problem.
    To mitigate partial observability caused by problem decomposition, we introduce a unified history-enhanced context processing module.
    Extensive experiments show that this framework achieves superior solution quality compared with state-of-the-art learning-based methods, with a smaller gap to classical heuristics, demonstrating strong generalization across diverse CVRP variants.
    Planning and SchedulingApplicationsPlanning and SchedulingLearning in planning and scheduling
  418. #4484

    Bridging the Gap Between Gaussian Splatting and SLAM: A Geometric Gaussian Field-based Gaussian Splatting SLAM System

    Jiasheng He, Wenbin Zhu, Xiaobin Liu, Jing Yuan, Yuhang Wei
    Recent works in Gaussian Splatting (GS) SLAM highlight the importance of geometric structure. However, existing methods often rely on either 2D or 3D Gaussian primitives, lacking the balance between geometry and appearance, thus failing to precisely model spatial structures. Furthermore, current frameworks either solely utilize SLAM to provide poses for GS or optimize poses and all Gaussian parameters indiscriminately, which neglects geometric consistency constraints essential for robust localization, creating a gap between GS and SLAM. In this paper, a tight-coupling SLAM system based on a Geometric Gaussian Field (GGF) is proposed. First, GGF combines explicit spatial structures with implicit neural residuals to adaptively regulate the Gaussian morphologies, seamlessly transitioning between 2D and 3D Gaussian primitives, ensuring precise spatial modeling. Second, a Geometric Consistency-Guided Refinement (GCGR) strategy is introduced, which exploits depth and normal maps rendered by GGF to construct critical constraints for accurate pose estimation. Simultaneously, updated poses and spatial information are fed back to refine Gaussian morphology, establishing a mutually enhancement and tight-coupling between GS and SLAM. Extensive experiments demonstrate that GGF-SLAM outperforms state-of-the-art methods in tracking accuracy, photorealistic rendering, and geometric reconstruction.
    Computer Vision3D computer visionComputer VisionMachine learning for visionRoboticsLocalization, mapping, state estimationRoboticsPerceptionRoboticsRobotics and vision
  419. #4487

    Riemannian Graph Convolutional Network for Skeleton-Based Two-Person Interaction Recognition

    Rui Wang, Zihao Bi, Chen Hu, Xiaoning Song, Xiao-Jun Wu, Nicu Sebe, Ziheng Chen
    In the field of skeleton-based human action recognition, Graph Convolutional Networks (GCNs) have become a dominant framework. However, existing GCN-based approaches often treat the sequences of two-person interaction as separate entities, ignoring the inherent semantic dependencies and spatial correlations between interacting subjects. Furthermore, high-order skeleton representations naturally exhibit non-Euclidean structures, where Euclidean deep learning models are inherently limited in explicitly capturing and preserving such geometric information. As a countermeasure, we propose a Riemannian Graph Convolutional Network (RGCN) that operates on the Symmetric Positive Definite (SPD) manifolds. Specifically, we model high-order skeletal statistics via Gaussian embedding and propose a Riemannian network to capture inter-subject interactions and global correlations. The proposed RGCN is instantiated under three SPD geometries, and its effectiveness is validated through extensive experiments on three interaction benchmarks. Extensive experimental results show that RGCN provides a competitive and geometrically grounded alternative for skeleton-based interaction recognition.
    Computer VisionAction and behavior recognitionComputer VisionMachine learning for visionMachine LearningDeep learning architecturesMachine LearningGeometric learning
  420. #4495

    Addressing Downward Memory Loss in Hierarchical GNN Forecasters Through Memory-Buffered Decoding

    Thomas Bailie, S. Karthik Mukkavilli, Varvara Vetrova, Yun Sing Koh
    Accurate spatio-temporal forecasting requires modeling interactions across multiple spatial and temporal scales. Existing Graph Neural Network (GNN) forecasters primarily operate at a single local scale, limiting their ability to capture global processes that govern system dynamics. Hierarchical GNNs (HGNNs) aim to address this limitation by learning multiscale representations, but in practice, they often fail to preserve global information during coarse-to-fine propagation. We identify this limitation as downward memory loss, where global trends learned at coarse resolutions diminish before influencing fine-grained predictions, leading to misaligned local dynamics. We propose HiGFlow, an HGNN forecaster that explicitly preserves multiscale trends through a self-updating memory buffer that integrates into the coarse-to-fine information flow. This design maintains global contextual signals as an inductive bias throughout hierarchical decoding. We provide a theoretical analysis indicating that when decoding, HiGFlow preserves multiscale trends where residual-based architectures may fail to do so. Empirically, HiGFlow achieves substantially lower MAE and RMSE than state-of-the-art forecasting models across multiple benchmark datasets, demonstrating the importance of explicit global memory in multiscale spatio-temporal forecasting. Our implementation is available at https://github.com/TB862/HiGFlow.
    Machine LearningLearning graphical modelsMachine LearningMulti-view learningMachine LearningSequence and graph learningMachine LearningTime series and data streams
  421. #4497

    BIP Revisited: Enabling Efficient Dominance Pruning for Distributed Constraint Optimization

    Xiangshuang Liu, Ziyu Chen
    Bound-Independent Pruning (BIP) is a powerful technique for accelerating tree-based complete search algorithms for Distributed Constraint Optimization Problems (DCOPs) by pruning dominated regions of the search space using only local information. However, BIP fundamentally relies on constructing a context-dependent dominating space through joint optimization over local constraints, incurring exponential computational and storage complexity that limits its practical scalability. To address this limitation, we propose DR-BIP, a decoupled and sound reformulation of BIP that enables efficient identification of dominated subspaces without explicitly computing the dominating space. By reformulating dominated-subspace detection into pairwise dominance tests and decoupling them across independent constraints, DR-BIP eliminates joint optimization and reduces both computational and storage complexity from exponential to polynomial. The resulting dominance components can be precomputed and reused across context changes, further improving efficiency. We formally prove the correctness of DR-BIP and integrate it into several state-of-the-art tree-based complete DCOP algorithms. Extensive empirical evaluation on standard benchmarks confirms that DR-BIP consistently outperforms BIP, significantly reducing both computation time and communication overhead.
    Agent-based and Multi-agent SystemsCoordination and cooperationConstraint Satisfaction and OptimizationDistributed constraints
  422. #4499

    History Doesn’t Repeat, but Its Patterns Echo: A Parallel Pairwise Negative-Sampling Framework for Temporal Link Prediction

    Yongchun Jiang, Heng Zhang, Jian Gao, Xin Zheng
    Temporal link prediction with temporal graph neural networks (TGNNs) is increasingly used to model spatio-temporal dependencies in temporal graphs and to forecast future interactions among entities. Existing sampling-based training methods typically rely on random negative sampling and pointwise loss formulations, which often lead to suboptimal convergence and limited generalization due to low-quality negative samples. We propose ATNSF, a temporal graph learning framework with a hybrid negative sampling strategy that uses a portion of historical edges as hard negatives. For efficiency, we design an asynchronous parallel training pipeline for scalable optimization and introduce a pairwise sampled softmax loss that contrasts each positive instance with a batch of negatives to learn more discriminative representations. Finally, we theoretically show that jointly designing the loss function and negative sampling strategy is crucial for improving performance and generalization. Extensive experiments across six temporal graph datasets demonstrate that ATNSF improves the average AP from 0.724 to 0.827 (+0.103). Remarkably, it also accelerates training by 1.37× to 6.85×, achieving a 2.57× geometric mean speedup. The source code of this paper can be found at https://github.com/yongqiu-star/ATNSF.
    Data MiningBig data and scalabilityData MiningMining graphsData MiningParallel, distributed and cloud-based high performance mining
  423. #4502

    TTS-Design: Test-Time Compute Scaling for Structure-Guided Protein Design

    Zizhe Jin, Yizhen Zheng, Huan Yee Koh, Jiaxin Ju, Jiapu Wang, Shirui Pan
    Generating protein sequences that reliably fold into target structures is a central challenge in computational biology and protein design. Progress in protein inverse folding (PIF), however, is fundamentally constrained by the scarcity of high-quality structural data, which limits the effectiveness of existing models. In this work, we propose TTS-Design, a test-time compute scaling framework that enhances protein sequence design without retraining models or relying on larger training data. For TTS-Design, its core design is a dedicated inverse folding reward model named IF-RM, which quantitatively evaluates structure–sequence compatibility and enables effective selection among multiple candidate sequences at inference time. By integrating IF-RM into existing PIF models, TTS-Design leverages additional computation during inference stage to explore and refine candidate solutions. Extensive experiments on the CATH 4.2 and CATH 4.3 demonstrate that TTS-Design consistently improves sequence recovery and structural reliability across different backbone models, without retraining or increasing model size. Case studies on real-world proteins further confirm its practical effectiveness. Overall, our results highlight test-time compute scaling as a promising and effective paradigm for advancing protein inverse folding and other scientific problems under data-limited scenarios.
    Multidisciplinary Topics and ApplicationsBioinformaticsMultidisciplinary Topics and ApplicationsHealth and medicineAIMachine LearningMachine LearningSequence and graph learning
  424. #4503

    Uncertainty-Guided Adaptive Local Adversarial Perturbation for Reliable Molecular Property Prediction

    Xuxin Zhao, Wen Zhang, Shichao Liu
    Molecular property prediction is a fundamental task with wide-ranging applications in chemistry, materials science, and drug discovery. Beyond predictive accuracy, the reliability of model predictions, particularly their uncertainty awareness, is critical for high-stakes downstream applications. However, most existing methods predominantly optimize performance metrics while overlooking prediction reliability, often producing results that are both inaccurate and overconfident. To address this challenge, we propose an uncertainty-guided framework that integrates adaptive local adversarial perturbation with evidential deep learning for reliable molecular property prediction. The proposed method identifies sensitive molecular regions through gradient-based sensitivity analysis and applies locally adaptive perturbations whose magnitudes are dynamically modulated by predictive uncertainty. This approach explicitly challenges the model to preserve predictive stability under structure-aware perturbations, thereby enhancing its sensitivity to critical molecular substructures. Furthermore, we introduce an evidential calibration strategy that aligns uncertainty with the prediction discrepancy between clean and adversarial molecular views, enabling the learned evidence to faithfully reflect prediction stability. Extensive experiments on benchmark molecular property prediction datasets demonstrate that our approach achieves state-of-the-art predictive performance while providing more reliable uncertainty estimates than widely used uncertainty quantification methods. These results highlight the potential of the proposed framework as a reliable tool for risk-sensitive virtual screening and molecular decision-making. Our code is available at https://github.com/Ubehind/ugap.
    Machine LearningAdversarial machine learningMachine LearningRepresentation learningMultidisciplinary Topics and ApplicationsBioinformaticsUncertainty in AIApplications
  425. #4512

    MLLMs Get It Right, Then Get It Wrong: Tracing and Correcting Late-Layer Textual Bias

    Xingming Li, Ao Cheng, Qiyao Sun, Xixiang He, Xuanyu Ji, Runke Huang, Qingyong Hu
    When vision contradicts text, multimodal large language models (MLLMs) consistently favor text—even when images provide clear evidence otherwise. This bias poses risks for applications requiring visual grounding, yet its cause remains unclear. In this paper, we uncover a surprising finding: models often get it right initially, forming correct vision-based predictions in their intermediate layers, before changing their minds and favoring text in the final output. We call this "late-layer textual override". The visual information is encoded, it simply does not survive to the output. More intriguingly, we find that how predictions change reveals whether they're correct: 85% of failures shift toward text, while 89% of successes shift toward vision. This directional signature enables a simple but powerful intervention: when we detect a confident visual prediction being suppressed, we restore it. We propose CALRD (Conflict-Aware Layer Reference Decoding), a training-free method that recovers overridden predictions at inference time. Experiments across five MLLMs of varying architectures demonstrate up to 9.4% absolute improvements on conflict benchmarks while largely preserving standard performance, without training or external knowledge. It recovers what the model already knew but failed to preserve.
    Computer VisionVision, language and reasoningNatural Language ProcessingInterpretability and analysis of models for NLPNatural Language ProcessingLanguage models
  426. #4523

    Contrastive Flow Matching for Sequential Recommendation

    Wangyu Jin, Yang Xue, Wenwen Xia, Hongliang He, Guanfeng Liu, Pengpeng Zhao
    Flow Matching (FM), as a highly promising generative paradigm, has recently been introduced into sequential recommendation. However, the lack of explicit inter-user supervision in existing FM-based method may lead to an averaged flow phenomenon, where the model tends to produce similar velocity directions for different users during item generation. This greatly undermines the model’s ability to generate personalized item. To address this issue, we propose a Contrastive Flow Matching (ConFM) for sequential reconmendation, aiming to inject inter-user separation signals so as to preserve the discriminability of users’ velocity directions. Specifically, considering the common design where velocities are derived from predicted item representations, we introduce a Representation Contrastive Regularizer in the representation space, which forces the predicted item embeddings of different users to be separated from each other. In addition, we introduce a Velocity Contrastive Regularizer in the induced velocity space, which directly encourages each user’s induced velocity to stay far from other users’ velocity targets, thereby better aligning the generation process with the user’s own interests. Extensive experiments on four datasets demonstrate that ConFM consistently outperforms strong baselines, and it also achieves notable improvements on long-tail items.
    Data MiningRecommender systems
  427. #4532

    JURIS: Bringing the Jury System to Multi-Agent Summarization

    Ziling Li, Junwei Zhang, Yixuan Yang, Yuqiang Han, Xiaolin Li
    Agents implemented using Large Language Models (LLMs) offer novel solutions for tasks such as document summarization. However, single-agent systems, limited by their own knowledge base, often exhibit information omissions or inconsistencies with the facts, while multi-agent systems lacking effective organization can fall into the trap of social cognitive defects such as flattery and premature consensus. Inspired by the U.S. jury system, we propose JURIS, a judicial-decision-inspired multi-agent collaborative framework. Specifically, the document summarization process simulates a lawsuit scenario: multiple generator agents condense the target text into multiple sentences from different perspectives, much like lawyers on both sides providing different evidence; multiple decision-making agents consider the overall picture based on defense and voting mechanisms to select candidate sentences, much like a jury adopting different evidence; finally, a chief editor agent compiles a summary of all selected sentences, much like a secretary summarizing a plan and a judge giving a final verdict. Extensive experiments on in-domain news and cross-domain benchmarks demonstrate that JURIS consistently and significantly outperforms strong single-agent and multi-agent baselines across automatic evaluation metrics, multi-dimensional quality assessments, and human judgments, validating the effectiveness of decision-driven structured multi-agent collaboration for document summarization.
    Agent-based and Multi-agent SystemsMulti-agent planningAgent-based and Multi-agent SystemsNormative systemsAgent-based and Multi-agent SystemsOther
  428. #4537

    Mixture of Clustering Experts with Dual Consistency for Multi-View Clustering

    Daidai Zhu, Yang Zhao, Dandan Ma, Ganchao Liu, Zhiyu Jiang
    Multi-view clustering aims to discover shared semantic structures from multiple complementary data views to improve clustering performance. However, most existing methods rely on a single clustering representation for each semantic cluster, which limits the ability to model complex cluster structures and diverse patterns in real-world data.
    To address this limitation, we propose a Mixture of Clustering Experts with Dual Consistency (MoCE-DC). MoCE-DC introduces a set of learnable clustering experts that characterize the same semantic cluster from multiple perspectives within a unified prototype space. A gating mechanism is employed to adaptively select experts for each sample. MoCE-DC decomposes complex semantic clusters into multiple collaborative sub-structures, significantly enhances the ability to model intricate intra-cluster diversity.
    To further ensure cross-view semantic consistency, we propose a dual-level alignment mechanism that enforces prediction consistency across views while guiding clustering assignments toward a more discriminative direction. In addition, a gating balance regularization strategy is introduced to mitigate expert collapse and promote balanced expert utilization. Extensive experiments on multiple public multi-view clustering benchmarks demonstrate that MoCE-DC outperforms state-of-the-art methods.
    Machine LearningClusteringMachine LearningMulti-view learningMachine LearningSelf-supervised Learning
  429. #4553

    PID-Controlled Constrained RL for Hub-based Joint Pricing, Dispatching, and Routing with Service Guarantees

    Pengfei Du, Yucen Gao, Bin Wang, Xiaochun Yang
    Joint optimization of pricing, dispatching, and routing is critical for hub-based mobility services but challenging due to complex decision couplings and strict service guarantees, such as Order Response Rate (ORR). Conventional constrained reinforcement learning often struggles in this mixed continuous--combinatorial action space, suffering from oscillatory behavior in Lagrangian dual variables and unstable constraint satisfaction. To address this, we propose PID-SACA, a unified framework that integrates an entropy-regularized actor--critic policy for continuous pricing and dispatching assisted by an embedded routing solver for execution-aware feedback. Crucially, we adapt the PID control mechanism to the Lagrangian dual update process. This approach leverages proportional, integral, and derivative feedback to dampen oscillations caused by stochastic gradient variance, ensuring robust long-term constraint enforcement. We provide theoretical analysis on the boundedness of dual variables, and experiments on publicly available large-scale mobility datasets demonstrate that PID-SACA significantly outperforms baselines, achieving high revenue with stable service compliance. Code: https://github.com/jerry0375/PID-SACA
    Constraint Satisfaction and OptimizationConstraint optimization problemsMachine LearningOptimizationPlanning and SchedulingSearch in planning and scheduling
  430. #4579

    From Traits to Roles: Consensus-Guided Composition of Orthogonal Experts for Cooperative MARL

    Yewei Zhou, Bin Zhang, Ying Zhou, Xuri Ge, Dapeng Li, Hangyu Mao, Pengjie Ren, Zhiwei Xu
    Parameter sharing is a central design choice in cooperative multi-agent reinforcement learning, yet it fundamentally conflicts with the need for role specialization in heterogeneous cooperative environments. Existing role-based methods typically learn monolithic role representations, which often suffer from gradient interference and fail to capture the compositional structure of complex behaviors. Inspired by Trait Theory, we propose DEcompose and COnstruct Roles (DECOR), a framework that models agent roles as dynamic compositions of orthogonal behavioral traits. DECOR introduces an orthogonal Mixture-of-Experts architecture to decompose behaviors into independent traits, mitigating destructive gradient interference under parameter sharing, and a group-consensus guided mechanism to extract team-level tactical intents that guide role composition.Experiments on multiple benchmarks demonstrate that DECOR consistently improves sample efficiency and overall performance over existing related methods.
    Agent-based and Multi-agent SystemsMulti-agent learning
  431. #4583

    Dual-Channel Hybrid Graph Neural Network for Mobility Social Relationship Inference

    Liangkun Chen, Xiang Li, Guiyuan Jiang, Zhongying Zhao, Junyu Dong, Yanwei Yu
    Inferring latent social ties from large-scale spatiotemporal mobility traces is a foundational AI task with broad applicability. Existing hypergraph-based methods often model higher-order relations by treating hyperedges as static snapshots, thus failing to capture the temporal dynamics and co-evolution of user interactions. Meanwhile, many approaches still struggle to distinguish stable social gatherings from transient noisy co-occurrences. To address these challenges, we propose a novel Dual-Channel Hybrid Graph Neural Network (HyGNN) that jointly models temporal dynamics and high-order structural dependencies. The framework consists of two complementary components: (1) a temporal meeting graph channel that decomposes multi-user interactions into ordered snapshots and aligns trajectory evolution through message propagation, and (2) a gathering hypergraph channel that uses a structure-aware encoder with homogeneity-guided aggregation to filter noise in high-order co-occurrences. A mutual-information-aware fusion module integrates both views while preserving their distinct semantics. Extensive experiments across three real-world datasets show that HyGNN consistently outperforms state-of-the-art baselines in PRAUC and ROCAUC metrics, and robustly captures the nonlinear interplay between mobility and sociality. The code is available at https://github.com/KunLiangChen/HyGNN.
    Data MiningMining graphsData MiningMining spatial and/or temporal data
  432. #4588

    Bounding the Inefficiency of Risk-Averse Selfish Routing with Nonadditive CVaR

    Zhaoqi Zang, David Z. W. Wang
    Modern AI-driven systems rely on large populations of autonomous agents that make decentralized routing decisions under uncertainty, where rare but severe tail latency can critically degrade quality of experience, safety, and reliability. To model agents’ aversion to such tail latency, we study nonatomic selfish routing games in which agents minimize the Conditional Value-at-Risk (CVaR) of path latency. CVaR explicitly captures both the likelihood and severity of tail latency, but its inherent nonadditivity across network edges poses a fundamental challenge. We address it by identifying a worst-case dependence structure—tail risk concentration—under which tail latency across network edges is synchronized. We show that CVaR penalizes this dependence structure and becomes additive under worst-case tail dependence, which enables tight inefficiency analysis. To quantify the resulting inefficiency induced by risk-aversion, we adopt the price of risk aversion (PRA), defined as the worst-case ratio between the total system cost at a risk-averse equilibrium and that at a risk-neutral equilibrium. We show that, for arbitrary latency functions and general network topologies, the PRA under CVaR admits a tight upper bound that grows linearly with both the network size and the maximum edge-level upper-tail cost. We further prove that this bound is tight by constructing a family of Braess-type networks that achieve a matching lower bound. These results provide the first tight worst-case inefficiency bound for CVaR-based selfish routing and offer insights into how tail-risk-averse decision making by autonomous agents amplifies congestion externalities in large-scale multi-agent systems.
    Agent-based and Multi-agent SystemsAgent theories and modelsGame Theory and Economic ParadigmsComputational social choiceGame Theory and Economic ParadigmsNoncooperative gamesPlanning and SchedulingPlanning under uncertaintyUncertainty in AIDecision and utility theory
  433. #4589

    When Evidence Falls Short: Router-Guided Fake News Detection with Pattern Augmentation

    Yujing Wang, Xiaobao Wang, Yiqi Dong, Yueheng Sun, Di Jin, Dongxiao He
    With the growing complexity of online information, trustworthy fake news detection has become increasingly critical. Although Large Language Models (LLMs) exhibit a strong ability to leverage factual evidence for verification, they remain highly vulnerable to unreliable, noisy, or scarce evidence, undermining robustness in real-world scenarios. Given the generalizability of deceptive patterns in fake news, we consider pattern as a complementary signal under insufficient evidence during factual verification. However, due to LLMs' lack of expertise in deception-specific patterns, realizing such effective collaboration remains challenging. To address these issues, we propose a Router-Guided Fake News Detection Framework with Pattern Augmentation (RGPA). Specifically, we introduce a hierarchical routing mechanism including a case router and an external evidence router. It guides news to appropriate reasoning paths adaptively based on a multi-dimensional quality assessment, prioritizing high-quality evidence while mitigating noise. Furthermore, we design an expert model to capture deceptive features and integrate them into LLMs' reasoning, enabling a synergy of factual verification and pattern awareness under evidence-scarce scenarios. Extensive experiments on two real-world datasets demonstrate that RGPA significantly outperforms existing approaches.
    Data MiningMining graphsData MiningMining text, web, social media
  434. #4590

    COMPACT: Common-token Optimized Model Pruning Across Channels and Tokens

    Eugene Kwek, Wenpeng Yin
    Improving the memory efficiency, throughput, and serving cost of large language models (LLMs) is critical for edge deployment, interactive applications, and sustainable inference. Pruning is a promising approach, but existing methods have limitations: width pruning disrupts the standard transformer architecture and requires custom inference code, while depth pruning causes abrupt accuracy drops. Moreover, many approaches that work well for LLMs fail to preserve performance on small language models (SLMs). We propose COMPACT, which jointly prunes (i) rare vocabulary to shrink embedding layers and (ii) FFN intermediate channels using common-token–weighted activations aligned with the post-pruning token distribution. COMPACT inherits strengths of both depth and width pruning, such as deployment-friendliness, scale-adaptivity, competitive pruning speed, and strong inference performance. Experiments on several LLM families (0.5B–70B) show state-of-the-art downstream performance, with substantial improvements in inference throughput and GPU memory. Project code is at https://github.com/ekwek1/COMPACT.
    Natural Language ProcessingLanguage models
  435. #4597

    Causal Path Alignment: Anchoring the Optimization Trajectory for Controllable In-Parameter Knowledge Editing

    Xiyu Liu, Zhengxiao Liu, Naibin Gu, Zheng Lin, Weiping Wang
    Knowledge editing is pivotal for efficiently updating the parametric memory of Large Language Models (LLMs), enabling them to function as evolving agents in dynamic environments. However, mainstream in-parameter knowledge editing approaches suffer from Subject-Dominant Memory Interference: modifying a specific fact inadvertently corrupts the broader structural knowledge associated with the same subject within LLMs. We diagnose the root cause as a shortcut learning pathology, where the optimization objective overfits subject representations while bypassing the essential relational context. To rectify this, we propose Causal Path Alignment (CPA), a principled framework designed to anchor the optimization trajectory to valid causal pathways. CPA enforces parameter updates to route through relation-aware intermediate states, thereby preventing the erasure of contextual dependencies. Experimental results across diverse LLM backbones demonstrate that CPA consistently eliminates the shortcut, significantly improving relation specificity while exhibiting minimal side-effects. Moreover, CPA serves as a model-agnostic plug-in for existing editors, paving the way for reliable and trustworthy in-parameter knowledge editing.
    Natural Language ProcessingApplicationsNatural Language ProcessingInterpretability and analysis of models for NLP
  436. #4619

    HyperXRec: Unifying Preference Clusters and LLM Experts for Robust Explainable Recommendations

    Xue Han, Zhiwen Luo, Zhixiang Li, Nizar Bouguila, Weifeng Su, Wentao Fan
    Explainable recommendation is crucial for building user trust, yet producing natural-language rationales that faithfully reflect the underlying decision process remains challenging. Most LLM-based explainable recommenders incorporate collaborative signals through shallow prompting or lightweight adapters, which often yields generic explanations that are only loosely grounded in preference evidence, particularly when interactions are sparse. To address this gap, we propose HyperXRec, a novel framework that unifies preference modeling and explanation generation by integrating hyperspherical latent clustering with a cluster-guided mixture-of-experts (MoE) inside an LLM. Specifically, HyperXRec employs a dual-VAE with a von Mises-Fisher mixture prior to learn robust hyperspherical user/item embeddings, resulting in stable and semantically coherent clusters even under sparse interaction settings. These clusters are subsequently leveraged to guide fine-grained expert routing in the LLM, encouraging explanations that are both personalized and explicitly aligned with the learned preference structure. Experiments on multiple benchmarks demonstrate that HyperXRec consistently outperforms strong baselines in explanation fidelity, personalization, and robustness in cold-start scenarios.
    Data MiningRecommender systemsMachine LearningClusteringMachine LearningExplainable/Interpretable machine learning
  437. #4620

    Efficient Algorithms for Influence Maximization in General Models and Observed Cascades

    Fabian Spaeh, Themistoklis Haris, Alina Ene, Huy L. Nguyen
    We study influence maximization in general stochastic models, the observed cascades model, and the independent cascade (IC) model. For general stochastic models with only black-box sample access, we introduce a low-adaptivity optimization framework that improves sample complexity and running time over Sadeh et al. (2020) and is instrumental to all our results. We further introduce an adaptive algorithm guided by empirical variance, avoiding pessimistic worst-case bounds. Combining our optimization framework with sketching, we obtain the first algorithm with provable guarantees and nearly-linear running time for influence maximization on observed cascades, optimal up to logarithmic factors. For IC, we prove a novel tail bound replacing a factor n with 𝜏 (the number of diffusion steps) in sample complexity, improving over prior work when 𝜏 is small, as is common due to small-world phenomena. Experiments confirm substantial speedups while maintaining solution quality.
    Agent-based and Multi-agent SystemsResource allocationData MiningMining graphsSearchCombinatorial search and optimisation
  438. #4623

    Explaining, Verifying and Aligning Semantic Hierarchies in Vision-Language Model Embeddings

    Gesina Schwalbe, Mert Keser, Moritz Bayerkuhnlein, Edgar Heinert, Annika Mütze, Marvin Keller, Sparsh Tiwari, Georgii Mikriukov, Diedrich Wolter, Jae Hee Lee, Matthias Rottmann
    Vision-language model (VLM) encoders such as CLIP enable strong retrieval and zero-shot classification in a shared image–text embedding space, yet the semantic organization of this space is rarely inspected. We present a post-hoc framework to explain, verify, and align the semantic hierarchies induced by a VLM over a given set of child classes. First, we extract a binary hierarchy by agglomerative clustering of class centroids and name internal nodes by dictionary-based matching to a concept bank. Second, we quantify plausibility by comparing the extracted tree against human ontologies using efficient tree- and edge-level consistency measures, and we evaluate utility via explainable hierarchical tree-traversal inference with uncertainty-aware early stopping (UAES). Third, we propose an ontology-guided post-hoc alignment method that learns a lightweight embedding-space transformation, using UMAP to generate target neighborhoods from a desired hierarchy. Across 13 pretrained VLMs and 4 image datasets, our method finds systematic modality differences: image encoders are more discriminative, while text encoders induce hierarchies that better match human taxonomies. Overall, the results reveal a persistent trade-off between zero-shot accuracy and ontological plausibility and suggest practical routes to improve semantic alignment in shared embedding spaces.
    AI Ethics, Trust, FairnesBiasAI Ethics, Trust, FairnesExplainability and interpretabilityAI Ethics, Trust, FairnesTrustworthy AIComputer VisionStructural and model-based approaches, knowledge representation and reasoningMachine LearningClustering
  439. #4648

    Text-Graph Synergy: A Bidirectional Verification and Completion Framework for RAG

    Jiarui Zhong, Hong Cai Chen
    Retrieval-Augmented Generation (RAG) has become a core paradigm for enhancing factual grounding and multi-hop reasoning in Large Language Models (LLMs). Traditional text-based RAG often retrieves logically irrelevant pseudo-evidence, while graph-based RAG is frequently hindered by search-time pruning, which may discard potentially valid reasoning paths. Existing hybrid approaches primarily adopt simple evidence concatenation or unidirectional enhancement, which fails to address the fundamental "Information Island" problem caused by asymmetric reasoning flows between unstructured text and structured graphs. We propose TGS-RAG, a unified framework for Text-Graph Synergistic enhancement. TGS-RAG introduces a bidirectional mechanism: (i) a Graph-to-Text channel that employs a Global Voting strategy from visited graph nodes to re-rank and refine textual evidence, filtering out semantic noise; and (ii) a Text-to-Graph channel that utilizes the Memory-based Orphan Entity Bridging algorithm. This algorithm utilizes textual cues to proactively resurrect valid but previously pruned reasoning paths from the search history without additional database overhead. Experimental results on multiple multi-hop reasoning benchmarks demonstrate that TGS-RAG significantly outperforms state-of-the-art baselines, achieving a superior balance between retrieval precision and computational efficiency.
    Natural Language ProcessingInformation retrieval and text miningNatural Language ProcessingQuestion answeringData MiningKnowledge graphs and knowledge base completionNatural Language ProcessingInformation extraction
  440. #4650

    Counterfactual Estimation via Temporal-Aware Intervention Networks

    Xin Wang, Chi Luo, Shengfei Lyu, Yi Wan, Xiren Zhou, Xiangyu Wang, Huanhuan Chen
    Accurate estimation of time-varying treatment effects is crucial for optimizing interventions in personalized medicine. However, observational data often contains complex confounding bias and temporal complexities, making counterfactual estimation challenging. We propose Counterfactual Estimation via Temporal-Aware Intervention Networks (TAIN), a novel model that introduces an Intervention-aware Functional Convolution kernel to emphasize the role of treatments and capture complex temporal treatment interactions. TAIN addresses confounding bias from a domain generalization perspective, approximating the unknown target domain using adversarial examples and incorporating Sharpness-Aware Minimization to derive a generalization bound. This approach is more suitable for longitudinal settings compared to existing methods inspired by domain adaptation techniques due to inherent differences between static and longitudinal contexts. Experiments on simulated datasets demonstrate TAIN's superior performance compared to state-of-the-art models for counterfactual estimation over time.
    Knowledge Representation and ReasoningCausalityMachine LearningCausalityUncertainty in AICausality, structural causal models and causal inference
  441. #4693

    Sparsification Under Siege: Dual-Level Defense Against Poisoning in Communication-Efficient Federated Learning

    Zhiyong Jin, Runhua Xu, Chao Li, Yizhong Liu, Jianxin Li, James Joshi
    Gradient sparsification, while mitigating communication bottlenecks in Federated Learning (FL), fundamentally alters the geometric landscape of model updates. We reveal that the resultant high-dimensional orthogonality renders traditional Euclidean-based robust aggregation metrics mathematically ambiguous, creating a sparsity-robustness trade-off that adversaries exploit to bypass detection. To resolve this structural dissonance, we propose SafeSparse, a consensus restoration framework that decouples defense into topological and semantic dimensions. Unlike prior arts that treat sparsification and security orthogonally, SafeSparse introduces: (1) a Structure-Aware Calibration mechanism utilizing Jaccard similarity to filter topological outliers induced by index poisoning; and (2) a Directional Semantic Alignment module employing density-based clustering on update signs to neutralize magnitude-invariant attacks. Theoretically, we establish convergence guarantees for SafeSparse. Extensive experiments across multiple datasets and attack scenarios demonstrate that SafeSparse recovers up to 25.7% global accuracy under coordinated poisoning, effectively closing the vulnerability gap in communication-efficient FL.
    AI Ethics, Trust, FairnesSafety and robustnessAI Ethics, Trust, FairnesTrustworthy AI
  442. #4700

    Streamlining Long-Chain Reasoning via Differentiable Hierarchical Fusion

    Chuangen Gao, Wenlun Zhang, Shang Wang, Shuyang Gu
    While large-scale reasoning models have achieved remarkable performance gains by scaling test-time computation through extended Chain-of-Thought (CoT) sequences, their practical utility is severely constrained by protracted inference lengths and high computational latency. In this paper, we present Differentiable Hierarchical Fusion (DHF), a novel framework that merges reasoning models with efficient base models via differentiable optimization to produce concise, accurate outputs. We introduce a dual-factor adaptive weighting mechanism to capture intra-block (Attention vs. MLP) variance and inter-block (shallow vs. deep layers) importance hierarchies, thereby addressing key limitations of static merging heuristics. Specifically, DHF optimizes fusion coefficients using gradient descent on a loss function that jointly minimizes cross-entropy (for accuracy) and response length (for conciseness). Furthermore, we enhance an outlier-aware initialization strategy to seed coefficients based on activation density and construct multi-calibration datasets to improve the model’s generalization ability. Comprehensive evaluations on Qwen and LLaMA models across six reasoning benchmarks show that DHF reduces the average response length by 55% while boosting accuracy by 2.8%–6% compared to existing state-of-the-art merging methods.
    Natural Language ProcessingLanguage modelsNatural Language ProcessingResources and evaluation
  443. #4712

    Interactive All-in-One Image Restoration and Fusion

    Bing Cao, Qiang Zhang, Xingxin Xu, Pengfei Zhu
    Supervised infrared-visible image fusion (IVIF) often overfits limited training distributions, creating a critical generalization gap under open-world degradations (rain, haze, low light, noise, blur). To address this issue, we propose AIR-Fusion, a parameter-efficient adaptation of a frozen, restoration-capable latent diffusion backbone for degraded IVIF without full fine-tuning, transferring restoration priors for robust fusion. A Cross-Modal Bridging Adapter (CMBA) aligns infrared cues and textual instructions with the frozen diffusion conditioning space and injects them into multi-scale denoising features to steer instruction-guided restoration-aware fusion. In addition, a Trajectory-Constrained Rectifier (TCR) regularizes stochastic sampling via a pixel-latent closed loop, rectifying intermediate predictions with source-referenced structures and re-encoding them to stabilize the denoising trajectory and recover fine details suppressed by latent compression. Experiments across multiple datasets and degradation settings show consistent improvements in restoration quality and fusion fidelity, with strong generalization under complex and compounded degradations.
    Computer VisionMachine learning for visionComputer VisionMultimodal learning
  444. #4742

    CASE-Net: Deep Spatio-Temporal Representation Learning via Causal Attention and Channel Recalibration for Multivariate Time Series Classification

    Fan Zhang, Yating Cui, Hua Wang
    Multivariate time series (MTS) classification is foundational to pervasive computing and financial analysis, yet existing multi-scale paradigms are often constrained by suboptimal representation fidelity. We identify two critical bottlenecks: temporal non-causality in standard encoders that induces temporal confounding in non-stationary dynamics, and the absence of explicit channel saliency mechanisms that allows noise to contaminate the latent space. To address these challenges, we propose the Causal Attention and Spatio-temporal Encoder Network (CASE-Net), an architecture designed for structural manifold pre-conditioning. CASE-Net synergizes a Causal Temporal Encoder, which enforces physical arrow-of-time constraints via masked self-attention and causal convolutions, with an Adaptive Channel Recalibration module functioning as an information bottleneck to suppress detrimental noise. Comprehensive evaluations across six heterogeneous domains demonstrate that CASE-Net establishes new state-of-the-art benchmarks on four tasks, achieving a peak accuracy of 98.6% on the AWR dataset and superior robustness in non-stationary regimes.
    Data MiningMining spatial and/or temporal dataMachine LearningDeep learning architecturesMachine LearningTime series and data streams
  445. #4743

    From Diversity to Uniformity: Cross-modal Time Series Modeling with Dependent Channel Grouping

    Minjun Cao, Hao Miao, Wentao Zhang, Senzhang Wang
    Emerging foundation models have spurred growing interest in task-unspecific time series modeling, which can accommodate data from diverse domains and support various tasks. However, most existing methods still suffer from poor adaptability and generalization across cross-domain time series with varying feature dimensionalities. This study achieves unified time series modeling to transcend task and data boundaries, inspired by the recent advances in visual foundation models. We propose a universal cross-modal Time series modeling method named TimeIG, featuring Tailored Temporal Imaging and Dependent Channel Grouping. Specifically, TimeIG adopts the novel dependent channel grouping method to project heterogeneous time series into hierarchical spaces to identify representative features and learn inter-dependencies across variables. Then, a tailored temporal imaging method is proposed to generate carefully hand-crafted images from time series, facilitating complementary temporal-spatial feature extraction. Finally, a dual-branch module is designed to simultaneously model the sequential and visual data, enabling cross-modal alignment via the proposed Cross-Modal Attention and Dynamic Weighted-Averaging. Extensive experiments on real-world cross-domain time series datasets show that TimeIG consistently outperforms existing SOTA methods. The code of TimeIG is publicly available at https://github.com/missCmj/TimeIG.
    Data MiningMining spatial and/or temporal dataAIData Mining
  446. #4746

    Unrestricted Targeted Deep Hashing Attack via Contrastive Latent Diffusion

    Fan Yang, Chuan Ma, Yuhui Zheng, Xiaobo Shen, Joey Tianyi Zhou
    Deep hashing is widely used for large-scale image retrieval but remains vulnerable to adversarial examples, raising practical security concerns. Existing targeted adversarial attacks on deep hashing typically rely on lp-norm constrained perturbations, which struggle to balance attack effectiveness and imperceptibility, often requiring perceptible noise and limiting their practicality in real-world retrieval scenarios. We propose UTDHA, the first unrestricted targeted attack for deep hashing models using contrastive-guided latent diffusion. UTDHA generates adversarial examples with a latent diffusion model and performs optimization in the latent space rather than the pixel space, enabling semantic manipulation while preserving image naturalness. Through contrastive guidance, the attack pulls adversarial examples toward the target label while pushing them away from non-target labels. Meanwhile, UTDHA enforces structural and perceptual consistency, producing adversarial examples that are both imperceptible and visually natural. Extensive experiments on three benchmarks demonstrate that UTDHA outperforms existing targeted adversarial attack baselines for deep hashing models in both attack effectiveness and imperceptibility.
    Computer VisionImage and video retrievalComputer VisionAdversarial learning, adversarial attack and defense methods
  447. #4754

    PhyRE-Net: A Physics-Anchored Network with Active Spatial Gating for Industrial Infrared Thermal Forecasting

    Shuotong Yang, Haitao Zhang
    Spatiotemporal forecasting of infrared thermal fields is a critical technology for the predictive maintenance of industrial electrical control equipment. However, many existing methods are designed under index-based temporal representations and global regression objectives, which are not well aligned with schedule-driven, non-stationary industrial regimes and the long-tailed spatial distribution of safety-critical hotspots. To address these issues, this paper proposes PhyRE-Net, a physical-time-anchored network architecture featuring active spatial gating. Specifically, we first design the multi-period resolution rotary positional embeddings mechanism within the backbone, which injects absolute temporal features into attention geometry to align predicted phases with non-stationary production schedules. We then devise a reinforcement learning-driven spatial gating mechanism to dynamically modulate feature responses, actively shifting the computational focus from the dominant background to sparse but critical thermal anomalies. Extensive experiments on a large-scale real-world dataset covering seven types of heterogeneous electrical equipment show that PhyRE-Net achieves consistent improvements over strong baselines across all evaluated metrics. The source code is available at https://github.com/YST-10/PhyRE-Net.
    Machine LearningApplicationsMachine LearningTime series and data streamsMachine LearningReinforcement learning
  448. #4758

    Similarity-Guided Structural Matching Learning for Graph Dataset Condensation

    Yiyang Zhang, Yutong Ye, Yingbo Zhou, Nan Zhang, Xiang Lian, Mingsong Chen
    As graph repositories grow in scale and diversity, training Graph Neural Networks (GNNs) becomes computationally demanding. However, existing graph condensation methods often fail to retain the intrinsic structural patterns of the original graphs, which are essential in graph-based learning. Therefore, these methods suffer from limited performance and poor generalization in downstream tasks due to the loss of structural information. To address this, we propose Similarity-guided Structural Matching Learning for Graph Dataset Condensation (SSGDC), which efficiently reduces repository size while maintaining both task performance and structural information. Our approach introduces a similarity-based graph selector to identify high-quality subsets for condensation. The condensed graphs are optimized using a dual-objective loss that combines gradient-matching for task alignment with a metric-learning loss for structural preservation within the selected subset. This ensures that the condensed dataset retains both task-relevant information and the essential relational topology that supports GNN training and enhances generalization. Experiments demonstrate that our method achieves higher accuracy and better structure retention across varying condensation ratios, highlighting the critical role of structural preservation in graph dataset condensation.
    Machine LearningApplications
  449. #4768

    Active Diffusion-Based Inference for Ill-Posed Inverse Problems Under Incomplete Priors

    Jitao Xu, Nobuo Sato, Yaohang Li
    Many scientific and engineering applications require estimating unknown parameters from experimentally observable data -- an inverse problem that is inherently challenging due to nonlinearity, noise, and ill-posedness. In this paper, we propose an active diffusion-based inverse problem solver. A diffusion model is trained to learn the mapping between the parameter space and the observable space. By iteratively detecting and correcting model misspecification through posterior uncertainty, the method discovers and learns the correct region of parameter space, even when initial training bounds exclude the true parameters. This provides a principled, Bayesian justification for adaptive domain augmentation and ensures robust inference for inverse problems under incomplete prior knowledge. We demonstrate the effectiveness of our inverse solver for a toy inverse problem with infinite solutions, and for the parameterization of the quantum correlation functions to event observables in a Quantum Chromodynamics analysis of nucleon structure.
    Machine LearningActive learningMachine LearningBayesian learningMachine LearningGenerative modelsMultidisciplinary Topics and ApplicationsPhysical sciences
  450. #4779

    ReSyn: A Generalized Recursive Regular Expression Synthesis Framework

    Seongmin Kim, Hyunjoon Cheon, Su-Hyeon Kim, Yo-Sub Han, Sang-Ki Ko
    Existing Programming-By-Example (PBE) systems often rely on simplified benchmarks that fail to capture the high structural complexity—such as deeper nesting and frequent Unions—of real-world regexes.
    To overcome the resulting performance drop, we propose ReSyn, a synthesizer-agnostic divide-and-conquer framework that decomposes complex synthesis problems into manageable sub-problems.
    We also introduce Set2Regex, a parameter-efficient synthesizer capturing the permutation invariance of examples.
    Experimental results demonstrate that ReSyn significantly boosts accuracy across various synthesizers, and its combination with Set2Regex establishes a new state-of-the-art on challenging real-world benchmark.
    The complete source code, datasets, and pre-trained model checkpoints are publicly available at https://github.com/mrseongminkim/ReSyn.
    Knowledge Representation and ReasoningAutomated reasoning and theorem provingKnowledge Representation and ReasoningComputational complexity of reasoningKnowledge Representation and ReasoningKnowledge representation languagesKnowledge Representation and ReasoningLearning and reasoning
  451. #4798

    Trajectory-Consistent Denoising Diffusion Codebook Models for Zero-Shot High-Fidelity Image Compression at Ultra-Low Bitrates

    Fang Zhang, Linli Xu
    Denoising Diffusion Codebook Models (DDCM) have emerged as a promising framework for zero-shot image compression by replacing stochastic sampling with discrete selection from a reproducible Gaussian codebook. By greedily picking noise vectors that best match the target image, DDCM encodes the generative trajectory into a compact sequence of indices for image compression. However, the target-guided noise selection provides only coarse guidance as the discrepancy between the denoiser's output and the target image is substantial. This discrepancy inevitably results in trajectory drift, necessitating massive codebooks to span the enlarged search space and extensive inference steps to gradually compensate for the deviation. In this paper, we propose the Trajectory-Consistent Diffusion Codebook Model (TC-DDCM) to address these inefficiencies. Rather than approximating the distant target, TC-DDCM navigates the generative process by strictly tracking a pre-computed inversion trajectory. This paradigm shift yields two critical advantages: 1) By minimizing the deviation to the proximal reference state along the trajectory, the search space is drastically shrunk, resulting in significantly smaller codebooks. 2) By aligning the generative steps with inversion methods, TC-DDCM achieves optimal reconstruction in a few inference steps. Extensive experiments demonstrate that TC-DDCM significantly outperforms state-of-the-art zero-shot methods in rate-distortion-perception performance at ultra-low bitrates, making zero-shot diffusion-based compression practical.
    Computer VisionEfficiency and OptimizationComputer VisionLow-level VisionComputer VisionRepresentation learning
  452. #4800

    Fairness k-Submodular Maximization Subject to Matroid Constraint

    Tan D. Tran, Canh V. Pham, Phuong N. H. Pham
    Fairness k-submodular maximization has attracted increasing interest due to its broad relevance in artificial intelligence and machine learning. However, most existing work is limited to monotone objectives or simple size constraints, while the non-monotone setting with richer constraints remains largely unexplored. In this paper, we first introduce a constant-ratio approximation algorithm for the problem under a general non-monotone objective function and a matroid constraint. Our approach is built upon a two-stage algorithmic framework. Specifically, we first develop an algorithm that guarantees feasibility with respect to upper fairness bounds only. We then show how this algorithm can be systematically extended to simultaneously enforce fairness bounds, while preserving provable approximation guarantees.
    Comprehensive experiments on standard benchmark datasets demonstrate that our algorithm achieves high-quality objective values while maintaining a favorable balance between fairness guarantees and query efficiency consistently outperforming existing state-of-the-art methods.
    Constraint Satisfaction and OptimizationConstraint learning and acquisitionConstraint Satisfaction and OptimizationConstraint optimization problemsConstraint Satisfaction and OptimizationConstraint satisfactionMachine LearningLearning theoryMachine LearningOptimization
  453. #4820

    GPD: Guided Progressive Distillation for Fast and High-Quality Video Generation

    Xiao Liang, Yunzhu Zhang, Linchao Zhu
    Diffusion models have achieved remarkable success in video generation; however, the high computational cost of the denoising process remains a major bottleneck. Existing approaches have shown promise in reducing the number of diffusion steps, but they often suffer from significant quality degradation when applied to video generation. We propose Guided Progressive Distillation (GPD), a framework that accelerates the diffusion process for fast and high-quality video generation.
    GPD introduces a novel training strategy in which a teacher model progressively guides a student model to operate with larger step sizes. The framework consists of two key components: (1) an online-generated training target that reduces optimization difficulty while improving computational efficiency, and (2) frequency-domain constraints in the latent space that promote the preservation of fine-grained details and temporal dynamics.
    Applied to the Wan2.1 model, GPD reduces the number of sampling steps from 48 to 6 while maintaining competitive visual quality on VBench. Compared with existing distillation methods, GPD demonstrates clear advantages in both pipeline simplicity and quality preservation.
    Computer VisionImage and video synthesis and generation
  454. #4825

    MGRec: Structure-Grounded Medication Recommendation via Condition-Aware Molecular Representation Learning

    Jinke Feng, Wenjie Du
    Medication recommendation plays a critical role in clinical decision-making by supporting personalized and safe treatment planning. Existing methods rely heavily on historical co-occurrence patterns and primarily optimize discrete prescription prediction objectives, limiting generalization in rare or emerging disease settings. We propose MGRec, a framework that shifts learning from discrete prescription prediction to condition-aware modeling in a continuous molecular representation space. MGRec treats molecular structure as the primary modeling target and employs a Therapeutic--Safety Factorized Conditional Variational Autoencoder to disentangle therapeutic and safety-related factors in a condition-aware molecular latent space. The model infers treatment-relevant molecular representations conditioned on current patient-specific clinical context, which are mapped to clinically approved medications for final recommendation. To improve clinical safety, we further introduce a DDI (drug-drug interaction)-guided latent regularization to integrate drug interaction knowledge at the representation level. Experiments on two real-world benchmarks demonstrate that MGRec achieves state-of-the-art accuracy and reduced interaction risk, particularly in data-sparse scenarios.
    Data MiningRecommender systemsMachine LearningApplicationsMultidisciplinary Topics and ApplicationsHealth and medicine
  455. #4842

    Instance-Aligned Semantic Reconstruction for Incomplete Multi-View Clustering

    Weiqing Yan, Yongteng Du, Peng Song, Chang Tang
    Incomplete multi-view clustering (IMVC) aims to exploit complementary information from multiple views with missing observations. Recent diffusion-based approaches have shown promise for view completion; however, they often fail to capture instance-aligned global correlations across views and suffer from inefficient inference and loosely coupled optimization. In this paper, we propose IASR, an Instance-Aligned Semantic Reconstruction framework for IMVC. IASR formulates missing-view recovery as a cross-view token alignment generator, in which noisy targets, observed views, and timestep embeddings are jointly represented as tokens and interact to capture long-range cross-view dependencies throughout the denoising trajectory. To stabilize representation learning under generative noise, we further introduce a stable–active dual-encoder representation architecture with a generative-adaptive contrastive learning strategy that tightly couples view completion and clustering. Extensive experiments on eight benchmark datasets demonstrate that IASR consistently outperforms state-of-the-art IMVC methods, especially under high missing-rate settings, while achieving improved inference efficiency.
    Machine LearningClusteringMachine LearningDeep learning architecturesMachine LearningMulti-view learning
  456. #4864

    Less Is More: Proportional Memory-Guided Differential-Attention MIL for Whole-Slide Image Classification

    Hongpeng Yang, Yingxin Chen, Shiqiang Ma, Fei Guo
    Whole-slide images (WSIs) provide gigapixel-scale visual evidence for cancer diagnosis, yet diagnostically relevant regions are typically sparse and embedded within large amounts of weakly informative tissue. Existing multiple instance learning methods often aggregate all patches by using attention mechanisms or sequence modeling, leading to redundant computation and limited adaptability to slides with highly variable sizes and lesion burdens. We propose Proportional Memory-guided Differential-attention Multiple Instance Learning (PMDMIL) network, a scalable MIL framework that follows a “less is more” principle for WSI classification. PMDMIL retains a fixed ratio of patches, pre-filters noise with a class-wise memory bank, and fuses the sparse survivors through differential attention to amplify subtle yet decisive morphological differences. Experiments on five public WSI benchmarks demonstrate that PMDMIL consistently outperforms representative MIL and sequence-based baselines across multiple evaluation metrics. Notably, on the DHMC dataset, our model can achieve competitive performance, while processing only 10% of patches, indicating that effective instance selection is more important than exhaustive aggregation for WSI classification.
    Computer VisionBiomedical image analysisData MiningOtherKnowledge Representation and ReasoningApplicationsMachine LearningApplicationsSearchOther
  457. #4883

    Continuous Test-Time Adaptation via Dual Alignment

    Boyuan Zhang, Jie Pan, Shuai Yang, Lichuan Gu
    Continual test-time adaptation (CTTA) seeks to adapt a source-pretrained model to continuously shifting target distributions. Recent advances in this field either minimize prediction entropy on streaming target samples, which is efficient but prone to error accumulation, or rely on teacher–student pseudo-labeling, which provides more stable predictions at the cost of high computational overhead. To alleviate these issues, in this paper, we propose Dual Alignment (D-Align), a method that jointly optimizes correlation alignment and masked consistency alignment for stable and efficient online adaptation. Specifically, correlation alignment constructs a reliable pseudo-source domain by selecting resilient target samples, maintaining their diversity via a dynamic memory bank, and aligning nonlinear feature relations through image-level visual domain prompt–driven alignment, enabling effective feature alignment without costly backpropagation. Masked consistency alignment enforces prediction consistency across multiple masked views of each target sample, promoting robust and discriminative target representations. Using multiple standard benchmarks, the experiments have verified the effectiveness of D-Align under dynamic domain shifts, in comparison with state-of-the-art methods.
    Machine LearningLearnware/model reuse/transfer learningMachine LearningRepresentation learning
  458. #4885

    SpatialSV: Internalizing Interpretable 3D Spatial Awareness in MLLMs via Task-Oriented Visual Supervision

    Jiayu Tang, Yuchen Zhou, Chao Gou
    Unlocking the spatial intelligence of multimodal large language models (MLLMs) is crucial for understanding and interacting with the 3D world. Prevailing approaches typically inject spatial priors via external tools, which impose significant inference overhead, or rely on latent feature distillation, which remains uninterpretable and lacks fine-grained geometric constraints. To address these issues, we propose SpatialSV, a framework designed to internalize robust 3D spatial awareness within MLLMs while simultaneously offering inherent interpretability. Deviating from passive feature imitation, SpatialSV employs task-oriented visual supervision, compelling the model to actively lift its 2D visual features into explicit 3D representations, including depth maps, camera poses, and point clouds. Crucially, this 2D-to-3D lifting process provides a transparent window into the model’s representations: the resulting 3D reconstructions serve as an intuitive proxy for visualizing and diagnosing the quality of the model’s intrinsic spatial knowledge. Extensive experiments across multiple benchmarks and models demonstrate the effectiveness of SpatialSV in enhancing and interpreting MLLMs’ spatial intelligence. Furthermore, the framework exhibits strong generalization in semi-supervised settings, validating its potential to leverage unlabeled visual data for scalable, interpretable spatial representation learning.
    Computer Vision3D computer visionComputer VisionMultimodal learningComputer VisionRepresentation learning
  459. #4900

    DivCon-NeRF: Diverse and Consistent Ray Augmentation for Few-Shot NeRF

    Ingyun Lee, Jae Won Jang, Seunghyeon Seo, Nojun Kwak
    Neural Radiance Field (NeRF) has shown remarkable performance in novel view synthesis but requires numerous multiview images, limiting its practicality in few-shot scenarios. Ray augmentation has been proposed to alleviate overfitting caused by sparse training data by generating additional rays. However, existing methods, which generate augmented rays only near the original rays, exhibit pronounced floaters and appearance distortions due to limited viewpoints and inconsistent rays obstructed by nearby obstacles and complex surfaces. To address these problems, we propose DivCon-NeRF, which introduces novel sphere-based ray augmentations to significantly enhance both diversity and consistency. By employing a virtual sphere centered at the predicted surface point, our method generates diverse augmented rays from all 360-degree directions, facilitated by our consistency mask that effectively filters out inconsistent rays. We introduce tailored loss functions that leverage these augmentations, effectively reducing floaters and visual distortions. Consequently, our method outperforms recent few-shot NeRF approaches on the Blender, LLFF, and DTU datasets. Furthermore, DivCon-NeRF demonstrates strong generalizability by effectively integrating with both regularization- and framework-based few-shot NeRFs.
    Computer Vision3D computer vision
  460. #4902

    GeoMind: Explicit Spatial Reasoning via Dual-Reference Geometric Modeling

    Xing Wei, Aoxiang Tian, Shaofan Liu, Jiansheng Peng, Chong Zhao, Xiang Bi, Yang Lu, Benhong Zhang, Fan Yang
    While Vision-Language Models (VLMs) excel at semantic understanding, they struggle to comprehend 3D spatial relationships from limited views. Their reliance on implicit geometric encoding often leads to severe hallucinations and inconsistencies in spatial reasoning tasks. To address this, we introduce GeoMind, a model-then-reason framework that employs a single LLM to autoregressively generate an explicit Geometric Description Language (GDL) map, serving as a grounded context to derive the final answer. This intermediate GDL map provides an explicit and queryable world representation. Leveraging this explicit representation, we enforce a strict referential constraint, compelling the model to ground reasoning solely on the instantiated entities to ensure referential integrity and auditability. Specifically, we lift multi-view observations into object-centric tokens using frozen geometric priors and instance masks. The LLM is trained via a two-stage curriculum with programmatic supervision to generate the GDL map as a prerequisite for answering. On five spatial understanding benchmarks in both image and video settings, GeoMind delivers average accuracy gains of +6.9% (2B) and +9.8% (8B) over Qwen3-VL baselines. Our results suggest that explicit geometric grounding enables robust spatial reasoning without human annotation, providing a scalable and practical route to stronger spatial intelligence in large VLMs.
    Computer VisionVision, language and reasoningHumans and AICognitive modeling
  461. #4922

    Boosting Knowledge Transfer and Retention with Brain-inspired Multi-View Incremental Learning

    Yuhong Chen, Zihan Fang, Huifeng Yin, Yujie Wu, Qi Xu, Lei Deng, Shiping Wang, Mingkun Xu
    Traditional multi-view learning models are primarily designed for static datasets with fixed views. However, in dynamic incremental view environments, this approach inevitably leads to view forgetting, where the introduction of new views weakens previously acquired knowledge. In contrast, the human brain exhibits remarkable memory retention and knowledge transfer capabilities when receiving objects described from different perspectives, with past experience further supplementing new insights. Inspired by underlying neural processing mechanisms, we propose a novel view incremental learning framework named Hebbian View Orthogonal Projection (HVOP). HVOP constructs a knowledge transfer space, where gradient updates are projected onto the orthogonal complement of historical representations, thereby mitigating interference between old and new views. By further incorporating recursive lateral connections and Hebbian learning rules, the proposed model imparts brain-like dynamic adaptability to the learning process, enhancing knowledge transfer and integration, thereby enabling stable knowledge transfer under evolving views. We validate HVOP on node classification tasks, demonstrating its superior performance in both knowledge retention and transfer compared to traditional methods. The results highlight the efficacy of biologically inspired mechanisms in mitigating the view forgetting phenomenon.
    Humans and AICognitive modeling
  462. #4929

    Adaptive TD-Lambda for Cooperative Multi-agent Reinforcement Learning

    Yue Deng, Zirui Wang, Yin Zhang
    TD($\lambda$) in value-based MARL algorithms or the Temporal Difference critic learning in Actor-Critic-based (AC-based) algorithms synergistically integrate elements from Monte-Carlo simulation and Q function bootstrapping via dynamic programming, which effectively addresses the inherent bias-variance trade-off in value estimation. Based on that, some recent works link the adaptive $\lambda$ value to the policy distribution in the single-agent reinforcement learning area. However, because of the large joint action space from multiple agents and the limited transition data in Multi-agent Reinforcement Learning, the policy distribution is infeasible to calculate statistically. To solve the policy distribution calculation problem in MARL settings, we employ a parametric likelihood-free density ratio estimator with two replay buffers instead of calculating statistically. The two replay buffers of different sizes respectively store the historical trajectories that represent the data distribution of the past and current policies. Based on the estimator, we assign Adaptive TD($\lambda$), \textbf{ATD($\lambda$)}, values to state-action pairs based on their likelihood under the stationary distribution of the current policy. We apply the proposed method on two competitive baseline methods, QMIX for value-based algorithms, and MAPPO for AC-based algorithms, over SMAC benchmarks and Gfootball academy scenarios, and demonstrate consistently competitive or superior performance compared to other baseline approaches with static $\lambda$ values.
    Agent-based and Multi-agent SystemsCoordination and cooperationAgent-based and Multi-agent SystemsMulti-agent learning
  463. #4942

    Disentangled Hypergraph Network with Implicit Structure Learning for Mobility Social Relationship Inference

    Jingjing Zhu, Xiang Li, Dongliang Chen, Haobing Liu, Yuan Cao, Yanwei Yu
    Inferring social relationships from users' mobile data holds significant value for personalized recommendations. Most methods model user interactions based on co-occurrence records, achieving impressive success in capturing social signals. However, despite these advancements, current techniques still face three primary challenges. First, compressing higher-order co-occurrences into a single representation space easily leads to semantic entanglement. Second, relying on deterministic topology overlooks unobserved homophily. Third, the intrinsic characteristics of the users themselves are frequently ignored. To address these challenges, we propose a Disentangled Hypergraph Network with Implicit Structure Learning for mobility social relationship inference, named DHISL. Specifically, it first captures individual spatiotemporal characteristics through feature-guided representation initialization, and then adopts a dual-branch framework: the explicit branch models social and spatiotemporal preference signals, while the implicit branch adaptively learns a sparse channel-wise relational structure via differentiable structure learning. Furthermore, we introduce dual constraints in a multi-channel disentangled space to achieve intra-channel consistency and inter-channel exclusivity. Experiments on four real-world datasets demonstrate that DHISL outperforms state-of-the-art models, achieving average improvements of 6.34% in ROCAUC and 6.39% in PRAUC. The source code is available at https://github.com/Tilamisu-zz/DHISL.
    Data MiningMining graphsData MiningMining spatial and/or temporal data
  464. #4944

    Learning Multi-Indicator Weights for Data Selection: A Joint Task-Model Adaptation Framework with Efficient Proxies

    Jingze Song, Zihao Chen, Wenqing Chen, Zibin Zheng
    Data selection is a key component of efficient instruction tuning for large language models, as recent work has shown that data quality often matters more than data quantity. Accordingly, prior studies have introduced various multi-dimensional heuristics to evaluate and filter instruction data. However,most existing methods rely on static task-agnostic and model-agnostic weighting schemes, which overlook the varying requirements of specific downstream tasks and the differing pre-existing capabilities of models. In this paper, we propose a framework for learning multi-indicator weights that jointly adapts data selection to both the downstream task and the specific model. Our method identifies optimal weight configurations without full-scale fine-tuning by utilizing in-context learning (ICL) signals on compact tiny-validation sets. These signals serve as efficient performance proxies that ensure high-fidelity evaluation at minimal computational cost. Experiments across multiple benchmarks and model families, including Mistral, Qwen, and Llama, show that the approach achieves performance comparable to or exceeding full-dataset tuning while using only 30% of the training samples on GSM8K. Furthermore, our analysis reveals a trade-off between semantic diversity and logical complexity in reasoning tasks, highlighting the necessity of joint task-model adaptation.
    AINatural Language ProcessingMachine LearningOptimizationNatural Language ProcessingLanguage modelsNatural Language ProcessingQuestion answeringNatural Language ProcessingResources and evaluation
  465. #4958

    Simulated Ignorance Fails: A Systematic Study of LLM Behaviors on Forecasting Problems Before Model Knowledge Cutoff

    Zehan Li, Yuxuan Wang, Ali El Lahib, Ying-Jieh Xia, Frederick Pi
    Evaluating LLM forecasting capabilities is constrained by a fundamental tension: prospective evaluation offers methodological rigor but prohibitive latency, while retrospective forecasting (RF)—evaluating on already-resolved events—faces rapidly shrinking clean evaluation data as SOTA models possess increasingly recent knowledge cutoffs. Simulated Ignorance (SI), prompting models to suppress pre-cutoff knowledge, has emerged as a potential solution. We provide the first systematic test of whether SI can approximate True Ignorance (TI). Across 470 competition-level questions and 9 models, we find that SI fails systematically: (1) cutoff instructions leave a 52% performance gap between SI and TI; (2) chain-of-thought reasoning fails to suppress prior knowledge, even when reasoning traces contain no explicit post-cutoff references; (3) reasoning-optimized models exhibit worse SI fidelity despite superior reasoning trace quality. These findings demonstrate that prompts cannot reliably "rewind" model knowledge. We conclude that RF on pre-cutoff events is methodologically flawed; we recommend against using SI-based retrospective setups to benchmark forecasting capabilities.
    Natural Language ProcessingLanguage modelsMachine LearningEvaluationMachine LearningBenchmarksNatural Language ProcessingInterpretability and analysis of models for NLPAI Ethics, Trust, FairnesTrustworthy AI
  466. #4963

    Sentiment-aware Rating-based Recommendation via Semantic-enhanced Item Alignment

    Yingjie Chen, Xiang Li, Dongliang Chen, Guoqing Chao, Zhongying Zhao, Yanwei Yu
    Leveraging review texts to mine deep user preferences is vital for recommendation. However, existing methods neglect the positive-negative counteraction and rely on noisy hard sentiment thresholds. Furthermore, the feature density asymmetry causes dense semantic features to overwhelm sparse collaborative signals. To address these issues, we propose the SENSE model for fine-grained sentiment-aware and structurally aligned recommendation. We employ a confidence-aware soft sentiment assignment and a sentiment exchange mechanism to quantify fine-grained preferences and explicitly model sentiment interaction. For feature fusion, we construct a collaborative-semantic graph and incorporate adaptive data augmentation. Finally, via anchor-guided cross-view alignment, we use the reliable behavior view as a topological anchor to force semantic alignment, thereby preserving semantic richness while preventing collaborative signals from being overwhelmed.
    Through experiments on four datasets, we verify that SENSE outperforms state-of-the-art methods. It achieves average improvements of 19.26% in MSE for rating prediction, 13.41% in Recall@10 and 13.89% in NDCG@10 for Top-N recommendation. Our source code is available at https://github.com/yingjie160/SENSE.
    Data MiningRecommender systems
  467. #4966

    H²SCAN: Adaptive Time Series Representation Learning via Heterogeneous Hypergraph Structure-aware Contrasts

    Biao Chen, Zijie Tang, Junhua Fang, Feng Lu, Lang Zhang, Pengpeng Zhao
    Learning universal representations for time series is fundamental for diverse downstream tasks. However, current approaches largely rely on handcrafted data augmentations, which may distort intrinsic temporal dynamics and structural regularities. In addition, most static representation learning frameworks struggle to cope with the non-stationary nature of real-world time series. To address these issues, we propose Heterogeneous Hypergraph Structure-aware Contrastive Adaptive Network (H²SCAN), a novel augmentation-free framework that derives contrastive supervision directly from graph topology. Specifically, H²SCAN constructs a heterogeneous hypergraph with three node types to capture multi-scale temporal characteristics. Building upon this representation, a meta-adaptation network is introduced to dynamically reweight heterogeneous hyperedges, enabling the model to adapt to distribution shifts in real-time. Finally, a structure-aware contrastive learning objective is employed to align latent representation similarity with the intrinsic hypergraph topology. Experiments on multiple benchmarks and cloud Kafka cluster datasets demonstrate that H²SCAN outperforms existing methods by modeling high-order multi-domain dependencies and preserving the semantic integrity of time series data.
    Data MiningMining heterogenous dataData MiningMining spatial and/or temporal dataData MiningParallel, distributed and cloud-based high performance miningKnowledge Representation and ReasoningQualitative, geometric, spatial, and temporal reasoningMachine LearningSelf-supervised Learning
  468. #4972

    Revisiting Hypernetwork in Model Heterogeneous Personalized Federated Learning

    Chen Zhang, Husheng Li, Xiang Liu, Linshan Jiang, Danxin Wang
    Recent personalized federated learning research focuses on heterogeneous models across clients. However, existing methods often rely on external data, model decoupling, and partial learning, which makes them sensitive to settings. In contrast, we revisit hypernetworks and leverage their strong generalization ability to propose the first practical method for personalized federated learning. We first propose a model heterogeneous personalized federated learning framework based on hypernetworks, MH-pFedHN, which quantifies clients with different architectures using customized embedding vectors and then generates client-specific model parameters through a server-side hypernetwork. Besides the shared feature extractor, our hypernetwork consists of multiple heads, where clients with similar numbers of parameters are assigned the same number of customized embedding vectors and consequently share the same head. This design enables knowledge sharing across different architectures and reduces the computation of parameter generation. To further enhance the hypernetwork’s learning and generalization, we propose MH-pFedHNGD, which introduces a lightweight yet effective plug-in global model. Our framework requires no external data and does not disclose client model architectures, thereby effectively ensuring security and demonstrating great potential. Experiments across various models and tasks demonstrate that our approach outperforms standard baselines and exhibits strong generalization performance. Our code is available at https://github.com/DangDang1895/MH-pFL
    Machine LearningFederated learning
  469. #4987

    ADC-GNN: Adaptive Dual-level Collaborative Graph Neural Networks for Graph Classification

    Wan Tang, Lu Bai, Lixin Cui, Ming Li, Hangyuan Du, Jing Li
    Most existing Graph Neural Networks (GNNs) rely on the node-level message passing or attention mechanisms to propagate and extract useful information. Although recent advances attempt to move beyond purely the node-level propagation by constructing high-level representations, these approaches are often constrained by pre-computed substructures or unidirectional bottom-up aggregations. Consequently, high-level structural semantics cannot effectively feed back to guide node representation learning, limiting the collaborative optimization between fine-grained features and macroscopic structural semantics. To address these limitations, we propose a novel Adaptive Dual-level Collaborative GNN (ADC-GNN) associated with an adaptive dual-level collaborative mechanism. We commence by introducing a set of global, learnable latent prototypes as high-level semantic references, and then employ a relaxed Sinkhorn algorithm to establish differentiable, non-collapsing assignments between nodes and prototypes. Based on these assignments, the ADC-GNN constructs high-level representations and enables interactions among them. We show that the ADC-GNN can inject the learned high-level information back into the node level, forming a closed-loop, bidirectional optimization process. Experiments demonstrate the superior performance of the proposed ADC-GNN on graph classification.
    Data MiningMining graphs
  470. #4991

    A Unified Spectral-Spatial Framework for GNNs: Balancing Over-Smoothing and Over-Squashing

    Xinya Qin, Lu Bai, Lixin Cui, Ming Li, Hangyuan Du, Jing Li
    Over-smoothing (OSM) and over-squashing (OSQ) are two fundamental phenomena that limit the performance of Graph Neural Networks (GNNs), yet a unified spectral-spatial understanding of these phenomena remains underexplored. In this paper, we adopt polynomial spectral filters as an analytical tool to establish a unified spectral-spatial framework for graph convolution and systematically characterize the effect of the polynomial order k on information propagation in GNNs. Within this framework, we reveal an intrinsic trade-off induced by the polynomial order. Specifically, higher-order filters enhance spectral expressiveness and alleviate OSM caused by the dominant low-frequency components. However, they also expand the spatial receptive field, thereby intensifying information compression and increasing the risk of OSQ. Based on this analysis, we provide a principled guideline for selecting the polynomial order and propose a Quadratic Spectral Graph Convolution Network (QS-GCN) for graph classification. Experiments demonstrate the effectiveness and robustness of the proposed method.
    Data MiningMining graphs
  471. #4992

    StreamMTS: Towards Streaming Multivariate Time Series Forecasting

    Binwu Wang, Jiaming Ma, Yudong Zhang, Pengkun Wang, Zhengyang Zhou, Xu Wang, Yang Wang
    Current mainstream research in multivariate time series (MTS) prediction often assumes that all data is static. However, real-world MTS data typically arrives continuously in a streaming manner, which we refer to as streaming MTS. The statistical characteristics and spatiotemporal graph topology of this data evolve over time, presenting two key challenges: the model's ability to adapt to new data distributions and the enhancement of cross-domain generalization capabilities. In this paper, we propose a streaming MTS prediction framework. We begin by designing a lightweight spatiotemporal causal learning model that captures generalizable causal spatiotemporal features from a decoupling perspective. Next, we introduce a framework to enhance the model's streaming learning capability, leveraging the adaptability of continual learning while strengthening cross-domain representation abilities. Specifically, we reformulate continual learning as a multi-task learning problem and present a multi-task optimization algorithm that identifies a set of Pareto-optimal solutions to address the inherent stability-plasticity dilemma in continual learning. Finally, we propose a topology-aware feature propagation strategy that disseminates well-trained node embedding features to unseen graph structures, thereby improving the model's cross-domain generalization. Results on real-world datasets demonstrate that our model achieves a 14.40\% improvement in performance, along with 18$\times$ and 51$\times$ enhancements in efficiency and memory usage, respectively.
    Data MiningMining spatial and/or temporal data
  472. #5019

    Safety-Aware Shared Autonomy via World-Model Constrained Planning

    Zechen Shi, Jiawei Ye, Zeyang Liu, Xingyu Chen, Xuguang Lan
    Safety-aware shared autonomy aims to enable an autonomous agent to collaborate with a human operator, completing tasks under safety constraints while maximally preserving human intent. However, existing methods often underperform in long-horizon tasks that require balancing task performance under safety constraints with the degree of intervention—particularly when accurately predicting future unsafe events is critical. This limitation largely stems from their reliance on the current state alone to decide when to intervene and how to modify actions. We propose World Model Assisted Safety Planning (WASP), a model-predictive shared autonomy framework for explicit safety constraint satisfaction. We first formulate a safety-aware world model, and leverage its online predictive reasoning to decide when to intervene by jointly assessing prospective safety violations and degradation in task performance, thereby intervening only when necessary while enforcing safety constraints at decision time. Once intervention is triggered, WASP plans a short-horizon residual correction using world model rollouts, filters out unsafe candidates, and executes the least-deviating correction among the remaining high-return options in a receding-horizon loop. Experiments across diverse vision-based safety-critical domains show that WASP substantially reduces safety violations while preserving task performance and reducing unnecessary interventions over prior shared autonomy baselines.
    Humans and AIHuman-AI collaborationHumans and AIHuman-computer interactionMachine LearningApplications
  473. #5040

    RM-Distiller: Exploiting Generative LLM for Reward Model Distillation

    Hongli Zhou, Hui Huang, Wei Liu, Chenglong Wang, Xingyuan Bu, Lvyuan Han, Fuhai Song, Muyun Yang, Wenhao Jiang, Hailong Cao, Tiejun Zhao
    Reward models (RMs) play a pivotal role in aligning large language models (LLMs) with human preferences. Due to the difficulty of obtaining high-quality human preference annotations, distilling preferences from generative LLMs has emerged as a standard practice. However, existing approaches predominantly treat teacher models as simple binary annotators, failing to fully exploit the rich knowledge and capabilities for RM distillation. To address this, we propose RM-Distiller, a framework designed to systematically exploit the multifaceted capabilities of teacher LLMs: (1) Refinement capability, which synthesizes highly correlated response pairs to create fine-grained and contrastive signals. (2) Scoring capability, which guides the RM in capturing precise preference strength via a margin-aware optimization objective. (3) Generation capability, which incorporates the teacher's generative distribution to regularize the RM to preserve its fundamental linguistic knowledge. Extensive experiments demonstrate that RM-Distiller significantly outperforms traditional distillation methods both on RM benchmarks and reinforcement learning-based alignment, proving that exploiting multifaceted teacher capabilities is critical for effective reward modeling. To the best of our knowledge, this is the first systematic research on RM distillation from generative LLMs.
    Natural Language ProcessingLanguage models
  474. #5046

    SMLDR: Spectral Memory Learner with Dual-Retrieval for Time Series Forecasting

    Zhenxin Li, Longquan Liao, Wenchang Zhang, Jiaying Zhang, Kaiwen Wei, Jiang Zhong, Linjiang Zheng
    Time series forecasting aims to predict future values using historical observations, which is crucial for many practical applications with complex temporal dynamics. Recent frequency-domain forecasting methods have utilized spectral representations to model periodicity, but they usually rely on an implicit assumption: periodic components and aperiodic residuals are orthogonal in the frequency domain. However, in practice, this assumption often does not hold due to issues such as spectral leakage, superposition of multiple periods, and the gradual periodic changes. This paper re-examines frequency-domain modeling from the perspective of spectral non-identifiability and proposes a Spectral Memory Learner with Dual Retrieval (SMLDR) for time series forecasting. Instead of modeling periodicity in the frequency dimension, SMLDR represents periodic components as reusable spectral prototypes stored in learnable memory. The dual retrieval mechanism extracts historical periodic components and predicts them in the frequency domain, while a lightweight time-domain branch is used to model aperiodic dynamics to predict future residuals.Extensive experiments on multiple time series forecasting benchmarks show that SMLDR consistently achieves excellent performance, and the learnable memory can capture complex periodicity.
    Machine LearningTime series and data streamsMachine LearningStructured predictionAIData Mining
  475. #5058

    Efficient Parallel Algorithms with Linear Queries for Non-Monotone Submodular Maximization

    Canh V. Pham, Tan D. Tran, Dung T. K. Ha, My T. Thai
    In this work, we propose the first constant-approximation algorithms,
    $\mathsf{LinAst}$ and $\mathsf{LinAtg}$, which simultaneously achieve optimal
    query complexity $O(n)$ and adaptive complexity $O(\log n)$ for non-monotone
    submodular maximization under a cardinality constraint $k$ over a ground set of
    size $n$. Specifically, compared with existing algorithms that attain the best
    known adaptive complexity of $O(\log n)$, our approach preserves this adaptivity
    while reducing the query complexity from $O(n \log k)$ to $O(n)$ and improving
    the approximation ratio from $0.172-\epsilon$ to $0.193-\epsilon$.
    Our algorithms are built upon $\mathsf{LinAdapt}$, which achieves a constant
    approximation ratio with $O(\log n)$ adaptive rounds and linear query
    complexity by requiring only $O(1)$ candidate guesses of the optimal value.
    We further introduce the $\mathsf{BoostAdapt}$ algorithm, which improves the
    approximation guarantee to $0.25-\epsilon$ with $O(\log n \log k)$ adaptive
    complexity and $O(n \log k)$ query complexity, based on a novel staggered greedy
    threshold framework that alternately constructs two disjoint solution sets over
    $O(\log k)$ sequential rounds. Extensive experiments on standard benchmark
    datasets demonstrate that our algorithms consistently outperform
    state-of-the-art methods in terms of solution quality, query complexity, and running time.
    Constraint Satisfaction and OptimizationConstraint optimization problemsConstraint Satisfaction and OptimizationConstraint satisfactionMachine LearningOptimizationMachine LearningTheory of deep learning
  476. #5060

    FairTCD: Dual-Teacher Temporal Contrastive Distillation for Twofold Fair Dynamic Graph Embedding

    Yuxuan Gu, Yicong Li, Weiwei Yuan, Tianzi Zang, Donghai Guan, Jason J. Jung, Jie Zhao, Yongbo Ma
    Fair dynamic graph embedding is crucial for real-world systems, such as recommendation and social networks. Prior studies impose a single-axis fairness formulation, treating attribute and structural bias as separable artifacts. This overlooks their coupling relationship, under which debiasing along one axis can induce cross-axis amplification. This coupling further introduces opposing gradient constraints under joint optimization, leading to optimization conflicts. Furthermore, the evolution of dynamic graphs causes shifts in bias distribution, leading to unstable optimization and exacerbating these conflicts. To address these issues, we propose FairTCD, a Fair Dual-Teacher Temporal Contrastive Distillation framework. FairTCD employs two adversarial fairness teachers to decouple attribute and structural fairness representations. To reconcile dynamic conflicts between two fairness objectives, we introduce a temporal contrastive distillation to induce consistency between attribute and structural fairness representations across time while retaining temporal semantics. A unified student model distills complementary knowledge from both teachers to achieve twofold fairness. Experiments on three real-world benchmarks demonstrate that FairTCD preserves performance while improving twofold fairness metrics by at least 4.53%.
    AI Ethics, Trust, FairnesBiasData MiningMining graphs
  477. #5068

    Self-Improving Autonomous Vehicles via Real-World Reinforcement Learning

    Daehyeok Kwon, Seung-Woo Seo, Sang-Hyun Lee
    End-to-end autonomous driving systems have demonstrated advantages over traditional modular systems. Despite this progress, these end-to-end systems still struggle to be deployed in real-world driving environments, as they inevitably encounter undertrained scenarios in which autonomous vehicles may take unsafe actions. Reinforcement Learning (RL) provides a theoretical framework for addressing this challenge by enabling autonomous vehicles to self-improve: continuously collecting additional scenarios and learning from them. However, training autonomous vehicles with RL is not straightforward in the real world. Collecting real-world driving data involves costly interactions with the environment, and significant human intervention is required both to prevent autonomous vehicles from entering unsafe states and to reset them for subsequent episodes. In this paper, we introduce a novel real-world RL algorithm that allows autonomous vehicles to collect informative scenarios and learn from them with minimal human intervention. Our algorithm considers the learning progress of autonomous vehicles to identify informative scenarios and abort episodes before they enter unsafe states. To evaluate our algorithm, we introduce challenging urban driving tasks that require autonomous vehicles to reset themselves to initial states. The experimental results show that our real-world RL algorithm outperforms baselines with much less human intervention.
    Machine LearningReinforcement learningMultidisciplinary Topics and ApplicationsTransportationRoboticsLearning in robotics
  478. #5071

    Fin-PRM: A Domain-Specialized Process Reward Model for Financial Reasoning in Large Language Models

    Jie Zhu, Yuanchen Zhou, Shuo Jiang, Junhui Li, Lifan Guo, Feng Chen, Chi Zhang
    Process Reward Models (PRMs) supervise intermediate reasoning steps in large language models (LLMs), but existing PRMs are mainly trained on general-domain data and struggle with the structured, symbolic, and fact-sensitive nature of financial reasoning. Financial tasks require not only correct final answers but also verifiable intermediate steps grounded in domain knowledge. In this paper, we propose Fin-PRM, a domain-specialized, trajectory-aware PRM for financial reasoning that jointly models step-level correctness and trajectory-level coherence, producing binary supervision signals for both local and global reasoning quality. To support reliable supervision, we construct a high-quality financial reasoning dataset of 3K trajectories, where step- and trajectory-level labels are automatically derived from multi-source reward signals, including Monte Carlo rollouts, LLM-based evaluation, and explicit financial knowledge verification. Fin-PRM defines a unified ranking score that integrates step- and trajectory-level rewards, enabling consistent use across multiple settings. We evaluate Fin-PRM in three scenarios: (1) offline trajectory selection for supervised fine-tuning, (2) reward-guided Best-of-N inference for test-time scaling, and (3) process-aware reward shaping for reinforcement learning. Experiments on financial reasoning benchmarks, including CFLUE and FinQA, show that Fin-PRM consistently outperforms general-purpose PRMs and strong baselines. Our project resources will be available at https://github.com/aliyun/qwen-dianjin.
    Natural Language ProcessingLanguage modelsNatural Language ProcessingOtherNatural Language ProcessingResources and evaluation
  479. #5073

    LoCo: Low-Rank Compositional Rotation Fine-Tuning

    An Nguyen, Jaesik Choi, Anh Tong
    Parameter-efficient fine-tuning (PEFT) has emerged as an critical technique for adapting large-scale foundation models across natural language processing and computer vision. While existing methods such as low-rank adaptations achieve parameter efficiency via low-rank weight updates, they are limited in their ability to preserve the geometric structure of pretrained representations. We introduce Low-rank Compositional Orthogonal fine-tuning (LoCO), a novel PEFT method that constructs orthogonal transformations through low-rank skew-symmetric matrices and compositional rotation chains. We propose an approximation scheme that enables fully parallel computation of compositional rotations, making the approach practical for high-dimensional feature spaces. Our method maintains low computational complexity while maintaining orthogonality with controlled approximation error. We validate LoCO across diverse domains, including diffusion transformer fine-tuning, vision transformer adaptation, and language model adaptation. Our approach demonstrates superior or competitive performance compared to both existing orthogonal and non-orthogonal baselines.
    Machine LearningGenerative modelsMachine LearningLearnware/model reuse/transfer learningNatural Language ProcessingLanguage generation
  480. #5074

    SCOPE: Safety-Constrained Online Preview Enforcement for Efficient Encirclement in Multi-UAV Pursuit-Evasion

    Cheng Chen, Weiwei Yuan, Xiaozhen Lu, Yicong Li, Jiale Zhang
    Unmanned aerial vehicle swarms in pursuit-evasion requires encirclement efficiency while maintaining safety constraints, facing a critical safety-efficiency trade-off. Existing safe multi-agent reinforcement learning (MARL) methods often yield either unsafe task policies or conservative policies. This is challenging since forward feasible safety under dense interactions requires online safety preview. To address this issue, we propose Safety-Constrained Online Preview Enforcement (SCOPE), a MARL-based algorithm that balances encirclement efficiency and safety by short-horizon preview and safety enforcement. SCOPE learns an online safety-preview dynamics model that rolls out future trajectories to inform encirclement decisions checking. By proposing preview-fused actor-critic, SCOPE uses short-horizon previews for efficient encirclement and less unsafe behavior. Hierarchical safety enforcement performs safety look-ahead and online action correction to maintain forward feasible safety. Experiments show that SCOPE better balances encirclement efficiency and safety than safe MARL baselines, maintaining average encirclement time while reducing the average agent cost by 65.6%. Code could be found at https://github.com/98177qdn/SCOPE.
    Agent-based and Multi-agent SystemsMulti-agent learningAI Ethics, Trust, FairnesSafety and robustnessMachine LearningMultiagent Reinforcement LearningRoboticsMulti-robot systems
  481. #5088

    CasUGC: Aligning User-Generated Comment Evolution with Cascade Dynamics for Popularity Prediction

    Yifan Meng, Xigang Sun, Anran Zhang, Jiaqi Jiang, Jiahui Jin
    Accurately predicting the popularity of information cascades facilitates the development of social media network applications. During the cascade propagation process, user-generated comments continually evolve, thereby substantially affecting overall information popularity.
    However, existing methods primarily focus on learning structural–temporal cascade dynamics while neglecting the modeling of user-generated comment evolution, leading to suboptimal predictions.
    In this paper, we propose a novel framework, CasUGC, that aligns Cascade dynamics with User-Generated Comment evolution for popularity prediction.
    Specifically, we develop a dual-granularity alignment strategy that bridges the representation gap between comment semantics and structural–temporal dynamics at both the user and cascade levels. Building on this alignment, we further design a dynamic cascade feature generation module to produce temporally enhanced cascade embeddings.
    To further address the inherent uncertainties in both alignment and prediction, we introduce two diffusion-based components: a cascade-level diffusion network that learns a latent distribution mapping between semantic and structural–temporal representations, and a second diffusion process that captures the stochasticity of propagation dynamics for robust prediction.
    Extensive experiments conducted on four real-world social media datasets demonstrate the superiority of CasUGC over state-of-the-art methods.
    Multidisciplinary Topics and ApplicationsWeb and social networks
  482. #5101

    Proactive Federated Unlearning: Sensitivity-Guided Sparse Adaptation on Key Layers

    Jinshan Lai, Fengchun Zhang, Yunyuan Wang, Dongfen Li, Yang Zhang, Xiong Li, Ruijin Wang
    Driven by privacy regulations, federated unlearning (FU) aims to remove the influence of specific clients or samples from a trained federated model, approximating the behavior of retraining from scratch without the target data. However, existing FU methods are largely reactive: retraining-based solutions are accurate but prohibitively expensive, while parameter-based approaches are more efficient yet may cause irreversible knowledge damage and catastrophic forgetting. We introduce PFU-SKLA, a proactive FU framework that endows models with built-in forgettability. During pretraining, we perform Orthogonality-guided Representation Disentanglement (ORD) to learn a robust, disentangled feature space that reduces cross-client/class interference. Additionally, we use Dynamic Sensitivity-based Key Layer Identification (DS-KLI) to identify sensitive layers that hold target knowledge. During unlearning, we propose a sparse, lightweight adaptation strategy that precisely erases target knowledge by inserting sparse adapters into the identified critical layers while preserving non-target knowledge by freezing the backbone. Extensive experiments across multiple datasets and three standard unlearning settings show that PFU-SKLA consistently approaches retraining-from-scratch performance, while substantially reducing communication and computation costs compared to state-of-the-art (SOTA) FU methods.
    Machine LearningFederated learning
  483. #5110

    Listen and Count: Expanding the Frontier of Zero-Shot Object Counting to Sound-centric Counting

    Mingjie Wang, Zhuohang Li, Qi Zhang, Jun Zhou, Zili Yi, Minglun Gong
    While class-agnostic object counting has recently evolved from image-exemplar to language-guided paradigms, existing methods are limited by text polysemy and the lack of prompts in audio-sensing scenarios. To overcome these challenges, we introduce a sound-centric counting paradigm, enabling models to "listen and count" using the vivid and discriminative signatures of sound cues. We propose S2ICount, which achieves seamless, fine-grained audio-visual alignment for sound-guided counting. At its core, SoundMamba leverages linear-complexity long-range semantic modelling to align cross-modal features, while the Sound Calibration Module designs a circular scanning mechanism to progressively reconcile modality discrepancies using multi-scale cues from a Stable Diffusion backbone. To enhance saliency awareness, a Multi-tier Hybrid Optimization strategy enforces consistency across magnitude, semantics, and distribution. In addition, we introduce SoundCount, the first large-scale sound-oriented counting dataset, comprising over 2,000 scenes and 1,650 audio sources across 66 object categories. Extensive experiments demonstrate that S2ICount achieves state-of-the-art performance, validating the effectiveness of sound as a robust guidance modality for counting. The dataset, demo and implementation code are made publicly accessible at https://github.com/ZSTU-CV-Lab/S2ICount.
    Computer VisionApplications and SystemsComputer VisionMultimodal learningComputer VisionScene analysis and understanding
  484. #5111

    Towards Efficient and Effective Unimodal Trajectory Representation Learning: A Simple Yet Powerful Approach

    Shaoxuan Gu, Xingyu Zhao, Dongliang Chen, Yuan Cao, Yanwei Yu
    Trajectory representation learning transforms trajectory data into low-dimensional embeddings for downstream analytics. Although trajectory data inherently contains rich spatiotemporal information that remains to be more deeply explored, recent approaches have increasingly favored integrating external multimodal information rather than deeply mining internal trajectory patterns, resulting in the underutilization of the significant potential inherent in spatiotemporal features. To address these issues, we propose SimTRL, a novel unimodal framework specifically designed for road network-based trajectory data. Specifically, we design a bi-directional mamba-traj encoder that leverages the linear computational efficiency of State Space Models to capture global non-causal topological dependencies, overcoming the limitations of traditional causal models. Furthermore, we introduce a context-aware time encoder mechanism to explicitly model dynamic time embeddings. Extensive experiments conducted on two real-world datasets demonstrate that our model achieves average performance gains of 5.19%, 13.91%, 7.2%, and 2.69% across four tasks, respectively. Compared to multimodal models, our approach achieves an average improvement of 9.06% across all evaluation metrics, with a pre-training speedup of 11.2×. The source code of our model is available at https://github.com/sxgu1/SimTRL.
    Data MiningMining spatial and/or temporal data
  485. #5112

    Empowering Precise Embodied Agents with Executable Analytic Concepts as Semantic-Physical Blueprints

    Mingyang Sun, Jiude Wei, Qichen He, Donglin Wang, Cewu Lu, Jianhua Sun
    A core challenge for embodied agents is the ``semantic-to-physical gap"—the difficulty of mapping symbolic reasoning to precise execution. While Vision-Language Models (VLMs) enhance agent task planning, they often fail in problem classes requiring accurate alignment between functional geometry and physical constraints, such as articulated object manipulation or precision assembly. To address these challenges, we propose GRACE, an agent framework that adopts Executable Analytic Concepts (EAC) as a core knowledge annotation paradigm for object understanding. Under this paradigm, the agent does not merely perceive objects as unstructured visual data; instead, it interprets and annotates them as ``Semantic-Physical Blueprints", which provide a structured cognitive representation that encodes geometric primitives, mechanical affordances, and manipulation constraints. Central to our framework is a policy scaffolding mechanism, which allows the agent to dynamically ground VLM-based insights into these instantiated blueprints. This enables the agent to model and resolve complex decision-making tasks involving parameterized object descriptions and constrained motion planning. Experimental results demonstrate that agents utilizing this paradigm achieve high robustness in complex manipulation tasks. Furthermore, we demonstrate that GRACE supports automated concept discovery, offering a scalable and modular solution for high-precision embodied intelligence.
    Agent-based and Multi-agent SystemsAgent theories and modelsAgent-based and Multi-agent SystemsApplicationsAgent-based and Multi-agent SystemsEngineering methods, platforms, languages and toolsRoboticsCognitive robotics
  486. #5144

    Preference-Guided Multi-Policy Optimization for Flexible Job Shop Scheduling

    Inguk Choi, Woo-Jin Shin, Sang-Hyun Cho, Hyun-Jung Kim
    Under the shift toward Industry 4.0, mass-customized manufacturing systems have introduced complex scheduling problems, such as the Flexible Job Shop Scheduling Problem (FJSSP). Recent Deep Reinforcement Learning (DRL)-based heuristics have shown promise, yet existing methods often suffer from two key limitations: they typically rely on single-policy optimization, which limits exploration, and on imprecise reward functions, which fail to accurately reflect decision quality. To address these challenges simultaneously, we propose PGMPO (Preference-Guided Multi-Policy Optimization), a novel learning framework consisting of (1) a simple but effective multi-policy modeling approach that allows a single network to represent multiple decision-makers, and (2) a preference-driven model optimization method that effectively guides policies to learn diverse and specialized problem-solving strategies without the need for explicit reward functions. Experimental results demonstrate that PGMPO substantially boosts the performance of existing neural solvers across several benchmarks.
    Planning and SchedulingLearning in planning and schedulingAIPlanning and SchedulingPlanning and SchedulingSchedulingPlanning and SchedulingSearch in planning and scheduling
  487. #5157

    Towards Fair Graph Learning Without Demographic Supervision

    Zichong Wang, Zhipeng Yin, Mo Sha, Xiaofeng Gao, Xiaoli Li, Wenbin Zhang
    Graph Neural Networks (GNNs) have demonstrated strong predictive performance across a wide range of applications. However, their increasing deployment has raised critical fairness concerns, as these models can inherit and amplify existing biases. Most existing fairness approaches rely on explicit demographic information, either directly available or inferred, to measure and mitigate bias. In real-world settings, however, such information is often unavailable or legally prohibited to infer due to privacy concerns, legal restrictions, or regulatory constraints, which substantially limits the applicability of these methods. To address this challenge, we propose Demographic-Independent Fair Graph Learning (DIFGL), a novel framework for fair graph learning without demographic supervision. DIFGL mitigates group unfairness by minimizing disparities in individual treatment across implicitly identified subgroups, thereby enforcing fairness without requiring explicit demographic information. Extensive experiments on benchmark datasets demonstrate that DIFGL achieves significant improvements in fairness while maintaining competitive predictive performance.
    AI Ethics, Trust, FairnesAI and law, governance, regulationAI Ethics, Trust, FairnesEthical, legal and societal issuesAI Ethics, Trust, FairnesFairness and diversity
  488. #5195

    SeGO: Sensitivity-Aware Golden Optimization for Large-Scale VLM Quantization

    Tianqi Zhao, Xinrui Cheng, Yang Su, Weiyi Lu, Zhaodong Zhang, Zhongjie Wang, Ruihan Hu
    The deployment of Vision-Language Models (VLMs) faces memory and computational bottlenecks because of the massive parameters and intensive computations. While Post-Training Quantization (PTQ) can reduce these costs, existing methods often overlook the heterogeneity of multimodal input when applied to VLMs, leading to quantization accuracy degradation. In this paper, we categorize the Transformer linear layers into Expansion Space (responsible for feature expansion) and Projection Space (responsible for feature aggregation) and reveal a cross-modal structural sensitivity asymmetry in VLMs: The Expansion Space is more sensitive to quantization accuracy than the Projection Space in both LLM and ViT encoders. Based on this, we propose SeGO, a unified structural sensitivity-aware sparse optimization framework. First, it constructs search interval for scaling factors using channel-wise statistics of LLM and ViT weights to exclude invalid solutions. Then it applies scaling protection only to the sensitive Expansion Space while directly quantizing the robust Projection Space. Moreover, SeGO uses a Golden-Section Solver to speedup the search of optimal scaling factors. Experiments show that SeGO achieves the balance among model parameter amount, quantization accuracy and scaling factors' search efficiency on InternVL2 and LLaVA series. Under the W3/W4A16 configuration, SeGO improves the accuracy of 7B-26B models by 3.8%-17.3% and enhances calibration efficiency by dozens of times.
    Computer VisionVision, language and reasoningMachine LearningDeep learning architecturesMachine LearningOptimization
  489. #5206

    Beyond Client Clustering: Fine-Grained Preference Alignment in Federated RLHF via Self-Evolving Routing

    Ke Wang, Shaojing Fu, Yuchuan Luo, Guilin Deng, Silong Chen, Zheng Yuan, Lin Liu
    Federated Reinforcement Learning from Human Feedback (RLHF) enables the collaborative alignment of Large Language Models (LLMs) while preserving privacy, yet it faces critical bottlenecks arising from data heterogeneity. Existing approaches typically rely on rigid client-level clustering, which overlooks intra-client heterogeneity and fails to adapt to the multifaceted needs of individual users. To address this, we propose FedPrism, a novel framework that shifts alignment granularity from the coarse client level to the precise instance level. Similar to an optical prism dispersing mixed light into distinct spectral components, FedPrism decomposes complex, intra-client heterogeneous data streams by dynamically routing individual samples to specialized experts based on semantic features. Crucially, we devise a Posterior-Guided Performance Distillation (PGPD) mechanism that leverages experts' actual training loss as self-supervision to autonomously refine routing policies without explicit labels. Extensive experiments show that FedPrism not only establishes new state-of-the-art results but also effectively mitigates the negative transfer prevalent in non-IID settings, ensuring superior alignment fidelity.
    Machine LearningFederated learningMultidisciplinary Topics and ApplicationsSecurity and privacy
  490. #5254

    Walking on Ice: Adaptive Gait Control of Humanoid Robots Based on Visual Prediction and Proprioceptive Estimation for Variable Friction Environments

    Yixuan Shen, Rongqiang Zhao, Ruonan Li, Jie Liu
    Humanoid robots have been increasingly deployed in real-world environments such as intelligent manufacturing, healthcare, and agriculture, from indoor spaces to outdoor terrains where friction conditions vary significantly. However, existing approaches such as domain randomization or employing estimation struggle to handle low-friction gait control and fail to address variable friction conditions, making it difficult to ensure control robustness. To address these challenges, we propose Walking on Ice (WoI), a friction-aware locomotion framework that combines proactive prediction with reactive adaptation for varying surface friction. We introduce a VLM-based friction predictor that enables the policy to take anticipatory actions before friction changing, thereby addressing variable-friction scenarios. Additionally, we design a friction estimator combined with a joint history encoder that perceives proprioceptive states to handle low-friction surfaces. Experimental results demonstrate that WoI improves success rate by 8.5% and reduces slip rate by 5.15% over the state-of-the-art method Denoising World Model Learning (DWL). Particularly, in extreme cases, WoI achieves 98.3% success rate with 51.0% higher survival rate, 3.20% lower slip rate, and 46.5% lower stuck rate than DWL, demonstrating stable and efficient gait adaptation. Our work provides a robust and efficient gait control framework for stable locomotion in real-world scenarios with dynamic friction conditions.
    RoboticsBehavior and controlRoboticsLearning in roboticsRoboticsMotion and path planning
  491. #5258

    Cross-Paradigm Graph Backdoor Attacks with Promptable Subgraph Triggers

    Dongyi Liu, Jiangtong Li
    Graph Neural Networks (GNNs) are vulnerable to backdoor attacks, where adversaries implant malicious triggers to manipulate model predictions.
    Existing trigger generators are often simplistic in structure and overly reliant on specific features, confining them to a single graph learning paradigm, such as graph supervised learning, graph contrastive learning, or graph prompt learning.
    Such paradigm-specific designs lead to poor transferability across different learning frameworks, limiting attack success rates in general testing scenarios.
    To bridge this gap, we propose Cross-Paradigm Graph Backdoor Attacks with Promptable Subgraph Triggers (CP-GBA), which employs Graph Prompt Learning (GPL) to synthesize transferable subgraph triggers. Specifically, we first distill a compact yet expressive trigger set into a queryable repository, jointly optimizing for class-awareness, feature richness, and structural fidelity. Furthermore, we pioneer the theoretical exploration of GPL transferability under prompt-based objectives, ensuring robust generalization to diverse and unseen test-time paradigms.
    Extensive experiments across multiple real-world datasets and defense scenarios show that CP-GBA achieves state-of-the-art attack success rates. Code is available at https://github.com/novdream/CP-GBA.
    AI Ethics, Trust, FairnesSafety and robustnessData MiningMining graphs
  492. #5259

    Graph Anomaly Detection via Feature Selection with Local Topological Residuals

    Yazheng Zhao, Nannan Wu, Haoran Yin, Yiming Zhao
    Graph anomaly detection (GAD) aims to identify nodes that exhibit significant deviations from expected structural or attribute patterns, and has garnered increasing attention in recent years. Recent approaches for GAD have predominantly focused on local inconsistency mining, which refers to the difficulty of establishing high similarity relationships between anomalous nodes and their neighbors. While local inconsistency mining requires the incorporation of topological information, the use of Graph Neural Networks (GNNs) to introduce such information tends to homogenize connected nodes, thereby causing the loss of local anomalous signals. To address this challenge, we propose LTRGAD, a two-stage GAD framework that performs feature selection based on local feature-topological residuals (LTR). By processing features separately, LTRGAD effectively introduces topological information while preserving the original local anomalous patterns, enabling more accurate local anomaly detection. Subsequently, global anomaly detection is conducted on the entire graph by leveraging the results from the local detection phase. Extensive experiments on seven benchmark datasets demonstrate the effectiveness of the proposed LTRGAD framework.
    Data MiningAnomaly/outlier detectionData MiningMining graphs
  493. #5269

    Complex-Valued Residual Diffusion with GRPO for Pansharpening

    Zhiyuan Wang, Dong Li, Kaixin Fu, Chunhui Luo, Xueyang Fu
    Pansharpening aims to fuse a high-resolution panchromatic (PAN) image with a low-resolution multispectral (MS) image to generate a highresolution multispectral output that preserves both fine spatial details and faithful spectral responses. However, enhancing spatial textures without introducing spectral distortion remains challenging due to the inherent spectral–spatial trade-off. To address this issue, we propose a two-stage pansharpening framework that tackles the problem from both modeling and optimization perspectives. In the first stage, we formulate pansharpening as spectral-prior conditioned residual diffusion, where a stable spectral base constrains the generation process and allows the diffusion model to focus on PAN-guided high-frequency details, leading to improved training stability and reduced spectral drift. To better capture the coupled spatial and frequency characteristics of PAN–MS fusion, we adopt a complex-valued denoising network to enhance spectral–spatial interaction modeling. In the second stage, to bridge the gap between distortionoriented training objectives and practical quality preferences, we introduce Group Relative Preference Optimization (GRPO) to fine-tune the diffusion model using multi-objective preference signals, explicitly balancing spectral fidelity, texture sharpness, and perceptual quality. Extensive experiments on standard benchmarks demonstrate that the proposed method achieves a more favorable trade-off between fidelity and perceptual quality compared to competitive end-to-end and diffusionbased approaches.
    Computer VisionLow-level Vision
  494. #5276

    A Truthful Multiunit Profit-Optimal Mechanism for Synthesizing Social Laws

    Jun Wu, Jian Huang, Chongjun Wang
    Social Law Synthesis (SLS) in strategic environments is a novel multi-unit mechanism design problem, spanning modeling to computational challenges. We derive a method to specify the problem succinctly, reduce payment determination to allocation determination, and design an integer linear programming (ILP)-based algorithm that further reduces allocation to a polynomial-time ILP formulation. This offloads intractability to powerful ILP solvers, yielding a truthful, individually rational, and profit-optimal mechanism.
    Game Theory and Economic ParadigmsAuctions and market-based systemsGame Theory and Economic ParadigmsMechanism design
  495. #5288

    One-stop Multi-modality Image Dehazing-Registration-Fusion Framework via Progressive Task Cooperation

    Jing Li, Peiqi Cao, Bin Yang
    Infrared and visible image fusion (IVF) encounters two challenges: 1) image unregistered, where parallax leads to blurred fusion results, and 2) adverse imaging conditions, which introduces outlier data (such as haze) that significantly degrade image registered and fusion performance. Existing IVF methods fail to account for the joint impact of unregistered and haze factor, often neglecting these challenges or addressing them in isolation. However, unregistered and haze are intrinsically coupled in fusion task. For example, haze can obscure structural details, leading to inaccurate registration and ultimately degrading fusion performance. To address the intrinsic coupling between unregistered and haze, we propose a progressive task-cooperative processing pipeline—dehazing, registration, and fusion—to achieve robust fusion of unregistered images under hazy conditions. To mitigate the challenges posed by architectural complexity and excessive parameter overhead in task progressive learning, we propose a lightweight model approximation paradigm through hierarchical knowledge distillation. The framework employs a stage-wise distillation optimization strategy that synergistically integrates: 1) primary task-specific distillation for modality-aware feature extraction, and 2) progressive task-cooperative distillation for fusion-oriented representation learning, which can improve fusion robustness for unregistered multi-modal inputs in hazy conditions. Extensive experiments demonstrate that our method achieves significantly superior performance compared to the State-Of-The-Art (SOTA) methods.
    Computer VisionLow-level Vision
  496. #5295

    Inductive Knowledge Graph Wave Networks

    Giuseppe Pirrò
    Inductive knowledge graph reasoning (IKGR) requires generalization to entities and relation types not seen during training. We introduce Knowledge Graph Wave Networks (KGWN), an IKGR approach that learns adaptive per-relation wave operators to control relation-dependent propagation, along with velocity normalization ensuring stability at any propagation depth. Experiments demonstrate state-of-the-art results on inductive benchmarks and competitive transductive performance.
    Data MiningKnowledge graphs and knowledge base completionKnowledge Representation and ReasoningSemantic WebMachine LearningRepresentation learning
  497. #5338

    FedUP: Uncertainty-Aware Personalized Federated Learning via Probabilistic Prototypes

    Furui Qi, Weishan Zhang, Lingzhao Meng, Yuru Liu, Zijun Feng, Yuange Liu, Baoyu Zhang, Daobin Luo, Tao Chen
    Prototype-based federated learning enables efficient knowledge sharing by exchanging class prototypes rather than full model parameters. However, heterogeneous client data and limited local samples increase prototype estimation variance, making many client prototypes unreliable. Existing methods usually treat prototypes as deterministic point estimates and cannot quantify their reliability, which may contaminate global prototypes and cause negative transfer. To address these challenges, we propose FedUP, an uncertainty-aware personalized federated learning framework that models prototypes as probability distributions. FedUP captures both aleatoric and epistemic uncertainty on a probabilistic simplex to guide local training and global aggregation. It further uses global probabilistic prototypes as class-conditional priors for feature augmentation, alleviating client-side data sparsity. Finally, FedUP aggregates prototype distributions through reliability-informed barycenters on the statistical manifold, suppressing unreliable contributions while preserving geometric structure. Experiments on natural and medical benchmarks demonstrate that FedUP consistently outperforms state-of-the-art methods.
    Computer VisionBiomedical image analysisComputer VisionRecognition (object detection, categorization)Machine LearningFederated learning
  498. #5340

    Computing Better Approximate Pure Nash Equilibria in Payoff-maximization Potential Games

    Angelo Fanelli
    Potential games are a fundamental class of games in which pure Nash equilibria are guaranteed to exist, yet computing such equilibria is computationally intractable for several natural subclasses of these games. This has led to extensive research on computing approximate pure Nash equilibria. In this paper, we study payoff-maximization potential games, a class that captures many natural optimization settings. For these games, strong theoretical guarantees are known only for restricted subclasses, most notably $P_d$--{\sc Flip} games, which belong to the class of constraint satisfaction games. We show that standard approaches based on unilateral improvement moves can fail to provide meaningful approximation guarantees even for natural extensions of constraint satisfaction games. To overcome this limitation, we propose an algorithmic framework based on coordinated strategy changes by small groups of players that computes an approximate pure Nash equilibrium with a provable guarantee depending on a natural parameter of the game. In the special case of $P_d$–{\sc Flip} games, our framework can be configured to recover existing algorithms, preserving their approximation guarantees and convergence time.
    Game Theory and Economic ParadigmsNoncooperative gamesAIGame Theory and Economic ParadigmsAIConstraint Satisfaction and OptimizationAISearch
  499. #5366

    Gradient-Based Join Ordering

    Tim Schwabe, Maribel Acosta
    Join ordering is the NP-hard problem of selecting the most efficient order in which to evaluate joins (conjunctive, binary operators) in a database query. Because query execution performance critically depends on this choice, join ordering lies at the core of query optimization. Traditional approaches cast this problem as a discrete combinatorial search over binary trees guided by a cost model, but they have trade-offs between effectiveness and efficiency.
    We show that when the cost model is differentiable, query plans can be continuously relaxed into a soft adjacency matrix that represents a superposition of plans. This continuous relaxation, combined with differentiable constraints that enforce plan validity, enables a gradient-based search for low-cost plans within this relaxed space. Using a Graph Neural Network as the cost model, we demonstrate that this gradient-based approach can find comparable and even lower-cost plans compared to traditional discrete search methods on two different graph datasets. Furthermore, we empirically show that the runtime of this approach scales better than discrete search algorithms. We believe this first step towards gradient-based join ordering can lead to more effective and efficient query optimizers in the future.
    Constraint Satisfaction and OptimizationConstraint optimization problemsKnowledge Representation and ReasoningSemantic Web
  500. #5370

    On the Size Complexity and Decidability of First-Order Progression

    Jens Claßen, Daxin Liu
    Progression, the task of updating a knowledge base to reflect action effects, generally requires second-order logic. Identifying first-order special cases, by restricting either the knowledge base or action effects, has long been a central topic in reasoning about actions. It is known that local-effect, normal, and acyclic actions, three increasingly expressive classes, admit first-order progression. However, a systematic analysis of the size of such progressions, crucial for practical applications, has been missing. In this paper, using the framework of Situation Calculus, we show that under reasonable assumptions, first-order progression for these action classes grows only polynomially. Moreover, we show that when the KB belongs to decidable fragments such as two-variable first-order logic or universal theories with constants, the progression remains within the same fragment, ensuring decidability and practical applicability.
    Knowledge Representation and ReasoningComputational complexity of reasoningKnowledge Representation and ReasoningReasoning about actions
  501. #5450

    DoMoE: Domain-Aware Semantic Expert Prediction for Efficient MoE Inference Under Expert Offloading

    Yao Mu, Fahao Chen, Wenbin Zhu, Mengying Zhao, Zhaoyan Shen, Dongxiao Yu
    Mixture-of-Experts (MoE) large language models improve inference efficiency through sparse expert activation, but deployment on resource-constrained devices remains challenging due to the large expert parameter footprint. Expert offloading mitigates this issue by loading experts on demand, yet its effectiveness critically depends on accurate and efficient expert prediction: inaccurate predictions incur redundant expert transfers, while overly expensive predictors negate latency benefits. Existing trajectory-based methods suffer from low accuracy, whereas semantic-based approaches incur prohibitive similarity-matching overhead. We propose DoMoE, a domain-aware MoE inference system that exploits domain locality in inference workloads. DoMoE organizes routing information into domain-specific expert routing tables, restricts semantic matching to domain-relevant tokens and explicitly balances prediction accuracy against prediction overhead. Experiments show that DoMoE achieves a 1.32 X average throughput improvement and a 1.22 X increase in expert hit ratio across multiple MoE models and workloads, enabling efficient and accurate expert routing for practical inference.
    Natural Language ProcessingLanguage modelsPlanning and SchedulingPlanning algorithmsPlanning and SchedulingScheduling
  502. #5452

    Learning with Foresight: Enhancing Neural Routing Policy via Multi-Node Lookahead Prediction

    Xia Jiang, Yaoxin Wu, Yew-Soon Ong, Yingqian Zhang
    Neural policies have shown promise in solving vehicle routing problems due to their reduced reliance on handcrafted heuristics. However, current training paradigms suffer from a fundamental limitation: they primarily focus on next-node prediction for solution construction, resulting in myopic decision-making that undermines long-horizon planning capacity. To this end, we introduce Multi-node Lookahead Prediction (MnLP), a novel training strategy that extends the supervised learning paradigm to predict multiple future nodes simultaneously. We incorporate causal and discardable MnLP modules that operate exclusively during training, facilitating models to anticipate multi-step decisions while preserving inference-time efficiency. By incorporating multi-depth auxiliary supervision into the loss function, MnLP equips neural policies with the ability of long-range contextual understanding. Experimentally, MnLP outperforms existing training methods, improving the generalization capability of neural policies across various problem sizes, distributions, and real-world benchmarks. Moreover, MnLP can be seamlessly integrated into diverse neural architectures without introducing additional inference overhead.
    Planning and SchedulingApplicationsPlanning and SchedulingLearning in planning and schedulingPlanning and SchedulingRouting
  503. #5454

    MFMSRNet: An Interpretable Multi-frequency and Multi-scale Riemannian Network for Motor Imagery Decoding

    Wenhao Rao, Xujie Zhao, Jianhui Zhao, Bo Du, Feixiang Tang
    Motor imagery (MI) electroencephalography (EEG) decoding has benefited from deep learning, yet methods operate in Euclidean space and behave as opaque black boxes, neglecting the intrinsic geometry of functional brain connectivity. EEG connectivity descriptors, such as phase synchrony and covariance matrices, naturally reside on the manifold of symmetric positive definite (SPD) matrices, where Euclidean operations are geometrically inconsistent and hinder interpretability. This work proposes the Multi-frequency and Multi-scale Riemannian Network (MFMSRNet), an interpretable end-to-end geometry-aware framework for MI EEG decoding on the SPD manifold. The method constructs kernelized phase-locking value (KPLV) functional connectivity (FC) matrices to capture nonlinear phase synchrony while ensuring positive definiteness. An attention-based Riemannian fusion mechanism adaptively integrates information across multiple frequency bands in the tangent space. Furthermore, a multi-scale Riemannian network extracts global, hemispheric, and local connectivity patterns via manifold-preserving bilinear mappings and smooth eigenvalue rectification. Extensive experiments indicate that MFMSRNet yields more expressive and interpretable representations for robust MI decoding, offering a promising solution for reliable brain–computer interface applications. The code is available at https://github.com/Raeno-Rao/MFMSRNet.
    Humans and AIApplicationsHumans and AIBrain sciencesHumans and AICognitive modelingMultidisciplinary Topics and ApplicationsBioinformaticsMultidisciplinary Topics and ApplicationsOther
  504. #5456

    Beyond Implicit Constraint: Explicit Low-Rank Structured Subspace Learning for Fast Attributed Graph Clustering

    Yaoming Cai, Song Liu, Zijia Zhang, You Wu, Xiaobo Liu, Yao Ding, Fei Li
    Attributed graph clustering has achieved remarkable success by synergistically integrating topological structures and node attributes. While subspace learning has emerged as a dominant paradigm for node partitioning, most existing methods rely on implicit low-rank constraints, which often fail to capture complex nonlinear manifolds and suffer from prohibitive computational overhead on large-scale graphs. In this paper, we propose ELSS (Explicit Low-rank Structured Subspace learning), a scalable and robust framework that transcends implicit formulations. Specifically, ELSS learns an explicit and nonlinear low-rank subspace within a graph-structured embedding space, effectively uncovering latent cluster structures. To effectively mitigate the pervasive oversmoothing issue, we introduce a homophily-aware adaptive graph filter, which dynamically calibrates smoothing intensity to preserve discriminative ego-information. Furthermore, to ensure linear scalability, we develop a PageRank-guided structural sampling strategy for anchor-based approximation, which identifies pivotal landmarks based on their global topological prestige. Theoretical analysis guarantees that ELSS effectively mitigates spectral collapse while maintaining a linear complexity. Extensive experiments on diverse benchmarks demonstrate that ELSS consistently delivers superior clustering accuracy over state-of-the-art methods.
    Machine LearningClusteringMachine LearningGeometric learningMachine LearningKernel methodsMachine LearningUnsupervised learning
  505. #5508

    SRJudge: Empowering Large Language Models with Selective Reasoning for Fine-Grained Knowledge Concept Tagging

    Zhiwei Yang, Jiahua Yang, Huiru Lin, Xing Chen, Quanlong Guan
    Knowledge concept tagging aims to assign specific concept or topic labels to educational content, which is essential for both educators and learners in traditional and online teaching practices. Recent work has explored large language models (LLMs) for this task, achieving promising performance. However, LLMs still struggle to select the correct concept from a large-scale candidate set due to the high dimensionality of the decision space. In this paper, we propose a novel three-stage Select-Reason-Judge (SRJudge) framework, which empowers LLMs with selective reasoning capability for fine-grained knowledge concept tagging. Specifically, the Selector in Stage 1 first narrows the candidate concepts to a top-K shortlist by fine-tuning a small language model (SLM), e.g., BERT, since the top-K predictions hit the correct concept in most cases, thereby reducing the decision space of correct candidates. Next, the Stage 2 Reasoner employs a lightweight LLM for refined reasoning over the shortlisted candidates. It further integrates an improved reinforcement learning strategy with a dynamic task-specific reward function and a pruning mechanism to better align with human reasoning preferences. Finally, a larger LLM acts as a judger that evaluates the overall rationality of the reasoning process and its explanations to determine the final output. In addition, we construct two high-quality datasets for further validation, i.e., the biology dataset S_Bio and the physics dataset S_Phy. Experimental results demonstrate that our method consistently outperforms state-of-the-art baselines across benchmark datasets, verifying its effectiveness and superiority. Resources are available at: https://github.com/Nicozwy/SRJudge.
    Natural Language ProcessingLanguage modelsNatural Language ProcessingText classification
  506. #5514

    Benchmarking and Enhancing Relational Diagrams Reasoning for Multimodal Large Language Models

    Tianyu Hong, Peng Wang, Wenjun Ke, Yao He, Hongao Wang
    Multimodal Large Language Models (MLLMs) have achieved strong performance on a wide range of vision--language tasks. However, their capabilities remain unclear in relational-diagram (RD) reasoning, where correct answers must satisfy diagram-defined constraints such as directed dependencies, branching conditions, and prerequisite relations. RDs, including hierarchical diagrams, fishbone diagrams, and flowcharts, require models to not only retrieve nodes and relations, but also maintain valid reasoning paths over multi-step dependency chains. In this paper, we introduce RD-Bench, a large-scale benchmark of verified diagram--question pairs covering three diagram categories and four difficulty levels. RD-Bench evaluates hierarchical localization, branch-condition selection, constraint-guided traversal, step counting, and prerequisite identification. Experiments on representative open-source and proprietary MLLMs reveal a consistent limitation: models perform much better on single-step reasoning than on multi-step dependency reasoning, especially for complex flowcharts with branching logic and long-range prerequisites. To reduce this gap, we propose RD-MCTS, a Monte Carlo Tree Search framework that incorporates diagram-derived structural priors, constraint-consistent state transitions, and inference-time step-level process rewards. RD-MCTS guides search toward dependency-relevant nodes and valid next steps, improving constraint-consistent reasoning on high-difficulty questions.
    Computer VisionMultimodal learningNatural Language ProcessingResources and evaluation
  507. #5522

    Will My Favorite Chases Terminate If Evaluating Conjunctive Queries Does? One Does Not Simply Decide This

    Lucas Larroque, Quentin Manière
    Existential rules are a prominent formalism to enrich a database with knowledge from the domain of interest, but make even basic reasoning tasks on the resulting knowledge base undecidable. To circumvent this, several classes of rules offering various useful properties have been identified. One such class, for instance, contains all sets of rules on which the chase algorithm always terminates, which guarantees the existence of a finite universal model. However, these classes are often abstract rather than concrete: it may be undecidable to check whether a given set of rules belongs to them. Given that the most studied classes of existential rules are designed for reasoning on databases, thus ensuring decidable conjunctive query entailment, we ask: Within a class that supports decidable query entailment, do the usual abstract classes become concrete? We answer in the negative for classes based upon the termination of all classical chase variants and for the bts class.
    Knowledge Representation and ReasoningComputational complexity of reasoningKnowledge Representation and ReasoningDescription logics and ontologiesKnowledge Representation and ReasoningKnowledge representation languages
  508. #5529

    Dual Branch Mutual Teaching for Long-Tailed Partial Label Learning

    Xiangyu Ren, Mingxuan Xia, Guangcheng Zhu, Gengyu Lyu, Haobo Wang, Peng Lu
    In Partial Label Learning (PLL), each instance is associated with a candidate label set, with exactly one label being true. While most studies implicitly assume balanced class distributions, real-world data often exhibit severe class imbalance distributions, leading to the Long-Tailed Partial Label Learning (LT-PLL) problem. In response, prevailing strategies typically manifest the label space to retrieve more tail samples for improved pseudo-label accuracy. However, hard-to-distinguish tail samples can significantly hinder representation learning and, in turn, affect the quality of pseudo-labels. To address this, we propose a novel Distribution-aware Dual-branch Contrastive Learning framework, DEACON, that decouples representation of head and tail labels via a dual-branch mutual-teaching design, enabling disambiguation across different shot-level groups with tailored representations. DEACON comprises two modules: (i)-a instance-balanced branch trained with a vanilla contrastive objective; (ii)-a label-balanced branch trained with a novel reweighted loss that adjusts the contrastive strength through the conditioned von-Mises Fisher density. The two branches mutually teach each other using their pseudo-labels, and we ensemble their predictions to improve overall performance. Extensive experiments on various benchmarks show that our method consistently outperforms state-of-the-art baselines.Code and appendix can be found in https://github.com/YukiNozzzz/DEACON.
    Machine LearningWeakly supervised learning
  509. #5533

    Optimality-preserving Logic-Based Benders Decomposition of Answer Set Programs

    Carmine Dodaro, Antonio Ielo, Marco Maratea, Cinzia Marte, Alice Tarzariol
    Bender Decomposition is a well-known solving technique in Operation Research that decomposes a problem into a master and a subproblem, which interact via “cuts". This technique has been extended to Logic-Based Bender Decomposition (LBBD), solving problems specified by logic-based languages and enabling a wider applicability. However, while Bender Decomposition guarantees optimality, LBBD does not: this property is problem-specific and depends on the defined decomposition and cuts.
    In this paper, we present a theoretical analysis of the conditions under which LBBD, including cuts, can preserve optimality in the context of Answer Set Programming (ASP), a prominent logic-based language in the field of Artificial Intelligence.
    We also introduce a general-purpose algorithm that employs both minimal unsatisfiable subsets and minimal correction subsets to define cuts in a fully automated, problem-independent way. The algorithm preserves the optimality guarantees of Bender Decomposition.
    An empirical evaluation on real-world scheduling instances shows that our approach can find significantly more solutions, and more optimal ones, compared to a standard direct ASP encoding, while also consistently reducing the execution time.
    Knowledge Representation and ReasoningLogic programmingKnowledge Representation and ReasoningNon-monotonic reasoning
  510. #5548

    One Flow Fits All! A Scale-Aware Generative Framework for Diverse Data

    Hubin Cao, Jun Ma, Yusupu Ainiwaer, Hanquan Zhang, Yanjun Qin, Zixuan Wang, Xiaoming Tao
    Real-world systems increasingly require coherent reasoning and generation over diverse data modalities simultaneously. Current generative frameworks rely on complex, multi-stage training, resulting in low efficiency due to iterative inference and high computational cost. They also struggle with unified multimodal representation, failing to balance fine-grained details with global structures and long-term dependencies, which limits generative quality and practical usability. To overcome these limitations, we introduce the Inverse Heat Mean Flow (IHMF), a general-purpose solver that is compatible with a wide range of model backbones. Without requiring pre-training, IHMF directly learns an average velocity field through an inverse heat formul ation. By exploiting the inherent scale-space properties of the inverse heat process, IHMF explicitly decomposes multi-scale complexities, thereby simplifying trajectory learning and enabling adaptive topological alignment. Extensive experiments show that IHMF achieves a competitive performance with state-of-the-art methods, providing a robust and unified mathematical framework for various generative tasks. The code is publicly available at https://github.com/CHBonline/IHMF.
    Computer VisionImage and video synthesis and generation
  511. #5554

    DASFL: Dynamic Adaptive Split Federated Learning for Heterogeneous Clients

    Aijing Li, Yawen Li, Guanhua Ye, Dandan Liu, Tong Zhao, Zeli Guan
    Split Federated Learning (SFL) has emerged as a pivotal paradigm for privacy-preserving distributed training on resource-constrained edge devices by partitioning neural networks between clients and a server. A critical design choice in SFL is the split layer, which determines the computation distribution and the semantic level of smashed data, directly impacting communication overhead, training latency, and model accuracy.
    Existing SFL methods largely ignore the time-varying nature of edge resources, relying on static resource profiles that lead to suboptimal efficiency and compromise model convergence.
    In this paper, we propose Dynamic Adaptive Split Federated Learning (DASFL), a unified framework that jointly addresses dynamic resource heterogeneity and non-IID data distributions.
    We employ a resource-aware dynamic split layer selection strategy that enables each client to minimize per-round latency by adapting to instantaneous local conditions. We also design an adaptive masked aggregation that robustly synchronizes client updates under split-induced structural heterogeneity.
    Extensive experiments on multiple benchmark models and datasets, conducted under heterogeneous, time-varying resource conditions and non-IID data distributions, demonstrate that DASFL achieves a superior accuracy-efficiency trade-off and faster convergence compared to state-of-the-art SFL baselines.
    Machine LearningFederated learning
  512. #5574

    MELLA: Bridging Linguistic Capability and Cultural Groundedness for Low-Resource Language MLLMs

    Yufei Gao, Jiaying Fei, Nuo Chen, Ruirui Chen, Guohang Yan, Yunshi Lan, Botian Shi
    Multimodal Large Language Models (MLLMs) perform strongly in high-resource languages, yet often produce fluent but culturally "thin" descriptions in low-resource settings. We argue that this failure is not merely a linguistic limitation: culture-specific visual knowledge depends on native visual-textual alignments that translation-centric pipelines rarely provide.
    We present MELLA, a multimodal dataset across eight low-resource languages, designed to jointly support linguistic fluency and cultural groundedness. MELLA uses a dual-source strategy that combines native web image-alt-text pairs for culture-grounded supervision with generated-and-translated image descriptions for linguistically rich supervision, explicitly separating two learning signals often conflated in multilingual multimodal data.
    Through controlled diagnostic fine-tuning on multiple MLLM backbones, we show that MELLA mitigates cultural hallucination by helping models recognize and articulate culturally specific entities overlooked by translation-based adaptation. Our findings highlight data alignment, rather than model modification alone, as a key path toward culturally grounded multimodal understanding in low-resource languages.
    Computer VisionVision, language and reasoning
  513. #5575

    Diffract and Conquer: Hyperspectral Imaging from Any RGB Camera via Optical Encoding and Learning

    Alexey Pronin, Daniil Vladimirov, Andrei Korepanov, Artem Muzyka, Andrey Makarov, Sofya Podtikhova, Egor Ershov, Andrey Rastorguev, Roman Skidanov, Artem Nikonorov
    Hyperspectral imaging (HSI) provides detailed spectral information but is often impractical due to high cost, size, and hardware complexity. While learning-based methods attempt to recover hyperspectral images from RGB sensors, they are constrained by limited spectral measurements and noise. We propose a unified optical–computational approach that converts any standard RGB camera into a snapshot hyperspectral imaging system. Our method introduces a diffractive optical adapter that replaces the conventional lens with a diffractive lens array optimized for spectral encoding. To reconstruct hyperspectral images from the resulting measurements, we design a neural network specialized for diffractively encoded RGB data, capable of compensating for optical distortions and recovering high-quality spectra. The proposed system achieves up to 10 dB improvement in PSNR over RGB-based hyperspectral reconstruction and enhances the performance of existing state-of-theart models by up to 6 dB when used with the proposed adapter. Our results demonstrate that diffractive optical encoding combined with learned reconstruction enables practical and scalable hyperspectral imaging using commodity RGB cameras.
    Computer VisionApplications and SystemsComputer VisionComputational photographyComputer VisionLow-level Vision
  514. #5583

    About Time: Model-Free Reinforcement Learning with Timed Reward Machines

    Rajarshi Roy, Anirban Majumdar, Ritam Raha, David Parker, Marta Kwiatkowska
    Reward specification plays a central role in reinforcement learning (RL), guiding the agent’s behavior. To express non-Markovian rewards, formalisms such as reward machines have been introduced to capture dependencies on histories. However, traditional reward machines lack the ability to model precise timing constraints, limiting their use in time-sensitive applications. In this paper, we propose timed reward machines (TRMs), which are an extension of reward machines that incorporate timing constraints into the reward structure. TRMs enable more expressive specifications with tunable reward logic, for example, imposing costs for delays and granting rewards for timely actions. We study model-free RL frameworks (i.e., tabular Q-learning) for learning optimal policies with TRMs under digital and real-time semantics. Our algorithms integrate the TRM into learning via abstractions of timed automata and employ counterfactual-imagining heuristics that exploit the TRM's structure to improve search. Experimentally, we demonstrate that our algorithm learns policies that achieve high rewards while satisfying the timing constraints specified by the TRM on popular RL benchmarks.
    Agent-based and Multi-agent SystemsFormal verification, validation and synthesisKnowledge Representation and ReasoningKnowledge representation languagesPlanning and SchedulingLearning in planning and scheduling
  515. #5584

    MA-RWG: A Multi-Agent Framework for Thematically Structuring and Generation of Related Work

    Zhuang Liu, Jian Liu, Chun Kang, Chenbin Zhang, Rui Li, Fanhu Zeng, Yong Dai, Lei Sha
    AI-driven survey generation has advanced rapidly, yet related work generation (RWG) remains relatively underexplored. Unlike surveys that provide broad literature overviews, RWG synthesizes prior studies for a single focal paper, requiring contextual fit, cross-paper comparison, and accurate attribution. To address this gap, we propose MA-RWG, a fully automated multi-agent framework that generates polished related work sections from only a title and abstract. MA-RWG first retrieves high-quality candidate papers through semantic retrieval, optionally enhanced with a diversity-aware term. It then coordinates four specialized agents for summarization, organization, integration, and fact checking, enabling DAG-based taxonomy construction, feedback-guided refinement, and dual-model verification. For evaluation, we introduce a dedicated benchmark for paper-specific related work generation, covering generation quality, citation quality, and claim-level semantic similarity. Experimental results show that MA-RWG outperforms RAG-based baselines and survey-oriented agentic methods on the RWG task. Further ablation and cross-domain experiments demonstrate the soundness and robustness of the proposed framework.
    Natural Language ProcessingLanguage generationAgent-based and Multi-agent SystemsApplicationsNatural Language ProcessingSummarizationNatural Language ProcessingInformation retrieval and text mining
  516. #5596

    Learning Hyperspherical Time–Frequency Representations for Time-Series Out-of-Distribution Detection

    Willian T. Lunardi, Samridha Shrestha, Martin Andreoni
    Out-of-distribution (OOD) detection for time-series data remains comparatively underexplored compared to vision and language, with a limited principled understanding of how supervised time-series representations can be leveraged for reliable detection under distributional shifts. This work formulates time-series OOD detection as representation learning with hyperspherical embeddings, where class-conditional structure is induced by a von Mises–Fisher (vMF) likelihood–based objective on the unit sphere. The learned representation combines time- and frequency-domain views of the input signal via domain-specific encoders, integrating them into a joint embedding space for OOD detection. Detection uses distance-based scores over the learned embeddings, including k-nearest neighbors (k-NN) and Mahalanobis scores. We evaluate the approach at scale on the complete UCR and UEA time-series archives under a cross-dataset protocol. Empirical results show consistent improvements under both k-NN and Mahalanobis scoring over strong contrastive-learning and post-hoc baselines in the same setting. Code is available at https://github.com/tiiuae/hypertf-time-series-ood.
    Data MiningAnomaly/outlier detectionMachine LearningOpen-World/Open-Set/OOD LearningMachine LearningRepresentation learning
  517. #5605

    Platform-Aware Mission Planning with Task-Level Contracts

    Stefan Panjkovic, Alessandro Cimatti, Inigo Incer, Andrea Micheli, Stefano Tonetta
    Automated temporal planning is used to synthesize courses of action for a deterministic abstraction of a system; the produced plans are often modeled as schedules of tasks to be executed on the real system under control.
    A key problem with adopting this architecture is how to ensure the plan is executable and goal-reaching on the real system, which can be arbitrarily complex and possibly non-deterministic.

    In this paper, we propose a framework to exploit task-level contracts, expressed as assumptions and guarantees at the beginning and end of tasks, to automatically synthesize plans that are guaranteed to be correct on any system satisfying the contracts. Our framework combines a temporal planner to generate candidate plans and a contract reasoner to instantiate and verify the contracts associated with the plan. If the plan is found invalid, we refine the planning problem until a valid plan is found. We present an experimental evaluation on a realistic case-study and on several synthetic problems, showing the applicability of the approach.
    Planning and SchedulingMixed discrete/continuous planningPlanning and SchedulingPlanning with Incomplete InformationPlanning and SchedulingTheoretical foundations of planning
  518. #5614

    SeMi-LoRA: Enhancing Low-Rank Adaptation via Separation and Mixing

    Zhenfei Yang, Beiming Yu, Peiqin Lin, Yongkang Liu, Deyi Xiong
    Low-Rank Adaptation (LoRA) has become a standard paradigm for parameter-efficient fine-tuning (PEFT). However, its low-rank constraint can limit its capacity to express complex weight updates, leaving a performance gap compared with full fine-tuning. Existing extensions improve expressivity, but they often sacrifice parameter mergeability or rely on sparse, block-diagonal updates that restrict global information flow. We propose SeMi-LoRA (Separation and Mixing LoRA), a novel framework that enables complete high-rank updates while preserving mergeability. SeMi-LoRA decomposes adaptation into channel-wise compression followed by two-stage latent mixing, namely Rank Mix and Channel Mix. Our structural analysis shows that, under a fixed base rank, the rank capacity of the update increases with the number of channels and supports complete updates rather than partial sparse ones. Extensive experiments on commonsense reasoning, mathematical reasoning, natural language understanding, and dialogue generation benchmarks demonstrate that SeMi-LoRA consistently outperforms strong PEFT baselines with favorable parameter efficiency.
    Natural Language ProcessingLanguage models
  519. #5622

    PA-GMAE: A Position-Assigned Graph Masked Autoencoder for Point Cloud Representation Learning

    Haifeng Yang, Lupeng Fang, Jianghui Cai, Jie Wang, Guojiao An, Lihua Hu
    Masked autoencoder have demonstrated excellent performance in point cloud representation learning and received widespread attention. However, due to the unstructured nature of point clouds, existing methods struggle to effectively model both local geometric information and global topological features, while generally neglecting positional information, resulting in insufficient model representational capacity. To address these challenges, we propose a Position-Assigned Graph Masked Autoencoder (PA-GMAE) framework for point cloud representation learning. Specifically, we first explicitly transform the unstructured point clouds into a graph structure and assigns it positional information, leveraging the structured properties of graph to mine the intrinsic correlations among points. During the encoding stage, global contextual are dynamically injected into node features, endowing the model with capabilities of global awareness. In the decoding stage, collaborative work across multiple decoding tasks fully captures the global topology and positional dependencies of the point cloud, enhancing the quality of learned representations. Extensive experiments are conducted on three authoritative benchmark datasets, namely ScanObjectNN, ModelNet40, and ShapeNetPart, across typical downstream tasks including point cloud classification and part segmentation, the results demonstrate the effectiveness and superiority of the proposed framework.
    Machine LearningApplicationsMachine LearningSelf-supervised Learning
  520. #5637

    One-Turn Knockout: Traceable and Editable Proxy Unlearning Under Asymmetric Access Constraints

    Ziluowen Luo, Jun Yin, Hao Yan, Ruochen Liu, Ming Cheng, Senzhang Wang
    Machine unlearning (MUL) aims to remove the influence of specific data from a trained model for data privacy and model adaptability. Existing MUL methods mostly assume the internal parameters and the training data of the target model are accessible. Nevertheless, in most practical scenarios, the model provider (MP) and the service operator (SO) are different entities with unequal model access privileges. The MP provides the model, while the SO can only access the model via APIs when handling unlearning requests. Under such an asymmetric access constraint, we propose One-Turn Knockout (OTK), a novel traceable and editable MUL framework based on a model-agnostic and editable proxy. Specifically, OTK first compresses the representation space of the target model into a discrete proxy based on codebook, with merely one pass post-training. Each data sample is recorded in the proxy space as a distribution over the codebook tokens, and its contribution to the model prediction can be cumulatively estimated via additive token statistics. Based on the traceable and editable proxy, the SO can instantly handle unlearning requests by (i) estimating the token distribution of the forgotten data, (ii) identifying the causal tokens, and (iii) erasing their contributions without the access to the model parameters and training data. The theoretical bounds on the forgetting and retention model performance of OTK are also analyzed. Extensive experiments on diverse learning tasks and model architectures demonstrate the superiority of OTK.
    AI Ethics, Trust, FairnesAccountabilityAI Ethics, Trust, FairnesAI and law, governance, regulationData MiningPrivacy-preserving data mining
  521. #5647

    MV-FAC: Mean–Variance Value Function Factorization for Multi-Robot Mean–Standard Deviation Moving Target Search

    Haoming Chen, Hao Lu, Hongliang Guo, Jiancheng Lv
    This paper studies a risk-sensitive formulation of the multi-robot search problem, termed multi-robot mean-standard deviation search (MuRMSS), in which a team of robots cooperatively search for a moving target by minimizing a linear combination of the mean and standard deviation of search time. However, the standard deviation term is inherently non-additive, making it difficult to estimate, incompatible with canonical multi-robot search algorithms, and preventing consistent decomposition into individual robot utilities, which is essential for scalable multi-robot cooperation. In view of these challenges, we propose MV-FAC, which comprises a mean-variance temporal-difference module that jointly learns the mean and variance of search time, a factorization module that decomposes them into individual utilities, and a decentralized policy optimization module that minimizes each robot’s individual mean-std objective. We further establish and prove the mean-std individual global minimization (MS-IGM) theorem, thereby ensuring consistency between individual- and team-level objectives. Extensive simulation studies on standard multi-robot search benchmarks demonstrate that MV-FAC achieves the best overall mean-std search-time performance. We also validate MV-FAC's practicality by deploying it on a physical multi-robot system for moving target search in a real-world building environment.
    Agent-based and Multi-agent SystemsCoordination and cooperationAgent-based and Multi-agent SystemsMulti-agent learningAgent-based and Multi-agent SystemsMulti-agent planningRoboticsLearning in roboticsRoboticsMulti-robot systems
  522. #5665

    Distributed Learning with Adversarial Gradient Perturbations

    Nawapon Sangsiri, Yufei Tao
    Privacy concerns in distributed learning often lead clients to return intentionally altered gradient information. We consider the problem of learning convex and L-smooth functions under adversarial gradient perturbation, where a client's gradient reply to a server query can deviate arbitrarily from the true gradient subject to a distance bound. Our study focuses on two fundamental questions: (i) what is the smallest achievable sub-optimality gap (i.e., excess error in optimization) under such responses, and (ii) how many queries are sufficient to guarantee a given sub-optimality gap? We establish tight feasibility thresholds on the sub-optimality gap and provide algorithms that achieve these thresholds with provable query complexity guarantees.
    Machine LearningAdversarial machine learningMachine LearningFederated learning
  523. #5672

    Double-Calibration: Towards Reliable LLMs via Calibrating Knowledge and Reasoning Confidence

    Yuyin Lu, Ziran Liang, Yanghui Rao, Wenqi Fan, Fu Lee Wang, Qing Li
    Reliable reasoning in Large Language Models (LLMs) is challenged by their propensity for hallucination. While augmenting LLMs with Knowledge Graphs (KGs) improves factual accuracy, existing KG-augmented methods fail to quantify epistemic uncertainty in both the retrieved evidence and LLMs' reasoning. To bridge this gap, we introduce DoublyCal, a framework built on a novel double‑calibration principle. DoublyCal employs a lightweight proxy model to first generate KG evidence alongside a calibrated evidence confidence. This calibrated supporting evidence then guides a black-box LLM, yielding final predictions that are not only more accurate but also well-calibrated, with confidence scores traceable to the uncertainty of the supporting evidence. Experiments on knowledge-intensive benchmarks show that DoublyCal significantly improves both the accuracy and confidence calibration of black-box LLMs while maintaining low token cost.
    AI Ethics, Trust, FairnesTrustworthy AIKnowledge Representation and ReasoningApplicationsKnowledge Representation and ReasoningLearning and reasoningUncertainty in AIApplications
  524. #5683

    ConDyGNet: Constraint-Guided Dynamic Graph Networks for Multivariate Time Series Forecasting

    Zhenzhou Li, Xiang Li, Zhibin Niu
    Modeling inter-channel dependencies is important for multivariate time series forecasting (MTSF). However, in many cases, inter-channel dependencies are time-varying and subject to noise interference, making it difficult for models to find a balance between structural stability and temporal adaptivity. Existing methods either use a single global static structure, resulting in insufficient sensitivity to temporal changes, or use local statistical correlations to construct dependencies, but local correlations are prone to introducing noise, which may further amplify the impact of noise during propagation. To address these issues, we propose a Constraint-Guided Dynamic Graph Network (ConDyGNet), whose core idea is "global basis, dynamic weights". Specifically, ConDyGNet learns a low-rank global basis as a shared structural constraint and generates patch-wise basis mixing weights to construct dynamic propagation graphs. This maintains topological consistency while allowing local adaptation and reducing the influence of local noise. We conducted extensive experiments on eight public benchmark datasets and multiple forecasting horizons, demonstrating that ConDyGNet can learn more robust time-varying inter-channel dependencies and achieve state-of-the-art forecasting accuracy. The code is available at https://github.com/constli67/ConDyGNet.
    Machine LearningTime series and data streamsMachine LearningSequence and graph learningData MiningMining data streams
  525. #5704

    From Discrete to Continuous: Progressive Hybrid-Distributional Learning for Gradual Emotion Transitions

    Yunhe Xie, Yang Li
    Modeling emotional dynamics in multi-turn dialogues remains challenging due to emotion shifts, where emotional states evolve gradually rather than change abruptly. Existing methods often rely on one-hot supervision, which fails to reflect the progressive nature of human emotions, particularly for neutral-valence emotions with subtle trajectories. To address this limitation, we propose a pseudo soft-label guided Progressive Hybrid-Distributional (PHD) learning framework. PHD reconstructs discrete labels into hybrid distributional representations that encode inter-emotion relations and mixed emotional tendencies. Based on these representations, a progressive training strategy is introduced to guide the model from learning blended emotional states toward the standard one-to-one prediction objective. Furthermore, we design a Graded Contrastive Learning mechanism that replaces rigid binary allocation with graded supervision to alleviate label conflicts. Experiments on benchmark datasets demonstrate PHD consistently improves diverse baseline models, yielding average w-F1 gains of 1.29%. Our framework highlights the critical importance of modeling emotional dynamics as a gradual, distributional process.
    Humans and AIPersonalization and user modelingNatural Language ProcessingDialogue and interactive systemsMachine LearningMulti-label learningNatural Language ProcessingSentiment analysis, stylistic analysis, and argument mining
  526. #5710

    Attribution-based Explanations for Markov Decision Processes

    Paul Kobialka, Andrea Pferscher, Francesco Leofante, Erika Ábrahám, Silvia Lizeth Tapia Tarifa, Einar Broch Johnsen
    Attribution techniques explain the outcome of an AI model by assigning a numerical score to its inputs. So far, these techniques have mainly focused on attributing importance to static input features at a single point in time, and thus fail to generalize to sequential decision-making settings. This paper fills this gap by introducing techniques to generate attribution-based explanations for Markov Decision Processes (MDPs). We give a formal characterization of what attributions should represent in MDPs, focusing on explanations that assign importance scores to both individual states and execution paths. We show how importance scores can be computed by leveraging techniques for strategy synthesis, enabling the efficient computation of these scores despite the non-determinism inherent in an MDP. We evaluate our approach on five case-studies, demonstrating its utility in providing interpretable insights into the logic of sequential decision-making agents.
    AI Ethics, Trust, FairnesExplainability and interpretabilityUncertainty in AISequential decision making
  527. #5737

    CE-VFAL: A Novel Framework for Communication-Efficient Vertical Federated Adversarial Learning

    Tianxing Man, Jinjie Fang, Ganyu Wang, Yu Bai, Zhaogeng Liu, Bin Gu, Yi Chang
    Vertical Federated Learning (VFL) involves multiple participants collaborating to train machine learning models on distinct feature sets from the same data samples.
    This training paradigm with distributed updating focuses on secure and efficient communication.
    Nevertheless, the trained models exhibit heightened vulnerability to adversarial attacks during inference, which can provoke misclassification.
    Adversarial Training (AT), which involves exposing models to intentionally crafted misleading examples during training, is widely regarded as the most effective method for enhancing model robustness.
    However, the significant communication costs entailing such example generation within the VFL context pose an open challenge to developing a Vertical Federated Adversarial Learning (VFAL) framework.
    To this end, we introduce a Communication-Efficient Vertical Federated Adversarial Learning framework, named CE-VFAL.
    CE-VFAL framework incorporates the lazy propagation principle, confining most propagations to client models during adversarial updates, thereby minimizing frequent client-server interactions.
    Moreover, CE-VFAL seamlessly integrates Zeroth Order Optimization (ZOO) into communication, effectively reducing communication load by transmitting the loss difference derived from the raw and perturbed embeddings for multiple point estimation.
    Furthermore, our theoretical analysis demonstrates the sublinear convergence rate by containing the errors caused by multi-source approximate gradients.
    Extensive experiments corroborate the robust performance while significantly reducing communication costs.
    AI Ethics, Trust, FairnesSafety and robustnessMachine LearningFederated learning
  528. #5759

    Parameterized and Streaming Algorithms for Euclidean Fair k-Center Clustering

    Zeyu Lin, Chaoqi Jia, Longkun Guo, Chao Chen
    Motivated by the growing importance of fairness in machine learning, fair $k$-center clustering has attracted considerable research attention as a fundamental problem. In this problem, a dataset is partitioned into $m$ disjoint groups, and the objective is to select $k$ data points as centers, subject to upper bounds on the number of centers chosen from each group, so as to minimize the maximum distance between any data point and its assigned center. Focusing on Euclidean spaces, which are ubiquitous in machine learning applications, we first develop a parameterized approximation algorithm for Euclidean fair $k$-center with an approximation ratio of $2.732$. By incorporating this algorithm as a post-processing stage into a one-pass streaming framework for large-scale data, we obtain an approximation ratio of $4.464$. These ratios can be further improved to $2.414$ and $3.828$, respectively. To ensure polynomial-time complexity, we further design a one-pass streaming algorithm with approximation ratio of $4.732$, improving upon the state-of-the-art ratio of $5$ for general metric spaces and demonstrating that the streaming fair $k$-center problem admits better approximations in Euclidean spaces than in general metrics. Finally, extensive experiments show that our methods significantly outperform state-of-the-art approaches in terms of clustering accuracy.
    Machine LearningClusteringAIConstraint Satisfaction and OptimizationAI Ethics, Trust, FairnesFairness and diversityConstraint Satisfaction and OptimizationConstraint optimization problems
  529. #5761

    SpatialV2A: Visual-Guided High-fidelity Spatial Audio Generation

    Yanan Wang, Linjie Ren, Zihao Li, Junyi Wang, Tian Gan
    While video-to-audio generation has achieved remarkable progress in semantic and temporal alignment, most existing studies focus solely on these aspects, paying limited attention to the spatial perception and immersive quality of the synthesized audio. This limitation stems largely from current models' reliance on mono audio datasets, which lack the binaural spatial information needed to learn visual-to-spatial audio mappings. To address this gap, we introduce two key contributions: we construct BinauralVGGSound, the first large-scale video-binaural audio dataset designed to support spatially aware video-to-audio generation; and we propose an end-to-end spatial audio generation framework guided by visual cues that explicitly models spatial features. Our framework incorporates a visual-guided audio spatialization module that ensures the generated audio exhibits realistic spatial attributes and layered spatial depth while maintaining semantic and temporal alignment. Experiments show that our approach substantially outperforms state-of-the-art models in spatial fidelity and delivers a more immersive auditory experience, without sacrificing temporal or semantic consistency. All datasets, code, and model checkpoints are available at: https://github.com/renlinjie868-web/SpatialV2A.
    Computer VisionVideo analysis and understandingComputer VisionMultimodal learningComputer VisionImage and video synthesis and generation
  530. #5784

    Learning Kernelized Hypothesis for Hidden Confounder Detection

    Yikai Chen, Haotian Wang, Yunxin Mao, Xinpeng Lv, Shaowu Yang, Kun Kuang, Xinwang Liu, Wenjing Yang
    Detecting hidden confounding is crucial for reliable causal analysis from observational data, directly determining which downstream causal inference method to be deployed. Inspired by the theory of higher-order regression, recent sample-efficient hypothesis testing strategies overcome the restrictive requirement of multiple heterogeneous data environment. Despite their progress on single-environment confounder detection, such methods suffer from intrinsic flaws that the structural functions of the causal models should be specified in prior (linear or specific kernel functions). By contrast, real-world data acquisition exhibits diverse, unknown forms of structural functions, imposing an important but challenging gap between theories of higher-order regressions to practical confounder detection. In this paper, we contribute a Bi-level Kernel Confounder Detection (BiKCD) framework by learning adaptive kernelized structural space of structural functions. Subsequently, our BiKCD constructs hypothesis testing by comparing coefficients from the higher-order regression and the classical ordinary least squares in learned kernelized space. Finally, the hypothesis is calibrated to ensure valid inference under adaptivity. Theoretically, we establish an oracle-type risk bound for the selected structural space over a candidate kernel family, with the Type-I error control for the downstream test. Extensive experiments on synthetic and real-world datasets demonstrate the effectiveness of the proposed BiKCD.
    Machine LearningCausality
  531. #5812

    A Lightweight Traffic Map for Efficient Anytime LaCAM

    Bojie Shen, Yue Zhang, Zhe Chen, Daniel Harabor
    Multi-Agent Path Finding (MAPF) seeks collision-free paths for teams of agents and has a wide range of practical applications. LaCAM*, an anytime configuration-based solver, currently represents the state-of-the-art. Recent work has explored using guidance paths to steer LaCAM* toward configurations that avoid traffic congestion, thereby improving solution quality. However, existing approaches rely on Frank–Wolfe–style optimisation to repeatedly invoke single-agent search before executing LaCAM*, which creates a large computational overhead in large-scale problems. The guide path is also static, which is only helpful for finding the first solution in LaCAM*. To overcome this problem, we propose a new approach that exploits LaCAM*’s ability to construct a dynamic, lightweight traffic map during LaCAM*'s search. Experiments show that our method achieves higher solution quality than state-of-the-art guidance-path approaches in two variants of MAPF problems.
    Agent-based and Multi-agent SystemsMulti-agent planningSearchHeuristic search
  532. #5817

    NEST: Tackling Dataset-Level Distribution Shifts via Regime-Oriented Mixture-of-Experts

    Lanhao Li, Bingshu Xie, Lijun Sun, Xin Xue, Haoyi Zhou, Jianxin Li
    Accurate long-term forecasting in complex systems is frequently compromised by dataset-level distribution shifts, where diverse underlying behavioral modes and evolving system states drive the dynamic multivariate time-series. While existing methods predominantly focus on local temporal shifts, they fail to explicitly model the global structural challenge where datasets are composites of distinct operational regimes. In this paper, we propose NEST, a specialized framework designed to model and recompose these evolving structures through a two-phase dense MoE architecture. NEST first facilitates structural specialization by partitioning the dataset into distinct operational regimes through unsupervised clustering in a principled moment-entropy space. We introduce a regime-oriented router mechanism that generates initial expert weights based on temporal content, subsequently refined through geometric modulation to regime centroids. Crucially, rather than acting as monolithic predictors, individual experts function as specialized kernels that capture regime-specific dynamics by evolving unique variate-attention patterns. Extensive evaluations on diverse benchmarks, including heterogeneous network traffic and physical phenomena, demonstrate that NEST consistently achieves state-of-the-art performance. Our code and datasets are available at \url{https://github.com/Aaralshin/NEST}.
    Data MiningMining spatial and/or temporal dataMachine LearningSequence and graph learningMachine LearningTime series and data streams
  533. #5841

    Speeding Up the NSGA-II via Dynamic Population Sizes

    Benjamin Doerr, Martin S. Krejca, Simon Wietheger
    Multi-objective evolutionary algorithms (MOEAs) are among the most widely and successfully applied optimizers for multi-objective problems. However, to store many optimal trade-offs (the Pareto optima) simultaneously, MOEAs are typically run with a large population of solution candidates. This slows down the algorithm and renders the choice of the population size a crucial design decision. In this work, we aim to overcome these difficulties by proposing the dynamic NSGA-II, a variant of the well-known NSGA-II that starts with a small initial population and doubles it after a user-specified number 𝜏 of function evaluations, up to a maximum size of 𝑁ₘₐₓ. We prove that the dynamic NSGA-II with optimal parameters computes the Pareto front of the OneMinMax benchmark of size 𝑛 with high probability in O(𝑛 log² 𝑛) function evaluations, which is considerably faster than the Θ(𝑛² log 𝑛) runtime of the static NSGA-II with optimal parameters. For the OneJumpZeroJump benchmark with gap size 𝑘, we show a runtime of O(𝑛^𝑘 log² 𝑛), improving upon the known runtime of Θ(𝑛^(𝑘 + 1)). We also propose a variant that uses the initial population size for a longer period and achieves slightly better performance. Finally, we show that a simple concurrent-run strategy turns our dynamic NSGA-II variants into parameter-less algorithms that exceed the above runtimes only by a logarithmic factor and hence still outperform the static NSGA-II by a factor of ͠Ω(𝑛).
    SearchEvolutionary computation
  534. #5845

    PointGP: Geometry-Primed Attention for Point Cloud Analysis

    Yong Yang, Jianming Huang, Mengyuan Ge, Chunyang Huang, Bingbing Hu, Junfeng Yao
    Transformer-based architectures have demonstrated strong performance in 3D point cloud understanding, yet many existing methods generate attention weights mainly from semantic feature similarity. In deep networks, feature-centric attention may become less selective as point features are progressively smoothed, a behavior associated with feature homogenization and rank collapse, which can weaken the structural discrimination of local aggregation.We propose PointGP, a geometry-primed framework that uses rectified local geometric topology as the primary cue for attention generation. PointGP introduces a Semantic-Guided Manifold Rectifier to predict feature-conditioned local coordinate offsets, and a Dual-Stream Geometric Kernel to compute attention logits from both raw and rectified geometric cues. By reducing reliance on explicit query-key feature matching while implicitly incorporating semantic guidance through geometric rectification, PointGP provides an effective and efficient mechanism for local point aggregation.Experiments across five benchmarks covering classification, part segmentation, and indoor scene segmentation show that PointGP achieves competitive accuracy with strong parameter and computational efficiency compared with representative strong baselines.
    Computer Vision3D computer visionComputer VisionMachine learning for visionComputer VisionScene analysis and understanding
  535. #5854

    PRIME: A Decoupled Multi-agent Actor-Critic for Multi-view Clustering

    Jing Gao, Xinxin Liu, Peng Li, Jianing Zhang, Meng Liu, Qingchen Zhang
    Deep multi-view clustering draws plentiful attention in various domains, owing to remarkable performance in learning patterns from complementary information of multi-view data. However, previous methods encounter two challenges. They utilize a single pre-defined clustering strategy to perceive diverse structures from data in multiple views subject to heterogeneous distributions, failing to fully capture intricate complementary structures. They leverage either the feature fusion or the result fusion in clustering, which cannot fully integrate complementary information. Therefore, a decoupled multi-agent actor-critic (PRIME) is proposed via defining multi-view clustering as a partially observable Markov game, which establishes a dynamic trial-and-error assignment process of samples in a decentralized perception with centralized learning framework to adaptively learn the optimal clustering strategy. In PRIME, an actor leverages the policy gradient paradigm to independently implement a Markov decision process of data partition in a view, which fully explores data structures to tailor a clustering sub-policy for a view. Meanwhile, a critic utilizes the value function paradigm to centrally guide the Markov game among actors in different views, which constructs both feature and result fusion to progressively enhance complementary knowledge integration for robust clustering results. Extensive experiments on 6 benchmark datasets verify the superiority of PRIME against 10 methods.
    Machine LearningMulti-view learning
  536. #5857

    Transforming and Encoding FTS for SAT Solving: What Helps, What Hurts

    João Filipe, Álvaro Torralba, Gregor Behnke
    Factored tasks are a classical planning representation that extends SAS+ with limited forms of disjunctive preconditions, conditional effects, and angelic nondeterminism. This added expressiveness allows for a more compact representation of taks than traditional formalisms such as STRIPS or SAS+, and supports a wide range of task transformations. However, existing planning approaches for factored tasks have been limited to heuristic search methods.

    In this work, we investigate how to encode factored tasks in SAT. We propose several ways to encode the tasks, focusing on different strategies for translating the factored transition relation into propositional logic. We also analyze how to exploit parallelism at various levels in this setting and study the impact of common task transformations on the performance of SAT-based planners.
    Constraint Satisfaction and OptimizationSatisfiabiltyPlanning and SchedulingActivity and plan recognition
  537. #5873

    Stay in Character, Stay Safe: Dual-Cycle Adversarial Self-Evolution for Role-Playing Agents

    Mingyang Liao, Yichen Wan, Shuchen Wu, Chenxi Miao, Xin Shen, Weikang Li, Yang Li, Deguo Xia, Jizhou Huang
    LLM-based role-playing has rapidly improved in fidelity, yet stronger adherence to persona constraints commonly increases vulnerability to jailbreak attacks, especially for risky or negative personas.
    Most prior work mitigates this issue with training-time solutions (e.g., data curation or alignment-oriented regularization).
    However, these approaches are costly to maintain as personas and attack strategies evolve, can degrade in-character behavior, and are typically infeasible for frontier closed-weight LLMs.
    We propose a training-free Dual-Cycle Adversarial Self-Evolution framework with two coupled cycles.
    A Persona-Targeted Attacker Cycle synthesizes progressively stronger jailbreak prompts, while a Role-Playing Defender Cycle distills observed failures into a hierarchical knowledge base of (i) global safety rules, (ii) persona-grounded constraints, and (iii) safe in-character exemplars.
    At inference time, the Defender retrieves and composes structured knowledge from this hierarchy to guide generation, producing responses that remain faithful to the target persona while satisfying safety constraints.
    Extensive experiments across multiple proprietary LLMs show consistent gains over strong baselines on both role fidelity and jailbreak resistance, and robust generalization to unseen personas and attack prompts.
    Agent-based and Multi-agent SystemsMulti-agent learning
  538. #5874

    Open-Vocabulary Object 6D Pose Estimation via Modulated Textual Semantics

    Zixuan Sun, Hui Shuai, Qingshan Liu
    Estimating the 6D pose of novel objects without CAD models or video sequences remains a challenging problem. Recent works explore text-driven approaches to address this challenge in an open-vocabulary manner. However, these methods typically treat text embeddings as static priors, which lack the flexibility to adapt to specific visual scenes, thereby limiting reliable cross-view correspondences. In this paper, we propose a framework that modulates static textual embeddings into adaptable semantic guidance for RGB-D open-vocabulary 6D pose estimation. Specifically, we introduce a Text Feature Modulation (TFM) module that uses learnable queries to transform fixed text embeddings into adaptive semantic representations. These representations are explicitly calibrated by a Foreground-Grounded Semantic Alignment (FGSA) strategy, which suppresses background distractions via contrastive learning. Furthermore, we design a Semantic-Anchored Cross-view Fusion (SACF) module that aggregates image features from the reference and query views using modulated semantics as an invariant anchor. Finally, we predict object segmentation and establish semantic-aware correspondences, leveraging modulated textual semantics to filter mismatches for robust 6D pose estimation. Extensive experiments on standard benchmarks demonstrate that our method achieves superior performance.
    Computer Vision3D computer visionComputer VisionScene analysis and understanding
  539. #5896

    Bridging the Semantic Gap: Leveraging LLMs for Hierarchical Interest Evolution in Sequential Recommendation

    Yifan Cao, Rui Wu, Xiang Wang, Wei Peng, Wei Xu, Caihong Sun, Jian Zeng
    Accurate user behavior modeling is fundamental to the prediction of click-through rates (CTR) in industrial recommendation systems and online advertising. Traditional discriminative models, which rely on isolated ID features, struggle to capture the evolving nature of user intents across multiple channels due to the inherent semantic gap. While Large Language Models (LLMs) offer rich semantic understanding to bridge this gap, their direct application faces two main challenges: modality misalignment between continuous semantic representations and discrete item IDs, and prohibitive computational costs that make them unsuitable for real-time inference. To address these issues, we propose the Hierarchical Semantic Interest Evolution Network (HSIEN), a novel generative-discriminative framework. HSIEN leverages an LLM to construct a Markovian state-space model that recursively captures and updates user interests from multi-channel behavior sequences. It then employs a Semantic Mixture-of-Experts (SMoE) strategy to aggregate multi-channel signals into hierarchical intent representations, which are encoded as compact dynamic semantic vectors for integration with downstream discriminative models. Extensive experiments on a public dataset and a large-scale industrial dataset demonstrate that HSIEN significantly alleviates modality misalignment and enhances CTR prediction performance through feature complementarity. This work provides a practical and efficient pathway for leveraging LLMs in large-scale recommendation scenarios.
    Data MiningRecommender systemsMachine LearningFeature extraction, selection and dimensionality reductionMachine LearningSequence and graph learning
  540. #5904

    URFPert: Unrolled Regulatory Flow Networks for Out-of-Distribution Single-Cell Perturbation Prediction

    Xiaoqi Sheng, Jiawen Liu, Yutong Li, Sankar Mondal, Jiaming Liang, Tinghe Zhang, Hongmin Cai
    Single-cell perturbation screens enable systematic discovery of gene regulatory mechanisms, yet the exponential expansion of perturbation space makes comprehensive experimentation impractical. Although in silico predictors have been increasingly proposed to address this challenge, most existing methods either assume static regulatory priors or learn unconstrained data-driven representations. As a result, they often suffer from poor adaptation to state-dependent shifts and weak out-of-distribution generalization. In response, we propose URFPert, an unrolled regulatory flow network tailored for single-cell perturbation prediction. Specifically, URFPert is composed of three fundamental modules: (i) Regulatory Structure Unrolling module, which employs an iterative unrolling optimization algorithm to infer state-dependent gene regulatory networks (GRNs); (ii) Heterogeneous Graph Flow Network, which incorporates the inferred GRNs into a conditional velocity field to yield state-consistent and perturbation-specific trajectories; (iii)Discriminative Optimization Strategy, which combines Mean Squared Error~(MSE) with Maximal Coding Rate Reduction~(MCR²) to promote discriminative perturbation responses and mitigate training confounders. Comprehensive evaluations on five single-cell perturbation datasets demonstrate that URFPert outperforms state-of-the-art methods in unseen perturbation regimes, providing a powerful tool for interpreting regulatory mechanisms. The project code is public via https://github.com/Jiniretttt/URFPert.
    Machine LearningRegressionMultidisciplinary Topics and ApplicationsBioinformaticsMultidisciplinary Topics and ApplicationsHealth and medicineMultidisciplinary Topics and ApplicationsLife sciences
  541. #5913

    IWCNet: Intervention-based Weight Calibration Network for Multi-center Multi-modality MRI Lesion Segmentation

    Ronghui Qi, Wenlong Song, Wanyin Shi, Chenchu Xu
    Multi-modality MRI lesion segmentation leverages complementary contrasts across modalities to better characterize lesion morphology and tissue alterations, thereby improving lesion delineation and reducing misdiagnosis. Existing correlation-driven methods estimate fusion weights by converting cross-modality correlation scores into adaptive fusion weights. However, in multi-center settings, domain shift changes cross-modal correlation patterns, which breaks the learned correlation-to-weight mapping and leads to modality-weight bias that amplifies center-specific shortcuts. To address this issue, we propose an Intervention-based Weight Calibration Network (IWCNet) that models domain shift as a controllable embedding and leverages explicit interventions to mitigate its impact on fusion weight estimation. IWCNet constructs original and intervened branches by performing interventions on the domain embedding, measuring inter-branch prediction effects to calibrate fusion weights. IWCNet further introduces Operator-level Domain Modulation (OLDM) to construct structure-consistent domain-variant feature pairs via operator-level domain intervention, and Effect-Driven Consistency Calibration (EDCC) to jointly leverage intervention effects and cross-branch prediction consistency to calibrate weights. Extensive experiments on multi-center multi-modality MRI datasets demonstrate that IWCNet substantially improves segmentation accuracy and robustness under domain shift.
    Computer VisionBiomedical image analysisComputer VisionSegmentation, grouping and shape analysis
  542. #5922

    DELTA: Disentangled Hierarchical Interaction and Adaptive Adjustment for Emotion and Intent Understanding in Multimodal Conversations

    Shenjie Jiang, Xiangfeng Liu, Xianghua Li, Xuecheng Wu
    Emotion and intent joint understanding in multimodal conversations (MC-EIU) aims to infer emotion and intent information by leveraging the semantic dependencies among multimodal data. However, prior works largely ignore redundancy interference and suffer from insufficient interaction and inter-task noise propagation due to their reliance on shallow mechanisms. To overcome these limitations, we propose a novel framework named Disentangled Hierarchical Interaction and Adaptive Adjustment (DELTA) for MC-EIU. We first design a Disentangled Feature Denoising module based on orthogonal decomposition to effectively filter out noise and redundancy from heterogeneous data. Second, we propose a Dual-Level Adaptive Adjustment mechanism to dynamically optimize learning dynamics from both modality and instance perspectives. Furthermore, we introduce a Hierarchical Task-Bottleneck Interaction(HTBI) module, which employs a set of progressively compressed bottleneck tokens as a communication hub. This design can effectively simulating the multi-round iterative interaction between emotion and intent tasks. Experiments on the benchmark MC-EIU bilingual dataset demonstrate that our framework significantly outperforms state-of-the-art baselines in both emotion and intent tasks.
    Data MiningApplicationsKnowledge Representation and ReasoningApplicationsMachine LearningApplicationsSearchApplications
  543. #5936

    BAG-Net: Bidirectional Receptive-Field Graph Network for Two-View Correspondence Pruning

    Leyi Wang, Hao Chen, Changcai Yang
    Learning reliable two-view correspondences is essential for geometric computer vision applications. Existing graph-based pruning methods typically aggregate information unidirectionally, capturing only whether a correspondence receives neighborhood support while ignoring its structural contribution during aggregation. This makes it difficult to distinguish locally clustered outliers from globally critical inliers. To overcome this limitation, we propose a Bidirectional Aggregation Graph Network (BAG-Net), which introduces a bidirectional aggregation perspective for correspondence pruning. Specifically, we design Bidirectional Dynamic Graph Convolution (BiDGC) that jointly models forward consistency aggregation and reverse structural dependency, enabling the network to identify key inliers that are widely referenced during feature propagation. To further enhance robustness, we design a Stage-wise Graph Expansion-Contraction (SGEC) architecture that combines multi-scale receptive field evolution with Stage-wise Squeeze-and-Excitation (SSE) and Deep-Guided Modulation (DGM) to stabilize feature propagation across network stages. Extensive experiments on the YFCC100M and SUN3D datasets demonstrate that BAG-Net significantly outperforms existing SOTA methods in both outlier rejection and camera pose estimation, achieving 66.10% and 78.88% mAP5^o% on YFCC100M in known and unknown scenes, respectively. Source code: https://github.com/LeyiWang13/BAG-Net.
    Computer VisionMachine learning for visionComputer VisionOther
  544. #5943

    Learning to Sparsify Stochastic Linear Bandits

    Zhengmiao Wang, Ming Chi, Zhi-Wei Liu, Lintao Ye, Carla Fabiana Chiasserini
    This paper addresses the problem of learning to sparsify stochastic linear bandits, where a decision-maker sequentially selects actions from a high-dimensional space subject to a sparsity constraint on the number of nonzero elements in the action vector. The key challenge lies in minimizing cumulative regret while tackling the potential NP-hardness of finding optimal sparse actions due to the inherent combinatorial structure of the problem. We propose an adaptively phased exploration and exploitation algorithmic framework, utilizing ordinary least squares for parameter learning and specialized subroutines for sparse action selection. When the action set is a Euclidean ball, optimal sparse actions can be efficiently computed, enabling us to establish a Õ(d √T) regret, where d is the dimension of the action vector and T is the time horizon length. For general convex and compact action sets where finding optimal sparse actions is intractable, we employ a greedy subroutine. For general strongly convex action sets, we derive a Õ(d √T) α-regret; for general compact sets lacking strong convexity, we establish a Õ(d T^(2/3)) α-regret, where α pertains to the approximation ratio of the greedy algorithm. Finally, we validate the performance of our algorithms using extensive experiments.
    Machine LearningOnline learningMachine LearningMulti-armed banditsMachine LearningReinforcement learningMachine LearningLearning sparse models
  545. #5949

    Integer Splittable Congestion Games with Capacitated Resources and Player-Specific Costs

    Gianpiero Monaco, Raffaele Mosca, Luca Moscardelli
    Motivated by practical allocation problems, we study integer splittable congestion games with capacitated resources and player-specific costs. In this setting, each player has an integer weight that has to be split in integer units across multiple resources, each with a capacity limiting the total assigned weight, i.e., its congestion. The latency of a resource is equal to a player-specific constant when its congestion is not greater than the capacity and becomes prohibitive, i.e., equal to ∞, once the capacity is exceeded.
    We analyze the computational complexity of finding an allocation that optimizes utilitarian social welfare under two cost models (total cost and per-unit cost). Furthermore, we investigate the computation of, speed of convergence to, and efficiency of Nash equilibria.
    Agent-based and Multi-agent SystemsAgent theories and modelsGame Theory and Economic ParadigmsNoncooperative games
  546. #5958

    Beyond Homogeneous Adversaries: Stackelberg Security Games with Mixed Quantal Response

    Hoang Giang Pham, Tien Mai, Thuy Anh Ta, Minh Hoàng Hà
    The quantal response (QR) model is widely used in Stackelberg security games (SSGs) to capture boundedly rational adversaries. Existing work on SSGs under QR, however, almost exclusively assumes a homogeneous attacker population, ignoring heterogeneity in attacker preferences and rationality. We study SSG with mixed quantal response attackers, where the follower population consists of multiple discrete attacker types, each following a type-specific QR model. The defender allocates limited resources across targets, while an attacker drawn from this heterogeneous population observes the defender’s strategy and attacks a single target. This results in a highly non-convex equilibrium computation problem. We develop a polynomial-time approximation scheme (PTAS) for this setting when the number of attacker types is bounded, based on an exponential cone programming formulation combined with a carefully designed Branch-and-Bound procedure. Experiments demonstrate that our approach outperforms standard gradient-based methods and that explicitly modeling attacker heterogeneity yields significant gains over traditional SSG models with a single QR attacker.
    Game Theory and Economic ParadigmsNoncooperative games
  547. #5959

    H^2Net: Homo- and Heterogeneous Networks for Unified Segmentation

    Jinyu Han, Changguang Wu, Fuming Sun, Mengyin Wang, Jinhui Tang
    Unified segmentation aims to consolidate multiple vision tasks into a single model, yet faces two core challenges: learning robust homogeneous features (e.g., shared low- and mid-level cues) to enable cross-domain knowledge transfer, while disentangling heterogeneous features (e.g., task-specific semantic objectives) to avoid negative transfer and preserve task independence.
    To address these challenges, we propose the Homo- and Heterogeneous Network (H2Net), a unified framework that jointly models shared homogeneous representations and task-specific heterogeneous features. Specifically, H2Net incorporates a Cross-Modal Structure Enhancement Module (CSEM), which integrates auxiliary depth priors via joint frequency–spatial cross-modal attention to strengthen task-agnostic structural representations. In addition, a Task Adapter Pool (TAP) is introduced to model task-specific heterogeneous features by assigning dedicated adapters to individual tasks, enabling task-aware feature modulation and semantic disentanglement within a shared backbone. Extensive experiments on benchmarks spanning eight tasks demonstrate the effectiveness of the proposed approach and its superior performance. Code and results will be available at \url{https://h2net-ijcai26.github.io}.
    Computer VisionBiomedical image analysisComputer VisionMultimodal learningComputer VisionSegmentation, grouping and shape analysis
  548. #5960

    Probabilistic Verification of Recurrent Neural Networks for Single and Multi-Agent Reinforcement Learning

    Luca Marzari, Enrico Marchesini
    History-dependent policies induced by recurrent neural networks (RNNs) rely on latent hidden state dynamics, making verification in partially observable reinforcement learning (RL) challenging. Existing RNN verification tools typically rely on restrictive modeling assumptions or coarse over-approximations of the hidden state space, which can lead to overly conservative or inconclusive results. We propose RNN Probabilistic Verification (RNN-ProVe), a probabilistic framework that estimates the likelihood of undesired behaviors in RNN-based policies. RNN-ProVe uses policy-driven sampling to approximate the set of hidden states that are feasible under a trained policy, and derives statistical error bounds to produce bounded-error, high-confidence estimates of behavioral violations. Experiments on partially observable single-agent and cooperative multi-agent tasks show that RNN-ProVe yields more quantitative, feasibility-aware probabilistic guarantees than existing tools, while scaling to recurrent and multi-agent settings.
    Agent-based and Multi-agent SystemsFormal verification, validation and synthesisAI Ethics, Trust, FairnesSafety and robustnessMachine LearningReinforcement learning
  549. #5962

    AC2-VLA: Action-Context-Aware Adaptive Computation in Vision-Language-Action Models for Efficient Robotic Manipulation

    Wenda Yu, Tianshi Wang, Fengling Li, Jingjing Li, Lei Zhu
    Vision-Language-Action (VLA) models have demonstrated strong performance in robotic manipulation, yet their closed-loop deployment is hindered by the high latency and compute cost of repeatedly running large vision-language backbones at every timestep. We observe that VLA inference exhibits structured redundancies across temporal, spatial, and depth dimensions, and that most existing efficiency methods ignore action context, despite its central role in embodied tasks. To address this gap, we propose Action-Context-aware Adaptive Computation for VLA models (AC2-VLA), a unified framework that conditions computation on current visual observations, language instructions, and previous action states. Based on this action-centric context, AC2-VLA adaptively performs cognition reuse across timesteps, token pruning, and selective execution of model components within a unified mechanism. To train the adaptive policy, we introduce an action-guided self-distillation scheme that preserves the behavior of the dense VLA policy while enabling structured sparsification that transfers across tasks and settings. Extensive experiments on robotic manipulation benchmarks show that AC2-VLA achieves up to a 1.79 X speedup while reducing FLOPs to 29.4% of the dense baseline, with comparable task success.
    Computer VisionEfficiency and OptimizationRoboticsManipulationRoboticsRobotics and vision
  550. #5985

    Generalizing Bayesian Human-AI Collaboration: Theory and Application in Data-Scarce Environments

    Peng Liu, Hailong Sun, Chung-Piaw Teo, Mabel Chou
    Combining predictions from heterogeneous classifiers—such as in-house deep learning models, human experts, and large language models (LLMs)—is a key challenge, especially in data-scarce environments such as humanitarian operations. We propose a flexible Bayesian framework to effectively fuse these diverse inputs. By integrating classifier logits with auxiliary human feedback (e.g., confidence, image clarity) using an ordered probit process, our model generalizes prior work to accommodate the real-world properties of these classifiers. We validate our framework on a challenging product recognition task in food bank operations, an environment defined by data scarcity and an inexperienced volunteer workforce. Our combined model significantly outperforms standalone ResNet, human, and LLM-based approaches, demonstrating the practical benefits of fusing these heterogeneous signal sources.
    Humans and AIHuman-AI collaborationHumans and AIHuman-computer interactionAgent-based and Multi-agent SystemsHuman-agent interactionAIHumans and AI
  551. #5993

    Dual-Adversarial Dynamic Variational Asset Pricing with Adaptive Spatio-Temporal Feature Clustering for Portfolio Recommendation

    Yupeng Fang, Ruirui Liu, Xinyu Xia, Huichou Huang, Johannes Ruf, Qingyao Wu
    Asset pricing and portfolio recommendation are two closely related fundamental tasks in quantitative investment, for which machine learning methods have attracted significant attention in both academia and industry. In particular, nonlinear asset pricing models based on deep learning architectures that learn risk factors and risk exposures (betas) conditioned on high-dimensional asset characteristics have become widely used in the field.
    Despite their popularity, three challenges remain for portfolio recommendation: (i) their static risk pricing structure constrains predictive performance for expected returns; (ii) their representation learning is typically deterministic or fails to account for the inherent distributional uncertainty in the feature space arising from noisy returns and heterogeneous characteristics; and (iii) the sparse factor structure of asset returns and the diversity of characteristics make it difficult to identify incremental predictive information. To address these issues and bridge the gap to practical applications, we propose a novel multi-task dynamic factor model that jointly performs asset pricing and portfolio recommendation. Specifically, we introduce dual-adversarial trainers into the variational prior-posterior learning framework for Factor and Beta Networks, augmented with probabilistic equivariance regularization and dual adaptive spatio-temporal clustering. These components filter redundant information and enhance the model's ability to adapt to changing market conditions. Extensive experiments on a comprehensive open-source stock market dataset demonstrate that our model achieves strong and robust performance relative to baseline methods in the literature.
    Data MiningApplicationsMultidisciplinary Topics and ApplicationsFinanceUncertainty in AIApplications
  552. #6027

    Parameterized Approximation Schemes for Fair Clustering in Doubling Metrics

    Xiaoliang Wu, Ting Liang, Zhize Li, Qilong Feng
    Fair clustering has garnered considerable attention, with various fairness notions proposed to ensure equitable representation across demographic groups. In this paper, we focus on the k-center problem in bounded doubling metrics under two popular fairness requirements: group fairness and data summarization fairness, referred to Group Fair k-Center (Gf-k-Cen) and Data Summarization Fair k-Center (Dsf-k-Cen), respectively. Both fairness notions extend classical clustering formulation by associating each data point with a demographic label. Motivated by recent advances in parameterized approximation results for fair clustering, we investigate whether these problems admit Fixed-Parameter Tractable (FPT) approximation schemes in bounded doubling metrics. The previous algorithms typically neglect the local structural properties induced by fairness constraints itself, which limits their approximation quality. By further leveraging the geometric properties of doubling metrics together with local fairness information, we develop a candidate-based structural method that yields (1+eps)-approximation algorithms with FPT running times for both problems, parameterized by the number of selected centers. To the best of our knowledge, these results constitute the first parameterized approximation schemes for the Gf-k-Cen and Dsf-k-Cen problems in bounded doubling metrics.
    AI Ethics, Trust, FairnesFairness and diversityMachine LearningClustering
  553. #6055

    ILR-SMO: Iterative Latent Refinement for Robust Spatial Multi-Omics Integration

    Anqi Yu, Xudong Xu, Jianzhi Lu, Yuqi Sun, Bingguo Chen, Chenxi Ma, Weimin Tan, Bo Yan
    Spatial multi-omics technologies jointly profile diverse molecular modalities with spatial context, providing a comprehensive view of cellular heterogeneity and tissue organization. To integrate spatial multi-omics data and identify spatial domains, a wide range of unsupervised methods has been proposed. However, recent approaches rely on single-step fusion, directly aggregating heterogeneous modalities into a shared representation, making embeddings sensitive to modality imbalance and measurement noise that are inherent to spatial multi-omics data, as well as unstable optimization under weak supervision. Here, we propose ILR-SMO (Iterative Latent Refinement for Spatial Multi-Omics), a unified self-supervised framework that reformulates multimodal integration as a stability-aware refinement process over a shared latent representation. Instead of single-step plain fusion, ILR-SMO progressively integrates modality-specific information through sequential updates, allowing information to be injected in a controlled manner. This incremental refinement enables later updates to correct or compensate for earlier deviations, resulting in more stable and robust representation learning under heterogeneous and noisy modalities. ILR-SMO further incorporates reliability-aware modality gating for adaptive modulation of modality contributions, and employs a joint objective to enforce spatial coherence while preventing representation collapse. Extensive experiments on five spatial multi-omics benchmarks demonstrate that ILR-SMO consistently outperforms seven state-of-the-art methods and exhibits strong robustness across diverse settings.
    Machine LearningMulti-modal learningMachine LearningClusteringMachine LearningMulti-view learningMultidisciplinary Topics and ApplicationsBioinformatics
  554. #6065

    Mapping the Efficiency Landscape of Small Language Models

    Fabian Reichwald, Lukas Schiesser, Christiane Plociennik, Leonhard Kunz, Simon Pukrop, Martin Ruskowski, Oliver Thomas
    Large language models (LLMs) dominate both everyday and specialized applications, but their high computational demand, energy consumption, and privacy risks are increasingly critiqued. Small language models (SLMs) mitigate these drawbacks and are gaining momentum in scenarios where full LLM capabilities are not required, such as agents, industrial systems, or edge devices. Nevertheless, a systematic comparison of model capabilities, energy usage, and scaling behavior has not been conducted yet. We evaluate 70+ SLMs from 2023–2025 on five task-specific benchmarks and compare them with two popular LLMs, revealing key trade-offs between energy, performance, and model selection. Our findings challenge common assumptions: First, smaller models are not automatically more efficient, and energy increases do not guarantee performance gains. Second, newer SLMs show clear improvements in performance–energy trade-offs, though the progress begins to plateau. Last, the efficiency landscape forms a clear Pareto frontier: initial energy increases yield substantial gains, but the last percentage points of performance need orders of magnitude more energy. These results highlight diminishing returns of scaling and emphasize the need for informed, task-aware model selection rather than size-driven choices.
    Multidisciplinary Topics and ApplicationsEnergy, environment and sustainabilityNatural Language ProcessingLanguage modelsNatural Language ProcessingResources and evaluation
  555. #6073

    EvoThink: Evolving Thinking in Large Reasoning Models via Self-Pruning and Aha-Moment Preference Optimization

    Xinbang Dai, Zheyu Xin, Huikang Hu, Lin Ren, Rihui Jin, Guohui Xiao, Kuicai Dong, Zhaocheng Du, Yuyang Zhang
    Large Reasoning Models (LRMs) often suffer from overthinking due to redundant verification steps. Existing approaches for mitigating overthinking, such as fast-slow thinking switching and reasoning trajectory compression, fail to make a fine-grained distinction between beneficial and redundant steps within the LRM's reasoning process, and may thus impair reasoning capability in their pursuit of efficiency. To simultaneously improve reasoning efficiency and capability, we propose EvoThink, a framework that reduces redundant verification and encourages the exploration of new reasoning paths. EvoThink comprises two key components: Self-Pruning Training (SPT), an unsupervised method that iteratively prunes redundant reasoning steps and self-trains on the concise trajectories; and Aha-Moment Preference Optimization (AMPO), which, inspired by genetic algorithms, identifies valuable failed reasoning attempts, synthesizes from-wrong-to-right aha-moment data, and optimizes the model to internalize this reasoning pattern. Extensive evaluations across mathematical reasoning and code generation benchmarks demonstrate that EvoThink not only substantially reduces inference-time token usage but also improves the reasoning capability of LRMs.
    Natural Language ProcessingApplicationsNatural Language ProcessingLanguage models
  556. #6092

    Cross-Sensor Domain Generalization for Non-Contact Sleep Staging

    Jie Deng, Zhi Lu, Fang Zhou, Yu Pu, Zhi Wu, Beilei Wang, Yang Hu, Yan Chen
    Sleep staging is pivotal for assessing sleep quality and clinical diagnostics. While contactless radio-frequency (RF) sensing offers an unobtrusive alternative to Polysomnography (PSG), existing methods often struggle with generalization across diverse devices and environments due to the scarcity of annotated RF data. To overcome this limitation, we propose XSensorSleep, a cross-sensor domain generalization framework. Uniquely, XSensorSleep is trained exclusively on large-scale respiratory datasets from chest and abdominal belts and is directly transferable to RF-derived respiratory signals. To bridge the domain gap, we partition the training data into multiple pseudo-domains based on data sources and sensor types, forcing the model to extract domain-invariant features from heterogeneous respiratory patterns. Furthermore, we introduce a hierarchical alignment strategy: Epoch-Level Feature Alignment (ELFA) suppresses sensor-specific morphological artifacts, while Spectral Temporal Alignment (STA) captures invariant global sleep architectures. Specifically, STA leverages the rotation-invariance of eigenvalue spectra to align whole-night transition dynamics, ensuring the model prioritizes universal physiological rhythms over device-dependent signal fluctuations. Extensive evaluations across various home and clinical RF datasets demonstrate that XSensorSleep achieves superior generalization, enabling practical, automated, and label-free non-contact sleep staging.
    Machine LearningApplicationsMachine LearningClassificationMachine LearningRepresentation learningMachine LearningRobustnessMachine LearningSupervised Learning
  557. #6100

    GBFlow: Grouping Belief-Guided Dual Normalizing Flows for Accurate Spatial Domain Delineation

    Fengyi Zhou, Daoyuan Wang, Wenlan Chen, Cheng Liang, Fei Guo
    Existing spatial domain identification methods primarily use graph neural networks to model spatial and transcriptional relationships. However, their performance is highly sensitive to noisy affinity graphs. Moreover, graph autoencoders tend to over-constrain latent representations, which limits their ability to capture global variability in spatial multi-omics data. To overcome these issues, we propose a dual-flow latent refinement framework that simultaneously integrates structure-aware and structure-free transformations. Specifically, a graph normalizing flow is employed to enforce relational consistency, while a parallel vanilla normalizing flow preserves global distributional flexibility. Features learned by the two flows are then adaptively fused to obtain a robust and unified latent representations. In addition, we introduce a grouping belief-based affinity refinement strategy to suppress unreliable connections and strengthen confident neighborhood relationships, which provides a more stable structural prior for representation learning. Extensive experiments on multiple spatial multi-omics datasets show that the proposed method consistently outperforms state-of-the-art approaches and achieves more accurate and robust spatial domain identification. The supplementary material is publicly available at https://github.com/LiangSDNULab/GBFlow.
    Machine LearningClusteringMachine LearningMulti-modal learningMachine LearningMulti-view learningMachine LearningUnsupervised learning
  558. #6101

    EVA-Gen: When Perception Learns from Value via Generative Models in Decentralized Multi-Agent Systems

    Yuduo Zheng, XueFeng Du, Yanqi Cheng, Li Yin, Fengqi Li
    Decentralized Multi Agent Reinforcement Learning (MARL) faces a fundamental dilemma in real world deployments: agents must operate under epistemic fragmentation, where local observations are severely occluded, while navigating heterogeneous value landscapes, where sparse, critical events carry disproportionately high stakes. Existing paradigms typically decouple state estimation from policy optimization, guiding perception modules merely to minimize uniform reconstruction error. This leads to a Perception Value Misalignment, where agents squander computational resources reconstructing task irrelevant background noise while failing to resolve uncertainties in high value regions. To bridge this gap, we propose EVA-Gen (Epistemic Value Alignment via Generative Models) that establishes a cybernetic loop between generative perception and value based decision making. We formulate the Value Conditioned Reconstruction Paradigm, establishing that optimal perception under resource constraints is functionally weighted by the gradient of the value function. EVA-Gen couples Backward Flow to steer diffusion toward high stakes manifolds, Collaborative Information Bottleneck to filter communication for value relevant consensus, and Risk Sensitive Rectification to prevent sparse signal dilution, synergistically closing the perception control loop. Empirically, we demonstrate that EVA-Gen achieves superior performance in three value-heterogeneous multi agent environments.
    Agent-based and Multi-agent SystemsAgent theories and modelsAgent-based and Multi-agent SystemsCoordination and cooperationAgent-based and Multi-agent SystemsMulti-agent learning
  559. #6112

    Unifying Perspectives: Plausible Counterfactual Explanations on Global, Group-wise, and Local Levels

    Oleksii Furman, Patryk Wielopolski, Łukasz Lenkiewicz, Jerzy Stefanowski, Maciej Zięba
    The growing complexity of AI systems has intensified the need for transparency through Explainable AI (XAI). Counterfactual explanations (CFs) offer actionable "what-if" scenarios on three levels: Local CFs providing instance-specific insights, Global CFs addressing broader trends, and Group-wise CFs (GWCFs) striking a balance and revealing patterns within cohesive groups. Despite the availability of methods for each granularity level, the field lacks a unified method that integrates these complementary approaches. We address this limitation by proposing a gradient-based optimization method for differentiable models that generates Local, Global, and Group-wise Counterfactual Explanations in a unified manner. We especially enhance GWCF generation by combining instance grouping and counterfactual generation into a single efficient process, replacing traditional two-step methods. Moreover, to ensure trustworthiness, we innovatively introduce the integration of plausibility criteria into the GWCF domain, making explanations both valid and realistic. Our results demonstrate the method's effectiveness in balancing validity, proximity, and plausibility while optimizing group granularity, with practical utility validated through practical use cases.
    AI Ethics, Trust, FairnesExplainability and interpretabilityMachine LearningClassificationMachine LearningExplainable/Interpretable machine learningMachine LearningOther
  560. #6129

    Cross-Domain AI-Generated Image Quality Assessment via Content-Distortion Awareness

    Shun Zhu, Xichen Yang, Dechun Zhao, Tianshu Wang, Yan Zhang, Tianyin Li, Xiaobo Shen
    With the expanding use of artificial intelligence generated images (AGIs) in scenarios such as gaming, art, and film production, evaluating their quality is essential to ensure their practical utility. To guarantee effective quality measurement, both content and distortion must be considered. However, existing image quality assessment (IQA) methods fail to sufficiently combine the two aspects in their assessments. Meanwhile, the AGIs annotated with credible content and distortion labels are lack, thus, employing cross-domain methods to utilize typically public image datasets is meaningful. Based on the above conclusions, this paper proposed a cross-domain AI-generated IQA via content-distortion awareness (CDAQA). Firstly, the large language model has been utilized to obtain content and distortion expressions of images. Secondly, a dual-stream contrastive language-image pre-training module (DCLIP) has been designed to gain both quality-aware features in images and the corresponding expressions that related to content and distortion, respectively. Thirdly, a graph knowledge transferring module (GKT) that constructs graphs via inter-domain content relevance has been proposed. GKT can update distortion information in the source domain according to the target domain representations. Finally, the cross-domain framework has been designed to update the existing IQA model for AGIs. And, the contrastive loss of DCLIP, feature update loss of GKT, and quality prediction loss are coordinated to jointly update different modules, enabling sufficient fusion of target domain knowledge. Experimental results demonstrate that CDAQA achieves higher accuracy and stability in cross-domain AGIs tasks.
    Computer VisionApplications and SystemsComputer VisionMultimodal learningComputer VisionOther
  561. #6131

    Performance-Driven Demonstration Selection for In-Context Learning

    Wenqiang Wang, Mingbo Yang, Aiping Zhang, Yan Xiao, Peng Chen, Jianjie Huang, Xiaochun Cao
    In-context learning (ICL) enables large language models (LLMs) to adapt to new tasks with considerable performance gains, yet its effectiveness is highly sensitive to the choice of demonstrations.
    Most existing selection methods rely on heuristic or proxy signals (e.g., similarity, diversity, or uncertainty) and select demonstration independently, which may misalign with downstream performance and overlook set-level composition effects.
    Therefore, we propose \textbf{P}erformance-\textbf{D}riven \textbf{D}emonstration \textbf{S}election (PDDS), which directly aligns demonstration selection with ICL performance. PDDS formulates selection as predicting the target LLM’s downstream task performance for a given query–in-context pair, replacing proxy heuristics with a performance-aware objective. Unlike prior approaches that rank demonstrations independently before composing the prompt, PDDS evaluates the \textit{entire} in-context as a set, capturing inter-demonstration interactions and composition effects.
    PDDS trains an end-to-end scorer using supervision from the target LLM’s actual task outcomes. At inference time, it selects high-scoring in-context sets without additional target LLM calls or validation-time feedback.
    Across 6 NLP tasks, 10 datasets, and 8 LLMs (1B--70B, including GPT-4o), PDDS achieves state-of-the-art results and generalizes well across LLMs, datasets, and tasks.
    PDDS remains effective with as few as 30 training instances, scales to candidate pools of up to 100 demonstrations, and can be integrated as a plug-and-play component to enhance existing ICL methods.
    Natural Language ProcessingApplicationsNatural Language ProcessingLanguage generationNatural Language ProcessingLanguage modelsNatural Language ProcessingMachine translation and multilinguality
  562. #6143

    CDS4RAG: Cyclic Dual-Sequential Hyperparameter Optimization for RAG

    Pengzhou Chen, Tao Chen
    Retrieval-Augmented Generation (RAG) is sensitive to the vast hyperparameters of the retriever and generator, yet optimizing them using given queries remains a challenging task due to the complex hyperparameter interactions and expensive evaluation costs. Existing algorithms are ineffective and slow in convergence, since they often treat RAG as a monolithic black box or only optimize partial hyperparameters. In this paper, we propose CDS4RAG, a framework that optimizes the full hyperparameters of RAG using given queries via a new formulation of cyclic dual-sequential problem. CDS4RAG is special in the sense that it distinguishes the hyperparameters of the retriever and generator, optimizing them in turn and in a cyclic manner. Such a paradigm allows us to design fine-grained within-cycle budget provision and expedite the optimization via cross-cycle seeding when optimizing the generator. Importantly, CDS4RAG is an algorithm-agnostic framework that can be paired with diverse general algorithms. Through experiments on four common benchmarks and two backbone LLMs, we reveal that CDS4RAG considerably boosts the vanilla algorithms in 21/24 cases while significantly outperforming state-of-the-art algorithms in all cases with up to 1.54x improvements of generation quality and better speedup.
    Machine LearningAutomated machine learningMachine LearningHyperparameter optimizationMachine LearningOptimizationMultidisciplinary Topics and ApplicationsSoftware engineeringSearchSearch and machine learning
  563. #6149

    GRASP: Hard-Label Black-Box Malware Evasion with Higher Success, Fewer Queries, and Smaller Perturbations

    Yutong Liu, Jianting Ning, Qi Feng, Yanjun Zhang, Yujin Huang, Leo Yu Zhang
    Machine learning (ML)-based malware detectors are widely deployed but remain vulnerable to adversarial attacks. However, under hard-label black-box access, existing adversarial attacks on Windows Portable Executable (PE) malware are often query-inefficient and incur large file-size inflation. A common paradigm is to predefine a set of semantics-preserving atomic perturbations and search for evasive combinations under only binary feedback. Among these atomic perturbations, (i) those with highly combinatorial search spaces are difficult to explore effectively under hard-label feedback, leaving their potential untapped, and (ii) those relying on transplanting benign fragments are often laden with evasion-irrelevant bytes and exhibit highly variable adversarial utility. We propose Gradient-seeded Reinforcement Learning And Stealthy Pruning (GRASP), a three-stage framework that tackles these challenges. First, we decouple perturbations with highly combinatorial search spaces from the query-based search and instead apply a gradient-seeded warm-up that uses Gumbel-Softmax relaxation to enable gradient-based updates over the discrete space. This yields a strong warm start that improves evasion and reduces queries in later stages. Second, Reinforcement Learning (RL)-based refinement is accelerated by a perturbation library that filters, caches, and reuses compact high-utility patterns, reducing wasted queries on low-utility benign fragments. Third, a perturbation minimization stage removes redundant bytes while preserving evasion, reducing size inflation and feeding compact patterns back to the library. Experiments show that GRASP outperforms baselines, achieving higher attack success with fewer queries and smaller file-size inflation. We additionally demonstrate its practical effectiveness against commercial Antivirus engines.
    Machine LearningAdversarial machine learningMachine LearningApplicationsMachine LearningMulti-armed banditsMachine LearningRobustnessMachine LearningTrustworthy machine learning
  564. #6150

    Injecting Qualitative Spatial Reasoning into World Models via Logic Tensor Networks

    Ajay Ramachandra, Gesina Schwalbe, Zoe Falomir
    World models (WM) learn predictive representations from high-dimensional sensory input. However, their internal knowledge is implicit and often inadequate for systematic reasoning about object structure. Qualitative Spatial Reasoning (QSR) models provide explicit human-readable object descriptions under transformations. This paper shows that these two forms of modeling can be combined by injecting a QSR model into a WM to improve the WM reasoning abilities. Using a cube rotation setting inspired by the cube comparison test used in mental rotation tasks as a case study, we use a QSR model that captures the geometric relationships and transformation rules describing how features move across faces under discrete rotations. We translate this QSR model into differentiable constraints using Logic Tensor Networks (LTNs) to guide gradient based learning. These constraints are injected into the WM loss function to regularize its latent dynamics to satisfy the spatial constraints of rotation. The results show that under an extreme data scarce regime (10%), baseline model's performance collapses to near zero, whereas our approach achieves above 90% better performance relative to baseline. Our method consistently maintains high scores across all data regimes (~80%), for both in-distribution and out-of-distribution observations. This paper proposes a novel framework to infuse QSR knowledge inside WM to make learned latent dynamics logical. Code is available at: https:// github.com/ajayrao80/LtnDreamer/
    Knowledge Representation and ReasoningQualitative, geometric, spatial, and temporal reasoningMachine LearningRepresentation learningKnowledge Representation and ReasoningReasoning about actionsMachine LearningNeuro-symbolic methods/Abductive LearningMachine LearningGenerative models
  565. #6164

    On the Impact of Crossover in Many-Objective Optimization: A Runtime Analysis of NSGA-III

    Andre Opris
    In recent years, theoretical understanding has rapidly advanced regarding how popular multi-objective evolutionary algorithms (MOEAs) can optimize many-objective problems. However, the benefits of using crossover in many-objective optimization are theoretically not understood, except for specifically designed benchmark functions tuned to particular crossover operators, and still lag significantly behind its practical use. In this paper, we build upon this line of research and present a theoretical runtime analysis of the widely used NSGA-III algorithm on the classical m-objective m-OneJumpZeroJump function (m-OJZJ for short). Our results demonstrate that NSGA-III with crossover optimizes m-OJZJ asymptotically faster than NSGA-III without crossover for any number m of objectives for huge parameter regimes. We complement our analysis by providing a lower runtime bound on 4-OJZJ when crossover is turned off.
    SearchEvolutionary computationSearchHeuristic search
  566. #6172

    FedGLoRA: Grassmann-Manifold Federated Learning via Dual LoRA for Large EEG Models

    Qianyu Chen, Yihao Zhong, Runxuan Tang, Tianyi Zhang, Jing Liu, Ziyu Jia, Chenyu Liu
    Large EEG Models (LEMs) are drawing increasing attention in EEG, as large-scale pretraining yields transferable representations that improve generalization. As EEG research moves to real-world deployment, objectives and paradigms diversify, yielding increasingly heterogeneous and unevenly scaled datasets across institutions. This evolution induces distribution drift and scale imbalance, making static pretrained LEMs brittle and requiring adaptation to newly collected data. Under the constraints of the high cost of retraining and the privacy sensitivity of newly collected data, federated learning offers a practical route to update LEMs via decentralized parameter updates. Yet standard federated fine-tuning is unstable for pretrained LEMs, suffering backbone drift, misaligned updates under non-IID heterogeneity, and scale imbalance. We propose FedGLoRA: we freeze the backbone and communicate only LoRA adapters via a dual-branch design, with a shared global branch and a private on-device branch. FedGLoRA aggregates global updates on a Grassmann subspace with a Grassmann-proximal constraint, and uses bidirectional distillation to mitigate imbalance. Experiments on motor imagery and emotion recognition show consistent gains and improved stability over strong baselines.
    Humans and AIApplicationsHumans and AIBrain sciencesMultidisciplinary Topics and ApplicationsHealth and medicine
  567. #6175

    Subject-Agnostic Cross-View Referring Expression Comprehension via Reward-Driven Consistency Learning

    Liuwu Li, Ronger Ding, Zehang Lin, Qi Peng, Jiayuan Xie
    Cross-view Referring Expression Comprehension (REC) requires a model to accurately localize objects in a spatial representation of one perspective based on descriptions generated from a different perspective. Essentially, it demands maintaining referential consistency under perspective shifts. This problem is prevalent in engineering application scenarios (e.g., Geographic Information Systems (GIS), urban planning, etc), where descriptions are typically generated from a top-down or planning view, while targets must be identified in a spatial representation distinct from the source. However, existing methods often assume shared perspectives or rely on an explicit observer to interpret observer-centric spatial relations (e.g., "left/right" of the observer). Consequently, they struggle to handle cross-view grounding cases commonly arising in engineering application scenarios, where descriptions are defined by the source-view representation itself and do not involve any observing subject. To this end, we propose a novel Subject-Agnostic Cross-View REC task and construct a controlled simulation environment to systematically study referential consistency under perspective changes. Methodologically, we propose a Reward-Driven Consistency Learning framework. By imposing mutually constraining reward signals on the grounding and localization results between the source and target views, we guide the model to maintain referential consistency across different view representations. Experimental results show that our method significantly outperforms baselines in multiple cross-view settings, exhibiting superior generalization and robustness under perspective shifts.
    Computer VisionMultimodal learningComputer VisionVision, language and reasoningMachine LearningMulti-view learning
  568. #6180

    Toward Preference-aligned Large Language Models via Residual-based Model Steering

    Lucio La Cava, Andrea Tagarelli
    Preference alignment is a critical step in making Large Language Models (LLMs) useful and aligned with (human) preferences. Existing approaches such as Reinforcement Learning from Human Feedback or Direct Preference Optimization typically require curated data and expensive optimization over billions of parameters, and eventually lead to persistent task-specific models. In this work, we introduce Preference alignment of Large Language Models via Residual Steering (PaLRS), a training-free method that exploits preference signals encoded in the residual streams of LLMs. From as few as one hundred preference pairs, PaLRS extracts lightweight, plug-and-play steering vectors that can be applied at inference time to push models toward preferred behaviors. We evaluate PaLRS on various small-to-medium-scale open-source LLMs, showing that PaLRS-aligned models achieve consistent gains on mathematical reasoning and code generation benchmarks while preserving baseline general-purpose performance. Moreover, when compared to models aligned with DPO and SimPO, they perform better with great time savings. Our findings highlight that PaLRS offers an effective, much more efficient and flexible alternative to standard preference optimization pipelines, offering a training-free, plug-and-play mechanism for alignment with minimal data.
    Natural Language ProcessingInterpretability and analysis of models for NLPNatural Language ProcessingLanguage modelsNatural Language ProcessingOther
  569. #6187

    Constraint-Data-Value-Maximization: Utilizing Data Attribution for Effective Data Pruning in Low-Data Environments

    Danilo Brajovic, David A. Kreplin, Marco F. Huber
    Attributing model behavior to training data is an evolving research field.
    A common benchmark is data removal, which involves eliminating data instances with either low or high values, then assessing a model's performance trained on the modified dataset. Many existing studies leverage Shapley-based data values for this task. In this paper, we demonstrate that these data values are not optimally suited for pruning low-value data when only a limited amount of data remains. To address this limitation, we introduce the Constraint-Data-Value-Maximization (CDVM) approach, which effectively utilizes data attributions for pruning in low-data scenarios. By casting pruning as a constrained optimization that both maximizes total influence and penalizes excessive per-test contributions, CDVM delivers robust performance when only a small fraction of the data is retained. On the OpenDataVal benchmark, CDVM shows strong performance and competitive runtime.
    Data MiningAnomaly/outlier detectionData MiningBig data and scalabilityMachine LearningActive learningMachine LearningExplainable/Interpretable machine learningMachine LearningOther
  570. #6228

    Missingness-MDPs: Bridging the Theory of Missing Data and POMDPs

    Joshua Wendland, Markel Zubia, Roman Andriushchenko, Maris F. L. Galesloot, Milan Češka, Henrik von Kleist, Thiago D. Simão, Maximilian Weininger, Nils Jansen
    We introduce missingness-MDPs (miss-MDPs), a novel subclass of partially observable Markov decision processes (POMDPs) that incorporates the theory of missing data.
    A miss-MDP is a POMDP whose observation function is a missingness function, specifying the probability that individual state features are missing (i.e., unobserved) at a time step.
    The literature distinguishes three canonical missingness types: (1) missing completely at random (MCAR), (2) missing at random (MAR), and (3) missing not at random (MNAR).
    The problem is to compute near-optimal policies for a miss-MDP with an unknown missingness function, given a dataset of action-observation histories.
    Achieving such optimality guarantees for policies requires learning the missingness function from data, which is infeasible for general POMDPs.
    To overcome this challenge, we exploit the structural properties of different missingness types to derive probably approximately correct (PAC) algorithms for learning the missingness function.
    These algorithms yield an approximate but fully specified miss-MDP that we solve using off-the-shelf planning methods.
    We prove that, with high probability, the resulting policies are ε-optimal in the true miss-MDP.
    Empirical results confirm the theory and demonstrate superior performance of our approach over two model-free methods.
    Planning and SchedulingPOMDPsPlanning and SchedulingLearning in planning and schedulingMachine LearningPartially observable reinforcement learning and POMDPsKnowledge Representation and ReasoningCausality
  571. #6256

    “The Whole Is Greater than the Sum of Its Parts”: A Compatibility-Aware Multi-Teacher CoT Distillation Framework

    Jin Cui, Jiaqi Guo, Ruixuan Yang, Jiayi Lu, Jiepeng Zhou, Jiajun Xu, Jiangcheng Song, Boran Zhao, Pengju Ren
    Chain-of-Thought (CoT) reasoning empowers Large Language Models (LLMs) with remarkable capabilities but typically requires prohibitive parameter scales. CoT distillation has emerged as a promising paradigm to transfer reasoning prowess into compact Student Models (SLMs), but existing approaches often rely on a solitary teacher, capping the student’s potential since individual LLMs often exhibit distinct capability biases and may suffer from catastrophic forgetting. While leveraging diverse teachers seems appealing, effective fusing their supervisions remains challenging: teacher-student incompatibility risks amplifying hallucinations, and passive supervision fails to ensure genuine logic internalization. To address this, we introduce COMPACT, a framework that adaptively fuses supervisions from different teachers by dynamically weighting teacher gradients based on the student’s real-time compatibility evaluated by a multi-dimensional metric: (1) Graph-based Consensus to filter misleading rationales by identifying mainstream reasoning paths; (2) Mutual-Information-based Adaptability to detect "epiphany moments" for genuinely understanding the reasoning process rather than merely imitating; and (3) Loss-based Difficulty to assess student receptivity to the teacher's guidance and prevent negative transfer. Extensive experiments and latent space analysis demonstrate that COMPACT effectively integrates diverse reasoning capabilities without damaging the model's original knowledge structure, achieving state-of-the-art performance on various benchmarks while mitigating catastrophic forgetting.
    Natural Language ProcessingLanguage modelsNatural Language ProcessingInterpretability and analysis of models for NLPNatural Language ProcessingOther
  572. #6260

    ASP-Based Probabilistic Policy Fixing for Norm Compliant RL

    Sebastian Adam, Thomas Eiter
    Reinforcement learning (RL) is commonly used to learn reward-optimizing policies. However, RL policies are not always trained with ethical behavior in mind, which can lead an agent to violate social or legal norms in pursuit of its goal. Retraining agents with additional norms is not always feasible, especially in complex stochastic environments. To mitigate this issue, we present a probabilistic policy fixing framework that adapts norm-agnostic policies online. Using Answer Set Programming (ASP), we generate policy fixes that minimize deviations from the RL policy while optimizing for norm adherence against a set of sampled worlds. Based on the Rule of Three and Hoeffding's inequality, we provide guarantees that fixed policies are near optimal, given a specified level of confidence.
    Knowledge Representation and ReasoningLogic programmingKnowledge Representation and ReasoningReasoning about actions
  573. #6303

    Beyond the Mean: Gaussian Distributional Successor Features for Zero-Shot Non-Linear Reward Adaptation

    Yong Zhao, Wen Sun, Jianhua He, Peng Wang, Qubeijian Wang
    Zero-shot transfer for offline reinforcement learning involves generalizing to a wide range of tasks that are often risk-sensitive, without new interaction. We show here that although Successor Features (SFs) provide a principled framework for transfer through abstracting dynamics from rewards, their typical form is restricted to expected cumulative returns and is thus indifferent to variance in returns. We contend that such a mean-centric approach introduces bias by conflating risky and safe trajectories with identical expected outcomes, as it ignores outcome dispersion critical for risk-aware evaluation. Additionally, we note that distributional methods can be employed to model outcome distributions, but frequently do so at a high computational cost due to the need for discretization or elaborate sampling. To overcome this problem, we introduce a form of successor representation that incorporates an uncertainty envelope over states: Gaussian Distributional Successor Features (GDSF). Rather than inferring an implicit generative process, GDSF learns to approximate the distribution of cumulative outcomes with a Gaussian through recursive moment-matching. We present this framework for efficient representation of aleatoric uncertainty, without resorting to computationally expensive existing distributional methods. We also introduce Latent Outcome Optimization, an inference procedure which finds high-reward outcomes in the data support and drives a goal-conditioned policy. This dual-mode design allows the agent to transfer to non-linear objectives in a zero-shot way while preserving closed-form efficiency for linear tasks. Experiments on the D4RL benchmark show that GDSF yields better performance in risk-sensitive tasks while achieving at least competitive results in standard linear regimes.
    Machine LearningModel-based and model learning reinforcement learningMachine LearningMulti-task and transfer learningMachine LearningOffline reinforcement learningMachine LearningReinforcement learningPlanning and SchedulingPlanning under uncertainty
  574. #6311

    Towards Generalized Action Recognition on Low-Resolutions with Domain-Invariant Representation

    Hao Li, Jinhui Xu, Dianlong You
    This paper studies cross-domain/species action recognition from low-resolution videos with sharp appearance variations and domain-specific biases. Its attractive viewpoint is mining domain-invariant representations of cross-domain/species features under rough spatial details to enhance recognition and generalization, overcoming the significant decline of existing methods. To address this, we propose a generalized Action recognition framework for Low-resolution conditions with Domain-invariant Representation learning, named ActLDR, designed to learn domain-invariant representations. First, it decomposes video understanding into spatial and temporal pathways for explicitly separating domain-dependent appearance cues from robust motion dynamics; Second, it constructs a Spatial-Temporal Feature Exchange module to enable cross-branch refinement and suppress domain bias; Third, we inject Gaussian feature interference to simulate feature corruption and enforce prediction-level consistency to encourage stable representations. Empirical results demonstrate that our proposal outperforms previous methods, significantly improving robustness across resolutions, domains, and species, and demonstrating outstanding generalization and transferability.
    Computer VisionAction and behavior recognitionComputer VisionRepresentation learningComputer VisionVideo analysis and understanding
  575. #6315

    When Incompleteness Does Not Matter: The Case of Incomplete Abstract Argumentation Framework (Under the Possible Perspective)

    Bettina Fazzinga, Sergio Flesca, Filippo Furfaro
    Incomplete Abstract Argumentation Frameworks (iAAFs) extend Abstract Argumentation Frameworks (AAFs) by allowing arguments and attacks to be specified as uncertain, enabling a compact representation of alternative argumentation scenarios.
    Despite extensive work on reasoning with iAAFs, their expressive power relative to standard AAFs is still unclear.
    We address this question by studying whether the reasoning based on possible-extensions in iAAFs can be reduced to classical reasoning
    on AAFs.
    Our analysis relates iAAFs and AAFs via two comparison notions:
    equivalence, where an AAF yields as extensions exactly the possible extensions of an iAAF,
    and projection-equivalence, where such extensions are obtained by projecting away auxiliary arguments.
    We characterize the semantics under which these relationships hold, and, in these cases, provide constructive transformations from iAAFs to AAFs, thus also enabling classical argumentation tools to be applied to qualitative-uncertainty reasoning.
    Knowledge Representation and ReasoningArgumentationUncertainty in AIUncertainty representations
  576. #6319

    Connected EF1 Allocations Exist in Discrete Chore Cutting

    Ankang Sun, Bo Li
    In this paper, we prove the existence of an envy-free up to one item (EF1) division for a discrete chore. Our approach builds on the powerful framework of Simmons-Su [Am. Math. Mon. 1999], which leverages Sperner's lemma to guarantee the existence of a simplex corresponding to a sequence of similar fractional divisions, ensuring that each agent is satisfied with a different bundle. Bilò et al. [ITCS 2019, Games Econ. Behav. 2022] introduced a rounding technique that converts the fractional divisions into a connected integral EF1 division for goods when there are at most four agents, and this method was later extended by Igarashi [AAAI 2023] to accommodate any number of agents. However, the analogous problem for chores has remained unresolved, and existing rounding techniques fail due to the asymmetric definitions of EF1 for goods and chores. To overcome this asymmetry, we refine the existing rounding techniques and show that connected EF1 divisions exist for a discrete chore.
    Game Theory and Economic ParadigmsFair divisionGame Theory and Economic ParadigmsComputational social choice
  577. #6375

    Large Lemma Miners: Can LLMs Do Induction Proofs for Hardware?

    Romy Peled, Daniel Kroening, Michael Tautschnig, Yakir Vizel
    Large Language Models (LLMs) have shown potential for solving mathematical tasks.
    We show that LLMs can be utilized to generate proofs by induction for hardware verification and thereby replace some of the manual work %alleviating some of the manual effort
    done by Formal Verification engineers and deliver value to industry.
    We present a neurosymbolic approach that includes two prompting frameworks to generate candidate invariants, which are checked using a formal symbolic tool.
    Our results indicate that with sufficient reprompting, LLMs are able to generate inductive arguments for mid-size open-source RTL designs. For
    90% of our problem set, at least one of the prompt setups succeeded in producing a provably correct inductive argument.
    Knowledge Representation and ReasoningAutomated reasoning and theorem provingNatural Language ProcessingApplicationsNatural Language ProcessingLanguage models
  578. #6377

    Tight Asymptotic Bounds for Fair Division with Externalities

    Frank Connor, Max Dupré la Tour, Vishnu V. Narayan, Šimon Schierreich
    We study the problem of allocating a set of indivisible items among agents whose preferences include externalities. Unlike the standard fair division model, agents may derive positive or negative utility not only from items allocated directly to them, but also from items allocated to other agents. Since exact envy-freeness cannot be guaranteed, prior work has focused on its relaxations. However, two central questions remained open: does there always exist an allocation that is envy-free up to one item (EF1), and if not, what is the optimal relaxation EF-k that can always be attained?

    We settle both questions by deriving tight asymptotic bounds on the number of items sufficient to eliminate envy. We show that for any instance with n agents, an allocation that is envy-free up to O(√n) items always exists and can be found in polynomial time. Additionally, via a reduction from fair division with externalities to discrepancy theory combined with recent discrepancy lower bounds, we prove a matching Ω(√n) lower bound showing that this result is tight even when the valuations are binary and satisfy the no-chores condition, which refutes a conjecture of [Deligkas et al., AAAI '24] and resolves the open question of [Aziz et al., AAAI '23], ruling out the existence of EF1 allocations when agents have externalities.
    Game Theory and Economic ParadigmsFair division
  579. #6383

    Semifactual Explanations for GNN-based Classification: Formal Foundations, Complexity and Computation

    Gianvincenzo Alfano, Sergio Greco, Domenico Mandaglio, Francesco Parisi, Reza Shahbazian, Irina Trubitsyna
    Graph Neural Networks (GNNs) have become increasingly central to several analysis tasks in various domains, ranging, e.g., from social networks to molecular analysis. Understanding the decision-making process of GNNs is a critical challenge in machine learning, particularly when post-hoc explanations, attempting to answer why specific inputs are classified in a certain way by a given model, are required for specific input predictions. Existing post-hoc explainability methods for GNNs often rely on counterfactual reasoning, grounded in the `if this had not occurred' thinking, that typically identifies minimal subgraphs perturbations that would change a prediction. Notably, less attention has been posed on the equally important semifactual reasoning, grounded in the `even if this has not occurred' thinking, that results in identifying maximal subgraphs perturbations that would not change a prediction. In this paper, we introduce a novel technique for post-hoc explainability queries in GNNs by focusing on the semifactual reasoning. We first thoroughly investigate the computational complexity of several reasoning and optimization problems related to the computation of semifactuals, and then propose a novel learning architecture for addressing their computation. Finally, we experimentally evaluate the proposed approach in a variety of settings, showing its effectiveness in generating high-quality semifactual explanations.
    Machine LearningExplainable/Interpretable machine learning
  580. #6397

    ExLipBaB: Exact Lipschitz Constant Computation for Piecewise Linear Neural Networks

    Tom Splittgerber
    It has been shown that a neural network's Lipschitz constant can be leveraged to derive robustness guarantees, to improve generalizability via regularization or even to construct invertible networks. Therefore, a number of methods varying in the tightness of their bounds and their computational cost have been developed to approximate the Lipschitz constant for different classes of networks. However, comparatively little research exists on methods for exact computation, which has been shown to be NP-hard. Nonetheless, there are applications where one might readily accept the computational cost of an exact method. These applications could include the benchmarking of new methods or the computation of robustness guarantees for small models on sensitive data. Unfortunately, existing exact algorithms restrict themselves to only ReLU-activated networks, which are known to come with severe downsides in the context of Lipschitz-constrained networks. We therefore propose a generalization of the LipBaB algorithm to compute exact Lipschitz constants for arbitrary continuous piecewise linear neural networks and p-norms. With our method, networks may contain traditional activations like ReLU or LeakyReLU, activations like GroupSort or the related MaxMin and FullSort, which have been of increasing interest in the context of Lipschitz-constrained networks, or other piecewise linear functions like MaxPool.
    AI Ethics, Trust, FairnesSafety and robustnessMachine LearningAdversarial machine learningMachine LearningRobustness
  581. #6407

    SARA: Semantic-Anchored Referential Alignment for Ego–Exo Instance Correspondence

    Yueyang Ge, Ye Lin, Bojun Yang, Yilin Huang, Yi Guo
    Achieving visual coordination between egocentric (ego) and exocentric (exo) perspectives is a cornerstone of augmented reality and human-machine collaboration. However, extreme viewpoint shifts and occlusions often cause semantic drift in existing methods, making robust cross-view correspondence a formidable challenge. In this paper, we propose SARA (Semantic-Anchored Referential Alignment), a framework designed to bridge the ego-exo gap by leveraging viewpoint-invariant, high-level semantic information as region-level semantic anchors for robust feature mapping. Specifically, our approach incorporates: (1) the Semantic Anchoring Module extracts cross-view category priors and invariant features to provide stable region-level references across perspectives, and (2) the Multimodal Semantic Vector Fusion mechanism achieves language-guided manifold fusion of semantic vectors to synthesize unified embeddings, effectively establishing representational correspondence across disparate perspectives. Extensive experiments across diverse, complex scenarios demonstrate SARA’s superior performance in establishing stable mappings,validating the effectiveness of semantic-invariant features and multimodal fusion in addressing the core hurdles of cross-view understanding.
    Computer VisionMultimodal learningComputer VisionVision, language and reasoningComputer VisionImage and video retrieval
  582. #6424

    Agreement, Diversity, and Polarization Indices for Approval Elections

    Piotr Faliszewski, Jitka Mertlová, Krzysztof Sornat, Stanisław Szufa, Tomasz Wąs
    An index is a function that measures the extent to which an election has a particular feature. We seek indices that capture agreement, diversity, and polarization among voters in approval elections, normalized with respect to saturation. The latter means that if two elections differ by the fraction of candidates approved by an average voter, but otherwise are of similar nature, then they should have similar index values. We propose several indices, analyze their properties, and use them to derive a new map of approval elections, and compare various real-life elections from Pabulib, Preflib and other sources.
    Game Theory and Economic ParadigmsComputational social choice
  583. #6426

    COREKG: Coreset-Guided Personalized Summarization of Knowledge Graphs

    Sohel Aman Khan, Raghava Mutharaju, Supratim Shit
    Knowledge Graphs (KGs) are extensively used across different domains and in several applications. Often, these KGs are very large in size. Such KGs become unwieldy for tasks such as question answering and visualization. Summarization of KGs offers a viable alternative in such cases. Furthermore, personalized KG summarization is crucial in the current data-driven world as it captures the specific requirements of users based on their query patterns. Since it only maintains relevant information, the personalized summaries of KG are small, resulting in significantly smaller storage requirements and query runtime. In this work, we adapt the coreset theory to create personalized KG summaries. For a given dataset and a user-specific query workload, we present an approach that samples a relevant subset of triples using sensitivity-based importance sampling. We ensure that the subset approximates the characteristics of the full dataset with bounded approximation error. We define sensitivity scores that measure the importance of a triple with respect to a user’s query workload, which are then used by our coreset construction algorithm. We explicitly focus on personalized knowledge graph summarization by constructing summaries independently for each user based on their query behaviour.
    Our evaluation on Freebase, WikiData, and DBpedia shows that COREKG delivers higher query-answering accuracy and structural coverage than the state-of-the-art methods, such as GLIMPSE, PPR, PEGASUS and APEX² while requiring only a tiny fraction of the original graph.
    Data MiningInformation retrievalKnowledge Representation and ReasoningSemantic WebNatural Language ProcessingInformation extractionNatural Language ProcessingQuestion answeringNatural Language ProcessingSummarization
  584. #6453

    Budget-Aware LLM Quantization and Low-Rank Correction via Information-Guided Subspace Matrices

    Sinuo Fan, Yingjie Lao
    Compression techniques such as quantization and low-rank approximation enable large language models (LLMs) to run on current edge hardware with limited computing power, but the key challenge lies in balancing the allocation of precision and low rank within a fixed memory limit. We propose a training-free framework, BALANCER, which achieves global budget allocation for mixed-precision quantization and low-rank correction through information-guided subspace matrices. BALANCER collects a small set of calibration signals—activation statistics, reference gradient magnitudes, and a diagonal curvature proxy—and converts them into information-guided subspace matrices. Instead of relying on coarse-grained layer-wise importance allocation, BALANCER transforms these signals into gain curves for each weight in the matrix singular subspace, allowing a principled greedy allocator to distribute compression bits and ranks across the entire model. On the LLaMA-3.1-8B model, with an average of 3.6 bits per parameter, BALANCER only increases the perplexity from 6.63 to 6.78 (+0.15), while reducing memory from 16 GB to 3.9 GB, and enabling the model to be deployed on edge GPUs such as the Jetson Orin Nano (8 GB).
    Machine LearningApplicationsMachine LearningLearning sparse models
  585. #6478

    One for Exploration and Another for Exploitation: A Dual-Population MOEA Framework with Provable Benefits

    Chenglin Jiang, Shengjie Ren, Zimin Liang, Miqing Li, Chao Qian
    Evolutionary Algorithms (EAs) are currently the most popular tool for solving multi-objective optimization problems. Balancing exploration and exploitation is fundamental to the performance of Multi-Objective EAs (MOEAs). Achieving this requires maintaining a set of high-quality solutions for effective exploitation while simultaneously enabling exploration of diverse regions of the search space. In MOEAs, these two roles are typically fulfilled by a single evolutionary population. However, high-quality solutions that are promising for exploitation may lack the potential to guide the search toward different regions of the search space. Conversely, solutions with high exploratory potential that can lead the search toward more promising regions may themselves be of low quality. To address this issue, this paper proposes a dual-population MOEA with Exploration and Exploitation Decoupled (MOEA/EED): the exploration population uses a simple aging mechanism, while the exploitation population preserves the currently optimal solutions. We theoretically prove the benefits of MOEA/EED, i.e., it achieves the upper bounds, O(n^(k+1) · min{1, (e ln(3e) / k)^k}) and O((5e ln(3e))^(n/5)), on the expected running time for solving two commonly studied bi-objective problems OneJumpZeroJump and bi-objective RealRoyalRoad, which are significantly better (even exponentially smaller) than that of the popular SMS-EMOA, NSGA-II, and their non-elitist variants. Lastly, we empirically show its effectiveness on a wide range of practical optimization problems.
    SearchEvolutionary computation
  586. #6516

    Merging Beliefs and Counterfactuals

    Emiliano Lorini, Dmitry Rozplokhas
    We present a unified framework for modeling
    agents’ epistemic states and counterfactual conditionals.
    The novelty of our approach lies at the
    semantic level. We introduce a computationally
    grounded semantics in which an agent’s doxastic
    accessibility relation, needed to model the standard
    notion of deductively closed belief, is derived from the agent’s belief base,
    and in which the notion of comparative similarity between states,
    needed to interpret counterfactual conditionals, is
    computed from both the atomic formulas describing
    the agents’ belief bases and those describing
    the environment. We use this semantics to interpret a language
    of beliefs and counterfactuals. We
    demonstrate the expressiveness of our language and
    the flexibility of the semantics by formalizing a
    wide range of notions: counterfactual dependence,
    counterfactual notions of reason and knowledge,
    conditional belief. Furthermore, we show that our
    semantics illuminates the subtle interplay between
    belief change and counterfactuals. On the computational side,
    we provide a succinct formulation of
    model checking for our language and establish a
    PSPACE complexity result.
    Knowledge Representation and ReasoningBelief changeKnowledge Representation and ReasoningKnowledge representation languagesKnowledge Representation and ReasoningReasoning about knowledge and belief
  587. #6519

    Approximate Heuristic Search for Semi-Decentralized Systems

    Mahdi Al-Husseini, Kyle H. Wray, Isaac R. Ward, Mykel J. Kochenderfer
    Achieving optimal coordination in multiagent systems involves a trade-off between intractable centralized planning and suboptimal decentralized execution. We bridge this gap by introducing Approximate Recursive Small Step-Semi-Decentralized A* (RS-SDA*), a tree search algorithm that exploits time-varying centralization during periods of available communication conditioned on the environment or joint actions. By interleaving offline planning with online search and using relaxed heuristics, RS-SDA* achieves high solution quality with reduced computational overhead. SDec-POMDP benchmark experiments show that Approximate RS-SDA* finds near exact optimal solutions in less than 1% of the time required by exact algorithms. We show scalability in six labyrinth environments with both deterministic and stochastic state transitions and demonstrate real-world feasibility with a multi-drone search-and-rescue simulation in the DARPA Subterranean Challenge cave system.
    Agent-based and Multi-agent SystemsAgent communicationAgent-based and Multi-agent SystemsAgent-based simulation and emergenceAgent-based and Multi-agent SystemsApplicationsAgent-based and Multi-agent SystemsCoordination and cooperationAgent-based and Multi-agent SystemsMulti-agent planning
  588. #6523

    Maximum Satisfiability of Simple Temporal Problems

    Johannes K. Fichte, Johanna Groven, Peter Jonsson, Victor Lagerkvist, Jorke M. de Vlas
    The Simple Temporal Problem (STP) is a core framework for quantitative temporal constraints. As STP data can be inconsistent, we study MaxSTP: compute a maximum-cardinality consistent subset of constraints. This extension is NP-hard, and we analyze its parameterized complexity under measures that capture practically relevant instance features: the number of variables n (instance scale), the maximum coefficient magnitude k (numeric range), and structural parameters of the constraint graph such as treewidth tw (decomposability), and vertex cover size vc (density). We show that MaxSTP is W[1]-hard parameterized by n, implying that n and parameters that depend on n (including tw and vc) are insufficient for fixed-parameter tractability. For combined parameters, we give an O^*(k^n)-time algorithm, yielding single-exponential solvability for fixed k. While k+tw remains W[1]-hard, MaxSTP is in XP via an O^*((n * k)^tw) algorithm. Our results suggest that MaxSTP is often computationally harder than optimizing qualitative CSPs - we verify that many such problems (including RCC-8 and Allen's algebra) are FPT when parameterized by n or tw. However, we also demonstrate that FPT algorithms for MaxSTP are indeed possible but with other parameters such as k + vc.
    Constraint Satisfaction and OptimizationConstraint optimization problemsConstraint Satisfaction and OptimizationConstraint satisfaction
  589. #6582

    Price of Fairness in Short-Term and Long-Term Algorithmic Selections

    Shahin Jabbari, Chen Wang
    Algorithmic decision-making in high-stakes settings can have profound impacts on individuals and populations. While much prior work studies fairness in static settings, recent results show that enforcing static fairness constraints may exacerbate long-run disparities. Motivated by this tension, we study a stylized sequential selection problem in which a decision-maker repeatedly selects individuals, affecting both immediate utility and the population distribution over time. We introduce notions of group fairness for both the short and long term and theoretically analyze the trade-off between fairness and utility via the Price of Fairness (PoF). We characterize optimal and fair policies in the short term and show that the PoF can be large even when group distributions are nearly identical. In contrast, we show that long-term disparities can vanish under simple investment policies that achieve a low PoF. We also empirically validate these theoretical observations using both synthetic and real datasets.
    AI Ethics, Trust, FairnesFairness and diversityAI Ethics, Trust, FairnesSocietal impact of AIConstraint Satisfaction and OptimizationConstraint optimization problemsPlanning and SchedulingTheoretical foundations of planning
  590. #6606

    The Complexity of Verifying Feedforward Neural Networks in Quantised Settings

    Eric Alsmann, Martin Lange, Marco Sälzer
    We investigate the computational complexity of neural network verification in quantised settings. We distinguish three classes of Feedforward Neural Networks (FNNs): rational FNNs with exact rational weights, quantised FNNs whose weights come from a finite-width arithmetic, and dynamically quantised FNNs in which rational networks are evaluated with respect to a given finite-width arithmetic. We consider two types of specifications used in the literature. Linear programming (LP) specifications are conjunctions of linear constraints, while bit-vector (BV) specifications allow reasoning at the bit level and can express non-linear constraints. Our results give a complexity landscape of these verification problems. For quantised FNNs with fixed arithmetic precision, we show that verification under both LP and BV specifications remains NP-complete, matching the complexity of the rational case. For dynamically quantised FNNs with BV specifications, we establish upper bounds, complementing a previously known PSPACE-hardness result.
    Machine LearningExplainable/Interpretable machine learningMachine LearningRobustnessMachine LearningTrustworthy machine learning
  591. #6642

    Social Welfare Under Heterogeneous Time Preferences

    Sarvin Bahmani, Soumyajit Paul, Sven Schewe, Shadi Tasdighi Kalat, Ashutosh Trivedi
    In several socioeconomic-critical decision-making settings, such as fair resource allocation, climate policy, or AI alignment, multiple principals interact within a common arena. While it is well established that these principals may have differing preferences, decision-making under heterogeneous time preferences remains relatively unexplored. In particular, principals may weigh future outcomes differently and may derive distinct utilities from the same decisions. Motivated by such scenarios, we introduce the notion of heterogeneous time preferences in MDPs, where multiple principals possess distinct reward functions and apply different discount factors to future rewards. To compute meaningful decisions in such settings, an AI agent must rely on a notion of optimality that accounts for the preferences of all principals.


    We adopt a utilitarian notion of social welfare, defined as the sum of utilities accrued to all principals, and study the synthesis of agent strategies that maximise this welfare. Under heterogeneous time preferences, we show that optimal strategies are no longer positional, even when all principals receive identical rewards. Nevertheless, optimal strategies remain structurally simple: they can be realised as pure finite-memory counting strategies, require only polynomial memory in the system size, and can be synthesised in polynomial time. On the other hand, we show that deciding threshold questions for optimal positional strategies is NP-hard, exposing a poor trade-off: insisting on positional simplicity neither makes synthesis tractable nor preserves social welfare.
    Agent-based and Multi-agent SystemsFormal verification, validation and synthesisPlanning and SchedulingMarkov decisions processesPlanning and SchedulingTheoretical foundations of planning
  592. #6674

    Rethinking Thinking Steps in Overthinking: Towards More Effective Reasoning

    Dezhi Zhao, Xin Liu, Xiaocheng Feng, Hui Wang, Bing Qin
    Understanding overthinking in large reasoning models (LRMs) is crucial for interpretability as well as reasoning efficiency and effectiveness. However, existing approaches primarily adopt coarse-grained reasoning strategies, such as truncating Chains-of-Thought or switching reasoning modes, which reduce verbosity but are insufficient to actively guide reasoning toward more effective trajectories.
    To address these issues, we propose a training-free, interpretable framework that selects thinking words via attention heads to guide LRMs toward more effective reasoning. Specifically, we categorize thinking steps into effective and redundant states, identify the attention head that best discriminates between them as the Thinking Partition Head to construct an Effective Thinking Representation Space, and compute the Information Gain Ratio (IGR) between candidate thinking words and this space to select the word that steers reasoning toward a more effective direction.
    Extensive experiments on mathematical and scientific reasoning benchmarks, including AIME24, AMC23, MATH-500, GSM8K, and GPQA-D, show that our method consistently outperforms the strong baseline DEER, achieving average improvements of 1.2–1.3% in accuracy and 2.5–4.5% in compression rate. Compared to vanilla baselines, our approach yields larger gains of 2.6–6.3% in accuracy while reducing token usage by 22–43%.
    Natural Language ProcessingInterpretability and analysis of models for NLP
  593. #6694

    A New Framework for Convex Clustering in Kernel Spaces: Finite Sample Bounds, Consistency and Performance Insights

    Shubhayan Pan, Kushal Bose, Debolina Paul, Saptarshi Chakraborty, Swagatam Das
    Convex clustering is a well-regarded clustering method, resembling the similar centroid-based approach
    of Lloyd’s k-means, without requiring a predefined cluster count. It starts with each data point as its
    centroid and iteratively merges them. Despite its advantages, this method can fail when dealing with
    data exhibiting linearly non-separable or non-convex structures. To mitigate the limitations, we propose
    a kernelized extension of the convex clustering method. This approach projects the data points into
    a Reproducing Kernel Hilbert Space (RKHS) using a feature map, enabling convex clustering in this
    transformed space. This kernelization not only allows for better handling of complex data distributions but
    also produces an embedding in a finite-dimensional vector space. We provide a comprehensive theoretical
    underpinnings for our kernelized approach, proving algorithmic convergence and establishing finite sample
    bounds for our estimates. The effectiveness of our method is demonstrated through extensive experiments
    on both synthetic and real-world datasets, showing superior performance compared to state-of-the-art
    clustering techniques. This work marks a significant advancement in the field, offering an effective solution
    for clustering in non-linear and non-convex data scenarios.
    Machine LearningClusteringMachine LearningLearning theoryMachine LearningOptimizationMachine LearningTheory of deep learningMachine LearningUnsupervised learning
  594. #6712

    Can Quantum Federated Learning Withstand Circuit-Level Backdoors?

    Aakar Mathur, Ruknuddin Mohammed, Ashish Gupta
    Quantum Federated Learning (QFL) inherits the core vulnerability of federated optimization to malicious clients, while also introducing an attack surface from variational circuit training and measurement-driven gradients. This work proposes a novel CircUit-Level backdoor Threat (CULT) model that formalizes four stealthy attacks by exploiting quantum-aware mechanisms, including Grover, Pauli, Bit-flip, and Sign-flip. By enabling malicious clients on both in-training and post-training surfaces, these attacks can critically undermine the learning process. We establish a rigorous theoretical foundation to demonstrate attack stealthiness under standard smoothness assumptions. Experiments on the MNIST and CIFAR-10 datasets with non-IID splits and varying fractions of malicious clients show that even a single malicious client can induce severe accuracy degradation under FedAvg aggregation. While popular defenses, including Krum, Multi-Krum, FoolsGold, FLGuardian, and Mud-HoG, reduce degradation in many regimes, they fail to eliminate worst-case failure cases, where accuracy drops up to 50%. The experimental analysis further reveals that under the CULT model, malicious updates effectively mask their presence by staying close to benign norms, thereby helping attackers evade detection.
    Machine LearningClassificationMachine LearningFederated learning
  595. #6714

    Optimal LTLf Synthesis

    Yujian Cao, Sven Schewe, Qiyi Tang, Shufang Zhu
    Strategy synthesis typically follows an all-or-nothing paradigm, returning unrealisable whenever a specification cannot be guaranteed in an uncertain environment. In this paper, we introduce optimal LTLf synthesis, where the goal is to realise as many objectives as possible from a given specification consisting of multiple objectives, especially for the case that they are not all jointly realisable. We first consider max-guarantee synthesis, which commits to a maximal set of objectives that we can a priori guarantee to realise. We then introduce max-observation synthesis, which maximises a posteriori realised objectives that may be incomparable on different executions. Finally, we present incremental max-observation synthesis, which further improves strategies by exploiting opportunities for stronger guarantees when they arise during an execution. Experimental results show that different variations of optimal synthesis scale broadly equally well, solving a large fraction of the benchmark instances within the given timeout, demonstrating the practical feasibility of the approach.
    Agent-based and Multi-agent SystemsFormal verification, validation and synthesisPlanning and SchedulingPlanning algorithmsPlanning and SchedulingReal-time planningPlanning and SchedulingTheoretical foundations of planning
  596. #6722

    Diverge to Converge: Mutual Heterogeneous Learning for Robust Pruning

    Jinhui Yu, Zikai Zhang, Khaled A. Harras, Yidong Li
    Neural network pruning is crucial for efficient deployment on resource-constrained devices, yet achieving high sparsity often leads to significant robustness degradation against adversarial perturbations and corruptions. Recent works typically rely on single-model fine-tuning along a fixed optimization trajectory, which renders the network susceptible to local optima and noise while failing to restore the multiple robustness properties compromised during compression. In this paper, we propose \textbf{Mutual Heterogeneous Learning (MHL)}, a framework enabling robust pruning via \textbf{single-model inference}. \textbf{MHL} instantiates heterogeneity through two complementary mechanisms: \textbf{layer-wise Lipschitz regularization} for intermediate feature smoothness and \textbf{adaptive margin objective} for difficulty-aware boundary separation. To guide these diverse experts to converge, we employ \textbf{entropy-based mutual distillation with a strategic schedule} that shifts the optimization trajectory from exploring diverse feature subspaces to consolidating a unified robust model. Extensive experiments on 4 clean and corruption benchmarks and adversarial attacks demonstrate that MHL significantly outperforms single-model baselines in both adversarial robustness (+5\%) and corruption robustness (+2.6\%) while maintaining competitive clean accuracy.
    Computer VisionAdversarial learning, adversarial attack and defense methodsMachine LearningDeep learning architecturesMachine LearningRobustness
  597. #6744

    Segment Any 3D-Part in a Scene from a Sentence

    Hongyu Wu, Pengwan Yang
    This paper aims to achieve the segmentation of any 3D part in a scene based on natural language descriptions, extending beyond traditional object-level 3D scene understanding and addressing both data and methodological challenges. Due to the expensive acquisition and annotation burden, existing datasets and methods are predominantly limited to object-level comprehension. To overcome the limitations of data and annotation availability, we introduce the 3D-PU dataset, the first large-scale 3D dataset with dense part annotations, created through an innovative and cost-effective method for constructing synthetic 3D scenes with fine-grained part-level annotations, paving the way for advanced 3D-part scene understanding. On the methodological side, we propose OpenPart3D, a 3D-input-only framework to effectively tackle the challenges of part-level segmentation. Extensive experiments demonstrate the superiority of our approach in open-vocabulary 3D scene understanding tasks at part level, with strong generalization across real-world 3D scene datasets.
    Computer Vision3D computer vision
  598. #6748

    Parameter-Efficient Dual-Loss Adaptation with Logit Divergence: A Unified Approach for Adversarial Example Detection and Robust Inference

    Zirui Fu, Marco Donato
    We present D3Adapter, a threat-aware framework that unifies adversarial example detection (AED) and robust inference. D3Adapter attaches a small library of lightweight adapters to a frozen ResNet backbone; the adapters are trained under two loss functions (cross-entropy and optimized reverse cross-entropy) and complementary objectives (clean training and adversarial training), yielding intentionally distinct logit behaviors. During inference, D3Adapter quantifies inter-adapter logit divergence to produce an agreement-based threat score without any external detector for AED. The same pass also performs robust inference by outputting prediction from the selected adapter when an adversarial example is detected, therefore unifying detection and defense into single-pass forward computation. We evaluate D3Adapter under transfer-based and adaptive white-box attacks, and we study scalability across datasets with varying numbers of classes, showing that unified detection and robust inference can be achieved with predictable overhead proportional to the number of adapters.
    Computer VisionAdversarial learning, adversarial attack and defense methodsComputer VisionEfficiency and OptimizationComputer VisionTransfer, low-shot, semi- and un- supervised learningMachine LearningAdversarial machine learningMachine LearningMulti-task and transfer learning
  599. #6760

    Extending Weighted Heuristic Search to Bi-Objective Search Problems

    Hans Kühn Leiva, Jorge A. Baier, Carlos Hernández Ulloa, Oren Salzman, Ariel Felner, Sven Koenig
    In heuristic search, a well-known technique to speed up search while providing a suboptimality guarantee is to multiply the heuristic function by a weight w > 1. In this paper, we study the theoretical and practical implications of using such a technique in bi-objective heuristic search, a natural academic exercise that has remained unexplored. We introduce Weighted BOA∗ϵ (WBOA∗
    ϵ ), a weighted version of the BOA* algorithm, which uses two real parameters: a weight w for the heuristic and an approximation factor ϵ. Higher values of w and ϵ allow for faster computation of approximate Pareto-optimal solution sets. We prove that WBOA∗ ϵ returns a representative solution set containing (w − 1, ϵ)-approximate solutions. We empirically compare it to A*pex, the state-of-the-art approximate bi-objective search algorithm. We find that WBOA∗ ϵ is competitive with A*pex: when using perfect heuristic functions, in road maps WBOA∗ϵ is faster for higher approximation factors, while in grid maps WBOA∗ϵ dominates A*pex. With imperfect heuristic functions, WBOA∗ϵ performs better than A*pex for lower approximation factors, while the opposite is true for larger approximation factors.
    SearchCombinatorial search and optimisationSearchHeuristic search
  600. #6763

    A First Runtime Analysis of Parallel Tempering in Quadratic Optimization

    Tiago Paixão, Jorge Pérez Heredia, Dirk Sudholt, Andrew M. Sutton
    Parallel tempering, or the replica exchange method, is a Markov Chain Monte Carlo (MCMC) sampling technique for finding low-energy states in complex landscapes by executing multiple processes at different temperatures and allowing for states to migrate between parallel processes based on the Metropolis criterion. Despite a growing interest in the technique as a randomized search heuristic, it is not clear when and why parallel tempering is effective. We conduct a first runtime analysis of the parallel tempering algorithm on a simple quadratic unconstrained binary optimization problem that induces a rugged energy landscape with tunable parameters. We prove that a single-temperature Metropolis process fails to efficiently optimize the problem at any temperature. In sharp contrast, a two-state parallel tempering approach using a low and a high temperature typically solves the problem in O(n^2 log n) iterations. The low-temperature process ensures stability, as it is unlikely to accept worsenings, whereas the high-temperature process facilitates exploration by traversing energy barriers. Improvements are discovered through exploration in the high-temperature process and transferred to the low-temperature process, where they are permanently retained. This effective interplay between processes demonstrates the power of parallel tempering and lends credence to its design.
    SearchCombinatorial search and optimisationSearchEvolutionary computationSearchLocal search
  601. #6765

    Ensuring Logic in the Fog: Sound POMDP Synthesis with LTL Objectives

    Can Zhou, Yulong Gao, Pian Yu
    Synthesising autonomous agents that can navigate uncertain environments while adhering to complex temporal constraints remains a fundamental challenge. While Linear Temporal Logic (LTL) provides a rigorous language for specifying such tasks, the inherent undecidability of qualitatively verifying LTL satisfaction in partially observable Markov decision processes renders quantitative synthesis difficult, especially when designing reliable reward signals for approximate solvers. In this paper, we bridge this gap with a novel, sound reward-shaping mechanism that dynamically generates belief-dependent rewards grounded in certified LTL satisfaction. By integrating this mechanism into an enhanced Monte Carlo Planning framework, we empower agents to navigate the `fog' of partial observability with a search process focused on maximising verifiable success. Our experiments demonstrate that this approach not only thrives in scenarios where existing solvers fail but also maintains effectiveness and scalability across diverse benchmark domains.
    Knowledge Representation and ReasoningReasoning about actionsPlanning and SchedulingPlanning under uncertaintyPlanning and SchedulingPlanning with Incomplete InformationPlanning and SchedulingPOMDPsUncertainty in AISequential decision making
  602. #6771

    Finding Simple Shortest-Paths via Centroids

    Carlos Linares López, Ian Herman
    Finding an arbitrary number of shortest paths in a graph is a fundamental problem in Graph Theory and Artificial Intelligence with numerous applications to real-world problems. While some algorithms produce paths which might contain loops, the most interesting and challenging problem variant consists of finding simple (or loopless) paths. This problem has been addressed with algorithms which always conduct an arbitrary number of Dijkstra searches. While the current state-of-the-art leverages the notion of sidetrack edge, a new development upon sidetrack edges has been recently introduced which can be used to produce non-simple paths very efficiently, centroids. In this paper, centroids are used to compute an arbitrary number of simple paths with some important benefits: the expansion of a single centroid delivers an arbitrary number of paths; only a single Dijkstra search is required to complete the task; the same algorithm can be easily coupled with heuristics that improve search efficiency. Experimental results across various domains show substantial improvements over the current state of the art, both in runtime and memory usage.
    SearchHeuristic search
  603. #6774

    Speaking Numbers to LLMs: Multi-Wavelet Number Embeddings for Time Series Forecasting

    Defu Cao, Zijie Lei, Muyan Weng, Jiao Sun, Yan Liu
    Large language models (LLMs) are attractive for context-aware time series forecasting because they can integrate heterogeneous textual signals, yet their discrete, language-oriented tokenization and embedding interfaces are misaligned with continuous numerical values, often harming numerical ordering and forecasting reliability. We propose TempoWave, a plug-and-play temporal wavelet digit interface that maps each scalar observation into digit-wise embeddings constructed from multi-wavelet, multi-scale coefficients. By directly overriding standard token representations, TempoWave seamlessly exposes both fine-grained local fluctuations and macro global structures in a transformer-compatible form, ensuring that precise numerical formatting, distinct digit identity, and robustness to common normalization operations are maintained throughout the LLM pipeline. Experiments across five context-enriched forecasting benchmarks demonstrate that TempoWave consistently improves LLM-based forecasters over standard numeric tokenization and alternative embedding interfaces, achieving a new state-of-the-art. These results highlight the numeric interface as a key bottleneck and suggest that principled multi-resolution embeddings can better couple LLMs' contextual reasoning with precise forecasting. Our code is available at https://github.com/DC-research/TempoWAVE.
    Machine LearningTime series and data streamsNatural Language ProcessingEmbeddingsNatural Language ProcessingLanguage models
  604. #6779

    Super Condorcet Winners and Limit Coalitional Manipulability of IRV

    Élie de Panafieu, François Durand, Guillem Perarnau
    We study the limit CM rate of single-winner voting rules under Impartial Culture, defined as the probability that a preference profile is coalitionally manipulable in the limit of large electorates. For three candidates, Lepelley and Valognes [1999] derived a closed-form expression for Plurality with Runoff, or equivalently Instant-Runoff Voting (IRV), and showed that its limit CM rate is strictly below one. This is remarkable because Kim and Roush [1996] established a limit of one for several major rules, including Maximin and all positional scoring rules except Veto. In this paper, we generalize the result of Lepelley and Valognes to any number of candidates greater than or equal to four. We show that Plurality with Runoff has a limit CM rate equal to one for all such numbers of candidates, whereas IRV retains a limit CM rate strictly below one. To this end, we rely on the notion of Super Condorcet Winner, recently introduced by Durand [2025], which yields an upper bound on the CM rate of IRV. We prove that this bound is asymptotically tight and compute the probability that a Super Condorcet Winner exists, thereby obtaining the exact limit CM rate of IRV. (Extended version with appendices: https://shs.hal.science/hal-05566713. Companion code: https://github.com/francois-durand/limit_cm_rate_of_irv_ijcai_2026. Short video: https://youtu.be/4i2A7qUP-eo.)
    Game Theory and Economic ParadigmsComputational social choice
  605. #6785

    Optimally Curating an Event

    Mohammad Hajiaghayi, Sebastien Lahaie, Mohammad Mahdavi, Suho Shin
    We study the optimal design of a self-financing event, a problem that requires balancing the recruitment of costly, positive-value participants with revenue-generating agents who may impose negative values on the event.
    We introduce a novel two-sided mechanism design framework with plus agents equipped with private costs and positive impact, and minus agents with private values and negative impact upon inclusion in the event, to maximize the overall quality of the event under a budget-balanced (BB) constraint so that the designer does not run a deficit.

    We conduct a comprehensive study on the theory of optimal event curation on various utility functions.
    For additive utility, we fully characterize the optimal incentive-compatible Bayesian mechanism under both the ex-ante and ex-post BB constraints.
    For submodular utility, we propose an ex-ante BB mechanism that achieves a constant-factor approximation to the optimal Bayesian mechanism.
    Game Theory and Economic ParadigmsMechanism designNatural Language ProcessingApplications
  606. #6791

    FossilWriter: Learning Hypergraph World Models with Latent Narratives for Creative Story Generation

    Heng Zhang, Yihao Zhong, Lubin Gan, Zhihe Chen, Tianyi Zhang, Jing Liu, Jin Huang
    Creative story generation has achieved notable progress with large language models. Current methods construct narratives through hierarchical planning or incremental expansion. These approaches produce structurally complete stories but offer limited support for organic narrative development. Many fiction writers offer a complementary view of the creative process. They often describe their work as gradually uncovering a story embedded in the world they have built. This perspective motivates our approach. We propose that a world model can grow to contain latent narratives as it develops. When characters hold incomplete knowledge and situations remain unresolved, the world structure itself accumulates seeds for story development. Based on this insight we present FossilWriter, a framework that learns hypergraph world models for creative story generation. FossilWriter represents the story world as a shared hypergraph and introduces a narrative layer that marks elements with developmental potential. An excavation module detects and scores latent narratives on this structure. A scene controller selects locations and characters to develop promising narrative threads. The hypergraph grows during writing and serves as the single source of truth to ensure long-range consistency. Experiments on three benchmarks show that FossilWriter achieves an 81.4% win rate on plot coherence and 84.7% on world consistency. Human assessment shows a 32.5% reduction in long-range factual conflicts compared with strong baselines.
    Multidisciplinary Topics and ApplicationsOther
  607. #6792

    The Communication Complexity of Instant-Runoff Voting

    Élie de Panafieu, François Durand, Jérôme Lang
    The communication complexity of a voting rule is the worst-case number of bits that n voters must transmit to a central authority under the most efficient elicitation protocol in an election with m candidates. We study the communication complexity of Instant-Runoff Voting (IRV). Conitzer and Sandholm [2005] established an upper bound of O(n (log m)^2), but did not provide a matching lower bound beyond Omega(n log m). We resolve this open problem by raising the lower bound to Omega(n (log m)^2) using the fooling set technique, thereby showing that the communication complexity of IRV is Theta(n (log m)^2). We further show that this complexity drops to Theta(n log m) under the single-peakedness restriction, and that both the IRV-Average variant and Single Transferable Vote (STV), the multiwinner extension of IRV, have the same asymptotic communication complexity as IRV. (Extended version with appendices: https://shs.hal.science/hal-05566718. Short video: https://youtu.be/gTXV3R2DS6o.)
    Game Theory and Economic ParadigmsComputational social choice
  608. #6798

    Can Stationary Distributions of Scale-Invariant Neural Networks Be Described by the Thermodynamics of an Ideal Gas?

    Ildus Sadrtdinov, Ekaterina Lobacheva, Ivan Klimov, Mikhail Burtsev, Mikhail I. Katsnelson, Dmitry Vetrov
    Understanding the training dynamics of deep neural networks remains a major open problem, with physics-inspired approaches offering promising insights. Building on this perspective, we develop a thermodynamic framework to describe the stationary distributions of stochastic gradient descent (SGD) with weight decay for scale-invariant neural networks, a setting that both reflects practical architectures with normalization layers and permits theoretical analysis. We establish analogies between training hyperparameters (e.g., learning rate, weight decay) and thermodynamic variables such as temperature, pressure, and volume. Starting with a simplified isotropic noise model, we uncover a close correspondence between SGD dynamics and ideal gas behavior, validated through theory and simulation. Extending to training of neural networks, we show that key predictions of the framework, including the behavior of stationary entropy, align closely with experimental observations. This framework provides a principled foundation for interpreting training dynamics and may guide future work on hyperparameter tuning and the design of learning rate schedulers.
    Machine LearningSupervised LearningMachine LearningTheory of deep learningMultidisciplinary Topics and ApplicationsPhysical sciences
  609. #6813

    Information and Contract Design for Repeated Interactions Between Agents with Misaligned Incentives

    Nanda Kishore Sreenivas, Kate Larson
    We study the consequences of information asymmetries and misaligned incentives in settings with multiple independent agents. We model an interaction between a Sender, who holds vital private information but cannot act, and a Receiver, who must make decisions but is dependent on the Sender's information. We find that the Sender learns an optimal communication strategy that the Receiver reliably acts on. Importantly, this strategy is highly sensitive to the degree of conflict in the agents' rewards and the amount of environmental information the Receiver can already observe. We introduce a mechanism allowing the agents to form linear contracts, where a price is established for the information. We demonstrate that the Sender learns to use these payment structures to improve its rewards, though this comes at a cost of “fairness” between agents as the Sender is able to extract much of the Receiver’s surplus. This raises questions about fairness, contract design, and learning in the context of multi-agent systems.
    Agent-based and Multi-agent SystemsAgreement technologies: Negotiation and contract-based systemsAgent-based and Multi-agent SystemsCoordination and cooperationAgent-based and Multi-agent SystemsMulti-agent learning
  610. #6829

    NN-kNN for Regression: Accurate Prediction from Interpretable Retrieval

    Xiaomeng Ye, Yu Wang, David Leake, David Crandall, Great Abhieyighan, Mereck McGowan
    Neural Network k-Nearest Neighbor (NN-kNN) was proposed as an interpretable network model that learns feature weights and similarity to retrieve relevant cases for classification. This paper extends it to regression with the goal of generating accurate predictions based on neighboring cases with similar labels. Specifically, we introduce three modular components: an attention mechanism that weights the contribution of retrieved cases, a locality-aware regularizer that favors label-similar neighbors, and an optional case adaptation module that refines the retrieved estimate. Across synthetic and standard tabular regression benchmarks, NN-kNN achieves competitive predictive error against strong baselines (kNN-R, MLKR, and MLPs) while providing cases with similar labels as explanation (later referred as label-similar case). Moreover, NN-kNN supports manual knowledge injection through tuning weights for human-comprehensible features. The result is a simple, general, and interpretable approach to continuous-valued prediction that unifies retrieval, attention, and optional case adaptation within a single neural framework.
    AI Ethics, Trust, FairnesExplainability and interpretabilityAI Ethics, Trust, FairnesTrustworthy AIKnowledge Representation and ReasoningCase-based reasoningMachine LearningRegressionMachine LearningTrustworthy machine learning
  611. #6836

    AlgoSimBench: Identifying Algorithmically Similar Problems for Competitive Programming

    Jierui Li, Raymond Mooney
    Recent reasoning-enhanced Large Language Models (LLMs) have achieved promising results in solving complex competitive programming problems. However, it remains unclear whether these reasoning abilities generalize to relevant tasks, like identifying algorithmically similar problems (ASPs). We introduce AlgoSimBench, a benchmark of 402 multiple-choice questions curated in an adversarial setting: each given reference problem is paired with one algorithmically similar problem and three distractors that are semantically close but algorithmically dissimilar. This design forces models to rely on algorithmic reasoning rather than superficial textual cues. Our evaluation shows that LLMs consistently struggle under this setting. To address this gap, we propose Attempted Solution Matching (ASM), which leverages LLM-generated solution attempts to assess similarity, yielding an average accuracy improvement of 9% across models. Beyond LLM evaluation, AlgoSimBench also probes code retrieval methods; when combined with BM25, ASM achieves an additional 11.8% gain over state-of-the-art embedding models. AlgoSimBench offers a challenging testbed that facilitates future studies on LLMs and retrieval methods.
    Natural Language ProcessingLanguage modelsNatural Language ProcessingResources and evaluation
  612. #6867

    Tool Call Dependency Graphs Enable Deep LLM Reasoning Evaluation and Better Explanations

    Nick Ferguson, Alan Bundy, Kwabena Nuamah
    Evaluation of tool-augmented Large Language Models (LLMs) has not advanced far beyond final answer accuracy, and neglects in-depth evaluation of reasoning ability despite it being a central claim of recent models. We aim to address this gap by developing a dependency graph-based evaluation to give insight into meta-level reasoning ability. We create a two-stage algorithm to construct dependency graphs from a series of LLM-generated tool calls: in the first stage creating nodes for tool calls, then in the second creating edges where tool call results are identified as being used as arguments in subsequent calls. The resulting graph facilitates the development of graph-theoretic metrics which allows deeper evaluation of the reasoning ability of LLMs over the task in question. We also use the dependency graph to assist with the generation of natural language explanations of the question-answering process. We create a baseline explanation generated from the list of tool calls, and a graph-augmented explanation which incorporates information about information provenance. A human evaluation study shows that our graph-augmented explanations are preferred across three key explainability criteria, although the extra detail contained impacts ease of understanding.
    AI Ethics, Trust, FairnesExplainability and interpretabilityAI Ethics, Trust, FairnesTrustworthy AINatural Language ProcessingQuestion answering
  613. #6897

    Centralized and Distributed Approaches for Restoring the Weak Controllability of Multi-Agent Interdependent STNUs

    Ajdin Sumic, Gauthier Picard, Roberto Posenato, Thierry Vidal, Carlo Combi, Frédéric Maris
    Temporal planning and scheduling often rely on constraint models to reason about activity durations. Simple Temporal Networks with Uncertainty (STNUs) address cases where some contingent durations are outside the control of the executing agent. The Multiple Interdependent STNU (MISTNU) model extends this setting to multi-agent systems by representing shared activities whose duration is decided by one agent and imposed on others. While controllability properties for MISTNUs have been defined, existing repair methods rely on a centralized SMT-based brute-force approach. This paper introduces a linear-constraint characterization of all negative cycles responsible for uncontrollability. Based on this formulation, we propose both a more efficient centralized linear-programming repair method and a distributed constraint reasoning approach that treats inconsistent cycles as inter-agent constraints. We experimentally compare several Distributed Constraint Optimization Problem (DCOP) solvers against the centralized baseline in terms of efficiency and repair quality.
    Agent-based and Multi-agent SystemsMulti-agent planningPlanning and SchedulingDistributed and multi-agent planningPlanning and SchedulingPlanning under uncertainty
  614. #6901

    ROAD: Adaptive Data Mixing for Offline-to-Online Reinforcement Learning via Bi-Level Optimization

    Letian Yang, Xu Liu, Yiqiang Lu, Jian Liu, Weiqiang Wang, Shuai Li
    Offline-to-online reinforcement learning harnesses the stability of offline pretraining and the flexibility of online fine-tuning. A key challenge lies in the non-stationary distribution shift between offline datasets and the evolving online policy. Common approaches often rely on static mixing ratios or heuristic-based replay strategies, which lack adaptability to different environments and varying training dynamics, resulting in suboptimal tradeoff between stability and asymptotic performance. In this work, we propose Reinforcement Learning with Optimized Adaptive Data-mixing (ROAD), a dynamic plug-and-play framework that automates the data replay process. We identify a fundamental objective misalignment in existing approaches. To tackle this, we formulate the data selection problem as a bi-level optimization process, interpreting the data mixing strategy as a meta-decision governing the policy performance (outer-level) during online fine-tuning, while the conventional Q-learning updates operate at the inner level. To make it tractable, we propose a practical algorithm using a multi-armed bandit mechanism. This is guided by a surrogate objective which simultaneously maintains offline priors and prevents value overestimation. Our empirical results demonstrate that this approach consistently outperforms existing data replay methods across various datasets, eliminating the need for manual, context-specific adjustments while achieving superior stability and asymptotic performance.
    Machine LearningMulti-armed banditsMachine LearningOffline reinforcement learningMachine LearningReinforcement learning
  615. #6907

    Identification of Probabilities of Causation: From Recursive to Closed-Form Bounds

    Xin Shu, Shuai Wang, Ang Li
    Probabilities of causation (PoCs) are fundamental quantities for counterfactual analysis and personalized decision making. However, existing analytical results are largely confined to binary settings. This paper extends PoCs to multi-valued treatments and outcomes by deriving closed form bounds for a representative family of discrete PoCs within Structural Causal Models, using standard experimental and observational distributions. We introduce the notion of equivalence classes of PoCs, which reduces arbitrary discrete PoCs to this family, and establish a replaceability principle that transfers bounds across value permutations. For the resulting bounds, we prove soundness in all dimensions and empirically verify tightness in low dimensional cases via Balke's linear programming method; we further conjecture that this tightness extends to all dimensions. Simulations indicate that our closed form bounds consistently tighten recent recursive bounds while remaining simpler to compute. Finally, we illustrate the practical relevance of our results through toy examples.
    Knowledge Representation and ReasoningCausalityUncertainty in AICausality, structural causal models and causal inferenceUncertainty in AIGraphical models
  616. #6924

    Disentangled Graph-Enhanced Large Language Models for Fair Learning

    Zhipeng Yin, Zichong Wang, Zhong Chen, Jack Yang, Xin Ning, Wenbin Zhang
    Large Language Models (LLMs) achieve strong performance in many applications but remain limited in handling graph-structured data due to their reliance on textual context. Recent approaches integrate Graph Neural Networks (GNNs) to enhance structural modeling, yet they largely overlook fairness, leaving models vulnerable to bias amplification across graph and text modalities. To address this issue, we propose FairGEnt, a disentangled graph-enhanced large language model for fair graph learning. FairGEnt separates sensitive-related and sensitive-invariant factors in both graph and textual representations to mitigate bias while preserving task-relevant information, and further aligns the two modalities through a fairness-aware integration module. In addition, FairGEnt incorporates fair graph-enhanced instruction tuning to improve LLM understanding of complex graph structures. Experiments on multiple benchmark datasets demonstrate that FairGEnt consistently outperforms existing methods in both fairness and predictive performance.
    AI Ethics, Trust, FairnesFairness and diversityAI Ethics, Trust, FairnesTrustworthy AINatural Language ProcessingLanguage models
  617. #6932

    Group Vitality Indices: Axioms and Algorithms

    Natalia Kucharczuk, Oskar Skibski
    We consider the problem of assessing a group of nodes in a network. Our focus is on vitality indices—a natural class of centrality measures that evaluate the importance of a node by examining the impact of its removal on the network. We conduct a comprehensive analysis of group vitality indices. Specifically, we show that every vitality index admits a unique extension to groups, which can be defined using a group variant of the Shapley value recently proposed in the literature. We also provide an axiomatization of the entire class, along with two specific group vitality indices that satisfy additional normalization conditions. Furthermore, we study the computational properties of all vitality indices, as well as Group Attachment Centrality.
    Game Theory and Economic ParadigmsCooperative gamesMultidisciplinary Topics and ApplicationsWeb and social networks
  618. #6934

    The Alignment Bottleneck in Decomposition-Based Claim Verification

    Mahmud Elahi Akhter, Federico Ruggeri, Iman Munire Bilal, Robert Procter, Maria Liakata
    Structured claim decomposition is often proposed as a solution for verifying complex, multi-faceted claims, yet empirical results have been inconsistent. We argue that these inconsistencies stem from two overlooked bottlenecks: evidence alignment and sub-claim error profiles. To better understand these factors, we introduce a new dataset of real-world complex claims, featuring temporally bounded evidence and human-annotated sub-claim evidence spans. We evaluate decomposition under two evidence alignment setups: Sub-claim Aligned Evidence (SAE) and Repeated Claim-level Evidence (SRE). Our results reveal that decomposition brings significant performance improvement only when evidence is granular and strictly aligned. By contrast, standard setups that rely on repeated claim-level evidence (SRE) fail to improve and often degrade performance as shown across different datasets and domains (PHEMEPlus, MMM-Fact, COVID-Fact). Furthermore, we demonstrate that in the presence of noisy sub-claim labels, the nature of the error ends up determining downstream robustness. We find that conservative "abstention" significantly reduces error propagation compared to aggressive but incorrect predictions. These findings suggest that future claim decomposition frameworks must prioritize precise evidence synthesis and calibrate the label bias of sub-claim verification models.
    Multidisciplinary Topics and ApplicationsNews and mediaNatural Language ProcessingInterpretability and analysis of models for NLPNatural Language ProcessingResources and evaluationNatural Language ProcessingText classification
  619. #6937

    Do Not Discretize, Optimize: Almost Greedy Fictitious Play

    Evangelos Markakis, Christodoulos Santorinaios
    Our work evolves around Fictitious Play, one of the first iterative methods that is known to converge to a Nash equilibrium in zero-sum games. In recent years, there has been a revived interest, due to applications in various machine learning problems, which has motivated a line of work on its convergence properties and on proposing new variants of the initial algorithm. Our paper is along this direction and introduces one new variant, which we refer to as Almost Greedy Fictitious Play. The proposed algorithm greedily attempts to find the optimal stepsize at each iteration but its search space is constrained and includes almost all the line between the cumulative mixed strategy and the current best response. Our main result is that the method achieves an instance dependent convergence rate of O(1/T) with respect to the duality gap. This matches the rate of Continuous Fictitious Play, and offers an alternative to discretization. We complement our theoretical findings with experiments that demonstrate the effectiveness of the method.
    Game Theory and Economic ParadigmsNoncooperative gamesMachine LearningGame Theory
  620. #6948

    Property Prediction of Stacked Bilayer Materials: A Multimodal Learning Approach

    An Vuong, Minh-Hao Van, Chen Zhao, Xintao Wu
    AI for materials science is a critical topic within AI for science, aiming to accelerate materials discovery and produce accurate property predictions. Bilayer 2D material stacking is essential for exploring new materials with novel functions and inherent phenomena, enabling the creation of new 2D bilayers for diverse real-world applications. Research on bilayer vdWs materials has made significant progress from experimental and computational perspectives. Various bilayer materials have been successfully synthesized experimentally and the increasing utilization of high-throughput computing technology has constructed several computational two-dimensional materials databases. However, the use of AI to model bilayer stacking and predict new properties remains underexplored, necessitating further research studies. In this work, we propose a novel multimodal learning approach to study the interfaces between dissimilar materials that jointly enable new or multiple functions, and to predict new properties arising from the vertical integration (stacking) of different functional material layers under given configurations. Comprehensive experiments demonstrate the effectiveness and efficiency of our approach compared to baseline methods. Our code is available at https://github.com/AnVuong123/bimat_ml.
    Knowledge Representation and ReasoningApplicationsMachine LearningMulti-modal learningMachine LearningRegressionMultidisciplinary Topics and ApplicationsPhysical sciences
  621. #6960

    Semantic Similarity Is Not Legal Correctness: Evaluating RAG Systems in Brazilian Civil Procedure

    Eryclis Silva, Madelyn Sanfilippo
    Evaluating retrieval-augmented generation (RAG) systems in legal domains is challenging due to the nuanced nature of legal reasoning and the scarcity of domain-specific benchmarks. We investigate semantic similarity and legal correctness in Brazilian legal question answering by developing a synthetic dataset of 3,012 evaluation instances derived from the Brazilian Civil Procedure Code, spanning seven query types and validated through human expert assessment. Our evaluation framework combines BERTScore with a domain-adapted LLM-as-Judge (GPT-4o-mini), validated against expert legal assessment. The analysis reveals a 50.8% disagreement between semantic similarity and legal correctness, with correlations varying substantially by query type (rho ranging from 0.464 to 0.732). Explicit article citation emerges as the strongest predictor of quality (Cohen’s d = 1.099), with cited responses achieving 145% higher legal correctness scores. Concrete examples demonstrate bidirectional divergence: responses may achieve high semantic similarity while citing incorrect legal articles, or provide legally correct answers with minimal lexical overlap. These findings demonstrate that semantic metrics alone are insufficient for evaluating legal RAG systems, with disagreement patterns systematically structured by query type.
    Natural Language ProcessingInformation retrieval and text miningNatural Language ProcessingApplicationsNatural Language ProcessingQuestion answeringAINatural Language Processing
  622. #7019

    Unified Sequence Modeling for Remote Sensing: A Parameter-Efficient Foundation Model via Prompt-Driven Granularity Alignment

    Yang Liu, Weixing Luo, Huaizhou Qi, Suisui Jia, Yongjing Guo
    Current remote sensing (RS) perception systems suffer from task heterogeneity, necessitating distinct architectures for classification, localization, and reasoning. While vision--language models (VLMs) offer a route toward unification, their computational cost can hinder deployment. In this work, we propose RS-Florence, a compact unified model that addresses these tasks through a Prompt-Driven Sequence-to-Sequence framework. Unlike traditional approaches that segregate semantic understanding and geometric localization, RS-Florence maps images and task-specific prompts into a unified sequence of natural language and discrete geometric tokens. This formulation helps bridge high-level semantics and low-level pixel perception. Experiments across 4 task families and 8 benchmarks show that our 0.23B model remains competitive with task-specific specialists. These results also suggest that multitask joint training improves performance on several benchmarks over single-task fine-tuning, indicating that a shared prompt-driven interface can serve both language and geometry tasks.
    Computer VisionMachine learning for visionComputer VisionRecognition (object detection, categorization)Machine LearningClassification
  623. #7045

    Tracking Topological Shifts: How Can Dynamic Graph Invariant Learning Enable Reliable Out-of-Time Spatio-Temporal Prediction?

    Xinyan Hao, Huaiyu Wan, Shengnan Guo, Shaojiang Wang, Youfang Lin
    Spatio-temporal graph networks form the foundation of modern traffic prediction, yet their deployment is fundamentally challenged by the pervasive reality of distribution shifts. While out-of-distribution (OOD) learning holds promise for robustness, existing methods rely on static graph structures, failing to capture the inherent topological dynamics of real-world traffic systems and thus limiting long-term deployment reliability. To bridge this gap, we propose DynaSTar, a Dynamic Spatio-Temporal Graph Invariant Learning model designed for reliable out-of-time (OOT) traffic prediction under evolving topologies. Our model employs a dynamic probabilistic graph structure, which is continuously refined through momentum-based updates and differentiable sparse sampling to model evolving inter-node dependencies. Besides, it utilizes node-level environment construction and modulated prediction to extract representations invariant to heterogeneous neighborhood fluctuations, enabling robust generalization. Comprehensive experiments on large-scale, long-term real-world datasets demonstrate that by effectively tracking evolving topological shifts, DynaSTar consistently outperforms state-of-the-art baselines across various OOT scenarios.
    Data MiningMining spatial and/or temporal data
  624. #7052

    Distance Between CP-theory Preferences

    Zihao Qi, Erik Rauer, Samik Basu
    Qualitative preferences specified in CP-theory define a partial order over outcomes based on induced dominance relations. We investigate the problem of measuring divergence between such induced orders. We introduce two formally equivalent distance metrics, each leveraging a distinct computational strategy based on different types of normalization. Preliminary empirical results support the soundness of the proposed methodologies and offer comparative insights into their computational properties. To the best of our knowledge, this is the first study to compute distance measures between qualitative preferences that jointly accounts for conditional dependencies and relative importance statements.
    Knowledge Representation and ReasoningPreference modelling and preference-based reasoningKnowledge Representation and ReasoningQualitative, geometric, spatial, and temporal reasoning
  625. #7054

    LLMs Uncertainty Quantification via Adaptive Conformal Semantic Entropy

    Hamed Karimi, Vaishali Meyappan, Reza Samavi
    LLMs' overconfidence, particularly when hallucinating, poses a significant challenge for the deployment of the models in safety-critical settings and makes a reliable estimation of uncertainty necessary. Existing approaches for uncertainty quantification typically prioritize lexical or probabilistic measures; however, these techniques often ignore the semantic variance of different responses with similar meaning. In this paper, we propose Adaptive Conformal Semantic Entropy (ACSE), a method for estimating prompt-level uncertainty by adaptively measuring semantic dispersion in LLMs outputs. Our uncertainty scoring function is based on clustering semantic entropy of multiple diverse responses to the same prompt. The function adaptively adjusts the uncertainty score based on semantic features of each cluster. To ensure statistical reliability of our score, we use conformal calibration to apply a decision rule to accept/abstain the prompts, providing a finite-sample, distribution-free guarantee such that the error rate among the accepted responses remains bounded by a user-specified tolerance. Our extensive experimental evaluations using different LLMs and datasets, demonstrate that our approach consistently outperforms state-of-the-art uncertainty quantification baselines using discriminative performance, acceptance rate, conformal guarantees, and probabilistic calibration indicators. As a highlight, for TriviaQA dataset, AUROC of our approach is 0.88 compared to 0.65 produced by the token entropy approach.
    Uncertainty in AIUncertainty representationsNatural Language ProcessingLanguage modelsUncertainty in AIApplicationsMachine LearningTrustworthy machine learning
  626. #7080

    Computing Epistemtic EF1 and Pareto-Optimal Allocations of Indivisible Chores

    Jugal Garg, Aniket Murhekar
    We study the allocation of m indivisible chores among n agents with additive disutilities under the fairness notion of envy-freeness up to one chore (EF1) and the efficiency notion of Pareto-optimality (PO). Although the existence of an allocation satisfying both EF1 and PO was recently established by Mahara (2025) using a highly non-constructive fixed-point argument, an effective algorithm for computing such an allocation remains elusive, prompting the study of meaningful relaxations of these desiderata.

    Caragiannis et al. (2023) introduced a natural relaxation through the concept of epistemic fairness: an allocation is said to be epistemic EF1 (EEF1), if for every agent i, it is possible to re-allocate the bundles of agents other than i such that i becomes EF1. In this work, we present a pseudo-polynomial time algorithm for computing an allocation of chores that is both EEF1 and PO. This gives an efficient polynomial-time algorithm for most practical settings where disutility values are integral and polynomially bounded in m and n. Our result employs the competitive equilibrium framework and relies on several technical insights that both utilize the distinct structure of epistemic EF1 and address the challenges it introduces.
    Game Theory and Economic ParadigmsFair divisionGame Theory and Economic ParadigmsComputational social choice
  627. #7083

    On the Privacy-Preserving Capabilities of PAVE Specifications in Learnware

    Hao-Yi Lei, Jin-Hui Wu, Zhi-Hao Tan, Zhi-Hua Zhou
    The learnware paradigm supports model reuse by pairing each submitted model with a specification, a lightweight representation used by the learnware dock system to identify, match, and reuse models without accessing raw data. While specifications are essential for learnware identification, they are also data-dependent public artifacts and it is not clear whether they reveal private information. Recently, the Parameter Vector (PAVE) specification has been proposed and shown to be effective for learnwares, yet its privacy properties remain largely unexplored. In this paper, we provide the first theoretical privacy analysis for PAVE. Specifically, we first formalize two specification-induced risks in the learnware paradigm: the disclosure risk of the released specification and the amplification risk that the specification may strengthen attacks against the released model. Second, we characterize when compact PAVE releases admit intrinsic differential privacy: under natural structural conditions of learnware docks, the compact PAVE specification satisfies an (ε, δ)-DP guarantee without explicit additive noise through a Gaussian-sketch view of stable parameter variations, and for regimes outside these conditions, we further provide DP-S-PAVE as a certified differentially private variant. Third, we show that the resulting DP guarantees control both disclosure risk and specification-side amplification risk, and we analyze the induced privacy–utility trade-off to guide effective learnware identification while preserving privacy.
    Machine LearningLearnware/model reuse/transfer learningMultidisciplinary Topics and ApplicationsSecurity and privacy
  628. #7101

    PDD-RRG: Posterior Diagnostic Decision for Study-level Radiology Report Generation

    Yang Yu, Yiming Ji, Bin Dai, Dong Zhang, Zhiyong Zhou, Shoushan Li, Yakang Dai
    Automatic radiology report generation (RRG) aims to simulate the workflow of radiologists, assisting them in clinical diagnosis. However, existing methods often fall short in utilizing all information relevant to the examination, as is typically done in clinical practice. Although some works attempt to incorporate multi-view images and historical data, these additional inputs may sometimes lead to avoidable diagnostic errors on the contrary. To address these challenges, we introduce a decision-making stage after report generation for the first time and propose a Posterior Diagnostic Decision framework (PDD-RRG) to integrate potentially conflicting diagnoses. Specifically, we create various subsets of input data and utilize an existing RRG model to generate reports from different perspectives. Then the Bayesian posterior probability and the learned thresholds for each clinical observation are calculated to obtain an aggregated diagnostic conclusion, which is subsequently used to refine the generated report. Experiments on MIMIC-CXR demonstrate that our proposed PDD-RRG can effectively enhance the clinical efficacy of existing RRG models without any retraining.
    Computer VisionBiomedical image analysisMultidisciplinary Topics and ApplicationsBioinformaticsMultidisciplinary Topics and ApplicationsHealth and medicine
  629. #7106

    FedCARE: Federated Unlearning with Conflict-Aware Projection and Relearning-Resistant Recovery

    Yue Li, Mingmin Chu, Xilei Yang, Da Xiao, Ziqi Xu, Wei Shao, Qipeng Song, Hui Li
    Federated learning (FL) enables collaborative model training without centralizing raw data, but privacy regulations such as the right to be forgotten require FL systems to remove the influence of previously used training data upon request. Retraining a federated model from scratch is prohibitively expensive, motivating federated unlearning (FU). However, existing FU methods suffer from high unlearning overhead, utility degradation caused by entangled knowledge, and unintended relearning during post-unlearning recovery. In this paper, we propose FedCARE, a unified and low overhead FU framework that enables conflict-aware unlearning and relearning-resistant recovery. FedCARE leverages gradient ascent for efficient forgetting when target data are locally available and employs data free model inversion to construct class level proxies of shared knowledge. Based on these insights, FedCARE integrates a pseudo-sample generator, conflict-aware projected gradient ascent for utility preserving unlearning, and a recovery strategy that suppresses rollback toward the pre-unlearning model. FedCARE supports client, instance, and class level unlearning with modest overhead. Extensive experiments on multiple datasets and model architectures under both IID and non-IID settings show that FedCARE achieves effective forgetting, improved utility retention, and reduced relearning risk compared to state of the art FU baselines. The source code can be found at https://github.com/thechosenchu/FedCARE.
    Multidisciplinary Topics and ApplicationsSecurity and privacy
  630. #7111

    SeesawNet: Towards Non-stationary Time Series Forecasting with Balanced Modeling of Common and Specific Dependencies

    Hao Li, Lu Zhang, Liu Chong, Yankai Chen, Pengyang Wang, Yingjie Zhou
    Instance normalization (IN) is widely used in non-stationary multivariate time series forecasting to reduce distribution shifts and highlight common patterns across samples. However, IN can over-smooth instance-specific structural information that is essential for modeling temporal and cross-channel heterogeneity. While prior methods further suppress distribution discrepancies or attempt to recover temporal specific dependencies, they often ignore a central tension: how to adaptively model common and instance-specific dependency based on each instance's non-stationary structures. To address this dilemma, we propose SeesawNet, a unified architecture that dynamically balances common and instance-specific dependency modeling in both temporal and channel dimensions. At its core is Adaptive Stationary–Nonstationary Attention (ASNA), which captures common dependencies from normalized sequences and specific dependencies from raw sequences, and adaptively fuses them according to instance-level non-stationarity. Built upon ASNA, SeesawNet alternates dedicated temporal and channel relationship modeling to jointly capture long-range and cross-variable dependencies. Extensive experiments on multiple real-world benchmarks demonstrate that SeesawNet consistently outperforms state-of-the-art methods.
    Data MiningMining spatial and/or temporal dataMachine LearningTime series and data streams
  631. #7117

    ProCURE: Addressing the Programming Concept Understanding Gap for Code Generation in LLMs via Concept-Aware Consistency Learning

    Xiaoning Ren, Qiang Hu, Wei Ma, Chongyang Liu, Yan Li, Yao Zhang, Lingxiao Jiang, Yongqiang Lyu, Yinxing Xue
    Although Large Language Models (LLMs) excel at code generation, recent research reveals that they exhibit an insufficient grasp of core programming concepts, such as data flow and control flow. This limitation undermines their robustness when encountering variations in these concepts in practice; however, effective solutions that explicitly target this gap remain limited.
    To address this challenge, we propose ProCURE, a concept-aware consistency learning framework designed to enhance LLMs’ understanding of programming concepts. Specifically, ProCURE first performs automated concept-oriented code augmentation to construct a concept-aligned dataset covering representative programming concepts. It then conducts concept-aware fine-tuning, encouraging the model to capture fine-grained concept variations and learn appropriate generation behaviors under such variations via a novel concept-sensitive consistency loss.
    To quantify programming concept understanding, we introduce the Concept Consistency Score (CCScore), defined as the proportion of correct generations preserved under concept variations. A higher CCScore indicates a more profound understanding of programming concepts.
    We evaluate ProCURE on four open-source LLMs across three widely used code generation benchmarks. Experimental results show that ProCURE improves CCScore by an average of 17.9 points, demonstrating its effectiveness in addressing the programming concept understanding gap.
    Machine LearningApplicationsNatural Language ProcessingApplications
  632. #7118

    OneVoice: One Model, Triple Scenarios—Towards Unified Zero-shot Voice Conversion

    Zhichao Wang, Tao Li, Wenshuo Ge, Zihao Cui, Shilei Zhang, Junlan Feng
    Recent progress of voice conversion (VC) has achieved a new milestone in speaker cloning and linguistic preservation. But the field remains fragmented, relying on specialized models for linguistic-preserving, expressive, and singing scenarios. We propose OneVoice, a unified zero-shot framework capable of handling all three scenarios within a single model. OneVoice is built upon a continuous language model trained with VAE-free next-patch diffusion, ensuring high fidelity and efficient sequence modeling. Its core design for unification lies in a Mixture-of-Experts (MoE) designed to explicitly model shared conversion knowledge and scenario-specific expressivity. Expert selection is coordinated by a dual-path routing mechanism, including shared expert isolation and scenario-aware domain expert assignment with global-local cues. For precise conditioning, scenario-specific prosodic features are fused into each layer via a gated mechanism, allowing adaptive usage of prosody information. Furthermore, to enable the core idea and alleviate the imbalanced issue (abundant speech vs. scarce singing), we adopt a two-stage progressive training that includes foundational pre-training and scenario enhancement with LoRA-based domain experts. Experiments show that OneVoice matches or surpasses specialized models across all three scenarios, while verifying flexible control over scenarios and offering a fast decoding version as few as 2 steps. Audio samples are available at https://kerwinchao.github.io/OneVoice/.
    Natural Language ProcessingSpeech
  633. #7120

    Theoretical Analysis of Multi-Objective Evolutionary Algorithms on Integer Spaces with Local Optima

    Yuetong Sun, Zeqiong Lv, Shengjie Ren, Zimin Liang, Miqing Li, Chao Qian
    Multi-objective evolutionary algorithms (MOEAs) are popular tools for multi-objective optimization (MOO), and have been successfully applied to many real-world MOO problems. However, the theoretical study has lagged behind their practical success and remains largely confined to synthetic pseudo-Boolean functions. To close this gap, this paper--drawing inspiration from a class of popular continuous problems with real-world relevance--introduces a multi-objective benchmark defined on an integer space, featuring an analyzable landscape and the presence of local optima. We conduct a running time analysis on the proposed benchmark and derive several theoretical results. Specifically, we prove that a widely-studied MOEA, GSEMO, using unit-step mutation can be trapped in local optimal regions and fail to identify the Pareto front. Fortunately, we find that this difficulty can be overcome either by incorporating an ageing mechanism or using heavy-tailed mutations that allow multi-valued changes along each dimension of an individual. In addition, we demonstrate the extendability of the proposed benchmark to more complex landscapes with numerous local optima, resembling well-established problems in the field (e.g., those from the ZDT and DTLZ suites). We hope this work is a step forward for the theoretical study of MOEAs on problems that are closely related to those commonly investigated in empirical research.
    SearchEvolutionary computation
  634. #7134

    VERA: Identifying and Leveraging Visual Evidence Retrieval Heads in Long-Context Understanding

    Rongcan Pei, Huan Li, Fang Guo, Qi Zhu
    While Vision-Language Models (VLMs) have shown promise in textual understanding, they face significant challenges when handling long context and complex reasoning tasks. In this paper, we dissect the internal mechanisms governing long-context processing in VLMs to understand their performance bottlenecks. Through the lens of attention analysis, we identify specific Visual Evidence Retrieval (VER) Heads —a sparse, dynamic set of attention heads critical for locating visual cues during reasoning, distinct from static OCR heads.
    We demonstrate that these heads are causal to model performance; masking them leads to significant degradation. Leveraging this discovery, we propose VERA (Visual Evidence Retrieval Augmentation), a training-free framework that detects model uncertainty (i.e., entropy) to trigger the explicit verbalization of visual evidence attended by VER heads. Comprehensive experiments demonstrate that VERA significantly improves long-context understanding of open-source VLMs: it yields an average relative improvement of 21.3% on Qwen3-VL-8B-Instruct and 20.1% on GLM-4.1V-Thinking across five benchmarks.
    AI Ethics, Trust, FairnesExplainability and interpretabilityComputer VisionImage and video retrievalComputer VisionInterpretability and transparency
  635. #7137

    BehaviorBench: A Psychologically Grounded Benchmark for Evaluating Personality in Large Language Models Through Realistic Behaviors

    Taowen Pu, Hexi Wang, Zeyang Liu, Dongsheng Guo, Chuan Zhao
    Current approaches to evaluating personality in large language models (LLMs) typically prompt them to self-report on psychological questionnaires such as the Big Five Inventory. However, these methods assess introspective labels rather than observable behavior, despite the fact that LLMs are deployed to act in realistic contexts, not to reflect on their own traits. To bridge this gap, we introduce BehaviorBench, a new benchmark for evaluating LLM personality through concrete behaviors in everyday scenarios. Grounded in established personality psychology, BehaviorBench links each Big Five trait to validated behavioral manifestations and embeds them in contextually plausible situations that naturally elicit trait-relevant actions. We evaluate a range of models and personality shaping strategies using BehaviorBench and find a substantial mismatch between self-reported personalities and actual behaviors. By grounding evaluation in observable behavior rather than introspection, our work reveals critical gaps in current LLM personality modeling and control mechanisms. Our code and data are available at https://github.com/butra1n/BehaviorBench
    Natural Language ProcessingPsycholinguisticsNatural Language ProcessingResources and evaluation
  636. #7147

    Cost of Structural Learning Under Censored Feedback: A Threshold-Bandit Approach

    Michael Ledford, William Regli
    In many multi-agent applications, tasks yield rewards only when executed by a coalition meeting an unknown size threshold; otherwise, feedback is fully censored. This censorship creates an identifiability problem: agents cannot distinguish stochastic failure from insufficient coordination. We formalize this setting as the Threshold-Activated Cooperative Multi-Armed Bandit (TAC-MAB) and analyze it under both centralized and decentralized coordination. We show that a centralized algorithm (C-TAC) achieves cumulative regret O(log T), decomposed into a structural-search term that captures the cost of resolving feasibility under censored feedback and a statistical-monitoring term for value estimation. We then introduce D-TAC, a decentralized event-triggered protocol in which agents synchronize only when their structural beliefs change. Empirically, D-TAC achieves a 23x reduction in communication relative to the centralized baseline while preserving feasibility alignment under conservative belief fusion. These results characterize the coordination cost of learning under censored feedback and show that near-centralized communication efficiency is achievable without continuous synchronization.
    Machine LearningMulti-armed banditsAgent-based and Multi-agent SystemsCoordination and cooperationAgent-based and Multi-agent SystemsMulti-agent learningMachine LearningOnline learningAgent-based and Multi-agent SystemsAgent communication
  637. #7150

    Differentiable Spectral Normalization for Large-Scale Ising Optimization

    Thinh Nguyen-Cong, Thang N. Dinh
    Spectral relaxation is widely used for large-scale combinatorial optimization due to its computational efficiency. Yet its effectiveness depends critically on the choice of graph normalization, a design decision typically made heuristically. Here, we show that normalization can be treated as a continuous optimization variable rather than a fixed preprocessing choice. Our method, Differentiable Spectral Normalization (DSN), parameterizes the spectral relaxation through a diagonal metric and maximizes the resulting lower bound via projected gradient ascent. Exact gradients are obtained through the Hellmann-Feynman theorem using only the principal eigenpair, maintaining linear complexity per iteration. On benchmark instances ranging from 10^3 to 8.4 x 10^6 nodes, DSN improves solution quality by 3-15% over static spectral methods. Its performance comes within 1-3% of state-of-the-art metaheuristics, such as simulated annealing, at up to 190x lower computational cost on large-scale instances. These results suggest that learning problem-specific relaxation geometry can substantially close the gap between spectral scalability and metaheuristic solution quality.
    Constraint Satisfaction and OptimizationConstraint optimization problemsConstraint Satisfaction and OptimizationMixed discrete and continuous optimizationMachine LearningMatrix/tensor methodsMachine LearningOptimizationSearchCombinatorial search and optimisation
  638. #7159

    HGOOD: Hypergraph-enhanced Graph Contrastive Learning for Graph Out-of-Distribution Detection

    Xuanting Fan, Chenyu Wang, Yueyue Gao, Wei Ju, Yifan Wang
    With the increasing application of graph learning advanced by deep learning, out-of-distribution (OOD) detection for graph-structured data has become an imperative challenge in the real world. Graph neural networks (GNNs) provide a promising solution for OOD detection. However, GNNs' core message-passing mechanism inherently relies on local neighborhood aggregation, and traditional graph structures only characterize pairwise node relations, failing to capture high-order associations and global patterns. Towards this end, in this paper, we propose a Hypergraph-enhanced graph contrastive learning framework for Graph Out-Of Distribution detection (termed HGOOD). Specifically, we construct two branches to mine graph semantics in a comprehensive manner. On the one hand, we employ a graph feature branch to encode neighborhood interactions via node-level and graph-level contrastive learning. On the other hand, we incorporate the hypergraph-global branch, which adaptively models graphs’ high-order global correlations. Building upon this, we introduce a cross-branch prototype contrast that aligns the captured graph patterns with their cross-branch clustering prototypes to enhance the semantic manifold of the in-distribution (ID) graph. Extensive experiments on benchmark datasets demonstrate that our HGOOD consistently outperforms prior methods. Our code is available at https://anonymous.4open.science/r/HGOOD-CFE2/.
    Data MiningAnomaly/outlier detectionData MiningMining graphsMachine LearningSelf-supervised Learning
  639. #7161

    Neurosymbolic Framework for Concept-Driven Logical Reasoning in Skeleton-Based Human Action Recognition

    Talha Ilyas, Deval Mehta, Zongyuan Ge
    Skeleton-based human activity recognition (HAR) has achieved strong empirical performance, yet most existing models remain black boxes and difficult to interpret. In this work, we introduce a neurosymbolic formulation of skeleton-based HAR that reframes action recognition as concept-driven first-order logical reasoning over motion primitives. Our framework bridges representation learning and symbolic inference by grounding first-order logic predicates in learnable spatial and temporal motion concepts. Specifically, we employ a standard spatio-temporal skeleton encoder to extract latent motion representations, which are then mapped to interpretable concept predicates via a spatio-temporal concept decoder that explicitly separates pose-centric and dynamics-centric abstractions. These concept predicates are composed through differentiable first-order logic layers, enabling the model to learn human-readable logical rules that govern action semantics. To impose semantic structure on the learned concepts, we align skeleton representations with LLM-derived descriptions of atomic motion primitives, establishing a shared conceptual space for perception and reasoning. Extensive experiments on NTU RGB+D 60/120 and NW-UCLA demonstrate that our approach achieves competitive recognition performance while providing explicit, interpretable explanations grounded in logical structure. Our results highlight neurosymbolic reasoning as an effective paradigm for interpretable spatio-temporal action understanding.
    Computer VisionAction and behavior recognitionComputer VisionInterpretability and transparencyComputer VisionTransparency, accountability, fairness and privacyComputer VisionVision, language and reasoning
  640. #7176

    Towards Comprehensive Post Safety Alignment of Large Language Models via Safety Patching

    Weixiang Zhao, Yulin Hu, Yang Deng, Jiahe Guo, Xingyu Sui, Yanyan Zhao, Bing Qin, Ting Liu
    Safety alignment of large language models (LLMs) has been gaining increasing attention. However, current safety-aligned LLMs suffer from the fragile and imbalanced safety mechanisms, which can still be induced to generate unsafe responses, exhibit over-safety by rejecting safe user inputs, and fail to preserve general utility after safety alignment. To this end, we propose a novel post safety alignment (PSA) method to address these inherent and emerging safety challenges, including safety enhancement, over-safety mitigation, and utility preservation. In specific, we introduce SafePatching, a novel framework for comprehensive PSA, where two distinct safety patches are developed on the harmful data to enhance safety and mitigate over-safety concerns, and then seamlessly integrated into the target LLM backbone without compromising its utility. Extensive experiments on four representative aligned LLMs, including LLaMA-2/3, Gemma and Mistral, show that SafePatching achieves a more comprehensive PSA than baseline methods, further optimizing the balance between being helpful and harmless in current aligned LLMs. Also, SafePatching demonstrates its superiority in continual PSA scenarios.
    AI Ethics, Trust, FairnesSafety and robustnessNatural Language ProcessingLanguage models
  641. #7181

    TabKD: Tabular Knowledge Distillation Through Interaction Diversity of Learned Feature Bins

    Shovon Niverd Pereira, Krishna Khadka, Yu Lei
    Data-free knowledge distillation enables model
    compression without original training data, criti-
    cal for privacy-sensitive tabular domains. How-
    ever, existing methods fail on tabular data because
    they ignore feature interactions, the fundamen-
    tal way tabular models encode predictive knowl-
    edge. We identify interaction diversity, system-
    atic coverage of feature combinations, as the es-
    sential requirement for effective tabular distilla-
    tion. To operationalize this insight, we propose
    TabKD, which learns adaptive feature bins aligned
    with teacher decision boundaries, then generates
    synthetic queries that maximize pairwise interac-
    tion coverage. Across 4 benchmark datasets and
    4 teacher architectures, TabKD achieves highest
    student-teacher agreement in 14 out of 16 configu-
    rations, outperforming 5 state-of-the-art baselines.
    We further show that interaction coverage strongly
    correlates with distillation quality, validating our
    core hypothesis. Our work establishes interaction-
    focused exploration as a principled framework for
    tabular model extraction.
    Machine LearningAdversarial machine learningMachine LearningKnowledge-aided learning
  642. #7198

    Deep Identification of Propagation Trees in Graph Diffusion

    Zeeshan Memon, Chen Ling, Ruochen Kong, Vishwanath Seshagiri, Andreas Züfle, Liang Zhao
    Understanding how information or influence propagates through a network, such as during an epidemic outbreak or the spread of misinformation, is a fundamental yet challenging problem. While prior works have focused on cascade prediction (forecasting future infected nodes), network inference (recovering latent global diffusion graphs), or source localization (identifying diffusion's origin), these approaches do not recover the actual "who-infected-whom" propagation tree for a specific diffusion instance. We introduce DIPT (Deep Identification of Propagation Trees), a probabilistic framework that infers propagation trees from final observed node diffusion states, without knowledge of the underlying diffusion mechanism. DIPT models local influence strengths between nodes and uses a discrete-continuous alternating optimization strategy to jointly learn the diffusion mechanism and infer the propagation structure. Empirical results across eight real-world datasets demonstrate that DIPT consistently outperforms existing approaches in reconstructing propagation trees.
    Machine LearningDeep learning architectures
  643. #7206

    Exploring the System 1 Thinking Capability of Large Reasoning Models

    Wenyuan Zhang, Shuaiyi Nie, Xinghua Zhang, Zefeng Zhang, Tingwen Liu
    This paper explores the system 1 thinking capability of Large Reasoning Models (LRMs), the intuitive ability to respond efficiently with minimal token usage. While existing LRMs rely on long-chain reasoning and excel at complex tasks, their system 1 thinking ability remains largely underexplored. This capability is essential as it reflects models' difficulty awareness and reasoning efficiency, both critical for real-world applications. We propose S1-Bench, a multi-domain, multilingual benchmark comprising model-simple system 1 questions. Our investigation of 28 LRMs reveals under-accuracy and inefficiency on system 1 problems. We find existing efficient reasoning methods either generalize poorly to simple questions or sacrifice performance for efficiency. Further exploration uncovers LRMs' early difficulty awareness accompanied by lower confidence, and shows that problem difficulty is implicitly encoded in hidden states. Code and the extended version of the paper with appendices are available at https://github.com/WYRipple/S1_Bench.
    Natural Language ProcessingOtherNatural Language ProcessingResources and evaluation
  644. #7214

    Learning with Semantic Priors: Stabilizing Point-Supervised Infrared Small Target Detection via Hierarchical Knowledge Distillation

    Yuanhang Yao, Ping Qian, Zhu Liu, Long Ma, Weimin Wang
    Single-frame Infrared Small Target Detection (ISTD) aims to localize weak targets under heavy background clutter, yet dense pixel-wise annotations are expensive. Point supervision with online label evolution reduces annotation cost; however, lightweight CNN detectors often lack sufficient semantics, leading to noisy pseudo-masks and unstable optimization. To address this, we propose a hierarchical VFM-driven knowledge distillation framework that uses a frozen Vision Foundation Model (VFM) during training. We formulate point-supervised learning as a bilevel optimization process: the inner loop adapts a VFM-embedded teacher on reweighted training samples, while the outer loop transfers validation-guided knowledge to a lightweight student to mitigate pseudo-label noise and training-set bias. We further introduce Semantic-Conditioned Affine Modulation (SCAM) to inject VFM semantics into CNN features at multiple layers. In addition, a dynamic collaborative learning strategy with cluster-level sample reweighting enhances robustness to imperfect pseudo-masks. Experiments on diverse challenging cases across multiple ISTD backbones demonstrate consistent improvements in detection accuracy and training stability. Our code is available at https://github.com/yuanhang-yao/semantic-prior.
    Computer VisionSegmentation, grouping and shape analysisComputer VisionTransfer, low-shot, semi- and un- supervised learning
  645. #7222

    BetaEdit: Null-Space Constrained Sequential Model Editing

    Bingqing Liu, Wei Liu, Yuhua Li
    Null-space-based methods have garnered considerable attention in model editing by constraining updates to the null space of the pre-existing knowledge representation, thereby preserving the model's original behavior. However, in practice these methods rely on an approximate null space—leading to knowledge leakage—and further suffer from severe performance degradation during sequential editing. Recent work shows that history-aware editing strategies can empirically mitigate this decline, yet the underlying reason remains unclear. In this paper, we first expose the knowledge leakage inherent in existing null-space approaches and then analyze why history-aware updates effectively preserve both editing performance and general capabilities during long-horizon editing. Building on these insights, we propose BetaEdit, a refined framework that effectively controls the knowledge leakage and integrates history-aware updates into the null-space paradigm. Extensive experiments on three large language models across two standard benchmarks show that BetaEdit consistently outperforms prior methods in the challenging regime of massive-scale sequential editing. Code is available at: https://github.com/lbq8942/BetaEdit.
    Machine LearningIncremental learningNatural Language ProcessingInterpretability and analysis of models for NLPNatural Language ProcessingLanguage models
  646. #7245

    Automated Safety Verification of Posterior Distributions of Probabilistic Programs

    Kazuki Watanabe, Hiroshi Unno
    Ensuring the safety of probabilistic systems is a central challenge in formal verification.
    We propose an automated refinement technique for verifying a safety property of posterior distributions in probabilistic programs.
    Our approach builds on the counterexample-guided abstraction refinement (CEGAR) framework and exploits the duality in convex optimisation and the adequacy of predicate-transformer semantics in probabilistic settings.
    We implement the technique and evaluate its effectiveness through preliminary experiments.
    AI Ethics, Trust, FairnesSafety and robustnessConstraint Satisfaction and OptimizationConstraint satisfactionConstraint Satisfaction and OptimizationSatisfiabiltyConstraint Satisfaction and OptimizationSolvers and toolsKnowledge Representation and ReasoningAutomated reasoning and theorem proving
  647. #7256

    Adversarial Attack Framework Against Vision-Language Model Unlearning

    Yimin Liu, Peng Jiang, Yajie Wang
    Large Vision--Language Models (VLMs) unlearning tends to eliminate the influence of ``to-be-forgotten'' content in the training corpora, algorithmically by suppressing the likelihood of faithfully generating responses on forget-target inputs. The injection of adversarial inputs can manipulate the unlearned VLM's generation towards the attacker’s will, forcing the reproduction of the supposedly forgotten content and undermining the reliability of expected forgetting behavior. However, most attacks assume access to the unlearned VLM’s architecture or parameters, or to output logits via queries. In this paper, we propose SISA, a novel attack framework for crafting adversarial inputs to manipulate generation towards the forgotten target, which only requires access to a surrogate, pre-trained VLM. SISA advances prior attacks by exploiting the persistence of visual sink tokens after unlearning as a stable structural anchor for semantic alignment. SISA induces a sink regime on a candidate visual token to build the structural anchor that influences generation, and then semantically aligns model output to the target while conditioning on attention through the induced sink token, reinforcing the anchor for desired elicitation. With sink persistence and sink-conditioned semantic anchoring, SISA crafts transferable adversarial inputs. Evaluation on diverse unlearned VLM settings confirms the effectiveness of SISA, increasing the outputs’ semantic agreement with forgotten targets by up to 6.9X relative to clean inputs.
    Machine LearningMulti-modal learningMultidisciplinary Topics and ApplicationsSecurity and privacy
  648. #7295

    Continuous Multi-Attribute Recognition for Agent Behaviour Understanding

    Sheryl Mantik, Michael Dann, Huong Ha, Minyi Li, Julie Porteous
    Understanding underlying agent attributes such as goals, preferences, beliefs, and ability level is key to explaining decision-making and predicting behaviour in complex environments. While goal recognition (GR) has produced effective methods for inferring goals, recent work has begun to explore reinforcement learning (RL)-based approaches to attribute recognition (AR), which generalises GR by inferring a wider range of agent attributes beyond goals. Existing frameworks treat attributes as discrete variables and typically require separate models for each attribute. In reality, many agent attributes vary continuously rather than as discrete values, making continuous representations crucial for capturing subtle variations in behaviour. We introduce a multivariate RL-based AR framework that jointly infers multiple continuous-valued attributes from observed behaviour using a single attribute-conditioned policy and continuous probabilistic inference. Our experiments demonstrate that the proposed framework achieves stable and fine-grained inference, offering improved flexibility and generality compared to existing approaches.
    Agent-based and Multi-agent SystemsAgent theories and modelsKnowledge Representation and ReasoningReasoning about actionsMachine LearningReinforcement learningPlanning and SchedulingActivity and plan recognition
  649. #7309

    Learning Counterfactual Fairness from Authentic Generation

    Zichong Wang, Zhipeng Yin, Zhong Chen, Jack Yang, Jun Liu, Wenbin Zhang
    Fairness-aware graph learning has become increasingly important amid growing concerns about algorithmic bias in networked data. Among existing approaches, counterfactual fairness is particularly appealing as it seeks to eliminate unfairness at its causal origin by ensuring that predictions remain invariant in counterfactual worlds where sensitive attributes are altered. However, most existing methods assume that all observed variables are directly influenced by sensitive attributes, an overly strong and often unrealistic assumption in real-world graphs. To address this limitation, we propose Graph Counterfactual Fairness (GCFair), a novel framework that achieves counterfactual fairness by explicitly identifying and disentangling the subsets of node features and graph structures genuinely affected by sensitive attributes. This principled joint disentanglement enables the generation of authentic counterfactual instances that selectively modify only sensitive-related information while preserving all sensitive-irrelevant factors. Extensive experiments show that GCFair effectively mitigates bias and outperforms state-of-the-art fairness methods in both counterfactual fairness and predictive accuracy.
    AI Ethics, Trust, FairnesBiasAI Ethics, Trust, FairnesFairness and diversityAI Ethics, Trust, FairnesTrustworthy AI
  650. #7330

    Invariant Graph Representations for Continuous-Time Dynamic Graphs Under Distribution Shifts

    Lanting Fang, Yulian Yang, Yawei Zhang, Shanshan Feng, Kaiyu Feng, Hanning Yuan
    Continuous-Time Dynamic Graphs (CTDGs) enable fine-grained modeling of evolving relational systems. However, most existing CTDG representation learning methods are tailored to in-distribution settings and exhibit limited robustness under out-of-distribution (OOD) shifts. Although recent causal approaches learn invariant representations via interventions, they are primarily designed for static or discrete-time graphs and become computationally prohibitive for CTDGs due to the combinatorial explosion of structural and temporal variations.
    To address these challenges, we propose CIR, a framework grounded in a novel structural causal model termed the ICCM. To avoid exhaustive interventions, we leverage the Normalized Weighted Geometric Mean (NWGM) to efficiently approximate interventional predictions. We further instantiate ICCM within a practical deep learning architecture that jointly captures invariant structural and temporal patterns through dedicated subgraph extractors, and maintains an environment memory bank to model distributional shifts across evolving contexts. Extensive experiments demonstrate that CIR consistently outperforms existing methods under diverse OOD scenarios.
    Data MiningMining graphsKnowledge Representation and ReasoningLearning and reasoningMachine LearningRepresentation learning
  651. #7336

    QiMeng-EvoPartition: Rethinking the Impact of Partitioning for Automated Pipeline Design

    Qicheng Wang, Rui Zhang, Shuyao Cheng, Chongxiao Li, Pengwei Jin, Zidong Du, Xing Hu, Qi Guo, Yunji Chen
    As a key technique for improving throughput by increasing clock frequency and reducing Cycles per Instruction (CPI), pipeline design increasingly relies on automated methods with the growing scale of modern circuits. However, existing automated pipelining methods often decouple partitioning from CPI optimization, implicitly assuming that partitioning affects only clock frequency. We observe that different partitioning strategies significantly alter the pipeline stage differences between logic gates, thereby impacting the average number of stall cycles and ultimately affecting CPI. Thus, automated pipeline partitioning should jointly reduce CPI and improve clock frequency while ensuring functional correctness, resulting in a multi-objective optimization problem. To address this issue, we propose EvoPartition, an evolutionary partitioning framework for automated pipeline design. Specifically, EvoPartition first inserts virtual nodes to guarantee functional correctness. Then, through iterative evolution for reducing CPI and improving clock frequency, pipeline stage assignments of all nodes are determined. Experimental results on ISCAS85 and EPFL show that EvoPartition achieves a 7.34% CPI optimization by reducing stage differences by 16.23%, and further improves throughput by 7.22% compared to the state-of-the-art.
    Machine LearningApplications
  652. #7357

    Emo-LiPO: Listwise Preference Optimization for Fine-Grained Emotion Intensity Control in LLM-based Text-to-Speech

    Yihang Lin, Li Zhou, Congwei Cao, Dongchu Xie, Xiaoxue Gao, Chen Zhang, Haizhou Li
    Large language model (LLM)-based text-to-speech (TTS) systems enable prompt-conditioned emotional control but struggle with fine-grained emotion intensity due to the semantic--acoustic gap between text and speech. To address this challenge, we formulate emotion intensity control in LLM-based TTS as a learning-to-rank problem and propose Emo-LiPO, a listwise preference optimization framework that aligns prompt-conditioned speech generation with relative emotion intensity expressed in text. Emo-LiPO explicitly models global intensity ordering within each emotion under fixed transcripts, enabling more faithful and continuous emotional expression. We further construct ESD-plus, a multi-speaker dataset with explicit emotion intensity variations, to support fine-grained emotion modeling and evaluation. Experiments on ESD-plus demonstrate that Emo-LiPO significantly improves emotion accuracy and intensity controllability over both supervised- and DPO-based LLM TTS baselines, with particularly pronounced gains at high intensity levels.
    Natural Language ProcessingSpeech
  653. #7375

    Disentangled Knowledge Forgetting in Machine Unlearning

    Yuhang Xia, Cheng Zhen, Yirui Wu, Lixin Yuan, Wenxiao Zhang, Jun Liu
    With the increasing demand of privacy protection, Machine Unlearning (MU) appears to remove private data from an already trained model without retraining from scratch. Most current works suffer from overly unlearning (low fidelity) or incomplete unlearning (low effectiveness). To identify the issues behind, we conduct causal analysis to obtain a resolvable route, i.e., disentangling shared knowledge into attribute-level semantics to remove it as the confounder. We further perform MU loss analysis to reformulate it as balanced form of constraints, thus guaranteeing high fidelity and effectiveness. Based on theoretical analysis, we propose disentangled knowledge forgetting constrained by the reformulated MU loss, which disentangles knowledge with variational auto-encoder and refines knowledge with counterfactual inference. Extensive experimental results demonstrate that our method achieves state-of-the-art performance.
    Computer VisionTransparency, accountability, fairness and privacy
  654. #7376

    ManiSplat: Manipulation Trajectory Synthesis from Monocular Video via Decoupled 3D Gaussian Splatting

    Wenhao Hu, Haonan Zhou, Liu Liu, Yun Du, Xinjie Wang, Ziang Li, Zhizhong Su, Gaoang Wang
    Reconstructing dynamic and interactive 3D scenes from real-world observations remains a fundamental challenge in computer vision and robotics. While recent advances in 3D Gaussian Splatting have enabled high-fidelity static reconstruction, extending it to interactive environments with articulated robots and manipulable objects remains difficult due to complex contact interactions and abrupt pose changes. To address these challenges, we introduce ManiSplat, a unified framework that reconstructs controllable and decoupled Gaussian digital twins directly from monocular ego-view robotic videos. Our method introduces a Graph-Structured Disentangled Representation that separates the robot, objects, and background into independently optimizable Gaussian subfields organized within a scene graph. To ensure stability, we propose a Task-Oriented Spatio-Temporal Alignment module that leverages the inherent logic of manipulation tasks—alternating between Motion and Skill phases—to construct accurate pseudo-ground-truth trajectories. Finally, a joint photometric-geometric optimization ensures the reconstructed scenes are temporally coherent, physically consistent, and simulation-ready. Extensive experiments demonstrate that our approach reconstructs interaction-driven dynamic scenes with high fidelity and controllability, effectively supporting downstream robotic tasks and policy learning. The project page is available at \url{https://whhu7.github.io/ManiSplat/}.
    Computer Vision3D computer vision
  655. #7382

    Automated Random Embedding for Practical Bayesian Optimization with Unknown Effective Dimension

    Hong Qian, Xiang Shu, Xiang Xia, Xuhui Liu, Yangde Fu, Bei Liang, Huibin Wang, Liang Dou
    Bayesian optimization is widely employed for optimizing complex black-box functions but struggles with the curse of dimensionality. Random embedding, as a dimension reduction strategy, simplifies tasks that possess the effective dimension by optimizing within a low-dimensional subspace. However, determining the effective dimension of a task in advance remains a significant challenge, which influences the selection of the subspace dimensionality and the optimization performance. Traditional methods use fixed subspace dimensions provided by experts or rely on trial and error to estimate subspace dimensions with resources consumed. To this end, this paper proposes an automated random embedding for high-dimensional Bayesian optimization with unknown effective dimension, called Dynamic Shared Embedding Bayesian Optimization (DSEBO). DSEBO starts with a low dimension and switches to a higher subspace if the solutions in the current subspace show preliminary convergence. DSEBO dynamically determines the dimension of the next subspace based on the quality of the solutions in different subspaces and shares the queried solutions with the new subspace for a better initialization. Theoretically, we derive a regret bound for DSEBO and demonstrate that DSEBO can better balance approximation and optimization errors. Extensive experiments on functions with dimensionality of varying magnitudes and real-world tasks with unknown effective dimensions reveal that, compared with state-of-the-art methods, alternating optimization across different subspaces results in significant improvements in high-dimensional optimization, both in terms of optimization regret and time.
    Machine LearningFeature extraction, selection and dimensionality reductionMachine LearningOptimization
  656. #7387

    Breaking the Reasoning Horizon in Entity Alignment Foundation Models

    Yuanning Cui, Zequn Sun, Wei Hu, Kexuan Xin, Zhangjie Fu
    Entity alignment (EA) is critical for knowledge graph (KG) fusion. Existing EA models lack transferability and are incapable of aligning unseen KGs without retraining. While using graph foundation models (GFMs) offer a solution, we find that directly adapting GFMs to EA remains largely ineffective. This stems from a critical "reasoning horizon gap": unlike link prediction in GFMs, EA necessitates capturing long-range dependencies across sparse and heterogeneous KG structures. To address this challenge, we propose a EA foundation model driven by a parallel encoding strategy. We utilize seed EA pairs as local anchors to guide the information flow, initializing and encoding two parallel streams simultaneously. This facilitates anchor-conditioned message passing and significantly shortens the inference trajectory by leveraging local structural proximity instead of global search. Additionally, we incorporate a merged relation graph to model global dependencies and a learnable interaction module for precise matching. Extensive experiments verify the effectiveness of our framework, highlighting its strong generalizability to unseen KGs.
    Data MiningKnowledge graphs and knowledge base completion
  657. #7398

    Cross-Relational Preference Learning for Better LLM Instruction Following

    Runsheng Li, Kai Sun, Bin Shi, Bo Dong
    Large Language Models (LLMs) still exhibit limited capability in following complex instructions. While existing approaches often rely on preference learning to enhance this ability, they typically overlook the relationships between the permissible response spaces of different instructions, which restricts a model to align with subtle and diverse constraint variations. To address this, we propose Cross-Relational Preference Learning (CRPL), a novel framework for constructing preference data that explicitly models inter-instruction relationships through two key techniques: Cross-Relationship Perturbation and Cross-Region Pair Sampling. This enables the generation of more diverse preference data that captures a wide spectrum of constraint variations. Additionally, we introduce an atomic constraint-based verification mechanism to rigorously assess response satisfaction, ensuring high-quality preference pair construction. Extensive experiments across multiple preference learning methods (e.g., DPO, KTO), LLM backbones and four instruction-following benchmarks demonstrate that our approach achieves substantial improvements over prior baselines and exhibits strong generalization.
    Knowledge Representation and ReasoningPreference modelling and preference-based reasoningNatural Language ProcessingLanguage generationNatural Language ProcessingLanguage models
  658. #7424

    Endo-GSG: Endoscopic Gaussian Splatting with Geometry-Awareness for Dynamic Tissue Reconstruction via Single-View Monocular Knowledge

    Chao He, Kuangji Chen, Bruce X.B. Yu, Bo Lu
    Dynamic 3D reconstruction of surgical scenes plays a critical role in robotic-assisted surgery. Gaussian Splatting (GS), while effective for novel view synthesis, struggles to recover accurate surface from a monocular view due to the implicit multi-Gaussian representation of the surface. Specifically, (1) the vertical overlap of Gaussians leads to floating artifacts, and (2) the random orientation of Gaussians affects the smoothness of the reconstructed surface. Consequently, these fragmented and misaligned Gaussians hinder downstream applications, e.g., geometry-aware endoscopic navigation and physics-integrated tissue mechanics simulation. To this end, we propose Endo-GSG, a unified framework that couples dynamic Gaussian splatting with an SDF field for dynamic surface-aware tissue reconstruction. To further enhance geometric fidelity, we design geometry-informed regularization losses that constrain Gaussian density and spatial positioning. The entire pipeline is jointly supervised by RGB images and predicted depth maps, enabling high-quality reconstruction and rendering even with sparse or monocular input. Experiments on public datasets demonstrate that Endo-GSG outperforms state-of-the-art methods in both rendering quality and geometric surface accuracy.
    Computer Vision3D computer visionComputer VisionBiomedical image analysisComputer VisionMachine learning for vision
  659. #7438

    Causal Manifold Transport for Identifiable Causal Generation in Diffusion Models

    Junghyo Sohn, Wootaek Jeong, Sujeong Song, Jee Seok Yoon, Heung-Il Suk
    Identifying meaningful latent representations within diffusion models remains a challenging problem for causal approaches. We propose Causal Manifold Transport Diffusion Model (CMT-Diff), a framework that operationalizes causal actions as geometric transformations. By adopting the perspective of backtracking counterfactuals, we formulate the generative process as a composite diffeomorphism that couples the Probability Flow ODE with a Continuous Normalizing Flow. This mapping constructs an exogenous manifold where causal factors align with coordinate variations. Within this geometry, we derive Causal Manifold Transport (CMT) to realize interventions as linear vector translations along factor-aligned directions. We establish theoretical identifiability guarantees and demonstrate that our approach facilitates controllable generation by capturing the underlying causal manifold.
    Machine LearningCausalityMachine LearningGenerative modelsMachine LearningGeometric learning
  660. #7452

    Trust, but Verify: Uncertainty-Driven Evidential Multimodal Representation Learning

    Yupeng Han, Kai Zhang, Xianquan Wang, Zhihong Pan, Ze Liu, Jiyuan He
    Effective multimodal learning in real-world scenarios depends on a nuanced treatment of uncertainty, which arises at three levels: (1) Intrinsic Uncertainty from modality-specific noise or ambiguity; (2) Relational Uncertainty due to cross-modal conflicts or redundancy; and (3) Aggregated Uncertainty when fusing potentially inconsistent signals. Most existing methods overlook this hierarchy, applying a single uncertainty model. We propose Adaptive Evidential Multimodal Representation Learning (AEMRL), a framework aligned with this multi-level view. To address intrinsic uncertainty, Disentangled Evidential Uncertainty Encoding (D-EUE) provides interpretable, class-aware reliability scores per modality. For relational uncertainty, Uncertainty-Conditioned Dynamic Factorization (UDF) uses a hypernetwork to dynamically extract complementary cues and suppress conflict. To resolve aggregated uncertainty, Adaptive Fusion & Conflict-Aware Calibration (AFCAC) adaptively weights evidence streams and calibrates final predictions based on detected conflicts. Extensive experiments on five diverse benchmark datasets show that AEMRL consistently enhances task accuracy, reduces calibration error, and improves robustness to noise, semantic conflict, and missing modalities.
    Computer VisionMultimodal learningMachine LearningMulti-modal learning
  661. #7457

    Competitive Connected Multi-robot Exploration of Unknown Graphs

    Dolev Mutzari, Yonatan Aumann, Sarit Kraus
    Multi-robot graph exploration is a central problem in robotics, planning, and multi-agent systems. In this work, we consider the problem of exploring an unknown $n$-node graph by $k$ robots that must remain connected throughout the process. Such a connectivity is frequently required for safety reasons, and naturally arises in real-world applications such as search-and-rescue and maintenance operations. We study the \emph{overhead} imposed by not knowing the graph in advance, measured in terms of the \emph{competitive ratio} of the number of exploration rounds necessary when the graph is unknown (versus the case that it is known).

    We introduce a novel exploration procedure, \textsf{DFS-BGS}, to tackle the problem, and analyze its performance both theoretically and experimentally. On the theoretical end, \textsf{DFS-BGS} provably achieves a competitive ratio $\tilde{\mathcal{O}}(k^{1/3})$, for the case $n\leq k$. Empirically, we compare our online $\textsf{DFS-BGS}$ to $\textsf{COCTA}$~\cite{sinay2017maintaining}, the SOTA algorithm for trees that are known in advance. Examining the performance of the algorithms on real-world hotel floor plans as well as random graphs over a wide range of parameters, $\textsf{DFS-BGS}$ incurs only a small slowdown, even with hundreds of robots and thousands of nodes.
    Agent-based and Multi-agent SystemsCoordination and cooperation
  662. #7472

    Approximate EFX Under Limited Agent Heterogeneity

    Vishwa Prakash HV, Ruta Mehta, Prajakta Nimbhorkar
    We study EFX and approximate EFX allocations of indivisible goods among agents with at most k distinct valuations (i.e., k types of agents).

    For exact EFX, it is known that an EFX allocation always exists when there are at most three types of agents. For approximate EFX, the best known guarantee is a 0.618-EFX allocation, and this has been improved recently to a φ-EFX allocation when the number of agents is at most seven. We settle another natural case in this landscape by showing that a φ-EFX allocation exists for any number of agents whenever there are at most four distinct valuations.

    We then consider a relaxation, EFX with charity, where some goods may remain unallocated and no agent envies the set of unallocated goods. It is known that, for n agents and any ε ∈ (0, 1/2], there exists an EFXε allocation with at most Õ((n/ε)^(1/2)) goods allocated to charity. We prove that when there are at most k distinct valuation types, there exists an EFXε allocation with only Õ((k/ε)^(1/2)) goods allocated to charity; in particular, the required amount of charity is sublinear in the number of distinct valuations rather than in the number of agents.

    We show that any EFX guarantee for k types of agents can be extended, with an additional (1 − 2ε) approximation factor, to a more general setting where the valuations can be partitioned into k clusters such that the maximum heterogeneity of valuations within each cluster is at most ε.
    Agent-based and Multi-agent SystemsResource allocationGame Theory and Economic ParadigmsFair division
  663. #7473

    Charging Station Placement for Anonymous Mobile Agents: A Parameterized Complexity Perspective

    Arun Kumar Das, Tesshu Hanaka, Nikolaos Melissinos, Hirotaka Ono
    We study the problem of optimally placing charging stations for a set of k anonymous mobile agents, each of which must reach a distinct terminal. The agents are identical and can travel only up to a given distance r on a single charge.
    The objective is to find a placement of charging stations so that every terminal can be assigned a unique agent capable of reaching it, and the number of charging stations is minimized. The problem is known to be NP-hard in general and solvable in polynomial time only on paths and cycles.

    In this work, we comprehensively investigate the classical and parameterized complexity of the problem. We first design a polynomial-time algorithm that, given a set of candidate charging stations, decides whether a feasible assignment of agents to terminals exists. This result also establishes the membership of the problem in NP, which was not previously known. We then analyze the parameterized complexity of the problem. Our results show that the problem remains hard even when we consider different combinations of natural parameters and structural parameters of the given network, such as the distance r plus the tree-depth, or the number of charging stations plus feedback vertex set number. On the positive side, we present three fixed-parameter tractable (FPT) algorithms: one parameterized by the number of agents k, a second by modular-width, and a third by vertex cover number. Finally, we give a polynomial-time k-approximation algorithm, as well as a polynomial-time algorithm for trees, by solving a generalized version of the problem that allows for some charging stations to be pre-placed.
    Agent-based and Multi-agent SystemsMulti-agent planningAgent-based and Multi-agent SystemsResource allocationPlanning and SchedulingRoutingPlanning and SchedulingTheoretical foundations of planningSearchCombinatorial search and optimisation
  664. #7506

    Exploring Internal Emphasis for Robust Noisy RAG

    Qifeng Lai, Zhiguo Gong, Usman Naseem, Wei Wang
    Retrieval‑augmented generation (RAG) improves knowledge‑intensive QA tasks by incorporating external evidence, yet retrieval remains imperfect and often returns irrelevant or misleading passages. We refer to this as Noisy RAG, where LLMs suffer from inconsistent relevance signals due to the fact that relevance identification is carried out by only a small subset of internal modules (e.g., retrieval heads) and can easily be confounded by other modules.
    Recent work attempts to enhance the denoising ability of the LLM through external emphasis, prompting the LLM to highlight helpful passages as the emphasis before answering. However, these textual identifications themselves depend on the same inconsistent signals and thus become less reliable.
    Our key insight is that, because relevance signals reside in only a few modules, emphasis should be applied directly within these modules rather than only at the text level. We therefore propose \textit{selective internal emphasis} to amplify the relevance signals, implemented via a lightweight, plug‑and‑play signal amplifier that operates inside the LLM. The amplifier performs token‑ and channel‑level selective emphasis during a standard RAG fine‑tuning pipeline, with the base LLM frozen.
    Across four QA benchmarks and three LLM scales, our method consistently improves accuracy and robustness. Furthermore, generalization and interpretability analyses show that the amplifier captures retrieval‑related patterns rather than only dataset‑specific patterns, enabling more reliable passage-denoising.
    Natural Language ProcessingApplicationsNatural Language ProcessingInformation retrieval and text miningNatural Language ProcessingInterpretability and analysis of models for NLPNatural Language ProcessingQuestion answering
  665. #7512

    Beyond Sequences: A Dynamic Hierarchical Heterogeneous Spatio-Temporal Graph for Irregular Multivariate Time Series Forecasting

    Xiaowei Yan, Zhuo Li, Junjie Zhang, Bing Li, Jun Yan, Buzhou Tang
    Irregular Multivariate Time Series (IMTS) analysis is a challenging task as asynchronous irregular sampling disrupts intra-variable temporal consistency and cross-variable alignment. Most existing methods model multivariate correlations at either the variable or observation level in static ways. They suffer from correlation loss or cross-variable misalignment inevitably. In this paper, we propose a novel method DyH2-STGraph, Dynamic Hierarchical Heterogeneous Spatio-Temporal Graph, for IMTS forecasting. Within the graph, irregular observations are represented as nodes with spatio-temporal coordinates and observed values, connected with other neighbor observation nodes dynamically by spatio-temporal neighbor selection, while variables are treated as hyper nodes, connected with their constituent observation nodes as well as other variable nodes. The multivariate correlations are classified into the following three categories in specific manners: (1) Intra-variable coarse-to-fine correlations between variable nodes and their constituent observation nodes captured by hierarchical message propagation, (2) Inter-variable fine-grained correlations among neighboring observation nodes captured by spatio-temporal attention, and (3) Inter-variable coarse-grained correlations among variable nodes captured by attention. Extensive experiments on four benchmark datasets demonstrate that DyH2-STGraph significantly outperforms state-of-the-art methods while maintaining competitive efficiency.
    Machine LearningSequence and graph learningMachine LearningTime series and data streams
  666. #7530

    Focus Like a Human: Efficient GUI Grounding via Coarse-to-Fine Visual Attention and Parallel Verification

    Zhenhua Yang, Xiachong Feng, Weihong Zhong, Lunjun Liu, Xiaocheng Feng, Bing Qin
    Building upon powerful Large Visual Language Models, recent GUI agents have revolutionized autonomous GUI interaction.
    Given the high information density and structural complexity of GUI layouts, a critical challenge lies in accurately identifying where to focus, i.e., precise GUI grounding.
    To ensure grounding accuracy, prior approaches predominantly rely on processing inputs at native high resolutions. However, this paradigm inevitably introduces excessive visual redundancy, resulting in increased computational cost and dispersed attention.
    In our preliminary analysis, we observe that while high-resolution inputs are indispensable for recognizing fine-grained visual details, they paradoxically impair the model’s ability to attend to the correct regions.
    In contrast, low-resolution inputs, which suppress high-frequency details while preserving global layout structure through downsampling, produce significantly more robust and coherent attention patterns that better guide the model toward relevant regions.
    Interestingly, this observation mirrors the human visual system, which first leverages low-resolution peripheral vision to identify salient areas before conducting a detailed examination via high-resolution foveal focus.
    Motivated by this insight, we propose a coarse-to-fine GUI grounding framework termed FastFocus. Our approach first exploits the stability of low-resolution attention to efficiently generate candidate grounding regions. These candidates are then evaluated by a Parallel Verification module, which selectively zooms into high-resolution views to resolve fine details and filter out false positives.
    Experiments on ScreenSpot-Pro demonstrate state-of-the-art performance, validating that a hierarchical integration of low-resolution guidance and high-resolution verification constitutes an effective and robust paradigm for GUI grounding.
    Agent-based and Multi-agent SystemsHuman-agent interactionComputer VisionVision, language and reasoningNatural Language ProcessingDialogue and interactive systems
  667. #7533

    Transformer-based Convolutional Codes Decoder

    Guangya Hu, Tianyi Li, Chaoping Xing
    As a fundamental component of the physical layer, convolutional codes are instrumental in ensuring dependable transmission of information over inherently noisy communication channels. Recently, neural network-based decoding algorithms have shown remarkable progress in enhancing the performance of block error-correcting codes. However, current neural network decoders for convolutional codes often exhibit strong performance primarily on low-rate codewords, whereas maximum likelihood decoding algorithm incur substantial time overhead when decoding high-rate codewords, which impacts communication efficiency in practical communication systems. In this work, we propose a Transformer-based decoder for convolutional codes designed for soft-decision decoding over AWGN and fading channels, which serves as an efficient decoding scheme for convolutional codes with various code rates. For the unique structure of convolutional codes, we propose a Bidirectional Context Aggregation mechanism to enable the model to fully capture the long-range dependencies and the memory effect of convolutional structures. Additionally, we extend the Transformer attention matrix with a codeword-specific feature integration mechanism to allow the model to capture implicit relationships within codewords. Extensive experiments demonstrate that the accuracy performance of the proposed model outperforms that of neural network decoders. And for maximum likelihood decoding algorithm, our model achieves comparable error-correction capability and a significantly faster decoding speed.
    Machine LearningApplications
  668. #7536

    From Values to Tokens: An LLM-Driven Framework for Context-Aware Time Series Forecasting via Symbolic Discretization

    Xiaoyu Tao, Shilong Zhang, Mingyue Cheng, Daoyu Wang, Tingyue Pan, Bokai Pan, Changqing Zhang, Shijin Wang
    Time series forecasting plays a vital role in supporting decision-making across a wide range of critical applications, including energy, healthcare, and finance. Despite recent advances, forecasting accuracy remains limited due to the challenge of integrating historical numerical sequences with contextual features, which often comprise unstructured textual data. To address this challenge, we propose TokenCast, a large language model (LLM) driven framework that leverages language-based symbolic representations as a unified intermediary for context-aware time series forecasting. Specifically, TokenCast employs a discrete tokenizer to transform continuous numerical sequences into temporal tokens, enabling structural alignment with language-based inputs. To effectively bridge the semantic gap between modalities, both temporal and contextual tokens are embedded into a shared representation space via a pre-trained LLM, further optimized with generative objectives. Building upon this unified semantic space, the aligned LLM is subsequently fine-tuned in a supervised manner to predict future temporal tokens, which are then decoded back into the original numerical space. Extensive experiments on real-world datasets demonstrate the effectiveness of our framework and highlight its potential as a generative framework for context-aware time series forecasting. The code is available at https://github.com/Xiaoyu-Tao/TokenCast.
    Data MiningMining spatial and/or temporal data
  669. #7545

    ScholarSum: Student-Teacher Abstractive Summarization via Knowledge Graph Reasoning and Reflective Refinement

    Bohou Zhang, Xiaoyu Tao, Mingyue Cheng, Huijie Liu, Qi Liu
    Abstractive summarization plays a crucial role in enabling efficient understanding of scientific literature, yet it inherently demands both linguistic fluency and factual faithfulness. Existing approaches often fail to reconcile these two requirements. Extractive methods rely on rigid sentence splicing that disrupts macro-level logical coherence, while large language model (LLM)-based generative approaches, despite mastering linguistic fluency, exhibit limited factual consistency.
    In this work, we propose ScholarSum, a hierarchical reflective graph-based framework that emulates a student–teacher writing process for fluent and faithful scientific summarization. ScholarSum first organizes the document into a hierarchical knowledge graph by segmenting it into semantically coherent units, whose multi-layered community structure captures global logic and macro-level themes. Guided by this global structure, the student generates an initial draft, which is subsequently refined through fine-grained evidence retrieval.
    To ensure factual consistency, a teacher-like reviewer then iteratively examines the draft, identifies unsupported content, and prompts targeted re-retrieval and rewriting until the summary meets rigorous quality standards.
    Extensive experiments demonstrate that ScholarSum significantly outperforms previous baselines in terms of both completeness and faithfulness.
    Our code is available at https://github.com/Xiaoyu-Tao/ScholarSum.
    Natural Language ProcessingInformation retrieval and text mining
  670. #7547

    Property Enhanced Instruction Tuning for Multi-Task Molecule Generation with Large Language Models

    Xuan Lin, Long Chen, Yile Wang, Yangyang Chen, Xiangxiang Zeng
    Large language models (LLMs) are widely applied in various natural language processing tasks such as question answering and machine translation. However, due to the lack of labeled data and the difficulty of manual annotation for biochemical properties, the performance for molecule generation tasks is still limited, especially for tasks involving multi-properties constraints. In this work, we present a two-step framework PEIT (Property Enhanced Instruction Tuning) to improve LLMs for molecular-related tasks. In the first step, we use textual descriptions, SMILES, and biochemical properties as multimodal inputs to pre-train a model called PEIT-GEN, by aligning multi-modal representations to synthesize instruction data. In the second step, we fine-tune existing open-source LLMs with the synthesized data, the resulting PEIT-LLM can handle molecule captioning, text-based molecule generation, molecular property prediction, and our newly proposed multi-constraint molecule generation tasks. Experimental results show that our pre-trained PEIT-GEN outperforms MolT5, BioT5, MolCA and Text+Chem-T5 in molecule captioning, demonstrating modalities align well between textual descriptions, structures, and biochemical properties. Furthermore, PEIT-LLM shows promising improvements in multi-task molecule generation, demonstrating the effectiveness of the PEIT framework for various molecular tasks. The code and appendix are available at https://github.com/chenlong164/PEIT.
    Data MiningApplicationsMultidisciplinary Topics and ApplicationsBioinformaticsNatural Language ProcessingApplications
  671. #7548

    CodeDelegator: Mitigating Context Pollution via Role Separation in Code-as-Action Agents

    Tianxiang Fei, Cheng Chen, Yue Pan, Mao Zheng, Mingyang Song
    Recent advances in large language models (LLMs) allow agents to represent actions as executable code, offering greater expressivity than traditional tool-calling. However, real-world tasks often demand both strategic planning and detailed implementation. Using a single agent for both leads to context pollution from debugging traces and intermediate failures, impairing long-horizon performance. We propose CodeDelegator, a multi-agent framework that separates planning from implementation via role specialization. A persistent Delegator maintains strategic oversight by decomposing tasks, writing specifications, and monitoring progress without executing code. For each sub-task, a new Coder agent is instantiated with a clean context containing only its specification, shielding it from prior failures. To coordinate between agents, we introduce Ephemeral-Persistent State Separation (EPSS), which isolates each Coder's execution state while preserving global coherence, preventing debugging traces from polluting the Delegator's context. Experiments on various benchmarks demonstrate the effectiveness of CodeDelegator across diverse scenarios.
    Agent-based and Multi-agent SystemsApplicationsAgent-based and Multi-agent SystemsMulti-agent planning
  672. #7552

    Latents-Inv:Robust Semantic Watermark via Dual-Path Mutual Information Redundancy for Diffusion Models

    Cong Li, Lingyun Yu, Peiqi Jiang, Hongtao Xie
    Semantic watermarking methods, embedding identity into the initial latent noise, provide an imperceptible identity traceability for diffusion models in copyright protection and source verification. However, existing methods are highly vulnerable to adversarial attacks, especially geometric transformations (e.g., rotation, cropping) and latent-space manipulations via proxy models, limiting the reliability of watermark verification in practical deployment. To address this issue, we propose a robust and fully reversible, flow-based watermarking framework with dual encoding paths, which preserves high visual fidelity of watermarked image while ensuring resilient identity recovery under adversarial attacks. Specifically, a dual-path network is proposed to encode watermark information into both the generated image and the owner’s secret key. This network leverages Mutual Information Redundancy to recover compromised information under single-path attack, ensuring robust verification. To enhance verification credibility without degrading generation quality, we introduce a joint training strategy that suppresses false positives on negative samples through contrastive learning under fidelity constraints. Furthermore, we employ a backward Euler iteration scheduler for rectified flow models, which facilitate accurate inversion mapping, to enable effective watermark verification, which accurate inversion. Extensive experiments show that our method achieves superior robustness against various adversarial attacks while maintaining high visual quality across diverse generative models.
    AI Ethics, Trust, FairnesEthical, legal and societal issuesAI Ethics, Trust, FairnesSafety and robustnessAI Ethics, Trust, FairnesTrustworthy AI
  673. #7556

    Incentivizing Truthful Machine Unlearning via Hierarchical Auditing

    Shaolong Guo, Yuntao Wang, Zhou Su, Tom H. Luan
    Machine unlearning has become a critical capability for AI services to comply with evolving privacy regulations. A key yet underexplored challenge is how to verify whether a profit-driven AI server has faithfully performed unlearning. Existing verification approaches either incur prohibitive costs or provide insufficient deterrence, failing to balance audit cost and enforcement effectiveness. To bridge this gap, we propose UAG, a game-theoretic unlearning auditing framework that incentivizes truthful unlearning via strategic deterrence rather than exhaustive verification. We design a hierarchical auditing mechanism that combines low-cost screening with selectively triggered high-precision verification, and models the server-auditor interaction as a three-stage dynamic Bayesian game. By characterizing the equilibrium strategy, we derive optimal audit and penalty policies that incentivize honest unlearning. Theoretical analysis and experiments show that UAG maintains reliable detection while achieving a favorable cost-deterrence trade-off. Notably, UAG attains a server honesty rate of approximately 95% while screening only about 50% of unlearning requests, showing its practicality for trustworthy black-box unlearning services.
    AI Ethics, Trust, FairnesAI and law, governance, regulationAI Ethics, Trust, FairnesTrustworthy AIGame Theory and Economic ParadigmsOther
  674. #7573

    DynaOD: Dynamic Origin-Destination Flow Generation with Discrete-to-Continuous Temporal Semantic Modeling

    Jie Zhao, Xianqi Dai, Jie Feng, Huandong Wang, Yong Li
    Dynamic origin-destination (OD) flow generation seeks to synthesize realistic mobility dynamics from temporal context alone, without relying on historical OD observations. A key challenge is to translate semantic temporal signals into temporally coherent OD patterns while preserving the inherent spatial heterogeneity of urban regions. We propose DynaOD, a semantic-driven framework that models temporal dynamics through two complementary perspectives: discrete directional trends that characterize qualitative shifts in urban activity patterns, and continuous temporal evolution that captures how such shifts unfold over time. By jointly encoding these temporal semantics, the framework constructs time-varying region representations that condition pretrained static OD generators in a lightweight and plug-and-play fashion. This modular design further supports scalable deployment and cross-city transferability. Extensive experiments on large-scale real-world datasets show that our method consistently outperforms representative baselines in both predictive accuracy and distributional fidelity. Code is publicly available at https://github.com/csjiezhao/DynaOD.
    Data MiningMining spatial and/or temporal data
  675. #7583

    Persistent Safety Set Guided Offline Safe Reinforcement Learning

    Ayan Choudhury, Janaka Brahmanage, Akshat Kumar, Praveen Paruchuri
    Offline safe reinforcement learning learns high-return policies that satisfy hard safety constraints using only a pre-collected dataset. This setting is challenging due to the inability to explore, and the risk of propagating value errors through unsafe state-space regions. To address this, first, we characterize the safe state region by developing a framework for learning control barrier functions (CBFs) using a novel generalized Bellman operator, yielding a persistent safety set, from which the agent can remain safe indefinitely. Second, we show that several existing safety set estimation methods (e.g., reachability-constrained RL) can be formulated within our CBF learning framework, highlighting its generality. We further propose a new CBF that ensures safety under environment dynamics uncertainty, unlike standard CBFs designed for deterministic settings. Third, we propose a new reward maximization algorithm that effectively exploits our learned persistent safety set for reward critic estimation. Empirical results on standard benchmarks show that our approach achieves state-of-the-art safety with fewer constraint violations while maintaining competitive returns.
    Machine LearningOffline reinforcement learningPlanning and SchedulingMarkov decisions processesUncertainty in AISequential decision making
  676. #7585

    Multi-View Ensemble for Time Series Anomaly Detection via Coupling Flows

    Wanghui Qiu, Chenxi Liu, Shiyan Hu, Zhengyu Li, Chenjuan Guo, Bin Yang
    Time series anomaly detection faces a critical challenge that different anomaly types require different detection mechanisms, yet single methods are inherently limited by their design biases. We propose FlowFuse, a multi-view ensemble framework with coupling flow-based score fusion for time series anomaly detection. FlowFuse combines four complementary detection modules spanning temporal and frequency domains, as well as clustering and reconstruction paradigms, to extract multi-perspective anomaly scores. Rather than simple averaging, an ensemble of coupling flows models the joint distribution of these multi-view scores, learning complex inter-view dependencies through invertible transformations with alternating updates between temporal and frequency scores. The ensemble naturally quantifies detection uncertainty through prediction disagreement, which can optionally guide selective supervision when labels are available. Extensive experiments across 18 diverse benchmarks show that FlowFuse achieves state-of-the-art performance, demonstrating effectiveness across multiple anomaly types.
    Data MiningMining spatial and/or temporal data
  677. #7599

    MindTracker: Unveiling Implicit Emotions in Long-Horizon Dialogues

    Zhiqiang Gao, Jing Han, Zhuochu Wang, Shihao Gao, Cheng Zhu, Kehan Wang, Huan Zhao, Zixing Zhang
    Affective computing has achieved notable success in recognizing explicit emotions from short, isolated dialogue segments. However, human emotions are often implicitly expressed, internally regulated, and dynamically evolve over extended interactions. Existing models struggle to disentangle internal emotional states from external expressions, and fail to capture the emotional inconsistency that emerges across long-horizon dialogues. To address this limitation, we introduce Emotional Inconsistency Analysis (EIA), a novel task that aims to identify and reason about discrepancies between implicit and explicit emotions over long-term conversational contexts. To support this task, we construct the MaskDialog dataset carefully curated from television drama and large language models (LLMs). We further propose two LLM-based baseline approaches, i.e., One-shot Self-consistent Inference and Cascaded Multi-step Inference, and conduct comprehensive analyses on dialogue construction strategies and inference behaviors. Extensive experiments across multiple mainstream LLMs reveal that EIA remains highly challenging, particularly in modeling implicit emotional trajectories and cross-turn inconsistency. Overall, EIA reframes emotion understanding from short-term recognition to longitudinal, implicit emotion tracking, with implications for dialogue systems and human–computer interaction.
    Natural Language ProcessingSentiment analysis, stylistic analysis, and argument miningHumans and AIHuman-computer interactionKnowledge Representation and ReasoningDiagnosis and abductive reasoning
  678. #7600

    LLM-Orchestrated Diagnose–Plan–Treat for Mixed-Degradation CT Reconstruction

    Yongqiang Huang, Yingyu Chen, Fengzhi Xu, Tao Wang, Wenjun Xia, Hongming Shan, Yi Zhang
    Clinical Computed Tomography (CT) reconstruction often faces mixed degradations, where quantum noise, streak artifacts, and geometric distortions co-occur with various compositions and severities. Recently, all-in-one frameworks have outperformed traditional single-task models through degradation-specialized modulation mechanisms. However, attempting to resolve these mixed degradations simultaneously forces the model into suboptimal trade-offs, failing to balance conflicting reconstruction objectives. Our empirical study shows that decomposing CT reconstruction into a sequence of ordered steps effectively mitigates this conflict, with the execution order being a critical performance factor. Leveraging this insight, we propose AgenticCT, an LLM-orchestrated multi-agent framework designed to autonomously plan the optimal reconstruction trajectory. Specifically, AgenticCT operates through a diagnose-plan-treat workflow: a Supervisor Agent explicitly estimates a structured degradation state; a Planner Agent then selects a topology-optimized execution order conditioned on this state; and a library of Processor Agents sequentially executes the trajectory to achieve high-fidelity reconstruction. Extensive experiments demonstrate that AgenticCT consistently improves reconstruction fidelity across single and mixed degradations, and generalizes better to out-of-distribution datasets compared with strong specialized and all-in-one baselines. Code is available at: https://github.com/yqhuang2912/AgenticCT.
    Agent-based and Multi-agent SystemsCoordination and cooperationComputer VisionBiomedical image analysis
  679. #7604

    DeTri: Debiasing General-Purpose LLMs for Zero-Shot Relation Triplet Extraction via Structural Expert

    Zehan Li, Fu Zhang, Jiawei Li, Wenqing Zhang, Jingwei Cheng
    Zero-Shot Relation Triplet Extraction (ZSRTE) aims to extract relation triplets for unseen relation types without any annotated training data. Recent advancements in Large Language Models (LLMs) have significantly enhanced ZSRTE performance, enabling the direct generation of relational triplets from unstructured text. However, LLMs often introduce biases such as entity shift, relation confusion, and over-prediction, which limit the reliability of the extracted triplets.

    In this paper, we introduce DETRI, a novel debiasing framework for ZSRTE that addresses these biases by leveraging a discriminative structural expert model. DETRI treats LLMs as inspiration generators and refines their outputs via a three-stage debiasing pipeline, consisting of inspiration-based re-prediction, confidence-based filtering, and entity shift correction through span perturbation. The framework improves LLM-generated triplets without requiring fine-tuning the LLM, thus offering a transferable and scalable solution. Extensive experiments on two datasets show that DETRI outperforms state-of-the-art methods by reducing bias and improving extraction accuracy, achieving 2.89% F1 improvement over fine-tuned LLM methods while keeping the LLM parameters frozen. Our approach shows strong generalization across different LLMs, offering a solution to mitigate biases.
    Natural Language ProcessingInformation extraction
  680. #7608

    CAMERA: Adapting to Semantic Camouflage in Unsupervised Text-Attributed Graph Fraud Detection

    Junjun Pan, Yixin Liu, Yu Zheng, Lianhua Chi, Alan Wee-Chung Liew, Shirui Pan
    Text-attributed graph fraud detection (TAGFD) plays a critical role in preventing fraudulent activities on online social and e-commerce platforms. However, to evade detection, fraudsters continuously evolve their camouflaging strategies by deliberately mimicking textual responses of benign users, thereby concealing their malicious purposes. This phenomenon, referred to as semantic camouflage, fundamentally undermines commonly relied assumptions on how structural and attribute cues can be exploited to identify fraudsters, and makes it difficult to spot fraudsters with unsupervised TAGFD. To bridge the gaps, we propose a Case-Adaptive Multi-cue Expert fRAmework (CAMERA) for unsupervised TAGFD. CAMERA employs an ego-decoupled mixture-of-experts architecture, where each expert specializes in modeling a distinct type of fraud-indicative cue. A context-informed gating model is introduced to jointly consider the ego node representation and its local neighborhood context for adaptive integration of cues learned by different experts. Furthermore, CAMERA leverages the inherent rarity of fraudsters to support unsupervised one-class learning with expert-level objectives that encourage modeling dominant benign patterns, thereby enabling reliable unsupervised detection of camouflaged fraudsters. Experiments on 4 challenging datasets show that CAMERA consistently outperforms competitors, showing its effectiveness against semantically camouflaged fraudsters. Code available at https://github.com/CampanulaBells/CAMERA
    Data MiningAnomaly/outlier detectionData MiningMining graphs
  681. #7611

    CoFL: Consensus Driven Human-AI Collaborative Federated Learning

    Zeyuan Cai, Yao Zhang, Zhiwen Yu, Jiaqi Liu, Yuchang Sun, Chenhao Ma, Yilin Zhao
    Federated learning (FL) has emerged as a distributed machine learning paradigm due to its privacy-preserving advantages. Most FL studies assume offline labeled datasets are available at clients. In practice, however, client data often arrive without labels in a streaming manner, making label acquisition a crucial problem. Existing solutions typically rely on human experts or artificial intelligence (AI) models for annotation. Nevertheless, due to the subjectivity among human experts and the inherent limitations of AI models, the labels they provide are often unreliable. In this work, we propose CoFL, a novel consensus driven human-AI collaborative FL method, which utilizes the complementarity between humans and AI models to produce more reliable labels. Moreover, CoFL leverages cross-client consensus among human experts to further enhance the collaboration process. Experiments on three datasets demonstrate that CoFL consistently outperforms baseline methods under various settings. The code is available at https://github.com/huaiguang233/CoFL.
    Humans and AIHuman computation and crowdsourcingHumans and AIHuman-AI collaboration
  682. #7647

    GMM-TDQN:Two-Stage Multi-Objective Reinforcement Learning for Large-Scale Edge Server Deployment

    Zhou Zhou, Tingyu Zheng, Yifu Zeng
    The deployment of edge servers plays a crucial role in supporting large-scale edge computing systems, where multiple conflicting objectives—such as latency, energy consumption, load balancing, and service reliability—must be jointly optimized in complex, dynamic environments. Existing solutions often struggle to scale effectively or to balance these objectives in a unified learning framework. In this paper, we propose GMM-TDQN, a two-stage multi-objective reinforcement learning framework for large-scale edge server deployment. The first stage employs a Gaussian Mixture Model (GMM) to capture spatial and workload heterogeneity, enabling an efficient reduction of the deployment search space. Building upon this structured initialization, the second stage formulates the deployment problem as a sequential decision-making task and adopts a Transformer-enhanced Deep Q-Network (TDQN) to learn adaptive deployment policies that balance multiple objectives. Extensive experiments on real-world datasets demonstrate that GMM-TDQN consistently outperforms state-of-the-art methods, achieving reductions of 29.18% in average latency and 17.55% in energy consumption, while improving load balancing by 27.50% and system reliability by 32.55%. These results validate the effectiveness and scalability of the proposed framework for multi-objective edge server deployment.
    Data MiningApplicationsMultidisciplinary Topics and ApplicationsOther
  683. #7650

    Toward LoRA Copyright Protection with an Authorized Dual-Watermarking Framework

    Zhipeng Yin, Zichong Wang, Ruijun Chen, Xin Ning, Xingyu Zhang, Wenbin Zhang
    Text-to-Image (T2I) diffusion models have been widely adopted due to their strong generative capabilities, while Low-Rank Adaptation (LoRA) has emerged as an efficient mechanism for customizing these models for diverse creative and commercial applications. This trend has fostered LoRA-centric service platforms that that enable the customization and commercial distribution of LoRA modules according to user requirements. However, the growing prevalence of LoRA and its critical role in customized AI services have raised urgent concerns about LoRA copyright protection. To address this gap, we propose LoRA2D, an authorized dual-watermarking framework specifically designed to protect LoRA modules in T2I diffusion models. LoRA2D integrates license-based authorization control with explicit watermarks as visible deterrents for unauthorized or trial usage, which can be removed upon valid authorization, while persistently embedding an implicit watermark for robust black-box ownership verification. Extensive experiments on multiple image-generation datasets demonstrate the effectiveness and practicality of LoRA2D for securing copyrights in LoRA-adapted T2I diffusion models.
    AI Ethics, Trust, FairnesAI and law, governance, regulationAI Ethics, Trust, FairnesEthical, legal and societal issuesAI Ethics, Trust, FairnesSocietal impact of AIAI Ethics, Trust, FairnesTrustworthy AI
  684. #7672

    DiG-Plan: Mitigating Early Commitment for Tool-Graph Planning via Diffusion Guidance

    Yansi Li, Zhuosheng Zhang
    Generating executable tool plans requires selecting appropriate subsets from tool libraries, a combinatorial search problem with an exponentially large solution space. However, we identify a critical misalignment in predominant approaches: standard autoregressive (AR) decoding suffers from early commitment, where initial token choices rigidly constrain the search trajectory. A controlled study shows that masked denoising raises Pass@10 solution coverage from 0.320 to 0.943 over AR sampling under matched compute. Motivated by this, we propose DiG-Plan, a framework that decouples combinatorial exploration from structural refinement. DiG-Plan employs a diffusion-based proposer to generate diverse tool sets via iterative refinement, followed by an AR refiner for dependency prediction. Experiments on TaskBench and API-Bank show that DiG-Plan outperforms AR baselines by a 10% relative margin, with the largest gains on complex compositional tasks.
    Natural Language ProcessingApplicationsNatural Language ProcessingLanguage modelsNatural Language ProcessingToolsPlanning and SchedulingPlanning algorithmsPlanning and SchedulingSearch in planning and scheduling
  685. #7674

    Learning to Price and Stock Under Contextual and Censored Demand

    Zean Han, Zezhen Ding, Jiheng Zhang
    To make optimal joint pricing and inventory control decisions is a critical challenge for modern retailers. In practice, retailers face changing market conditions where demands are influenced by various contextual factors, while simultaneously dealing with the difficulty of lost sales that obscure true demand information. However, existing approaches often fail to account for both contextual information and censored demand observations. We address this gap by presenting a framework where we model demand as a linear combination of basis functions with unknown coefficients, allowing for adaptive pricing and inventory decisions that respond to changing contexts. We propose an efficient algorithm to achieve regret bound O(K sqrt(T) log T) under concave revenue conditions and O(K^(2/3) T^(2/3) (log T)^(1/2)) for the general case, with matching lower bounds confirming optimality. Extensive numerical experiments across diverse scenarios demonstrate our algorithm’s effectiveness.
    Machine LearningLearning theoryMachine LearningModel-based and model learning reinforcement learningMachine LearningMulti-armed banditsMachine LearningOnline learning
  686. #7699

    An Emotion-Preserving Conditional Information Bottleneck for Domain-Generalizable Speech Emotion Recognition

    Zhichen Yuan, C. L. Philip Chen, Shuzhen Li, Tong Zhang
    Domain-generalizable speech emotion recognition (DG-SER) aims to ensure the robustness of SER models across unknown domains, which is essential for real-world human-machine interaction systems. Most DG-SER approaches employ alignment or adversarial strategies with domain labels to promote generalization. However, these strategies often confine generalization to predefined domains, limiting robustness under diverse real-world speech variations. To address these challenges, this paper proposes an emotion-preserving conditional information bottleneck framework (EP-CIB) for domain-free DG-SER. Specifically, EP-CIB introduces a nuisance proxy representation learning module to learn a nuisance proxy as the broad non-emotional variability without requiring any domain annotations, covering both defined and previously unseen domains. It then extracts coarse-grained emotion features via the consistency-aware emotion representation learning module. EP-CIB further introduces an emotion-preserving conditional information bottleneck that, conditioned on the emotion label, disentangles the nuisance proxy from the coarse-grained emotion representation, improving domain-free generalization under open-ended domain shifts. EP-CIB departs from implicit domain surrogates in prior domain-free methods by explicitly learning a proxy for nuisance domains and disentangling it from emotion features, enabling domain-free emotion representation learning. The state-of-the-art performance in both speaker-independent and cross-corpus settings, including an 18% improvement on EmoDB-to-CASIA transfer, demonstrates the effectiveness of EP-CIB for DG-SER.
    Humans and AIHuman-computer interactionMachine LearningRobustnessNatural Language ProcessingSentiment analysis, stylistic analysis, and argument miningNatural Language ProcessingSpeech
  687. #7746

    PersuHMM: Iterative Learning the Hierarchical Meta-Strategy Memory for Persuasive Dialogue

    Yanyue Zhang, Zihao Wang, Xin Zhang, Deyu Zhou
    Persuasive dialogue aims to alter people's attitudes or behaviors through conversation.
    While LLMs can generate emotional responses, they tend to use a uniform approach across varied scenarios, limiting their persuasive effectiveness.
    Diverse demands need varied persuasion strategies, which challenge both LLMs and human efforts.
    To address this, this paper incorporates the idea of meta-learning and proposes PersuHMM, an iterative learning approach for constructing the Hierarchical Meta-strategy Memory to generate Persuasive dialogue.
    Through the interaction between Meta‑layer and Task‑layer, the approach continuously accumulates automated meta‑strategies from generated outcomes and data labels, forming a persuasive meta‑strategy memory that balances efficiency and generalizability.
    To facilitate model retrieval and use, the meta-strategy memory is hierarchically organized via clustering for a clearer and more logical storage structure.
    Experimental results show that the hierarchical meta-strategy memory enhances both the persuasiveness and generalizability of the persuasion model, and even enables cross-model performance improvement.
    Natural Language ProcessingApplicationsNatural Language ProcessingDialogue and interactive systemsNatural Language ProcessingLanguage generation
  688. #7754

    Eliminating Envy in Multigraphs Through Subsidy

    Bo Li, Ankang Sun, Shiji Xing
    Envy-free (EF) allocation is a fundamental problem at the intersection of theoretical computer science and economics. When resources are indivisible, achieving an EF allocation is often impossible. A common remedy is to compensate agents with subsidies. Brustle et al. [EC 2020] proved that a total subsidy of n-1 is both necessary and sufficient to guarantee EF in the worst case, where n is the number of agents and each agent has additive valuations with marginal values of at most 1 per item. In this paper, we consider a constrained setting, namely graph orientation, that was recently introduced by Christodoulou et al. [EC 2023]. In this model, agents correspond to vertices in a multigraph, and resources correspond to edges, with each edge allocated to one of its two incident agents. Despite extensive study of the graph orientation model, it remains unclear how much subsidy is sufficient to guarantee EF. We show that a total subsidy of n/2 is always sufficient to guarantee EF in any multigraph, halving the subsidy required in the unconstrained setting, and provide a polynomial-time algorithm to compute such an allocation. We show that this bound is tight, even for simple graphs.
    Game Theory and Economic ParadigmsFair divisionAgent-based and Multi-agent SystemsResource allocationAIGame Theory and Economic Paradigms
  689. #7756

    Subword Tokenization for Low- and Medium-Resource Languages: A Systematic Evaluation

    Jón Daðason, Hrafn Loftsson
    Subword tokenization is a standard technique for pre-trained language models, mapping text into sequences of tokens from a fixed-size vocabulary. Despite its widespread use, the impact of tokenization algorithms and vocabulary sizes on downstream performance remains underexplored, particularly for low- and medium-resource languages, where suboptimal configurations are difficult to compensate with additional data. Prior studies have mainly focused on high-resource languages, individual tokenization algorithms, or fixed vocabulary sizes, limiting their scope. In this paper, we present a systematic evaluation of subword tokenization strategies in six diverse low- and medium-resource languages: Icelandic, Estonian, Basque, Galician, Nepali, and Tajik. We pre-trained monolingual TEAMS-Small models using WordPiece, BPE, and Unigram tokenizers with vocabulary sizes of 16k, 32k, and 64k, and additionally compared byte-level and character-level tokenization for Icelandic. Evaluation on a benchmark of NLP tasks revealed that Unigram tokenization with a 64k vocabulary consistently outperformed other configurations, with vocabulary size having a greater impact on downstream performance than algorithm choice. These gains were task-dependent, with statistically significant improvements for part-of-speech tagging, named entity recognition, and question answering, but not for dependency parsing and summarization. Finally, byte-level tokenization provided no measurable advantage in a monolingual setting, suggesting that its benefits are primarily relevant for multilingual models.
    Natural Language ProcessingLanguage modelsNatural Language ProcessingPhonology, morphology, and word segmentationNatural Language ProcessingResources and evaluation
  690. #7785

    Mitigating Entity Type Confusion in Cross-Domain NER via Multidimensional Quantification and Reasoning Enhancement

    Jingyu Wang, Shijie Wu, Fusheng Jin
    Cross-domain Named Entity Recognition (CD-NER) aims to transfer the rich knowledge in the source domain to the target domain. Recent studies adopting decomposition or generation paradigms have achieved significant performance improvements, demonstrating high accuracy in entity span detection. However, during entity type classification, models severely suffer from entity type confusion, the erroneous tendency that models classify entities of one type in the text as another similar but incorrect type. To address this issue, we first propose a Multidimensional Confusion Quantification Model (MCQM) that quantifies a model's confusion extent between entity types from three dimensions: source-target hierarchy analysis, semantic similarity analysis, and explicit data evaluation. Moreover, we propose the Progressive Bidirectional Reasoning Chain (PBRC). PBRC leverages the source-target hierarchy and confusion analysis from the MCQM to prompt the LLM to generate two-stage reasoning information. The two-stage reasoning information is utilized to augment the knowledge of the model, significantly mitigating entity type confusion and improving the model's generalization performance. Experimental results demonstrate that our method achieves new state-of-the-art results on all domains of the CrossNER dataset.
    Natural Language ProcessingInformation extraction
  691. #7814

    ThinFormer: Channel Sparse Transformer for Efficient HRW Object Detection

    Wenxi Li, Kunpeng Liu, Moran Liu, Shuyang Liu, Chenyang Lyu, Haozhe Lin, Yuchen Guo
    Object detection in high-resolution wide (HRW) shots presents unique challenges due to the extreme sparsity of objects and the variability in sparsity ratios across images. Conventional detectors, designed for close-up settings like MS COCO, struggle to generalize to these scenarios, leading to inefficient computation and degraded performance. While prior work has focused on spatial sparsity, the potential of sparsity in the channel dimension remains largely underexplored. In this paper, we propose ThinFormer, a dynamic and efficient framework tailored for object detection in HRW shots. ThinFormer leverages sparsity in both the spatial domain and the channel domain, the latter enabled by a novel local feed-forward network (local FFN) that exploits the duality between spatial and channel representations. To adapt to varying levels of scene complexity, we introduce a dynamic sparsity ratio estimator, trained in a self-supervised manner, which guides selective activation of channels based on semantic richness. Extensive experiments on the PANDA benchmark demonstrate that ThinFormer significantly reduces computational cost while maintaining competitive detection performance compared to state-of-the-art methods. Our results highlight the effectiveness of embracing multi-domain sparsity for scalable and robust object detection in complex, real-world HRW imagery. Code and supplementary material are available at https://github.com/LiuKunpeng03/Thinformer.
    Humans and AIApplicationsRoboticsPerceptionRoboticsRobotics and vision
  692. #7822

    Tuple Inconsistency Measures: Toward Explaining Query Answers

    Yurun Gu, Badran Raddaoui, Yue Ma, Aikaterini Tzompanaki, Nicole Bidoit
    In this paper, we introduce novel tuple inconsistency measures that quantify the extent to which individual tuples contribute to violations of denial constraints in inconsistent databases. Then, we formally show that one of the proposed measures fully satisfies a set of well-motivated axiomatic properties. As an application, we lift tuple inconsistency to the level of query answers, obtaining a principled framework for measuring and explaining the inconsistency inherent in query results. We further investigate the computational complexity of tuple and answer inconsistency measures under denial constraints, identifying tractable cases that make their application to real-world data feasible. Finally, our experimental study on varying dataset- and inconsistency-settings demonstrates that, in comparison to existing methods, the proposed measures yield more fine-grained and informative assessments of inconsistency in databases.
    Knowledge Representation and ReasoningApplicationsKnowledge Representation and ReasoningOtherMultidisciplinary Topics and ApplicationsDatabases
  693. #7828

    Mining Statistically Likely k-Reachable States in Probabilistic Programs

    Arnab Ray, Nitesh Trivedi, Ansuman Banerjee, Sourav Chakraborty, Arijit Ghosh, Subhajit Roy
    We propose the notion of statistically likely k-step reachable set in probabilistic programs, a statistically robust notion for high-probability k-step reachable program states. We design an inductive algorithm to capture this set as a symbolic representation in propositional logic for Boolean probabilistic programs. Our methodology iteratively learns a symbolic formula for the statistically likely k-step reachable set that involves (a) learning an initial symbolic candidate via decision tree learning, (b) collecting positive and negative counterexamples via forward and backward verification checks, and (c) refining the current candidate via a sequence of prune and split moves on the decision tree. We demonstrate that the statistically likely k-step reachable set can reveal interesting properties about programs by studying probabilistic programs from the literature.
    Uncertainty in AIProbabilistic programming
  694. #7839

    FedEvoQ: Evolutionary Diversity Probing for Data-Quality-Aware Federated Aggregation

    Leming Wu, Yaochu Jin, Han Yu, Riquan Zhang
    Federated learning (FL) enables collaborative training of deep models while keeping raw data on local devices. A critical factor affecting the global performance is server-side aggregation. The most common strategy weights client updates by the amount of local data, which is often unreliable as the server cannot inspect data. More importantly, it ignores substantial variations in data quality (e.g., label noise) across heterogeneous clients. To address this issue, we propose a data-Quality-aware aggregation framework by introducing an Evolutionary-computation-inspired design into Federated learning (FedEvoQ), with a lightweight dual-branch architecture. Each client trains a main-task branch as usual, while a frozen auxiliary branch (EvoQ) performs evolutionary-style controlled sampling with crossover and mutation at the input and representation level to probe data diversity and reliability. EvoQ outputs a compact quality score that jointly reflects learnability (loss-based signal), geometric consistency (stability under perturbations), representation diversity, and out-of-distribution (OOD) deviation, enabling the server to compute quality-only aggregation weights. Extensive experiments on CIFAR10, CIFAR100, and PathMNIST under noisy and Non-independent and identically distributed (Non-IID) settings demonstrate that FedEvoQ consistently outperforms 10 state-of-the-art (SOTA) federated aggregation methods, improving the average test accuracy of the global model by over 2% with modest overhead.
    Machine LearningEvolutionary learningMachine LearningFederated learningMachine LearningTrustworthy machine learningSearchEvolutionary computation
  695. #7841

    ARMOR: Adaptive Curriculum Meta-Learning for Noise-Robust RAG Reasoning

    Yan Wang, Yuxin Zhang, Shenyu Zhang, Yongrui Chen, Sheng Bi, Guilin Qi
    Retrieval-Augmented Generation (RAG) systems have demonstrated remarkable effectiveness in mitigating hallucinations by incorporating external knowledge. However, the retrieval process inevitably introduces noise, posing significant challenges to RAG robustness. Fundamentally, noise robustness is a composite competency encompassing noise identification, retrieval filtering, and robust reasoning. Unfortunately, existing methods often fail to capture this holistic nature, typically restricted to single-dimensional enhancements or hampered by noise-induced optimization interference. To address these limitations, we decompose RAG robustness into three sub-tasks: knowledge boundary perception, retrieval verification, and robust reasoning. Accordingly, we propose ARMOR, an Adaptive Curriculum Meta-Learning for Noise-Robust RAG Reasoning. Inspired by the teaching process, ARMOR introduces an adaptive task scheduler as a "Teacher" to curate training plans according to the absolute learning progress of the model ("student"). Distinct from GRPO, ARMOR utilizes inter-group advantage estimation during learning to effectively mitigate mode collapse caused by limited output spaces. Experiments on three noise-injected complex reasoning datasets (2WikiMultiHopQA, MusiQue, and HotPotQA) indicate that ARMOR achieves synergistic improvements in retrieval noise robustness, significantly outperforming baselines in output quality.
    Natural Language ProcessingApplicationsNatural Language ProcessingInformation retrieval and text miningNatural Language ProcessingQuestion answering
  696. #7846

    GeneCaDiff: Hierarchical Tissue Image Synthesis from Gene Expression via Multi-Stage Cascaded Diffusion Model

    Weitian Huang, Shuaibo Gao, Bing Liu, Xiaoqi Sheng, Jiazhou Chen, Hongmin Cai
    The scarcity of paired gene expression and pathology images datasets poses a major bottleneck for training large-scale pathology foundation models. Although gene-to-image generative models offer a promising solution, existing methods typically employ coarse-grained conditional control strategies, resulting in entanglement between background textures and cell layouts. To address this challenge, we propose GeneCaDiff, a three-stage cascaded diffusion model that explicitly aligns the generative process with the hierarchical organization of biological tissues. Two conditional DDPMs independently synthesize tissue backgrounds and cellular foreground components, and a subsequent ControlNet-based fusion generator utilizes niche and cellular community maps as dual conditions to synthesize realistic tissue images. Quantitative and qualitative evaluations demonstrate strong realism and diversity, while controllability experiments validate hierarchical, decoupled control over niche textures and community layouts.
    Machine LearningGenerative modelsMachine LearningMulti-modal learningMultidisciplinary Topics and ApplicationsBioinformatics
  697. #7847

    Monitoring Data-aware Temporal Properties

    Alessandro Gianola, Marco Montali, Sarah Winkler
    Dynamic systems in AI are often complex and heterogeneous, so that an internal specification is not accessible and hence verification techniques like model checking are not applicable. Monitoring is in such cases an attractive alternative, as it deals with the observation of desirable properties along traces generated by an unknown dynamic system. In this work, we consider anticipatory monitoring of linear-time properties enriched with an arbitrary SMT theory over finite traces (data-LTLf). Anticipatory monitoring in this setting is a highly challenging problem and undecidable in general, as the monitoring state depends on both the trace prefix seen so far and all its possible finite continuations.
    Under reasonable assumptions on the background theory, we present and formally prove the correctness of a novel foundational framework for monitoring data-LTLf properties. The framework combines automata-theoretic methods to handle the temporal aspects of reasoning with automated reasoning techniques to address the first-order dimension. Moreover, we identify for the first time decidable fragments of this monitoring problem that are practically relevant as they combine linear arithmetic with uninterpreted functions, which covers e.g. data-aware business processes and dynamic systems operating over a database. Feasibility is witnessed by a prototype implementation and preliminary evaluation.
    Agent-based and Multi-agent SystemsFormal verification, validation and synthesisConstraint Satisfaction and OptimizationConstraint satisfactionConstraint Satisfaction and OptimizationSatisfiabiltyKnowledge Representation and ReasoningAutomated reasoning and theorem provingKnowledge Representation and ReasoningReasoning about actions
  698. #7885

    Accelerated Distributed Riemannian Optimization Algorithms with Random Shuffling

    Wenhan Xian, Heng Huang
    Riemannian optimization has attracted increasing attention recently as it is related to a variety of machine learning problems, including principal component analysis, dictionary learning, and mixture modeling. To tackle modern large-scale machine learning tasks, distributed learning is usually considered as an effective solution that trains a global model over multiple worker nodes collaboratively to enhance the computational power. Centralized learning and decentralized learning are two types of distributed learning. Centralized learning employs a central parameter server to coordinate the training process while in decentralized learning, each worker only communicates with its peer neighbors. In this paper, we propose accelerated distributed Riemannian stochastic gradient descent algorithms with random shuffling in the cases of both centralized and decentralized learning. We improve the stochastic first-order oracle complexity of the Riemannian SGD from to where is the size of training data. We conduct an experiment of the leading eigenvector problem under the condition of centralized learning and decentralized learning to validate the performance of our methods.
    Machine LearningFederated learning
  699. #7924

    Re-Weighting Cross-Modal Pairs via Rank Consistency for Noise-Robust Retrieval

    Weiran Pan, Wei Wei
    Image-text alignment relies on large-scale, well-aligned image-text pairs, yet web-crawled datasets inevitably introduce noisy correspondence. Existing methods typically re-weight training pairs by estimating semantic matching degrees from the model's own similarity predictions, which are inherently unreliable under noise and often overestimate partially aligned pairs. To overcome this limitation, we introduce a novel semantic matching degree estimation method based on ranking consistency. Our basic idea is that for a well-aligned image-text pair (I, T), the image I and text T should occupy semantically consistent positions in the embedding space. Thus, using I and T as queries to retrieve other texts (images) should yield two highly consistent rankings. Therefore, we reweight the training paris by measuring the Normalized Discounted Cumulative Gain (NDCG) between its cross-modal and intra-modal retrieval results. This signal is derived from the global data manifold structure rather than point-wise predictions, making it more robust to noise. Experiments on multiple cross-modal retrieval benchmarks validate the effectiveness of our method. Code is available at https://github.com/Aliinton/RCR.
    Machine LearningMulti-modal learningMachine LearningWeakly supervised learning
  700. #7930

    Weight-Aware Branch-and-Bound for Weighted Maximum Satisfiability

    Jialu Zhang, Chu-Min Li, Sami Cherif, Shuolin Li
    The Weighted Partial MaxSAT (WPMS) problem requires finding an assignment that satisfies all hard clauses while maximizing the total weight of satisfied soft clauses. From a computational perspective, WPMS is not merely an extension of the uniform-weight Partial MaxSAT (PMS); rather, it requires search strategies that actively exploit weight information to navigate the solution space. While state-of-the-art heuristic and SAT-based solvers have successfully integrated weight-aware mechanisms, Branch-and-Bound (BnB) solvers, notably WMaxCDCL, largely treat weight information passively during search and pruning. In this paper, we bridge this gap by introducing two novel weight-aware strategies into the BnB framework. Our methods integrate weight information directly into the preprocessing and lower-bound estimation. Experimental results demonstrate that these strategies significantly enhance the performance of WMaxCDCL, establishing a new state-of-the-art for exact WPMS solving.
    Constraint Satisfaction and OptimizationConstraint optimization problemsConstraint Satisfaction and OptimizationSatisfiabiltyConstraint Satisfaction and OptimizationSolvers and toolsSearchCombinatorial search and optimisation
  701. #7964

    Cross-Modal Dynamic Hypergraph Computation via Functional-Structural Brain Network for Brain Disorder Diagnosis

    Jingxi Feng, Heming Xu, Rundong Xue, Junhao Cai, Xudong Chen, Dong Zhang, Shaoyi Du
    Cross-modal brain networks characterize the complex connections between different brain regions from both functional and structural perspectives, which is of significant importance for brain network analysis and the diagnosis of brain diseases. However, existing methods have failed to fully exploit the complementary information between functional and structural brain networks, neglecting the guiding role of topology in the transmission of functional modal information, as well as the potential associations of cross-modal high-order information. To address these challenges, this paper proposes a cross-modal dynamic hypergraph computing (CDHGC) framework for brain disease diagnosis and an in-depth analysis of coupled functional-structural brain networks. The CDHGC comprises two key modules: the topology-guided dynamic hypergraph generation module, which combines the topology-constrained KNN and topology-guided centroid sampling methods to uncover high-order correlations among brain regions across modalities based on topological structure. This module dynamically optimizes the hypergraph structure during training to explore potential associations of cross-modal information. The cross-modal hypergraph convolution module efficiently aggregates information from functional-structural brain networks through a cross-modal attention mechanism during message-passing, resulting in high-order joint representations. Experiments conducted on the ADNI and ABIDE datasets demonstrate that the proposed method outperforms current state-of-the-art approaches. Interpretability analysis reveals that the proposed method can discover multi-modal biomarkers associated with brain diseases.
    Humans and AIBrain sciences
  702. #8002

    Neural Decision-Propagation for Answer Set Programming

    Thomas Eiter, Katsumi Inoue, Sota Moriyama
    Integration of Answer Set Programming (ASP) with neural networks has emerged as a promising tool in Neuro-symbolic AI. While existing approaches extend the capabilities of ASP to real world domains, their reasoning pipelines depend on classical solvers, which is a bottleneck for scalability. To tackle this problem, we propose a new method to compute stable models, called decision-propagation (DProp), which alternates falsity decisions and truth propagations. Successful DProp computations are shown to capture the stable model semantics. We then develop Neural DProp (NDProp), a differentiable extension of DProp with neural computation for decisions and fuzzy evaluation for propagations. We evaluate the capabilities of NDProp for learning decision heuristics as well as neuro-symbolic integration, and compare it with existing neuro-symbolic approaches. The results show that NDProp can learn to efficiently compute stable models, and it improves accuracy and scalability on neuro-symbolic benchmarks.
    Knowledge Representation and ReasoningLogic programmingKnowledge Representation and ReasoningNon-monotonic reasoningMachine LearningNeuro-symbolic methods/Abductive Learning
  703. #8015

    SPOTR: Spatio-temporal Pooling One-Token Reconstruction for Universal Physiological Signal Self-supervised Learning

    Yiyu Gui, Mingzhi Chen, Yuesheng Zhu, Guibo Luo, Yuchao Yang
    Physiological signals such as EEG, ECG, and PPG are widely used in clinical monitoring. Recent self-supervised learning (SSL) methods offer an attractive way to leverage unlabeled recordings, yet they still fall short in practice. In particular, current SSL methods struggle across heterogeneous datasets, often distorting clinically meaningful structures or learning shortcuts from temporal and cross-channel redundancy. Consequently, existing SSL methods often deliver limited performance under linear probing, a lightweight adaptation setting that better matches real-world medical scenarios. Moreover, most Transformer-based SSL models encode a flattened spatiotemporal token sequence, incurring high computation and memory cost, and are typically developed within a single modality. To address these limitations, we present SPOTR (Spatio-temporal Pooling One-Token Reconstruction), a compress-reconstruct pretraining framework that introduces a single-token global bottleneck for physiological signals. SPOTR compresses each waveform into a single-token representation and reconstructs the signal conditioned only on this representation. Meanwhile, SPOTR introduces an efficient spatio-temporal compaction module to reduce computation and memory cost. Pretrained on 20 datasets spanning EEG, iEEG, ECG, and PPG, SPOTR consistently outperforms the strongest baseline under linear probing, improving average AUC by 18.49%, 21.71%, 17.86%, and 4.64%, respectively. Compared with a representative general-purpose time-series foundation model, SPOTR achieves around 78% lower latency and 52% lower peak GPU memory on average. The code and supplementary material can be found at https://github.com/5GYYYYY/SPOTR.
    Multidisciplinary Topics and ApplicationsHealth and medicine
  704. #8016

    Modeling Dynamic Mixtures of Time-Delay Systems from Streaming Time Series

    Ren Fujiwara, Yasuko Matsubara, Yasushi Sakurai
    This research addresses the problem of adaptive modeling in time-series data streams with clear input-output relationships. This problem is challenging because rapid system changes (regime shifts) caused by environmental factors or input delay changes degrade model performance, and the trade-off among accuracy, robustness, and memory usage arises when using multiple small models for each time-series pattern. To address these issues, this paper presents an online framework/method that treats streaming time series as dynamic mixtures of time-delay systems. This framework maintains robustness of model tracking and reduces memory usage by summarizing past regimes using a fixed-length representation that captures both the system dynamics and input-output delays. Concretely, this approach constructs a summary system tensor using the system's Markov parameter series, capturing both dynamic behavior and delay characteristics. If necessary, a tensor decomposition algorithm extracts relevant past models from the tensor and helps select the system that best fits the current regime. This method enables rapid adaptation to environmental changes and is computationally efficient. Tests on real datasets show that DelayMix consistently outperforms other methods, achieving superior forecast accuracy and faster adaptation to delays, especially for highly non-stationary data.
    Data MiningMining data streamsData MiningMining spatial and/or temporal dataData MiningTheoretical foundations of data mining
  705. #8095

    Distribution-Aware Energy Minimization: Physical-Inspired Efficient Active Learning and Quantum Potentials

    Zhicheng Yao, Wenguo Yang, Yancheng Chen, Dun Ma, Shengminjie Chen, Xiaoming Sun
    Active learning aims to maximize model performance with minimal annotation costs by selecting the most informative samples from large unlabeled pools, which often face a budget dilemma: uncertainty-based methods induce redundancy under low budgets, while representativeness-based methods struggle to mine challenging samples under high budgets. Although some heuristic parameter interpolation schemes attempt to bridge this gap, such strategies suffer from a misalignment between theoretical assumptions and real-world distributions, failing to achieve a good exploration-exploitation balance across all scenarios. In this paper, we propose a general active learning framework based on Distribution-Aware Energy Minimization, which reformulates sample selection as minimizing the energy function for the distributional discrepancy between the selected subset and the global uncertainty field. This physical-inspired perspective naturally derives a Hamiltonian comprising attractive terms and repulsive terms, mathematically achieving an intrinsic and dynamic balance between uncertainty and representativeness. Furthermore, we transform the optimization objective to the ground-state search of an Ising Model, enabling efficient solutions via Coherent Ising Machine. Extensive numerical experiments show that our method outperforms prior state-of-the-art methods across multi-budget regimes on several benchmark datasets. Validation on a real quantum hardware also demonstrates the potential for quantum computer in future large-scale selection tasks.
    Computer VisionEfficiency and OptimizationMachine LearningActive learningMachine LearningSupervised Learning
  706. #8102

    Beyond Stability: Improved Efficiency Guarantees for α-Stable Matchings

    Isabel Fernandez Abad, Sophie Klumper, Guido Schäfer
    Stable matching mechanisms are fundamental to market design but face an inherent tension between stability and social welfare optimality. We study a natural relaxation of stability, termed α-stability, which models agents as willing to deviate only when the potential improvement is sufficiently large. Under α-stability, no pair of agents can deviate and improve their valuations by more than a factor of 1/α, with α ∈ (0,1]. We provide a complete characterization of the stability--efficiency tradeoff under asymmetric valuations. This tradeoff depends on the degree of asymmetry μ ∈ (0,1], which bounds the ratio between agents’ valuations for any pair. Our results show that relaxing stability can substantially improve achievable efficiency guarantees. We further present a polynomial-time algorithm that computes an α-stable matching attaining the best possible efficiency guarantee. For α ≤ μ/(μ+1), our algorithm achieves 1-efficiency; for larger α, it computes an α-stable matching achieving at least (1/α) · μ/(μ+1) of the optimal social welfare. Remarkably, our algorithm inflates the values of an optimal matching and then applies the Gale–Shapley algorithm to the modified instance. Finally, we show that computing an optimal α-stable matching is NP-hard, even under slight relaxations of stability, i.e., for α close to 1.
    Game Theory and Economic ParadigmsAuctions and market-based systemsGame Theory and Economic ParadigmsComputational social choice
  707. #8119

    The Complexity of Tournament Fixing: Subset FAS Number and Acyclic Neighborhoods

    Yuxi Liu, Junqiang Peng, Mingyu Xiao
    The Tournament Fixing Problem (TFP) asks whether a knockout tournament can be scheduled to guarantee that a given player v* wins. Although TFP is NP-hard in general, it is known to be fixed-parameter tractable (FPT) when parameterized by the feedback arc/vertex set number, or the in/out-degree of v*. However, it remained open whether TFP is FPT with respect to the subset FAS number of v* --- the minimum number of arcs intersecting all cycles containing v* --- a parameter that is never larger than the aforementioned ones. In this paper, we resolve this question negatively by proving that TFP stays NP-hard even when the subset FAS number of v* is constant ≥ 1 and either the subgraph induced by the in-neighbors D[N_{in}(v*)] or the out-neighbors D[N_{out}(v*)] is acyclic. Conversely, when both D[N_{in}(v*)] and D[N_{out}(v*)] are acyclic, we show that TFP becomes FPT parameterized by the subset FAS number of v*. Furthermore, we provide sufficient conditions under which v* can win even when this parameter is unbounded.
    Game Theory and Economic ParadigmsComputational social choice
  708. #8133

    BERM: Low-Overhead Prompt-Injection Detection via In-Situ Benign Representation Modeling

    Maihao Guo, Chaoyang Zhao, Jinqiao Wang
    Real-world deployment of large language models (LLMs) necessitates a robust and low-latency approach to detect prompt injections; existing low-overhead methods fail to simultaneously boost robustness and reduce latency. Current defenses for prompt injection either rely on brittle heuristics or invoke costly auxiliary models, imposing a significant runtime burden. We introduce BERM, a lightweight framework that performs in-situ detection by modeling a host LLM’s internal representations extracted during prefill, adding negligible overhead. Our approach trains a lightweight classifier atop the LLM by learning a compact manifold of benign representations via joint contrastive learning to maximize the separation from malicious representations. At inference, this pre-trained classifier enables in-situ detection without invoking auxiliary guard models. On a diverse landscape of prompt injection attacks, our framework establishes a new state-of-the-art, achieving an F1-score 5.2 percentage points (pp) higher than the best prior work. Critically, BERM achieves this while being over 12x faster, reducing inference overhead to near-zero.
    AI Ethics, Trust, FairnesSafety and robustnessMultidisciplinary Topics and ApplicationsSecurity and privacyMachine LearningRepresentation learningNatural Language ProcessingEmbeddingsNatural Language ProcessingLanguage models
  709. #8141

    Quantifying Semantic Inertia in Large Language Models Under Rapid Topic Switching

    Junxin Wang, Yuchao Wang, Hongkai Zhang
    In multi-turn interactions, large language models (LLMs) often exhibit a persistent influence from prior turns, even after an explicit topic switch. This behavior, which we term semantic inertia, can cause responses to deviate from the expected output distribution for an independent task, undermining task isolation and reliability. This paper introduces a rigorous experimental framework to systematically characterize the nature, form, and dynamics of semantic inertia. We propose an operational definition and a causal-contrastive method that isolates semantic carryover from confounding factors like context length. Through a series of experiments on five leading LLMs, we (i) confirm the existence of semantic inertia and identify its boundary conditions; (ii) model its decay over the course of generation, revealing a characteristic timescale and a heavy-tailed distribution; (iii) decompose its effects on three distinct channels—factual accuracy, structural integrity, and stylistic expression; and (iv) probe its controllability using prompt-based interventions. Our key findings show that inertia is not a simple length effect but an intrinsic dynamic, strongest in the initial part of a generation and decaying over a timescale of approximately 100-200 tokens. Its impact is most pronounced as a stylistic residue, while its effect on factual correctness is weaker and highly dependent on the task and domain switch. Crucially, we find that prompt-level ``reset'' instructions are unreliable and often counter-productive, while conflicting constraints consistently amplify, rather than resolve, output deviation. These results suggest that governing semantic inertia requires system-level state management mechanisms rather than relying on prompt engineering alone.
    AI Ethics, Trust, FairnesExplainability and interpretabilityAI Ethics, Trust, FairnesSafety and robustnessKnowledge Representation and ReasoningBelief changeKnowledge Representation and ReasoningDiagnosis and abductive reasoning
  710. #8155

    Semantic Geometric Calibration in Randomized Neural Feature Space

    Itay Abuhazera, Liron Cohen, Gil Einziger
    Modern neural networks can be accurate yet poorly calibrated. We present Random Geometric Calibration (RGC), a post hoc method that augments confidence estimation with a geometric signal derived from distances to training data in an aggregated semantic representation space. Prior geometric calibration methods in feature space often rely on architecture-specific layer choices, which can be unstable across models and datasets. To avoid this dependence, RGC samples a fixed number of intermediate blocks uniformly at random and reuses this set at both calibration and test time.
    We instantiate RGC with RGCL, which applies spatial pyramid pooling to each sampled activation, projects features to a fixed dimension, and computes a nearest-neighbor separation score that is mapped to probabilities via isotonic regression.
    We also present RGCC, which replaces pooling and projection with uniform coordinate sampling for improved computational efficiency.
    We evaluate RGCL and RGCC across multiple datasets, including CIFAR-10, CIFAR-100, and Tiny-ImageNet, and across a diverse set of architectures such as ResNet, DenseNet, and DINOv2. Our results show that RGCL achieves state-of-the-art or near state-of-the-art calibration performance, reducing expected calibration error by approximately 78% on average and by up to 97% for some models.
    Finally, we provide representation analysis that explains why RGCL and RGCC succeed without layer selection and characterize their computational overheads. We observe that layers exhibiting favorable geometric structure tend to emerge in later network blocks, and that randomized aggregation implicitly emphasizes these layers, leading to robust and reliable calibration performance.
    Machine LearningTrustworthy machine learningMachine LearningGeometric learningUncertainty in AIUncertainty representationsMachine LearningClassificationMachine LearningEvaluation
  711. #8160

    CT2BSE: 3D BSE Microstructural Image Cross-Device Generation from µCT for Cement Hydration via Voxel Swin Transformer

    Haozhong Gao, Liangliang Zhang, Yamin Han, Ruiqi Han, Lin Wang, Bo Yang
    Acquiring three-dimensional(3D) microstructural images of cement hydration reveals critical microscale features essential for understanding hydration mechanisms and advancing material development. Micro-computed tomography(µCT) is widely used to capture these images due to its non-destructive, repeatable 3D imaging capability, despite at high operational cost. However, µCT suffers from limited resolution, weak texture, and missing phase information. In contrast, backscattered electron(BSE) provides high-resolution, phase-sensitive, and texture-rich images but are confined to 2D form. Inspired by the idea of fusing BSE style into µCT images for computationally imaging 3D BSE data, this paper proposes a cross-device generation method, termed CT2BSE, to construct 3D BSE microstructural images from µCT for cement hydration. A Voxel Swin Transformer is proposed to extract 3D μCT features by improving Video Swin Transformer with a Content-Aware Positional Encoding–3D, making encoder better suited for homogeneous images such as cement. Furthermore, a Holistic Style Injector–3D is proposed to fuse 3D μCT features with 2D BSE characteristics, thereby mitigating feature instability caused by the extreme intensity variations typically exhibited in cement images. Subsequently, the fused features are decoded into 3D volumes aligned with the BSE style while preserving original microstructure. Experimental results demonstrate that CT2BSE can generate high-fidelity 3D BSE images from μCT.
    Computer Vision3D computer visionComputer VisionComputational photographyComputer VisionImage and video synthesis and generation
  712. #8163

    Reexamining the Exploration–Exploitation Dilemma from an Entropy-Driven Perspective

    Renye Yan, Yaozhong Gan, Jikang Cheng, Yi Sun, Zongwei Wang, Ling Liang, Yimao Cai
    Achieving an optimal balance between exploration and exploitation remains a fundamental challenge in reinforcement learning. This work revisits the exploration-exploitation dilemma through the lens of entropy, offering a novel perspective on this enduring problem. It establishes a theoretical connection between policy's entropy and exploratory behavior, using entropy as a rational measure to quantify the exploration-exploitation trade-off. Theoretical analyses demonstrate that a modified Bellman equation, augmented with a novelty-seeking term, ensures appropriate entropy adjustment and guarantees its globally monotonic decay. The derived policy optimization process inherently accommodates all three regimes of the exploration-exploitation spectrum, enabling a principled transition from exploration to exploitation. Building on these theoretical insights, this work introduces AdaZero, an adaptive deep architecture that dynamically balances exploration and exploitation. Extensive empirical evaluations highlight AdaZero's robust performance and validate the theory's feasibility.
    Machine LearningReinforcement learning
  713. #8203

    Benchmarking Real-Time Question Answering via Executable Code Workflows

    Wenjie Zhou, Yuan Gao, Xin Zhou, Hao Fu, Zhongjian Miao, Wei Chen, Bo Chen, Xiaobing Zhao
    Retrieving real-time information is a fundamental capability for search-integrated agents in real-world applications. However, existing benchmarks are predominantly static and therefore fail to capture the temporal dynamics of information and the continuously evolving nature of real-world knowledge.
    To address this limitation, we propose RT-QA, a dynamic evaluation framework that leverages executable code workflows to retrieve up-to-date answers at evaluation time.
    Specifically, we construct an agent-driven pipeline that autonomously generates code for web crawling and DOM-based answer extraction to produce real-time ground truth. To ensure robust evaluation over time, the pipeline further incorporates a self-repair mechanism to adapt to changes in web page structures.
    RT-QA spans 12 domains (e.g., Finance, Sports) with 320 Chinese questions categorized into three difficulty levels.
    Extensive evaluations of state-of-the-art models (e.g., GPT-5.2, GLM-4.7) reveal significant limitations in real-time adaptability: even the best models achieve only 46% accuracy.
    Our analysis highlights two primary failure modes:
    (1) Lazy Retrieval, where agents rely on search snippets instead of deeply scanning specific websites for information (20\% of failures); and
    (2) Temporal Confusion, a cognitive error where agents retrieve a historical date (e.g., an event in 2024) and fail to re-anchor to the current time (2026) for subsequent reasoning.
    These findings suggest that future agents require not just better retrieval strategies, but robust temporal state management.
    Natural Language ProcessingResources and evaluation
  714. #AI4G6

    PhyTTA: Physics-Informed Test-Time Adaptation of Foundation Models for Regional Drought Prediction

    Wentao Gao, Jiuyong Li, Lin Liu, Thuc Le, Jixue Liu, Yun Chen, Yanchang Zhao
    Drought prediction is crucial for disaster mitigation, yet it remains challenging due to the complexity and variability of drought events. Although time series foundation models (TSFMs) have shown great potential in general time series forecasting problems, they struggle to adapt to regional hydrological information. They often underestimate the impact of regional precipitation or temperature anomalies on drought indices like SPEI. This problem arises because general pretraining captures averaged time series patterns, which do not account for the unique climatic and hydrological characteristics of specific regions. To bridge this gap, we introduce Phy-TTA, a physics-informed adaptation framework designed to restore the physical consistency of drought forecasts. Rather than updating model parameters, which might overfit random weather noise, Phy-TTA corrects prediction errors by explicitly modeling the causal link between physical forcing (e.g., rainfall deficits) and drought. Our theoretical analysis highlights the critical role of incorporating physics-driven information to enhance the accuracy and reliability of drought predictions. Experiments across multiple regions demonstrate that Phy-TTA consistently improves performance.
    Multidisciplinary Topics and ApplicationsMultidisciplinary Topics and ApplicationsHumans and AIHumans and AI
  715. #AI4G10

    When Can We Trust Fairness Audits? Identifying Reliability Boundaries of Third-party Audit Conclusions

    Yuanhao Liu, Qi Cao, Huawei Shen
    Fairness auditing aims to assess whether a model is fair, playing a critical role in identifying potential risks in deployed AI systems. In practice, due to limited access, third-party auditors often rely on self-collected datasets (e.g., via sock-puppets), which may differ from real-world deployment scenarios. Such discrepancy can lead to inconsistencies between audit conclusions on the collected data and those in actual deployment, raising concerns about the reliability of third-party audits. This motivates a critical question: When can we trust the fairness audit conclusion derived from third-party datasets? Answering this question is challenging, as the actual deployment distribution is typically inaccessible or unobservable. To tackle this, we introduce the Consistency Radius, a metric that quantifies the maximum distribution shift under which an audit conclusion based on third-party dataset remain consistent. We further propose a convex relaxation optimization-based method to estimate the radius relying solely on model responses over the audit dataset. Leveraging this framework, third-party auditors can provide their datasets to model providers and request the magnitude of distributional discrepancy relative to the deployment distribution, enabling reliable audit conclusions without requiring any direct data access.
    AI Ethics, Trust, FairnessAI Ethics, Trust, FairnessAI4GAI Ethics, Trust, Fairness
  716. #AI4G12

    STAMP: Multi-Pattern Attention-Aware Multiple Instance Learning for STAS Diagnosis in Multi-Center Histopathology Images

    Liangrui Pan, Xiaoyu Li, Chenchen Nie, Yaning Yang, Shaoliang Peng
    Spread through air spaces (STAS) constitutes a novel invasive pattern in lung adenocarcinoma (LUAD), associated with tumor recurrence and diminished survival rates. However, large-scale STAS diagnosis in LUAD remains a labor-intensive endeavor, compounded by the propensity for oversight and misdiagnosis due to its distinctive pathological characteristics and morphological features. Consequently, there is a pressing clinical imperative to leverage deep learning models for STAS diagnosis. This study initially assembled histopathological images from STAS patients at the Second Xiangya Hospital and the Third Xiangya Hospital of Central South University, alongside the TCGA-LUAD cohort. Three senior pathologists conducted cross-verification annotations to construct the STAS-SXY, STAS-TXY, and STAS-TCGA datasets. We then propose a multi‑pattern attention-aware multiple instance learning framework, named STAMP, to analyze and diagnose the presence of STAS across multi‑center histopathology images. Specifically, the dual‑branch architecture guides the model to learn STAS‑associated pathological features from distinct semantic spaces. Transformer-based instance encoding and a multi‑pattern attention aggregation modules dynamically selects regions closely associated with STAS pathology, suppressing irrelevant noise and enhancing the discriminative power of global representations. Moreover, a similarity regularization constraint prevents feature redundancy across branches, thereby improving overall diagnostic accuracy. Extensive experiments demonstrated that STAMP achieved competitive diagnostic results on STAS-SXY, STAS-TXY and STAS-TCGA, with AUCs of 0.8058, 0.8017, and 0.7928, respectively, surpassing the clinical level. The 10 open baseline results establish a benchmark for STAS diagnostic research and facilitate the future generalizability and clinical integration of computational pathology technologies. Dataset features and code are accessible at https://github.com/panliangrui/IJCAI2026.
    Knowledge Representation and ReasoningKnowledge Representation and ReasoningMachine LearningMachine LearningAI4GComputer VisionAI4GData Mining
  717. #AI4G33

    Suicide Risk Assessment from AI-powered Video Surveillance: An Interpretable Framework for Prevention in Metro Stations

    Safwen Naimi, Wassim Bouachir, Guillaume-Alexandre Bilodeau, Brian Mishara
    Understanding and monitoring human behavior in metro stations play an important role in supporting suicide prevention efforts, where early identification of high-risk situations can enable timely intervention. This requires assessing suicide risk from a surveillance video by jointly reasoning about the behavior of each passenger, his/her spatial context, and temporal dynamics. However, this assessment using videos captured by surveillance cameras is challenging, as it demands accurate perception of human motion, understanding of platform geometry, and aggregation of heterogeneous behavioral cues over time. In this work, we formalize the task of Suicide Risk Assessment (SRA) in metro stations and introduce the first interpretable framework that addresses this challenge. Unlike approaches that focus on isolated subtasks or attempt to infer intent directly, our formulation assesses suicide risk from accumulated evidence by incorporating person tracking, activity recognition, semantic segmentation of the platform, and trajectory-driven risk heatmap modeling. By formalizing SRA as a distinct task and benchmarking a complete operational pipeline achieving 83.2% ROC-AUC on real surveillance data, this work highlights the complexity of suicide risk assessment and opens new directions for research on interpretable AI systems for social good.
    Computer VisionComputer VisionHumans and AIHumans and AIMultidisciplinary Topics and ApplicationsMultidisciplinary Topics and Applications
  718. #AI4G34

    Policy-Embedded Graph Expansion: Networked HIV Testing with Diffusion-Driven Network Samples

    Akseli Kangaslahti, Davin Choo, Lingkai Kong, Milind Tambe, Alastair Van Heerden, Cheryl Johnson
    HIV is a retrovirus that attacks the human immune system and can lead to death without proper treatment. In collaboration with the WHO and a large South African university, we study how to improve the efficiency of HIV testing with the goal of eventual deployment, directly supporting progress toward UN Sustainable Development Goal 3.3. While prior work has demonstrated the promise of intelligent algorithms for sequential, network-based HIV testing, existing approaches rely on assumptions that are impractical in our real-world implementations. Here, we study sequential testing on incrementally revealed disease networks and introduce Policy-Embedded Graph Expansion (PEGE), a novel framework that directly embeds a generative distribution over graph expansions into the decision-making policy rather than attempting explicit topological reconstruction. We further propose Dynamics-Driven Branching (DDB), a diffusion-based graph expansion model that supports decision making in PEGE and is designed for data-limited settings where forest structures arise naturally, as in our real-world referral process. Experiments on real HIV transmission networks show that the combined approach (PEGE + DDB) consistently outperforms existing baselines (e.g., 13% improvement in discounted reward and 9% more HIV detections with 25% of the population tested) and explore key tradeoffs that drive decision quality.
    Multidisciplinary Topics and ApplicationsMultidisciplinary Topics and Applications
  719. #AI4G35

    DART: Navigating Last-Mile Heterogeneity in Instant Delivery via Distribution-Adaptive Splines

    Hao Xiong, Yang Gao, Haiyong Luo, Fang Zhao, Dan Luo
    On-demand delivery platforms rely on Travel Time Estimation (TTE) to balance courier earnings and overdue risks. In collaboration with one of China's largest platforms, we address a critical "Fairness Gap" in TTE: current systems fail to capture complex delivery patterns in GNSS-denied environments, subjecting couriers handling high concurrent order volumes to disproportionate pressure due to overdue deliveries. Analyzing 1.27 million real-world trajectories, we attribute this bias to unique challenges in GNSS-denied scenarios: distributional heterogeneity, structural heterogeneity, and contextual uncertainty. To bridge this gap, we propose DART (Distribution-Adaptive Robust Timing). DART incorporates a Learnable Adaptive Spline (LAS) encoder with a gradient-driven knot migration mechanism to enhance non-linear expressiveness for outliers, significantly improving long-tail accuracy. Furthermore, a Spatio-Temporal Transition Graph (STTG) reconstructs the latent topology by integrating sequence semantics, such as Wi-Fi-sensed arrival merchant timestamps. At the same time, a Distribution Gating Mechanism characterizes delivery time distributions under distinct contexts. Through extensive experiments and large-scale online A/B testing, DART not only reduces MAE by 14.0% in complex environments but also decreases the Order Overdue Rate by 1.7% (saving $24,000 daily), demonstrating how AI effectively reconciles operational efficiency with labor fairness.
    Machine LearningMachine LearningData MiningData MiningHumans and AIHumans and AI
  720. #AI4G38

    Beyond Vision: A Multimodal Dataset and Framework for Pest Recognition via Plant Electrophysiological Signals

    Lu Wang, Jiaming Lin, Yuting Ye, Tao Wei, Chuchu Qin
    Precise pest identification is essential for sustainable agriculture. Current visual recognition systems are brittle in the wild, where performance degrades due to occlusion and variable illumination. In contrast, plant electrophysiological signals serve as a robust, all-weather physiological modality, capable of detecting cryptic feeding behaviors that escape optical sensors. However, this field remains constrained by the scarcity of data and the absence of specialized algorithms. To bridge this gap, we introduce the Herbivory-Induced Plant Bio-signal Multimodal (HIPB-MM) dataset, the first fine-grained dataset comprising 4,023 synchronized plant electrophysiological signal-video pairs recording the feeding processes of three typical pest species. To address the weak and non-stationary nature of these signals, we propose the Herbivory-Induced Physiological Sensing (HIPS) framework. It integrates a Morphological Semantic Decoupling strategy to recover robust slow-wave semantics, and a Generation-State Encoder to model latent physiological states. Complementing this, an auxiliary dual-stream visual branch calibrates signal representations using explicit behavioral and morphological cues. Experiments demonstrate that HIPS establishes a solid benchmark (69.81% accuracy), comprehensively outperforming state-of-the-art baselines. Crucially, this work validates plant electrophysiology as a low-cost, all-weather modality for sustainable crop protection, effectively reducing pesticide dependency and safeguarding ecosystem health.
    Data MiningData MiningMachine LearningMachine LearningMultidisciplinary Topics and ApplicationsMultidisciplinary Topics and Applications
  721. #AI4G39

    Towards an Early Warning System for Ocean Heat Extremes Through AI-Ocean Dynamics Synergy

    Zheng Jiang, Wei Wang, Gaowei Zhang, Yifei Bao, Zengzhou Hao, Lingyu Xu, Suixiang Shi, Lei Wang, Yi Wang
    Ocean heat extremes, including marine heatwaves and the El Ni\~no–Southern Oscillation (ENSO), exert profound impacts on marine ecosystems and socio-economic stability. Establishing robust early warning systems is critical for proactive risk management; however, conventional predictive models often fail to generalize to the intensifying, non-stationary extremes driven by rapid global warming. This project introduces a novel AI-Ocean Dynamics synergy designed to provide an integrated early warning system. By synthesizing multi-source observations with physics-informed neural networks, it ensures predictions remain constrained by fundamental physical laws. The system forecasts event onset, intensity, duration, and spatial extent while simultaneously attributing the underlying mechanisms, such as ocean advection and air–sea heat exchange. To validate performance, we establish a specialized ocean heat extremes benchmark to assess predictive skill and attribution reliability. Furthermore, the system incorporates an incremental learning mechanism, enabling continuous adaptation to long-term climatic and environmental evolutions. This project advances the development of reliable, interpretable, and adaptive early warning systems, providing a vital tool for informed policy and maritime decision-making.
    Multidisciplinary Topics and ApplicationsMultidisciplinary Topics and Applications
  722. #AI4G43

    Improving Survey Participation in Low-Literacy Populations Through Value-Sensitive Conversational AI

    Raj Gaurav Maurya
    Collecting reliable social data from low-literacy populations remains a persistent challenge, particularly when surveys involve sensitive topics and marginalized communities. Traditional paper-based and web-based survey modalities often suffer from high attrition and incomplete responses due to literacy barriers, social pressure, and interactional discomfort. In this paper, we present findings from an initial field evaluation comparing multiple survey modalities—paper-based interviews, digital web-based surveys, conversational AI surveys, and conversational AI enhanced with layered value-sensitive design—conducted with low-literacy women across India. Using data from 315 participants, we show that conversational AI significantly improves survey completion rates relative to traditional modalities, with the highest completion and lowest drop-off observed when value-sensitive and culturally aligned conversational design elements are fully integrated. These results demonstrate the importance of human-centered and value-sensitive interaction design in enabling inclusive, ethical, and scalable data collection for AI-for-social-good applications.
    Humans and AIHumans and AIUncertainty in AIUncertainty in AIKnowledge Representation and ReasoningKnowledge Representation and Reasoning
  723. #AI4G44

    CHAM-net: A Contrastive Hierarchical Adaptive Meta-network for Robust Global Methane Flux Prediction

    Rongchao Dong, Yiming Sun, Shuo Chen, Youmi Oh, Licheng Liu, Yiqun Xie, Xiaowei Jia
    Methane is a potent greenhouse gas that significantly contributes to global warming. However, accurately estimating global methane emissions and consumption remains challenging due to the complex interactions among environmental drivers that may vary across spatial and temporal scales. Prior data-driven methods often overlook the inherent spatiotemporal heterogeneity of ecosystems, failing to explicitly capture site-specific characteristics and cross-year evolutionary dynamics. To address these issues, we propose the Contrastive Hierarchical Adaptive Meta-network (CHAM-net), a novel framework that explicitly learns from historical context to capture site-specific dynamics. CHAM-net employs a hierarchical encoder–decoder architecture, in which the encoder captures site-specific characteristics from historical data and then dynamically conditions the decoder to generate the final prediction. Experimental results demonstrate that CHAM-net consistently outperforms all baseline methods on both simulation and observational datasets for methane emission and consumption, achieving nRMSE values as low as 0.43 and 0.88 with corresponding R^2 scores up to 0.97 and 0.68 for emission prediction.
    Multidisciplinary Topics and ApplicationsMultidisciplinary Topics and ApplicationsMachine LearningMachine LearningData MiningData Mining
  724. #AI4G50

    Optimizing Sensor Placement with Greedy Algorithms: A Case Study in Wildlife Camera Trapping for Spatial Capture-Recapture Population Estimation

    Hannah Murray, Amrita Gupta, Arielle W. Parsons, Justin P. Suraci, Bistra Dilkina
    Estimating wildlife populations is central to conservation planning, yet designing sensor deployments that produce reliable data for such estimates remains challenging. Spatial capture-recapture (SCR) models, widely used to estimate animal population sizes, are highly sensitive to sensor layout, where poor placement can substantially increase uncertainty in population estimates. We present a novel framework that formulates camera trap placement as a scenario-based optimization problem under real-world resource constraints. Using collections of simulated animal capture histories spanning ecologically plausible parameter ranges, candidate sensor placements are evaluated via closed-form, SCR-derived design criteria linked to the precision of population estimates and optimized using both genetic algorithms and a greedy search strategy. We demonstrate our approach using data from a hypothetical American pine marten camera trapping study in British Columbia's South Chilcotin Mountains, achieving lower relative standard error and bias in population estimates than baselines. Our method was developed in close collaboration with conservation practitioners and is currently being used in real-world wildlife monitoring programs. This framework offers a general approach for designing wildlife surveys that support reliable population estimation across a range of realistic ecological scenarios.
    Multidisciplinary Topics and ApplicationsMultidisciplinary Topics and ApplicationsConstraint Satisfaction and OptimizationConstraint Satisfaction and Optimization
  725. #AI4G52

    PACE: A Personalized Adaptive Curriculum Engine for 9-1-1 Call-taker Training

    Zirong Chen, Hongchao Zhang, Meiyi Ma
    9-1-1 call-taking training requires mastery of over a thousand interdependent skills, covering diverse incident types and protocol-specific nuances. A nationwide labor shortage is already straining training capacity, but effective instruction still demands that trainers tailor objectives to each trainee's evolving competencies. This personalization burden is one that current practice cannot scale.
    Partnering with local 9-1-1 call center, we propose \textit{PACE} (\textbf{P}ersonalized \textbf{A}daptive \textbf{C}urriculum \textbf{E}ngine), a co-pilot system that augments trainer decision-making by (1) maintaining probabilistic beliefs over trainee skill states, (2) modeling individual learning and forgetting dynamics, and (3) recommending training scenarios that balance acquisition of new competencies with retention of existing ones.
    PACE propagates evidence over a structured skill graph to accelerate diagnostic coverage and applies contextual bandits to select scenarios that target gaps the trainee is prepared to address.
    Empirical results show that PACE achieves 19.50\% faster time-to-competence and 10.95\% higher terminal mastery compared to state-of-the-art frameworks. Co-pilot studies with practicing training officers further demonstrate a 95.45\% alignment rate between PACE's and experts' pedagogical judgments on real-world cases. Under estimation, PACE cuts turnaround time to merely 34 seconds from 11.58 minutes, up to 95.08\% reduction.
    Multidisciplinary Topics and ApplicationsMultidisciplinary Topics and Applications
  726. #AI4G53

    An LLM-based Chain-of-Response Counter-Scam System

    Heedou Kim, Mogan Gim, Donghee Choi, Soonil Bae, Hoonick Lee, Mi-Young Kim, Jaewoo Kang
    The rapid evolution of online scams, driven by transnational networks and mass-produced social engineering scenarios, has exposed the speed limitations of conventional detection, necessitating tighter inter-agency coordination. While LLMs show promise in scam identification, their role in accelerating integrated response frameworks remains underexplored. We propose Counter-Scam, a unified LLM-based multi-agent framework that orchestrates end-to-end response from initial detection to crime investigation. The framework first proposes safe data guidelines, emphasizing non-public scam data and secure dataset construction via scam-specific NER. Developed with insights from 37 stakeholders to reduce delays and improve analytical efficiency, the system integrates CSRA (multi-agent mitigation), CSRT (nine role-aligned NLP tasks), and CSRD (a corpus of 185,300 scam cases and 38,587 knowledge entries). Experiments show that fine-tuned sLLMs surpass commercial models with over 10% in all CSRT tasks and a 0.24 F1 improvement in scam-specific NER. This proves the framework's capability for enabling rapid, collaborative mitigation of online scam.
    Agent-based and Multi-agent SystemsAgent-based and Multi-agent SystemsMultidisciplinary Topics and ApplicationsMultidisciplinary Topics and ApplicationsNatural Language ProcessingNatural Language ProcessingAI Ethics, Trust, FairnessAI Ethics, Trust, Fairness
  727. #AI4G62

    Domain-Informed Graph Neural Networks for Climate Factor Forecasting to Support Sustainable Crop Management

    Ziyue Sun, Zixin Jiang, Chenkai Xu, Xinggao Liu
    Forecasting climate factors is critical for anticipating agro-climatic risks and enabling sustainable crop management. However, accurate prediction remains challenging due to complex spatiotemporal variability, heterogeneous seasonal patterns, and intricate interdependencies among climate variables. Inspired by agronomic knowledge, We propose DoIGNN, a Domain-Informed Graph Neural Network that injects a domain-structured graph constraint built from Agro-Climatic Homogeneous Zones (ACHZs). Specifically, we partition stations into agro-climatic zones using long-term climatic statistics and location attributes, and construct a hierarchical ACHZ-guided adjacency. To better capture shared climate dynamics, we introduce a spatiotemporal decomposition module with temporal regularization that factorizes the climate tensor into low-rank global temporal bases and station loadings, yielding a compact station-level global component as auxiliary information for target forecasting. Finally, DoIGNN performs forecasting on both the ACHZ-guided and static-dynamic graphs to learn cross-region dependencies. Experiments on real-world climate datasets demonstrate that DoIGNN consistently improves forecasting accuracy over strong baselines while yielding more interpretable spatial dependency patterns that support climate-informed crop management decisions. Cooperating with Ningbo Natural Resources and Planning Big Data Center, the proposed model has been trained and deployed for local data analysis.
    Data MiningData MiningMachine LearningMachine LearningMultidisciplinary Topics and ApplicationsMultidisciplinary Topics and Applications
  728. #AI4G69

    Scalable Mapping of Tree Traits to Study the Dynamics of Protected Forest Areas in India

    Dhruvi Goyal, Aaditeshwar Seth
    Monitoring forest functional diversity is essential to understand ecosystem resilience in the face of rapid environmental change. While existing remote sensing approaches primarily track structural attributes such as canopy density and tree height, functional traits like leaf phenology (evergreen vs. deciduous) and leaf type (broadleaf vs. needleleaf) reveal more direct information about adaptive strategies of tree species. This study presents a scalable machine learning framework for mapping these traits across India at 10 m resolution using Google AlphaEarth Foundations (AEF) embeddings, which capture the complete annual spectral reflectance and radar signatures of the land surface. A key contribution we make is to curate an ML-ready training dataset by combining tree traits information with tree species occurrence data, and to obtain a diverse sample from this data based on spectral time-series to ensure the dataset captures a wide range of phenological dynamics. We then build cross-validation folds to specifically test for spatial generalizability across different eco-regions in India and temporal generalizability across different years, for classifiers learned from the data. Multiple classifiers are evaluated: Random Forest models trained on AEF embedding features achieved the best performance for both classification tasks, outperforming models trained on conventional Sentinel-1 and Sentinel-2 time series while offering seamless deployment in Google Earth Engine. Compared to publicly available land-cover products that encode leaf phenology and leaf type, our model yields significantly higher accuracy while providing outputs at substantially finer spatial resolution. We then observe the outputs of our model over several protected forest areas in India to understand their dynamics over the last 8 years. Our contribution is an analysis-ready open dataset to learn tree traits from remote sensed spectral data, a trained model that is spatially and temporally generalizable, and a demonstration of the insights the model can provide to understand the dynamics of protected forest areas, all of which can be replicated in other areas.
    Humans and AIHumans and AIMultidisciplinary Topics and ApplicationsMultidisciplinary Topics and ApplicationsMachine LearningMachine Learning
  729. #AI4G78

    Human-in-the-Loop Meta Bayesian Optimization for Fusion Energy and Scientific Applications

    Ricardo Luna Gutierrez, Sahand Ghorbanpour, Rahman Ejaz, Varchas Gopalaswamy, Riccardo Betti, Vineet Gundecha, Aarne Lees, Soumyendu Sarkar
    Inertial Confinement Fusion (ICF) holds transformative promise for sustainable, near-limitless clean energy, yet remains constrained by prohibitively high costs and limited experimental opportunities. This paper presents Human-in-the-Loop Meta Bayesian Optimization (HL-MBO), a framework that integrates expert knowledge with few-shot, uncertainty-aware machine learning to accelerate discovery in data-scarce, high-stakes scientific domains. HL-MBO combines a meta-learned surrogate model with an expert-informed acquisition function to recommend candidate experiments. To foster trust and enable informed decisions, HL-MBO also provides interpretable explanations of its suggestions. We show HL-MBO outperforms current BO methods on ICF energy yield optimization, as well as benchmarks in molecular optimization and critical temperature maximization for superconducting materials. By embedding human expertise into the optimization loop, HL-MBO opens a practical and scalable path to advance socially impactful scientific research.
    Machine LearningMachine LearningMultidisciplinary Topics and ApplicationsMultidisciplinary Topics and Applications
  730. #AI4G88

    iFire AI: AI-powered Wildfire Simulation and 3D Immersive Visualisation

    Renhao Huang, Lara Clemente, Mario Flores Gonzalez, Yang Song, Greg Drummond, Gonzalo Herrera, Jason Sharples, Michael Ostwald, Maurice Pagnucco, Ali Asadipour, Dennis Del Favero
    Wildfires, especially extreme wildfires, cause irreversible damage to ecosystems, human lives and economies globally. To reduce such losses, understanding wildfires is crucial for effective preparedness. This research proposal introduces iFire AI, a collaborative project aimed at developing world's leading 3D immersive visualisation system for extreme wildfire for experiencing, understanding extreme wildfire scenarios. iFire AI utilises our advanced 360-degree immersive system AVIE, which visualises interactive landscapes and wildfire events rendered by Unreal Engine. To provide realistic extreme fire scenarios at an hourly temperate resolution, we propose a deep learning model supported by a Sim2Real pipeline that integrates simulated and real-world data to address data insufficiency and enhance model development and evaluation. Finally, we explore 3D tree reconstruction using 3D Gaussian splatting, creating visually realistic, computationally efficient, and dynamically interactive tree models. By placing users inside hyper-realistic wildfire environments, iFire AI can enhance users' risk perception, situational awareness and collaborative decision-making, and thereby reduce risks due to extreme wildfires and promote sustainable development.
    Humans and AIHumans and AIMachine LearningMachine LearningMultidisciplinary Topics and ApplicationsMultidisciplinary Topics and Applications
  731. #AI4G90

    Coarse-to-Fine Domain Incremental Learning with Attentive Distillation for Mining Footprint Segmentation in Multispectral Imagery

    Alif Tri Handoyo, Vincent C.S. Lee, Rizka Widyarini Purwanto, Alex M. Lechner, Deanna Kemp, Muhamad Risqi U. Saputra
    Automatically mapping and segmenting global mining footprints using remote sensing and deep learning is critical for monitoring the socio-environmental risks and impacts of mining, yet its progress is hindered by the scarcity of fine-grained annotated data. Although large-scale datasets with coarse boundaries are widely available, leveraging them to improve fine-grained segmentation is challenging due to significant domain shift. To address this, we propose MineC2FNet, a coarse-to-fine domain incremental learning framework that exploits abundant coarse data to enhance fine-grained mining footprint segmentation. MineC2FNet adopts a teacher–student architecture with attentive distillation at both the feature and prediction levels, selectively transferring generalized knowledge from the coarse domain while enabling boundary refinement using limited fine-grained data (fine domain). We further introduce an expertly validated dataset of 219 images with precise boundary annotations across diverse geographies and commodities. Extensive experiments against state-of-the-art approaches, including domain adaptation and domain incremental learning methods, demonstrate that MineC2FNet achieves superior performance while effectively handling domain shift. The dataset and code are publicly available at https://github.com/risqiutama/MineC2FNet.
    AI4GComputer VisionAI4GMachine LearningAI4GMultidisciplinary Topics and Applications
  732. #AI4G93

    Bridging the SEA Gap: An Initial Benchmark for Neural Audio Codec-Synthesized Speech Deepfakes in South-East Asian Languages

    Orchid Chetia Phukan, Girish ., Mohd Mujtaba Akhtar, Arun Balaji Buduru
    Codecfakes (CFs) are a type of speech deepfakes generated through Audio Language Models (ALMs), with Neural Audio Codecs (NACs) forming the core mechanism for speech encoding and generation. CFs exhibit distributional characteristics that differ from vocoder-based deepfakes, causing detectors trained on vocoder data to generalize poorly to CFs detection. Although this has led to the development of CF detection benchmarks, existing resources are largely confined to English—and to a limited extent Chinese—leaving South-East Asian (SEA) languages unexplored. To bridge this gap, we introduce SEA-CF, the first large-scale benchmark for CF detection spanning multiple SEA languages, diverse speaker profiles, and a wide range of NAC architectures. SEA-CF is constructed by synthesizing publicly available real speech corpora. Our experiments show that state-of-the-art (SOTA) CF detectors trained on English-centric datasets fail to generalize to SEA speech due to language-specific phonetic structures, tonal variations, and rich prosodic diversity. We further conduct a comprehensive zero-shot and fine-tuned evaluation of recent SOTA ALMs on SEA-CF. Fine-tuning the ALMs improves performance, however, these are very large being impractical for real-world application due to their scale, particularly in low-resource and latency-constrained settings. To address this limitation, we propose a novel small-ALM, GARUDA tailored for CF detection, which delivers strong performance while remaining lightweight. Extensive evaluations demonstrate that the proposed Small-ALM outperforms strong end-to-end and ALM-based baselines, establishing a new, practical direction for robust CF detection in SEA languages and beyond.
    AI Ethics, Trust, FairnessAI Ethics, Trust, FairnessNatural Language ProcessingNatural Language Processing
  733. #AI4G96

    Context-Aware Concept Distillation for Trustworthy Flood Prediction

    Eli Levinkopf, Efrat Morin, Claudia V. Goldman
    Effective flood risk management relies on accurate forecasting, yet the ”black box” nature of state-of-the-art Deep Learning models creates a barrier
    to trust and accountability in high-stakes public safety decisions. While existing Explainable AI (XAI) methods offer local attributions, they fail to provide the verifiable, operationally meaningful causal narratives required by disaster response authorities. To address this societal challenge, we propose Context-Aware Concept Distillation (CACD), a framework developed in collaboration with domain experts to distill opaque LSTMs into interpretable, hydrology-aware surrogate models. We introduce an unsupervised pipeline to discover a ”Hydrological Language” and a Residual Hypernetwork that dynamically modulates these concepts based on static basin characteristics. Evaluated on 5,203 basins globally, our model achieves high fidelity (Median NSE 0.70), significantly outperforming black-box baselines (e.g., Multi Layer Perceptrons) on unseen future data. By demonstrating that human-interpretable concepts are sufficient to reconstruct flood dynamics, this work balances AI accuracy with the transparency required for responsible environmental decision-making.
    AI4GAI Ethics, Trust, FairnessAI4GHumans and AIAI4GMachine LearningAI4GMultidisciplinary Topics and Applications
  734. #AI4G101

    HighFM: Towards a Foundation Model for Learning Representations from High-Frequency Earth Observation Data

    Stella Girtsou, Konstantinos Alexis, Giorgos Giannopoulos, Charalampos Kontoes
    The increasing frequency and severity of climate-related disasters have intensified the need for real-time monitoring, early warning, and informed decision-making. Earth Observation (EO), powered by satellite data and Machine Learning (ML), offers powerful tools to meet these challenges. Foundation Models (FMs) have revolutionized EO ML by enabling general-purpose pretraining on large-scale remote sensing datasets. However most existing models rely on high-resolution satellite imagery with low revisit rates—limiting their suitability for fast-evolving phenomena and time-critical emergency response. In this work, we present HighFM, a first cut approach towards a FM for high-temporal-resolution, multispectral EO data. Leveraging over 2 TB of SEVIRI imagery from the Meteosat Second Generation (MSG) platform, we adapt the SatMAE masked autoencoding framework to learn robust spatiotemporal representations. To support real-time monitoring, we enhance the original architecture with fine-grained temporal encodings to capture short-term variability. The pretrained models are then fine-tuned on cloud masking and active fire detection tasks. We benchmark our SEVIRI-pretrained Vision Transformers against traditional baselines and recent geospatial FMs, demonstrating consistent gains across both balanced accuracy and IoU metrics. Our results highlight the potential of temporally dense geostationary data for real-time EO, offering a scalable path toward foundation models for disaster detection and tracking.
    Computer VisionComputer VisionMachine LearningMachine Learning
  735. #AI4G105

    Brazilian Indigenous Languages Revitalization at Scale: Reducing Cost and Development Time for Low-Resource Language Courses

    Gustavo Polleti, Fabricio Gerardi, Carolina Aragon, Fernando Bororo, Mariel Kujiboekureu, Wayali Salomã, Fabio Cozman
    Brazil is home to about 180 Indigenous languages, which span a wide range of sociolinguistic circumstances, from critically endangered to relatively stable. Across this continuum, Indigenous communities often lack pedagogical resources and are underserved by existing language-learning technologies, which are typically designed for high-resource languages and assume solid connectivity and large datasets. This research project proposes AI-assisted tools and workflows for the collection and annotation of textual and speech data that substantially reduce the time and cost required to produce engaging language-learning game apps. Our goal is to implement a language-learning game app that Indigenous students can use to practice their reading, writing and speaking skills at home. We propose novel speech processing models for low-resource Indigenous languages and offlline support in low-connectivity environments. Our project adopts a co-creation model that actively foster collaboration between Indigenous educators, linguists, and youth, is adapted to the their context, and complies with ethical guidelines. We outline an implementation plan with Bororo and Enawene Nawe communities to test our methods and, potentially, produce an AI-driven platform for Indigenous language education that is applicable across diverse sociolinguistic contexts in Brazil and beyond.
    AI Ethics, Trust, FairnessAI Ethics, Trust, FairnessHumans and AIHumans and AIMultidisciplinary Topics and ApplicationsMultidisciplinary Topics and ApplicationsNatural Language ProcessingNatural Language Processing
  736. #AI4G108

    Deep Reinforcement Learning Enhanced Semi-supervised Graph Neural Network for Credit Card Fraud Detection

    Huilin He, Kun Zhu, Zewen Hu, Jie Wang, Dawei Cheng
    Credit card fraud threatens global payment ecosystems, causing billions in losses and undermining public trust. Efficient fraud detection remains challenging due to surging transaction volumes and evolving tactics. While Graph Neural Networks (GNNs) excel at modeling structural relationships, they struggle in real-world scenarios characterized by label scarcity and often overlook discriminative feature-level signals, leaving rich risk signals underutilized without costly manual engineering. To address this, we propose DRESS, a Deep Reinforcement Learning (DRL) Enhanced Semi-supervised GNN framework. It employs a DRL agent to automatically capture and enhance feature-level risks, fusing them with graph-based structural risks and propagating via a gated temporal attention network for final prediction. To mitigate inefficient exploration of the DRL module, we incorporate a feature self-attention layer to weigh feature contributions to fraud detection and employ self-supervised intrinsic rewards to help optimize the DRL module efficiently. Extensive experiments on real-world datasets demonstrate that DRESS outperforms state-of-the-art methods, especially in low-label scenarios with only 2%–10% labeled samples. By empowering resource-limited institutions to combat fraud and prevent financial loss, DRESS secures the digital trust essential for inclusive growth, contributing to AI for poverty alleviation and economic development.
    Data MiningData MiningKnowledge Representation and ReasoningKnowledge Representation and Reasoning
  737. #AI4G121

    Improving Scientific Formula Verbalization in Large Speech Language Models for Accessible Learning

    Xueyi Li, Tianqiao Liu, Zitao Liu, Teng Guo, Yongdong Wu
    Online learning systems provide accessible learning opportunities for blind or low-vision students. To support access to complex scientific materials, the speech models used in these systems need to deliver accurate scientific formula verbalization. While recent large speech language models (LSLMs) provide remarkable low-latency streaming capabilities, their potential for scientific formula verbalization remains underexplored. In this paper, we propose Formula-Speech, the first end-to-end LSLM designed for scientific formula verbalization. Specifically, we construct two high-quality scientific formula datasets with educational experts to align speech models with scientific formula verbalization patterns. We then adopt a lightweight and effective two-stage training framework, combining supervised fine-tuning for basic formula-to-speech alignment with reinforcement learning guided by a custom reward function to optimize for human-preferred verbalization. Experimental results show that our model significantly improves the verbalization performance of LSLMs and achieves state-of-the-art results across multiple scientific domains.
    AI4GMultidisciplinary Topics and ApplicationsAI4GHumans and AIAI4GData Mining
  738. #AI4G122

    AG-STELLA: Spatio-Temporal Learning for Water-related Agricultural Land Use Activity Mapping with AlphaEarth

    Nibir Chandra Mandal, Oishee Bintey Hoque, Kyle Luong, Samarth Swarup, Kirti Rajagopalan, Mandy Wilson, Abhijin Adiga, Madhav Marathe
    Accurate mapping of agricultural land use activity, particularly long-term transition from cropland to pasture and short-term transition between cropland to fallow land, is essential for sustainable water management, drought response, and food-system resilience which directly supports United Nations Sustainable Development Goals (SDG-2 and SDG-8). However, reliable land use activity mapping is challenging due to spectral ambiguity, temporal irregularities, severe class imbalance, and limited generalization across agricultural regions. In this work, we propose AG-STELLA, a knowledge guided spatiotemporal model that (i) captures temporal changes of agricultural lands using pretrained spatiotemporal transformers; (ii) integrates geospatial context using AlphaEarth embedding; (iii) introduces a temporal transition latent space with temporal consistency constraints; (iv) employs guidance through hydroclimatic consistency; and (v) uses a land use-aware gated decoder to improve robustness across regions. Through experimentation across three water-stressed U.S. states, we show consistent gains over baseline vision and foundation models, achieving up to 27% F1-score improvement for pasture (minority class) and 16% overall. We further show the robustness across heterogeneous regions through cross-state transfer learning, where AG-STELLA consistently outperforms foundation model baselines and achieve up to 82.3% F1 for fallow land with a 9.6\% improvement over the best foundation model.
    Machine LearningMachine LearningComputer VisionComputer VisionMultidisciplinary Topics and ApplicationsMultidisciplinary Topics and Applications
  739. #AI4G124

    Rule-Bottleneck RL: Learning to Decide and Explain for Sequential Resource Allocation via LLM Agents in Public Health

    Guojun Xiong, Mauricio Tec, Haichuan Wang, Francesca Dominici, Joseph Ngonzi, Adeline Boatin, Milind Tambe
    Reducing preventable maternal mortality remains a global health priority. Under Sustainable Development Goal (SDG) target 3.1, the WHO emphasizes timely and equitable allocation of limited maternal health resources. Motivated by Department of Obstetrics and Gynecology at several important hospitals in Uganda and Ghana, we study the problem of sequential allocation of wearable vital sign monitoring devices among maternal mothers. While deep reinforcement learning (RL) has shown promise for sequential resource allocation, its limited interpretability hinders adoption in such high-stakes settings. In contrast, large language model (LLM) agents provide human-readable reasoning but often struggle with effective long-term decision making. To bridge this gap, we introduce Rule-Bottleneck RL (RBRL), the first LLM agent framework for resource allocation problems that jointly optimizes language-based decision policy and explainability. At each step within RBRL, an LLM first generates candidate rules---language statements capturing decision priorities tailored to the current state. RL then optimizes rule selection to maximize environmental rewards and explainability, with the LLM acting as a judge. Finally, an LLM chooses the action (optimal allocation) based on the rule. We provide conditions for RBRL performance guarantees as well as the finite-horizon evaluation gap of the learned RBRL policy. Experiments in maternal health show that RBRL outperforms baseline LLM agents and approaches the performance of deep RL, while producing clearer, policy-relevant explanations. Human evaluations further confirm improved trust and usability, demonstrating RBRL as a practical AI approach aligned with SDG target 3.1.
    Multidisciplinary Topics and ApplicationsMultidisciplinary Topics and ApplicationsAgent-based and Multi-agent SystemsAgent-based and Multi-agent SystemsPlanning and SchedulingPlanning and Scheduling
  740. #AI4G134

    PoemDirector: A Multi-Agent Context-Adaptive Instructional Mode Selection Framework for Chinese Classical Poetry Video Generation

    Tengteng Cheng, Xiaoli Zeng, Jialu Huang, Mingliang Hou, Zitao Liu, Xiangyu Zhao, Weiqi Luo
    Classical poetry is a significant component of aesthetics and cultural inheritance in China's K–12 language education, and web-based instructional videos have become the primary way students can learn about classical poetry. However, current approaches have failed to produce both high-quality explanations and large-scale automatic videos that teach classical poetry. Teachers are responsible for ensuring instructional design in semi-automated processes, but the time and expense of production are too great. Although end-to-end fully automated generation is efficient, the logic is unclear and the explanations are shallow because there is no systematic instructional design. The majority of current approaches lack an end-to-end generative framework with a hierarchical explanation structure for deep understanding, instructional intelligence with context-adaptive capabilities, and clear teaching objectives. In this paper, a multi-agent framework called PoemDirector is proposed that combines context-adaptive strategies, multi-layered explanations, and pedagogically grounded end-to-end generation into one framework. Based on the poem and the situation, the director agent in PoemDirector creates a structured creative blueprint and organizes for other agents to collaborate in order to build a link for creating instructional videos on classical poetry. In the meantime, we further establish a multi-dimensional evaluation framework for instructional effectiveness and poetic presentation, and conduct comparative studies based on both this framework and extra objective video quality metrics.
    The results demonstrated that PoemDirector significantly lowered labor costs and outperformed the baseline in a number of metrics, thereby resolving the conflict between high-quality instruction and mass production.
    Agent-based and Multi-agent SystemsAgent-based and Multi-agent SystemsHumans and AIHumans and AINatural Language ProcessingNatural Language Processing
  741. #AI4G136

    NeuroDALEC: A Differentiable and Interpretable Mass-Conserving Framework for Terrestrial Ecosystem Carbon Cycle Dynamics

    Meng Wan, Tiantian Liu, Xia Zhixin, Ningming Nie, Jue Wang, Rongqiang Cao, Honglin He, Xiaoli Ren, Peng Shi, Yangang Wang
    Accurate simulation of terrestrial ecological carbon cycles is crucial for global climate change and ecosystem management. Process-based carbon models have high interpretability, but suffer from insufficient accuracy and slow computation due to fixed parameters. In contrast, deep-learning carbon models achieve high accuracy, but disregard physical principles, which prevents ecologists from explaining ecosystem dynamics. We propose NeuroDALEC, an interpretable framework that embeds the DALEC carbon-cycle model within a neural network, enabling differentiable computation of ecological processes. Key parameters and ensemble learning strategies are designed, and mass-conserving carbon pool state transition equations are introduced to ensure physical consistency. Experiments show NeuroDALEC outperforms existing models in both accuracy and efficiency. Moreover, it provides sufficient interpretability by predicting all components of the carbon cycle. Deployed in a real-time carbon assimilation system, NeuroDALEC supports daily carbon forecasting and decision-making. This work contributes to the United Nations' Sustainable Development Goals 13 (Climate Action) and 15 (Life on Land). The source code is available at: https://osf.io/ubcv4/overview?view_only=ac8753c98677438180e82926ae898aba.
    Multidisciplinary Topics and ApplicationsMultidisciplinary Topics and ApplicationsMachine LearningMachine Learning
  742. #AI4G141

    UniST-Pred: A Robust Unified Framework for Spatio-Temporal Traffic Forecasting in Transportation Networks Under Disruptions

    Yue Wang, Djellel Difallah, Areg Karapetyan, Samer Madanat
    Spatio-temporal traffic forecasting is a core component of intelligent transportation systems, supporting various downstream tasks such as signal control and network-level traffic management. In real-world deployments, forecasting models must operate under structural and observational uncertainties, conditions that are rarely considered in model design. Recent approaches achieve strong short-term predictive performance by tightly coupling spatial and temporal modeling, often at the cost of increased complexity and limited modularity. In contrast, efficient time-series models capture long-range temporal dependencies without relying on explicit network structure. We propose UniST-Pred, a unified spatio-temporal forecasting framework that first decouples temporal modeling from spatial representation learning, then integrates both through adaptive representation-level fusion. To assess robustness of the proposed approach, we construct a dataset based on an agent-based, microscopic traffic simulator (MATSim) and evaluate UniST-Pred under severe network disconnection scenarios. Additionally, we benchmark UniST-Pred on standard traffic prediction datasets, demonstrating its competitive performance against existing well-established models despite a lightweight design. The results illustrate that UniST-Pred maintains strong predictive performance across both real-world and simulated datasets, while also yielding interpretable spatio-temporal representations under infrastructure disruptions. The source code and the generated dataset are available at https://anonymous.4open.science/r/UniST-Pred-EF27.
    Multidisciplinary Topics and ApplicationsMultidisciplinary Topics and ApplicationsMachine LearningMachine Learning
  743. #AI4G142

    Integrating Atmospheric Dispersion Modeling Priors into Cuboid Splatting for Spatiotemporal Reconstruction of Airborne Radioiodine After Nuclear Accidents

    Mareike Böckel, Stephan Doerfel, Kathrin Meisenberg, Oliver Meisenberg, Max Friedrich, Mattis Hartwig
    In nuclear power plant accidents, airborne radioiodine poses major health risks, making reliable reconstructions of its spatiotemporal distribution crucial for emergency management. Current state-of-the-art prognosis systems use atmospheric dispersion modeling but ignore posterior evidence from emergency care centers, comprising movement profiles and thyroid measurements of affected individuals. A first study showed that the AI method Cuboid Splatting can reconstruct iodine air concentrations from such data but it ignores simulations from established prognosis systems.
    Our multidisciplinary team extends Cuboid Splatting by incorporating these simulations as priors and subsequently correcting them using movement and thyroid data. Several ways to translate and correct priors are developed. The best-performing approaches are combined into a novel Cuboid Splatting-with-prior mechanism, which we evaluate using constructed prior scenarios representing different error types and intensities.
    Using Cuboid Splatting-with-prior yields more accurate reconstructions than (i) the used dispersion simulations alone and (ii) plain Cuboid Splatting without prior. Across reconstructions, the mean scenario error is 19.6%, improving on (i) by 28.0pp and on (ii) by 89.9pp, the latter with particularly large gains at high spatial resolution. These results demonstrate that combining simulation-based priors with measurement-based posterior inference can substantially improve the reconstruction of iodine air activity concentrations in nuclear emergencies.
    Data MiningData MiningMultidisciplinary Topics and ApplicationsMultidisciplinary Topics and ApplicationsMachine LearningMachine Learning
  744. #AI4G144

    ICFD-31k: A Large-Scale Dataset and Benchmark for Real-Time Conversational Fraud Detection

    Rishi Ahuja, Dr. Kumar Prateek, Dr. Simranjit Singh
    The proliferation of sophisticated telephone scams poses a significant societal and economic threat, impacting diverse linguistic contexts in a country like India. Furthermore, the lack of large-scale, publicly available datasets remains a critical barrier impacting research on robust, real-time countermeasures. In view of this, the proposed work introduces ICFD-31k, the first Indian Conversational Fraud Dataset, representing a new benchmark containing over 31,000 realistic conversational transcripts. ICFD-31k comprises systematically generated content, covering 10 distinct fraud umbrellas spanning from financial impersonation to job scams. ICFD-31k transcripts feature rich annotations comprising a final verdict, chunk-level streaming labels, and detailed ``slow-thinking'' rationales. In addition, the human-in-the-loop evaluation validates the ICFD-31k's quality, achieving a Cohen's Kappa of 0.534 that confirms annotation reliability. Furthermore, the proposed work introduces two fine-tuned models based on RoBERTa: M1 for non-streaming data and M2 for streaming data. The comprehensive experiments with strong baselines (M1, M2) further demonstrate the ICFD-31k's utility.
    Multidisciplinary Topics and ApplicationsMultidisciplinary Topics and ApplicationsAI Ethics, Trust, FairnessAI Ethics, Trust, FairnessMachine LearningMachine LearningNatural Language ProcessingNatural Language Processing
  745. #AI4G148

    Clinically-Oriented Screening Model for Diabetic Retinopathy Severity Grading and Diabetic Macular Edema Detection

    Sanchika Menezes, Rohan Chawla, Nawazish Shaikh, Pradeep Venkatesh, Radhika Tandon, Srinivas Rana
    Diabetic retinopathy (DR) and diabetic macular edema (DME) are leading causes of preventable blindness worldwide. Automated screening tools are critical for timely detection at scale, particularly in low-resource settings where access to ophthalmologists is limited. We propose DRDME-Net, a deployment-driven joint learning framework that formulates DR grading as an ordinal regression task and DME detection via a continuous surrogate, rather than conventional classification. This design yields stable risk scores tightly aligned with operational clinical decision-making thresholds. Evaluation on facility and community cohorts demonstrates that DRDME-Net achieves strong performance across severity boundaries. Insights from an initial feasibility pilot further demonstrate its scalability in real-world workflows. These results highlight the potential of DRDME-Net to expand equitable access to timely detection, reduce preventable vision loss, and provide a practical template for integrating AI into population screening initiatives.
    Computer VisionComputer VisionMachine LearningMachine LearningMultidisciplinary Topics and ApplicationsMultidisciplinary Topics and Applications
  746. #AI4G162

    FlowID: Enhancing Forensic Identification with Latent Flow-Matching Models

    Jules Ripoll, David Bertoin, Charles Dossal, Alasdair Newson, Jose Pablo Baraybar
    Every day, many people die under violent circumstances, whether from crimes, war, migration, or climate disasters.
    Medico-legal and law enforcement institutions document many portraits of the deceased for evidence, but cannot immediately carry out identification on them.
    While traditional image editing tools can process these photos for public release, the workflow is lengthy and produces suboptimal results.
    In this work, we leverage advances in image generation models, which can now produce photorealistic human portraits, to introduce FlowID, an identity-preserving facial reconstruction method. Our approach combines single-image fine-tuning, which adapts the generative model to out-of-distribution injured faces, with attention-based masking that localizes edits to damaged regions while preserving identity-critical features.
    Together, these components enable the removal of artifacts from violent death while retaining sufficient identity information to support identification.
    To evaluate our method, we introduce InjuredFaces, a novel benchmark for identity-preserving facial reconstruction under severe facial damage.
    Beyond serving as an evaluation tool for this work, InjuredFaces provides a standardized resource for the community to study and compare methods addressing facial reconstruction in extreme conditions.
    Experimental results show that FlowID outperforms state-of-the-art open-source methods while maintaining low memory requirements, making it suitable for local deployment without compromising data privacy.
    AI Ethics, Trust, FairnessAI Ethics, Trust, FairnessMultidisciplinary Topics and ApplicationsMultidisciplinary Topics and ApplicationsMachine LearningMachine LearningHumans and AIHumans and AIComputer VisionComputer Vision
  747. #AI4G169

    Agents Trusting Agents? Restoring Lost Capabilities with Inclusive Healthcare

    Alba Aguilera, Georgina Curto, Nardine Osman, Ahmed Al-Awah
    Agent-based simulations have an untapped potential to inform social policies on urgent human development challenges in a non-invasive way, before these are implemented in real-world populations. This paper responds to the request from non-profit and governmental organizations to evaluate policies under discussion to improve equity in health care services for people experiencing homelessness (PEH) in the city of Barcelona. With this goal, we integrate the conceptual framework of the capability approach (CA), which is explicitly designed to promote and assess human well-being, to model and evaluate the behaviour of agents who represent PEH and social workers. We define a reinforcement learning environment where agents aim to restore their central human capabilities, under existing environmental and legal constraints. We use Bayesian inverse reinforcement learning (IRL) to calibrate profile-dependent behavioural parameters in PEH agents, modeling the degree of trust and engagement with social workers, which is reportedly a key element for the success of the policies in scope. Our results open a path to mitigate health inequity by building relationships of trust between social service workers and PEH.
    Multidisciplinary Topics and ApplicationsMultidisciplinary Topics and ApplicationsAgent-based and Multi-agent SystemsAgent-based and Multi-agent Systems
  748. #AI4G181

    Directional Hallucinations: Ideological Drift in News-Grounded LLM Question Answering

    Chendi Wang, Liam Cunningham, Tom Yishay, Jieying Chen
    Large language models (LLMs) are increasingly used to answer questions about political information, including in election-adjacent information settings where factual errors and ideological distortions are high-stakes. We present a reproducible measurement framework that treats hallucinations, unsupported statements in document-grounded QA, as diagnostic signals of ideological drift. Using 21,727 expert-labeled U.S. political news articles from QBias spanning left, center, and right sources, we (i) generate an article-specific question, (ii) elicit document-grounded answers from three open-source LLMs, (iii) detect sentence-level hallucinations via reference-based comparison, (iv) classify the ideological valence of hallucinated sentences with a fine-tuned stance classifier, and (v) probe output logits to relate token-level uncertainty to hallucination and drift. Hallucination rates vary substantially across models and concentrate in contentious topics, while source-ideology differences in hallucination frequency are modest. In contrast, hallucination content exhibits robust leftward drift: a majority of hallucinated sentences are classified as left-leaning, including among hallucinations generated from right-leaning sources. Logit-level analysis shows hallucinations arise in high-entropy generation contexts, and in some models uncertainty also predicts leftward drift, consistent with an "uncertainty → guessing" mechanism. In advisory consultation with an election administration stakeholder, we discuss implications for auditing AI-mediated political information and for designing safeguards in election-relevant deployments.
    Uncertainty in AIUncertainty in AINatural Language ProcessingNatural Language ProcessingMultidisciplinary Topics and ApplicationsMultidisciplinary Topics and ApplicationsAI Ethics, Trust, FairnessAI Ethics, Trust, Fairness
  749. #AI4G183

    FACET: Multi-Agent AI Supporting Teachers in Scaling Differentiated Learning for Diverse Students

    Jana Gonnermann-Müller, Jennifer Haase, Nicolas Leins, Moritz Igel, Konstantin Fackeldey, Sebastian Pokutta
    Classrooms are becoming increasingly heterogeneous, comprising learners with diverse performance and motivation levels, language proficiencies, and learning differences such as dyslexia and ADHD. While teachers recognize the need for differentiated instruction, growing workloads create substantial barriers, making differentiated instruction an ideal that is often unrealized in practice. Current AI educational tools, which promise differentiated materials, are predominantly student-facing and performance-centric, ignoring other aspects that shape learning outcomes.
    We introduce FACET, a teacher-facing multi-agent framework designed to address these gaps by supporting differentiation that accounts for motivation, performance, and learning differences. Developed with educational stakeholders from the outset, the framework coordinates four specialized agents, including learner simulation, diagnostic assessment, material generation, and evaluation within a teacher-in-the-loop design.
    School principals (N = 30) shaped system requirements through participatory workshops, while in-service K–12 teachers (N = 70) evaluated material quality. Mixed-methods evaluation demonstrates strong perceived value for inclusive differentiation. Practitioners emphasized both the urgent need arising from classroom heterogeneity and the importance of maintaining pedagogical autonomy as a prerequisite for adoption. We discuss implications for future school deployment and outline partnerships for longitudinal classroom implementation.
    Agent-based and Multi-agent SystemsAgent-based and Multi-agent SystemsHumans and AIHumans and AIMultidisciplinary Topics and ApplicationsMultidisciplinary Topics and Applications
  750. #AI4G191

    Democratizing Ski Safety: Real-Time Turn Segmentation with Smartphone IMU and Causal LSTM Networks

    Michał Szymocha, Piotr Kacprzak, Jakub Robak, Wojciech Turek
    Anterior cruciate ligament (ACL) injury is one of the most common and serious injuries in sports, particularly among recreational skiers. Research shows that structured technique awareness and continuous feedback can significantly reduce the risk of such injuries, yet access to professional instructors is limited to wealthy athletes who can afford continuous private coaching, creating a harmful inequity in injury prevention. This gap can be mitigated by automating the real-time analysis of skiing techniques available to the wider recreational skiing community. The approach relies exclusively on inertial sensors embedded in standard smartphones, eliminating the need for specialized equipment and enabling broad social scalability. To support immediate feedback, the system operates causally, producing predictions based solely on past observations. The work is conducted in cooperation with professional ski instructors, ensuring that problem formulation, data annotation, and result evaluation reflect real-world coaching practices and injury prevention needs. The model is evaluated using Leave-One-Subject-Out validation on a public, in-the-wild dataset, demonstrating robust generalization across skiers, achieving an average directional accuracy of 89.8\%, while maintaining extremely low inference latency suitable for on-device mobile deployment. This work outlines a practical pathway to democratizing injury prevention in recreational sports.
    Machine LearningMachine LearningData MiningData MiningHumans and AIHumans and AI
  751. #AI4G198

    Column Generation for the Micro-Transit Zoning Problem

    Hins Hu, Rishav Sen, Jose Paolo Talusan, Abhishek Dubey, Aron Laszka, Samitha Samaranayake
    Along with the rapid development of new urban mobility options like ride-sharing over the past decade, on-demand micro-transit services stand out as a middle ground, bridging the gap between fixed-line mass transit and single-request ride-hailing, balancing ridership maximization and travel time minimization. Micro-transit adoption can have significant social impact. It improves urban sustainability, through lower energy consumption and reduced emissions, while enhancing equitable mobility access for disadvantaged communities, thanks to its lower vehicle miles per passenger, flexible schedules, and affordable pricing. However, effective operation of micro-transit services requires planning geo-fenced zones in advance, which involves solving a challenging combinatorial optimization problem. Existing approaches enumerate candidate zones first and selects a fixed number of optimal zones in the second step. In this paper, we generalize the Micro-Transit Zoning Problem (MZP) to allow a global budget rather than imposing a size limit for candidate zones. We also design a Column Generation (CG) framework to solve the problem exactly and several pricing heuristics to accelerate computation. Extensive numerical experiments across major U.S. cities demonstrate that our approach produces higher-quality solutions more efficiently and scales better in the generalized setting.
    Constraint Satisfaction and OptimizationConstraint Satisfaction and OptimizationMultidisciplinary Topics and ApplicationsMultidisciplinary Topics and ApplicationsPlanning and SchedulingPlanning and Scheduling
  752. #AI4G199

    A Hybrid Modeling Framework for Crop Prediction Tasks via Dynamic Parameter Calibration and Multi-Task Learning

    William Solow, Paola Pesantez-Cabrera, Markus Keller, Lav Khot, Sandhya Saisubramanian, Alan Fern
    Accurate prediction of crop states (e.g., phenology stages and cold hardiness) is essential for timely farm management decisions such as irrigation, fertilization, and canopy management to optimize crop yield and quality. While traditional biophysical models can be used for season-long predictions, they lack the precision required for site-specific management. Deep learning methods are a compelling alternative, but can produce biologically unrealistic predictions and require large-scale data. We propose a hybrid modeling approach that uses a neural network to parameterize a differentiable biophysical model and leverages multi-task learning for efficient data sharing across crop cultivars in data limited settings. By predicting the parameters of the biophysical model, our approach improves the prediction accuracy while preserving biological realism. Empirical evaluation using real-world and synthetic datasets demonstrates that our method improves prediction accuracy by 60% for phenology and 40% for cold hardiness compared to deployed biophysical models. Project site: https://tinyurl.com/DMC-MTL-Site.
    Machine LearningMachine LearningMultidisciplinary Topics and ApplicationsMultidisciplinary Topics and Applications
  753. #AI4G202

    A Gloss-Driven Indian Sign Language Production System Using Learned Pose Representations

    Suvajit Patra, Arkadip Maitra, Swami Punyeshwarananda, Soumitra Samanta
    Sign Language Production (SLP) system translates spoken or written language into sign language, enabling accessible communication between the deaf/hard-of-hearing and the hearing population. Being one of the most widely used sign languages globally, Indian Sign Language (ISL) is a very low-resource language and lacks such SLP systems. This paper presents a scalable and modular SLP framework based on Sign-Pose-VQ-VAE model, designed for low-resource settings. The model learns discrete pose representations (codes) by disentangling body, left-hand, and right-hand keypoints, enabling efficient pose modeling and co-articulated sign generation. The proposed system is evaluated using a Hindi movie subtitle corpus coupled with an off-the-shelf back-translation model and achieves a gloss BLEU-4 score of 47.20. The system generated signs are evaluated by certified ISL interpreters with an average rating of 4.33/5, and a BERT precision of 0.7683 on glosses. In addition, the proposed system achieves state-of-the-art performance among keypoint-based methods on the PHOENIX14T benchmark, with a BLEU-4 score of 10.03.
    Natural Language ProcessingNatural Language ProcessingComputer VisionComputer VisionHumans and AIHumans and AIMultidisciplinary Topics and ApplicationsMultidisciplinary Topics and Applications
  754. #AI4G206

    Climate Surrogates for Scalable Multi-Agent Reinforcement Learning: A Case Study with CICERO-SCM

    Oskar Bohn Lassen, serio agriesti, Filipe Rodrigues, Blaz Kurnik, Francisco Pereira
    Climate policy analysis requires models that capture multi-gas climate effects, but such models are too slow to embed in reinforcement learning loops at scale.
    In collaboration with a pan-European public-sector environmental agency, we develop a multi-agent reinforcement learning (MARL) framework that integrates a high-fidelity climate surrogate as the environment transition, enabling regional agents to learn policies under multi-gas dynamics.
    We train a recurrent surrogate on 20,000 multi-gas emission pathways to emulate CICERO-SCM.
    The surrogate achieves near-simulator accuracy (global-mean temperature RMSE 0.0004 with 1000x faster one-step inference and yields 100x end-to-end MARL training speed-up.
    We show policy agreement with the simulator in tractable settings and propose a replay- and rank-consistency test (Kendall's τ) for assessing policy fidelity when simulator-in-the-loop training is infeasible.
    This enables large-scale multi-agent policy experiments while retaining high-fidelity multi-gas climate response.
    Agent-based and Multi-agent SystemsAgent-based and Multi-agent SystemsHumans and AIHumans and AIMachine LearningMachine Learning
  755. #AI4G207

    Addressing Overcommitment in the Reasoning of Gendered Economic Memes Under Multimodal Ambiguity

    Kushal Kanwar, Dushyant Singh Chauhan, Kapil Rana, Gopendra Vikram Singh, Nils Lukas
    Multimodal meme understanding is increasingly used to analyze socially sensitive content, yet existing models often exhibit biased behavior when interpreting economic dependence and social roles under ambiguity. Many memes express economic relationships through sparse text or symbolic visual cues, providing insufficient evidence for gendered attribution. In such underspecified settings, models tend to rely on pretraining correlations, leading to hallucinated and stereotypical economic role assignments. In this work, we study gendered economic dependence in image-text memes through the lens of contextual sufficiency and identify epistemic overcommitment—inferring roles without adequate evidence—as a primary source of bias. We propose CGER-Net, a context-grounded multimodal framework that estimates whether the input provides sufficient evidence for gendered economic reasoning and applies evidence-gated inference to enable confident attribution when cues are explicit while favoring principled abstention otherwise. We evaluate CGER-Net on EconMeme-GE, a curated dataset of image-text memes annotated as Men, Women, Neutral, or Ambiguous. Across strong contemporary multimodal baselines, CGER-Net reduces Gender Overcommitment Rate by up to 44% on ambiguous instances while maintaining comparable accuracy on unambiguous cases. Human evaluation further shows that 79% of generated rationales are judged as epistemically aligned with the available evidence. These results highlight the importance of modeling when not to infer for reliable and responsible multimodal analysis.
    AI Ethics, Trust, FairnessAI Ethics, Trust, FairnessHumans and AIHumans and AIMultidisciplinary Topics and ApplicationsMultidisciplinary Topics and Applications
  756. #AI4G218

    SMaRT: Online Reusable Resource Assignment and an Application to Mediation in the Kenyan Judiciary

    Shafkat Farabi, Didac Marti Pinto, Wei Lu, Manuel Ramos-Maqueda, Sanmay Das, Antoine Deeb, Anja Sautmann
    Motivated by the problem of assigning mediators to cases in the Kenyan judicial system, we study an online resource allocation problem where incoming tasks (cases) must be immediately assigned to available, capacity-constrained resources (mediators). The resources differ in their quality, which may need to be learned. In addition, resources can only be assigned to a subset of tasks that overlaps to varying degrees with the subset of tasks other resources can be assigned to. The objective is to maximize task completion while satisfying soft capacity constraints across all the resources. The scale of the real-world problem poses substantial challenges, since there are over 2000 mediators, and a multitude of combinations of geographic locations (87) and case types (12) that each mediator is qualified to work on. Together, these features—unknown quality of new resources (newly onboarded mediators), soft capacity constraints (due to the mandate to assign cases without delay), and high-dimensional state space—make existing scheduling and resource allocation algorithms either inapplicable or inefficient. We formalize the problem in a tractable manner, using a quadratic program formulation for assignment and a multi-agent bandit style framework for learning. We demonstrate the key properties and advantages of our new algorithm, SMaRT (Selecting Mediators that are Right for the Task), compared with baselines on some stylized instances of the mediator allocation problem. We then turn to considering its application on real-world data on cases and mediators from the Kenyan judiciary. SMaRT outperforms baselines and allows for controlling the tradeoff between the strictness of the capacity constraints and overall case resolution rates, both in situations where mediator quality is known beforehand and when the problem is bandit-like in that learning is part of the problem definition. On the strength of these results, we plan to run a randomized controlled trial with SMaRT in the judiciary in the near future
    AI Ethics, Trust, FairnessAI Ethics, Trust, FairnessAgent-based and Multi-agent SystemsAgent-based and Multi-agent Systems
  757. #AI4G222

    White-Hat Testing for the Ballot Box: A Framework for Election AI Auditing

    Chendi Wang, Jieying Chen
    Recent research shows that conversational AI can shift voter preferences, with effects persisting for weeks. Yet frontier models exhibit a documented ``persuasion--reliability tradeoff,'' producing hallucinated or systematically distorted election information. Despite these risks, election officials lack standardized tools to systematically evaluate AI systems before deployment. We propose CivicAudit-Bench, a stakeholder-guided auditing framework to stress-test large language models for civic hallucinations, false confidence, jurisdiction-dependent failure, and asymmetric refusals/accuracy. This framework introduces a modular, counterfactual, and severity-aware auditing methodology that integrates roll-call–based alignment modeling, entity-swap probing, and jurisdiction-conditional correctness criteria. Informed by engagement with the U.S. Election Assistance Commission, the toolkit consists of three modules:
    (1) PoliBias-US, a multi-indicator alignment screen combining Congressional roll-call ideology scaling with party-cue counterfactual sensitivity, persona robustness, and narrative-framing alignment; (2) HalluBias-Election, an evidence-linked benchmark that measures hallucinations, severity-weighted critical errors, and asymmetries via Entity-Swap Counterfactual Probing and a jurisdiction-safe completion criterion; and (3) Disclosure-Test, pre-registered experiments assessing whether transparency and calibrated-uncertainty disclosures reduce overreliance and attenuate persuasion without blocking legitimate civic information. CivicAudit-Bench outputs versioned audit scorecards and a coordinated white-hat disclosure workflow, advancing UN SDG 16 by strengthening democratic information integrity.
    AI Ethics, Trust, FairnessAI Ethics, Trust, FairnessMultidisciplinary Topics and ApplicationsMultidisciplinary Topics and ApplicationsHumans and AIHumans and AIUncertainty in AIUncertainty in AI
  758. #AI4G225

    STELLAR: Spatio-Temporal Environmental Learning with Latent Alignment and Refinement for Long-Tailed Species Distribution Modeling

    Shufeng Kong, Tao Yu, yuanyuan wei, Caihua Liu, Junwen Bai, Yingheng Wang, Marc Grimson, Daniel Fink, Carla Gomes
    Joint Species Distribution Modeling (JSDM) is a key enabler for biodiversity monitoring and conservation planning. However, accurate JSDM faces two coupled challenges: environmental drivers and species distributions are inherently spatio-temporal, while species co-occurrence patterns exhibit complex non-linear community structure and severe long-tail imbalance driven by rare species. Existing approaches often address these factors in isolation, learning from static covariates or neglecting the historical trajectories of dynamic community structure. To overcome these limitations, we propose STELLAR (Spatio-Temporal Environmental Learning with Latent Alignment and Refinement), a novel framework that learns a shared latent space where dynamic habitat context and community structure are optimized jointly. Our approach integrates three complementary components: (1) a Graph-Temporal Encoder that employs graph attention and recurrent units to aggregate spatial neighborhood effects and capture the co-evolving historical dynamics of environmental context and community structure; (2) a Context-Anchored Latent Alignment mechanism that structures the latent space using a label-activated mixture prior and supervised contrastive learning, actively clustering species based on shared environmental preferences; and (3) an Imbalance-Aware Decoupled Decoding module that utilizes Asymmetric Loss to focus learning on hard, rare species samples, preventing mode collapse in the long tail. Experiments on the large-scale eBird dataset, curated with domain experts, demonstrate that our framework significantly outperforms state-of-the-art baselines, particularly in predicting rare species and revealing interpretable species interactions.
    Multidisciplinary Topics and ApplicationsMultidisciplinary Topics and ApplicationsData MiningData MiningMachine LearningMachine Learning
  759. #AI4G226

    Quantifying Error Disparities in Population Health Models

    Aaron Marker, Salvatore Giorgi, Adithya V Ganesan, Vasudha Varadarajan, Ojas Deshpande, Laura Brandt, Gabriel Odom, Andrew Schwartz
    Many high-stakes social applications of AI, such as public health surveillance and policy planning, operate at the community- rather than individual-level. However, most model fairness research evaluates disparities at the individual- or data-level (i.e. document or image) and rely on metrics defined over discrete demographic categories rather than population-level demographic proportions. In this work, we first introduce the Bilateral Concentration Index (BCI) to quantify nonmonotonic error disparities missed by the category-based metrics use at individual or data-levels.
    Then we conduct a large-scale audit of sociodemographic error disparities in both lexical- and transformer-based models of county-level health outcomes, over a dataset cover billions of community-mapped messages. While all tasks had significant disparity, the size varied widely depending on the outcome and model, from BCI of 2.1% for predicting life satisfaction to 17.0% for predicting fair or poor health. We further evaluate four approaches for incorporating sociodemographic information, as potential bias mitigation strategies, finding that while demographic inclusion consistently improved predictive accuracy, it frequently amplified error disparities. The largest disparities were associated with education and income (BCI = 2.7–16.4%), often reducing accuracy for low-income—and in some cases high-income—communities. These findings highlight a critical accuracy–fairness trade-off in community-level models for public health tasks, demonstrating how seemingly beneficial modeling choices can lead to increased disparities which could disadvantage communities if used for policy decisions.
    AI Ethics, Trust, FairnessAI Ethics, Trust, FairnessNatural Language ProcessingNatural Language Processing
  760. #AI4G230

    LLM-Enhanced Knowledge and Learning Path Understanding for Graph-based Educational Recommendation

    Qingqing Liang, Shuyan Zheng, Peiwei Xia, Chunyang Wang, Xuesong Lu, Aoying Zhou
    Educational recommendations empower personalized learning by suggesting suitable learning resources to learners, and the graph-based recommenders are widely adopted. Existing methods are mainly ID-based, which initialize learners and resources with trainable identifiers and optimize their representations solely from the interaction graph. As a result, the lack of semantic understanding of learning resources and learning paths hinders further improvements in recommendation accuracy. To alleviate the problem, we propose KLU4EduRec, which leverages large language models (LLMs) to understand resource knowledge and learning paths, thereby enhancing traditional graph-based educational recommender systems. Specifically, for learning path understanding, we segment a learning path by detecting learning pattern drift in resource knowledge sequence, and prompt LLMs to infer learners' learning patterns within each segment. The segment-level patterns are then chronologically aggregated to represent the overall learning path. Besides, we prompt LLMs to summarize the core knowledge of learning resources from their content as complementary semantic signals. Finally, the resulting semantic representations are aligned and fused with structural representations learned by a graph-based recommender to enable more accurate recommendations. We conduct extensive experiments to show that KLU4EduRec greatly outperforms existing methods, including traditional ID-based methods and recent LLM-powered methods. A case study shows how the understanding of pattern drift in a learning path leads to more suitable recommendations. A reproducibility package is available at https://anonymous.4open.science/r/KLU4EduRec-C08C.
    Multidisciplinary Topics and ApplicationsMultidisciplinary Topics and ApplicationsKnowledge Representation and ReasoningKnowledge Representation and ReasoningData MiningData Mining
  761. #AI4G237

    Sustainable Intelligence for the Wild: Democratizing Ecological Monitoring via Knowledge-Adaptive Edge Expert Agents

    Jiaxing Li, Hao Fang, Chi Xu, Miao Zhang, Jiangchuan Liu, William I. Atlas, Katrina M. Connors, Mark A. Spoljaric
    Rapid biodiversity loss underscore the urgency of effective monitoring, yet manual surveys remain resource-intensive. While on-device AI offers a scalable alternative, its performance in the wild is often challenged by environmental variability. Current methods rely heavily on cloud resource, which requires continuous uploading of field data for model retraining. This approach is unsuitable for remote deployments because it consumes limited power and network connectivity. To address these constraints, this research proposes a shift from model adaptation to knowledge adaptation. We introduce an architecture that separates visual perception from reasoning, combining a visual encoder with a dynamic knowledge base. We uses an explicit knowledge base to replace implicitly encoding expert knowledge into model parameters. This method also supports knowledge sustainability by preserving expert insights in a structured form. Through cross-disciplinary collaboration with biologists and Indigenous communities, this work advances ethical AI co-development, fostering responsible and culturally informed ecosystem management.
    Humans and AIHumans and AIKnowledge Representation and ReasoningKnowledge Representation and ReasoningMultidisciplinary Topics and ApplicationsMultidisciplinary Topics and ApplicationsAgent-based and Multi-agent SystemsAgent-based and Multi-agent Systems
  762. #AI4G253

    Temporal Motif-aware Graph Test-time Adaptation for OOD Blockchain Anomaly Detection

    Runang He, Tongya Zheng, Huiling Peng, Yuanyu Wan, Bingde Hu, Jiawei Chen, Canghong Jin, Mingli Song, Can Wang
    The ever-evolving transaction patterns have significantly hindered anomaly detection on emerging cryptocurrency blockchains due to the vast number of addresses and diverse anomalous behaviors. Recently, advanced Graph Anomaly Detection (GAD) approaches applied to blockchains have faced two critical challenges: \textit{adversarial pattern evolution by malicious actors} and \textit{the Out of Distribution (OOD) problem caused by varied transaction semantics on blockchains}. To address these challenges, we propose a novel framework termed \textbf{TE}mporal \textbf{M}otif-aware \textbf{G}raph \textbf{T}est-\textbf{T}ime \textbf{A}daptaion (\textbf{TEMG-TTA}). First, we comprehensively capture the 3-node temporal motif distribution of each active address using an efficient computational mechanism, enabling downstream temporal motif-aware graph learning. Second, we design a simple yet effective test-time adaptation strategy to facilitate the sharing of common patterns between training and testing graphs. Extensive experiments on 5 real-world datasets demonstrate that our proposed \textbf{TEMG-TTA} outperforms \textit{state-of-the-art} GAD approaches by an average of 37.65\%. A further case study on interpretable motif patterns reveals that \textbf{TEMG-TTA} explicitly characterizes the complex transaction patterns of anomalous addresses, thereby verifying the effectiveness of our technical designs. Our code will be made publicly available\footnote{\url{https://anonymous.4open.science/r/TEMG-TTA/}}.
    Data MiningData MiningMachine LearningMachine Learning
  763. #AI4G267

    ArogyaSutra: A Multi-Agent Framework for Multimodal Medical Reasoning in Indic Languages

    Tanmoy Halder, Akash Ghosh, Subhadip Baidya, Arijit Roy, Sriparna Saha
    Multimodal Large Language Models(MLLMs) have shown promising reasoning capabilities in general domains, yet their performance remains limited in specialized settings such as healthcare, particularly for multilingual and low-resource scenarios. This gap is critical in regions like rural India, where patients often express complex medical queries in native Indic languages and rely on multimodal inputs such as medical images. Existing MLLMs, predominantly trained on English-centric data, struggle to support such use cases, limiting equitable access to AI-driven healthcare assistance. To address this challenge, we construct a large-scale multilingual multimodal medical question–answer dataset named \textbf{ArogyaBodha} from eight heterogeneous sources, covering 31 body systems, six imaging modalities, and 21 clinical domains across English and seven major Indian languages. We further propose \textbf{\textit{ArogyaSutra}}, an actor–critic–based multi-agent framework that combines tool grounding with dual-memory mechanisms to support step-wise, reasoning-aware decision making while explicitly retaining past mistakes to prevent their repeated occurrence. The Actor predicts the answer to the multimodal query from visual and memory states, whereas the Critic evaluates actor outcomes and delivers corrective feedback, enabling iterative refinement of the reasoning process. Experiments show that our dataset and framework improve the multilingual medical reasoning accuracy of an MLLM across all Indic languages, with ablation studies validating the effectiveness of each component. Our work supports UN SDGs~3,~4, and~10 by enabling reliable multilingual medical decision support, reducing healthcare inequities, and strengthening inclusive clinical education for underserved communities.
    Natural Language ProcessingNatural Language ProcessingAgent-based and Multi-agent SystemsAgent-based and Multi-agent Systems
  764. #AI4G271

    PhysTrans: A Physics-Aware Transferable Framework for Global Cold-Start Photovoltaic Forecasting

    Meng Wan, Kaipeng Gao, Jue Wang, Siyan Fang, Xue Miao, Pufen Zhang, Sijie Chang, Peng Shi, Yangang Wang, Zhenbing Zhao
    With the rapid expansion of photovoltaic (PV) power generation worldwide, PV systems have become key to global energy construction. Accurate PV forecasting is essential for safe grid operation and renewable energy integration. However, most existing models rely heavily on site-specific historical data and perform poorly when deployed in cold-start scenarios of newly built power plants. We propose PhysTrans, a physics-aware transferable framework for cold-start PV forecasting. Firstly we design a physics-constrained residual network that utilizes a clear-sky module for better physical consistency. In further, we propose a dynamic cloud cropping method to obtain the cloud information of shaded PV stations by fitting the angle of the sun offsets. To fuse the asymmetric data, a query-based asymmetric fusion mechanism is introduced to achieve high-precision alignment of multi-modal data. We conduct experiments on global datasets, and the results show that the PhysTrans outperforms state-of-the-art models with a 13.2\% decrease in MAE in the single-site task, and also outperforms existing migration models with an average decrease in MAE of 12.7\% in the cross-sites task. Our work advances reliable and transferable PV forecasting for early-stage grid integration and contributes to SDG 7 (Affordable and Clean Energy) and SDG 13 (Climate Action), in line with the Leave No One Behind principle.
    Humans and AIHumans and AIMultidisciplinary Topics and ApplicationsMultidisciplinary Topics and ApplicationsMachine LearningMachine Learning
  765. #AI4G272

    PROB-EMOE: A Probabilistic Ensemble Mixture-of-Experts Framework for Metro Network Expansion Forecasting

    Fangyi Ding, Zhan Zhao, Zhi Li, Xudong Guo, Ning Zhang, Yamin Wang, Yihong Tang
    Forecasting Origin-Destination (OD) demand for new metro lines is critical for sustainable infrastructure planning but faces spatiotemporal out-of-distribution challenges. Existing models often struggle to capture heterogeneous interaction patterns in changing topologies and overlook inherent uncertainty and over-dispersion issues. To bridge these gaps, we propose PROB-EMOE, a planning-oriented probabilistic framework tailored for network expansion. To ensure robust generalization, we design a Mixture-of-Experts (MoE) predictor that integrates diverse expert views to capture heterogeneous demand patterns across changing topologies. To quantify extrapolation uncertainty, our framework functions as a unified probabilistic system by synergizing Deep Ensembles with a probabilistic output, effectively quantifying both data and model uncertainty. Through a systematic investigation of likelihood families, we empirically demonstrate that the Negative Binomial distribution offers the optimal fit in this context. Extensive experiments on a multi-year Shenzhen metro dataset demonstrate that our approach achieves state-of-the-art predictive performance and provides the sharpest calibrated uncertainty intervals. The framework has been deployed in a metropolitan smart-data platform to support risk-aware investment decisions.
    Data MiningData MiningMultidisciplinary Topics and ApplicationsMultidisciplinary Topics and ApplicationsUncertainty in AIUncertainty in AI
  766. #AI4H7

    When Vision-Language Models Meet Fetal Cardiac Ultrasound: Dual-Level Contrastive Learning for Out-of-Distribution Detection

    Ziyi Liu, Weihu Song, Yupeng Ma, Tianrui Liu, Yifan Gu, Haogang Zhu
    Recent advances in vision-language models (VLMs) have shown remarkable performance in medical image classification tasks. However, applying VLMs to fetal cardiac ultrasound (FCU) remains challenging due to compound distribution shifts, including covariate shifts caused by cross-center heterogeneity and semantic shifts arising from clinically non-standard views. To address this issue, we propose Dual-Level Contrastive Learning (DLCL), the first prompt-based VLM framework for out-of-distribution (OOD) detection in FCU, to the best of our knowledge. DLCL explicitly shapes the vision-language representation space through complementary local-level and global-level contrastive objectives. Specifically, local contrastive learning aligns instance-level features to mitigate covariate shifts, whereas global contrastive learning regularizes global prototypes to address semantic shifts. We conduct extensive experiments on a private multi-center FCU dataset and the public ISIC-OOD dataset to validate the proposed approach. On the challenging FCU task, DLCL achieves an AUROC of 89.61% and a harmonic mean of 80.35%, significantly outperforming state-of-the-art methods.
    Multimodal dataMultimodal dataMedical imagingMedical imaging
  767. #AI4H15

    MTP-DDA: Enhanced Drug-Disease Associations Prediction via Multi-task Learning with Multi-view Graph Convolutional Networks and Contrastive Learning

    Ming-Li Cui, Cui-Na Jiao, Dao-Hui Ge, Chun-Hou Zheng, Hui Yang, Ying-Lian Gao, Yan-Li Wang, Jin-Xing Liu
    Predicting drug-disease associations (DDAs) plays a crucial role in drug development and disease treatment. However, existing researches predominantly focus on single DDAs prediction task, often overlooking the intricate relationships among different tasks, which can further improve the performance of methods for DDAs prediction. To address this limitation, a multi-task prediction framework, capable of simultaneously predicting drug-disease, drug-protein, and disease-protein associations, is proposed, named MTP-DDA. The framework constructs three distinct graphs to reflect different relationships between biological entities. Then, based on these graphs, two sub-views and one main-view are constructed. For sub-view, corruption strategy is adopted to generate corrupted view, and Graph Convolutional Network (GCN) is employed to extract features from both the original view and its corrupted version, with contrastive learning applied to enhance feature representations. For main-view, GCN and Node2Vec are utilized to extract low-order and high-order node features respectively, and an attention mechanism is utilized for feature fusion. Finally, the node features from three views above are integrated, and the dot product operation is applied to the node features of association pairs to derive association scores, thereby enabling multi-task association prediction. Under 10-fold cross-validation, the proposed framework outperforms current methods on public datasets, demonstrating its effectiveness and robustness.
    AI4HDrug discoveryAI4HMedical knowledge representationAI4HSelf-supervised learningAI4HHealth data miningAI4HPrecision medicine
  768. #AI4H26

    D²G-TO: Task-aware and OOD-guided Discrete Graph Diffusion for Robust CNS Drug Discovery

    Xue Zhai, Chu-An Yang, Minghao Liu, Xu Dong, Han Wang, Weiwei Han, Ting Gao, LiHong Hu
    Central nervous system (CNS) drug discovery is constrained by an immense and sparse chemical search space. Meanwhile, molecules that simultaneously achieve brain penetration, target efficacy, and synthesizability are extremely scarce. However, existing generative models rarely couple rigorous multi-objective control with robustness to distribution shift, limiting their reliability in realistic CNS design. We introduce D²G-TO, a task-aware and out of-distribution (OOD)-guided discrete graph diffusion framework that unifies multi-pharmacological properties with structural distribution guidance. A novel Structural Similarity Guidance mechanism steers generation toward in-distribution regions while repelling OOD modes, maintaining structural distributional consistency in realistic scenarios. Across BBBP, BACE, and QM9 benchmarks, D²G-TO achieves strong validity, diversity, and other metrics. In an Alzheimer's disease case study, we subject the generated molecules to cross-property pharmacological prediction and systematic ADMET profiling, followed by structure based molecular docking against BACE-1 to assess binding-mode plausibility. D²G-TO identifies candidates that jointly satisfy blood–brain barrier permeability, β-site amyloid precursor protein cleaving enzyme 1 inhibition, and synthetic accessibility. Thus, D²G-TO has the potential to serve as an efficient in silico engine for early-stage CNS drug design. The code is available at https://github.com/zhaix922/DDG_TO.
    Health data miningHealth data miningDrug discoveryDrug discoveryExplainable AIExplainable AI
  769. #AI4H51

    GroupMIL: Semantic Group Based Multiple Instance Learning for Whole Slide Image Analysing

    Zhao Yao, Zhenmi Xie, Mengxin Tian, Guoqing Wu, Yaonan Wang, Min Liu
    Whole Slide Image (WSI) analysis faces challenges due to gigapixel resolutions and slide-level weak supervision. Multiple Instance Learning (MIL) serves as a pivotal method for this task. However, existing MIL frameworks often fail to exploit the inherent redundancy of tissue patterns or the semantic coherence among similar patches within a WSI. We propose GroupMIL, a novel framework that introduces a differentiable grouping mechanism into the MIL framework. This approach enables the automatic emergence of semantic segments using only slide-level labels. We specifically introduce a multi-stage grouping block and a hierarchical aggregator, which progressively fuse features within and across groups to construct a robust slide-level representation. Extensive experiments on multiple public datasets across cancer subtyping and survival prediction tasks demonstrate that GroupMIL consistently surpasses state-of-the-art performance.
    Medical diagnosisMedical diagnosisHealth data miningHealth data miningMedical imagingMedical imaging
  770. #AI4H56

    sc2Flow: Mitigating Mean Prediction Bias in Single-Cell Perturbation with Dual-Stage Flow Matching

    Hanwen Lyu, Jiawei Luo
    Predicting single-cell gene responses to chemical perturbations is vital for personalized therapy, yet existing deep learning models face significant hurdles. Standard regression-based approaches suffer from "mean prediction bias," failing to capture cellular heterogeneity, while current Flow Matching (FM) methods struggle with the extreme sparsity of single-cell data and the dimensionality mismatches inherent in transformer-based generation. To address these challenges, we introduce sc2Flow, a dual-stage framework that unifies discrete and continuous flow matching. sc2Flow first predicts the binary mask of expressed genes and subsequently models their quantitative levels using a scalable Transformer. This decoupling effectively resolves dimensionality conflicts and eliminates parameter redundancy. Extensive benchmarks on Sci-Plex3 and other datasets demonstrate that sc2Flow significantly outperforms state-of-the-art baselines on distribution matching, successfully mitigating mean bias to preserve critical biological heterogeneity.
    Genomic data analysisGenomic data analysisDrug discoveryDrug discovery
  771. #AI4H58

    PhysioGMC: Generalizable Multi-modal Coordination for Physiological Signals

    Mudi Zhang, Anirudh Nakra, Min Wu
    Physiological signals are widely used for health assessment in clinical and daily-life settings. Established physiological signals collected for inpatient and clinical use are often impractical for patients' daily home use due to their complexity and resource demands. In contrast, wearable signals enable continuous monitoring in everyday life, but many have limited reliability and are not widely understood or accepted in clinical practices. To leverage complementary strengths of clinical and wearable physiological signals, we propose PhysioGMC, a Generalizable Multi-modal Coordination framework for Physiological signals that explicitly accounts for their strong inter-subject variability. PhysioGMC incorporates both clinical and wearable modalities into the training process to improve cross-subject performance when only a single wearable modality is available at deployment. The framework introduces a cross-modal contrastive learning module comprising two contrastive losses to jointly learn label-relevant, subject-agnostic representations across modalities. The self-supervised contrastive loss aligns latent features across modalities, while the supervised contrastive loss encourages learning label-discriminative features that are invariant to subject identity. Experiments on cardiovascular health monitoring and sleep staging tasks demonstrate that PhysioGMC consistently outperforms existing methods, achieving superior cross-subject performance at test time using only wearable modalities, such as photoplethysmography (PPG).
    Multimodal dataMultimodal dataMedical knowledge representationMedical knowledge representationRemote monitoringRemote monitoring
  772. #AI4H63

    LLM as Clinical Graph Structure Refiner: Enhancing Representation Learning in EEG Seizure Diagnosis

    Lincan Li, Zheng Chen, Yushun Dong
    Electroencephalogram (EEG) signals are vital for automated seizure detection, but their inherent noise makes robust representation learning challenging. Existing graph construction methods, whether correlation-based or learning-based, often generate redundant or irrelevant edges due to the noisy nature of EEG data. This significantly impairs the quality of graph representations and limits downstream task performance. Motivated by the remarkable reasoning and contextual understanding capabilities of large language models (LLMs), we explore the idea of using LLMs as graph edge refiners. Specifically, we propose a two-stage framework: we first verify that LLM-based edge refinement can effectively identify and remove redundant connections, leading to significant improvements in seizure detection accuracy and more meaningful graph structures. Building on this insight, we further develop a robust solution where the initial graph is constructed using a transformer-based edge predictor and multilayer perceptron, assigning probability scores to potential edges and applying a threshold to determine their existence. The LLM then acts as an edge set refiner, making informed decisions based on both textual and statistical features of node pairs to validate the remaining connections. Extensive experiments on TUSZ dataset demonstrate that our LLM-refined graph learning framework not only enhances task performance but also yields cleaner and more interpretable graph representations.
    Clinical decision support systemClinical decision support systemHealth data miningHealth data miningLLM in medicineLLM in medicine
  773. #AI4H68

    CardioBench: Do Echocardiography Foundation Models Generalize Beyond the Lab?

    Darya Taratynova, Ahmed Aly, Numan Saeed, Mohammad Yaqub
    Foundation models are reshaping medical imaging, yet their application in echocardiography remains limited, hindered by a heavy reliance on private datasets that prevent reproducible comparison. Echocardiography poses unique challenges, including noisy acquisitions, high frame redundancy, and limited diverse public datasets. To address this, we introduce CardioBench, a comprehensive benchmark for echocardiography foundation models. Specifically, CardioBench unifies eight publicly available datasets into a standardized suite spanning four regression and five classification tasks, covering functional, structural, diagnostic, and view recognition endpoints. Leveraging this framework, we evaluate several leading foundation models, including cardiac-specific, biomedical, and general-purpose encoders, under consistent zero-shot, probing, and alignment protocols. Our analysis reveals that while general-purpose encoders transfer well and often close the gap with probing, they struggle significantly with fine-grained distinctions like view classification and subtle pathology recognition. Results indicate that models capturing temporal cardiac dynamics perform best on functional tasks, while retrieval-based approaches generalize more consistently across datasets. By releasing preprocessing, splits, and public evaluation pipelines, CardioBench establishes a reproducible reference point to guide the architectural design of future echocardiography and possibly other medical imaging foundation models.
    Medical diagnosisMedical diagnosisMedical imagingMedical imaging
  774. #AI4H71

    TvaraNet: A Lightweight Mamba Neural Network for Real-Time Medical Image Segmentation

    Sridhatta Jayaram Aithal, Vandana Bharti
    Medical image segmentation models based on vision mamba architectures have recently shown strong performance with improved efficiency over convolutional and transformer-based models. However, existing lightweight and ultralight variants often suffer from boundary degradation and inconsistent shape prediction, thereby limiting their clinical reliability. We propose TvaraNet, an extremely lightweight segmentation network designed to preserve boundary fidelity under strict efficiency constraints. TvaraNet contains only 0.037M parameters and requires only 0.060 GFLOPs, enabling deployment in resource-constrained environments.
    The architecture introduces two core lightweight modules. Parallel Skip Mamba Averaging (PASMA) enhances long-range dependency modeling by injecting globally aggregated channel context into Mamba blocks. Dilated Asymmetric Spatial Mixer Attention (DASMA) improves boundary-aware feature refinement in skip connections through efficient multi-scale spatial modulation. In addition, training-time regularization strategies, including an Adaptive Gated Gradient Reversal Layer (AGGRL), an adaptive Singular Value Decomposition (SVD) loss, and a Boundary-weighted Cross-Entropy loss (BWCE), are employed to suppress redundant features and emphasize object boundaries without adding inference overhead. Experiments on ISIC-17, ISIC-18, and a spinal segmentation dataset demonstrate that TvaraNet consistently outperforms existing lightweight models and achieves competitive or superior boundary-aware performance compared to heavier architectures. Notably, it improves mean IoU by +1.50, +1.42, and +3.35 over Ultralight VMUNet on the respective datasets, establishing TvaraNet as a practical solution for boundary-consistent medical image segmentation on edge and low-resource platforms. Code is available at: \url{https://github.com/MindsLab-GitHub/TvaraNet}
    Medical imagingMedical imagingMedical diagnosisMedical diagnosis
  775. #AI4H75

    BioDisco: Multi-Agent Hypothesis Generation with Dual-Mode Evidence, Iterative Feedback and Temporal Evaluation

    Yujing Ke, Kevin George, Kathan Pandya, Gerrit Großmann, David B. Blumenthal, Maximilian Sprang, David A. Selby, Sebastian Vollmer
    Identifying novel hypotheses is essential to scientific research, yet this process risks being overwhelmed by the sheer volume and complexity of available information. Existing automated methods often struggle to generate novel and evidence-grounded hypotheses, lack robust iterative refinement and rarely undergo rigorous temporal evaluation for future discovery potential. To address this, we propose BIODISCO, a multi-agent framework that draws upon language model-based reasoning and a dual-mode evidence system (biomedical knowledge graphs and automated literature retrieval) for grounded novelty, integrates an internal scoring and feedback loop for iterative refinement, and validates performance through pioneering temporal and human evaluations and a Bradley-Terry paired comparison model for statistical assessment. Evaluations suggest improved novelty and significance relative to ablated configurations and a generalist biomedical agent. Designed for flexibility and modularity, BIODISCO allows seamless integration of custom language models or knowledge graphs, and can be run with just a few lines of code.
    Biomedical NLPBiomedical NLPLLM in medicineLLM in medicineMedical knowledge representationMedical knowledge representationAI4HDrug discovery
  776. #AI4H88

    DNA-PPG: A Foundation Model for Photoplethysmography via Dual Neighborhood Alignment

    Yizhang Yang, Jinshi Cui, Wenlong Wu, Chunlong Tu, Yang Zhang, Junshi Lu, Li Wang, Guosong Gao
    Existing physiological foundation models face two limitations: rigid hard-negative sampling indiscriminately repels morphologically similar samples, distorting the natural manifold; and coarse discretization strategies sever the intrinsic continuity of physiological states, inducing precision loss. To address these challenges, we propose DNA-PPG, a novel pre-training framework anchored in Dual Neighborhood Alignment. DNA-PPG integrates the Morphology-Aware Self-Supervised Branch using Time-Frequency Soft Weighting to capture universal signal dynamics shared among diverse subjects, with the Physiological Semantic Alignment Branch that projects physiological indicators into continuous semantic space to explicitly embed precise physiological priors into the representation space. We scale the pre-training to 10.7 million PPG segments from over 8,400 subjects to ensure robust generalization. Extensive evaluations on six downstream benchmarks demonstrate that DNA-PPG significantly outperforms state-of-the-art baselines, achieving an 18% reduction in regression error and an 11.5% improvement in classification performance. These results validate DNA-PPG as a robust, universal feature extractor for diverse photoplethysmography applications.
    Self-supervised learningSelf-supervised learningHealth data miningHealth data miningRemote monitoringRemote monitoring
  777. #AI4H89

    GMENet: Generative Mixture of Experts Network for Multi-Center Glioma Diagnosis with Incomplete Imaging Sequences

    Pengfei Song, Fangjin Liu, Wenwen Zeng, Yonghuang Wu, Chengqian Zhao, Feiyu Yin, Xuan Xie, Jinhua Yu
    Contemporary glioma diagnosis integrates molecular features with histopathology to guide clinical decision-making. However, in clinical settings, divergent imaging protocols result in incomplete MRI sequences, leading to two primary challenges: forcing existing frameworks to discard a large portion of clinical data during training and consequently limiting their clinical applicability. To address these limitations, we propose GMENet, a Generative Mixture of Experts Network for multi-center glioma diagnosis with incomplete imaging sequences. Firstly, we design a Cross-attention-based Gated Generation Module that synthesizes missing sequence features from available sequences via cross-attention and dynamic gating mechanisms, incorporating a cycle-consistency loss to preserve semantic integrity. Secondly, we introduce a Dynamically Weighted Experts Fusion Module that performs mixture-of-experts interaction and confidence-aware fusion over original and synthesized dual-sequence features for multi-task prediction. We evaluate GMENet on a multi-center cohort of 1,241 subjects from four in-house datasets and two public repositories. Experiments show that GMENet expands clinically usable training data by 97%, relative to complete-sequence-only data. Furthermore, it consistently outperforms state-of-the-art methods trained on complete data, demonstrating improved robustness under cross-center distribution shifts. Code is available at: https://github.com/spf-sd/GMENet.
    Medical diagnosisMedical diagnosisMedical imagingMedical imagingPrecision medicinePrecision medicine
  778. #AI4H96

    Knowing When Not to Predict: Self Supervised Learning and Abstention for Safer DR Screening

    Muskaan Chopra, Lorenz Sparrenberg, Jan H. Terheyden, Rafet Sifa
    Self-supervised learning (SSL) is now a standard way to pretrain medical image models, but performance is still mostly judged by downstream accuracy. For safety-critical screening tasks such as diabetic retinopathy grading, this is not enough: a model must also know when its predictions are unreliable and defer uncertain cases for clinical review. In this work, we examine how the length of SSL pretraining influences confidence calibration and confidence-based abstention.
    We evaluate multiple SSL checkpoints under a fixed fine-tuning protocol and assess calibrated confidence, coverage, selective accuracy, and selective macro-F1. Across datasets and data regimes, SSL pretraining improves selective prediction compared to training from scratch. Unlike prior SSL studies that primarily evaluate downstream accuracy or AUROC, we analyze how SSL pretraining duration influences calibration and selective prediction behavior under confidence-based abstention. However, once accuracy saturates, selective performance can still change markedly across checkpoints, and longer pretraining does not consistently improve reliability. These results underscore the importance of abstention-aware evaluation and suggest that pretraining length should be treated as an important reliability-related design choice rather than only a computational detail. Code is available at https://github.com/29
    muskaan712/ijcai-knowing-when-not-to-predict.
    Medical diagnosisMedical diagnosisSelf-supervised learningSelf-supervised learningMedical knowledge representationMedical knowledge representationMedical imagingMedical imagingPublic healthPublic health
  779. #AI4H102

    SBSDM: A Style-aware Bidirectional Stream Diffusion Model for CT-to-PET Synthesis

    Jiahao Zheng, Yu Tang, Caiwen Jiang, Zhanjie Zhang, Yongcan Luo, Dapeng Wu
    CT-to-PET synthesis aims to synthesize PET images from the widely available and lower-cost CT scans to address the high cost and additional radiation exposure associated with PET scanning.
    However, CT-to-PET synthesis faces two key challenges due to the sequential correlation of volumetric imaging: preserving smooth transition between adjacent slices and ensuring style consistency across long-range slices.
    To overcome these limitations, we propose the Style-aware Bidirectional Stream Diffusion Model, which ensures both inter-slice continuity and global style consistency with low computational cost.
    Specifically, we first utilize a Vector Quantized Variational Autoencoder to encode bidirectional adjacent CT slices into latent codes. A Neighbor Attention module is then introduced to capture transition patterns among these latent codes, ensuring structural continuity. To enhance style consistency, we further design a Prototype Prompting mechanism to construct a feature pool from long-range slices. Global style prototypes are extracted from the feature pool and dynamically integrated into the generation process, guiding the model’s attention toward consistent stylistic features across slices. Extensive experiments demonstrate that our approach significantly outperforms state-of-the-art methods.
    Multimodal dataMultimodal dataMedical imagingMedical imaging
  780. #AI4H106

    Prior Knowledge-enhanced Spatio-temporal Epidemic Forecasting

    Sijie Ruan, Jinyu Li, Jia Wei, Zenghao Xu, Jie Bao, Junshi Xu, Junyang Qiu, Shuliang Wang, Xiaoxiao Wang, Hanning Yuan
    Spatio-temporal epidemic forecasting is critical for public health management, yet existing methods often struggle with insensitivity to weak epidemic signals, over-simplified spatial relations, and unstable parameter estimation. To address these challenges, we propose the Spatio-Temporal priOr-aware Epidemic Predictor (STOEP), a novel hybrid framework that integrates implicit spatio-temporal priors and explicit expert priors. STOEP consists of three key components: (1) Case-aware Adjacency Learning (CAL), which dynamically adjusts mobility-based regional dependencies using historical infection patterns; (2) Space-informed Parameter Estimating (SPE), which employs learnable spatial priors to amplify weak epidemic signals; and (3) Filter-based Mechanistic Forecasting (FMF), which uses an expert-guided adaptive thresholding strategy to regularize epidemic parameters. Extensive experiments on real-world COVID-19 and influenza datasets demonstrate that STOEP outperforms the best baseline by 11.1% in RMSE. The system has been deployed at a provincial CDC in China to facilitate downstream applications.
    Timeseries predictionTimeseries predictionPublic healthPublic health
  781. #AI4H107

    DIAM: Adaptive Drug Repositioning with Decoupled Biological Mechanism and Instance-Aware Modulation

    Kerui Xu, Keyuan Xu, Shuheng Yin, Hang Qiu
    Drug repositioning has emerged as an attractive drug development strategy with deep learning-based computational methods showing great potential in predicting Drug-Disease Associations (DDAs). However, dominant computational paradigms typically rely on Random Negative Sampling (RNS) and static embedding fusion, leading to two fundamental limitations. First, RNS treats unobserved pairs uniformly, resulting in coarse decision boundaries that fail to distinguish true associations from ambiguous candidates. Second, static fusion applies a monolithic combination of heterogeneous features, failing to adapt to the sample-specific dominance of different biological mechanisms. To address these issues, we propose DIAM, which establishes a mechanism-adaptive paradigm by explicitly decoupling structural and molecular signals. Specifically, DIAM introduces a Dual-Stream Biological Mechanism Decoupling module to construct global structural propagation and local molecular interaction views explicitly. Leveraging these views, we design a biological plausibility score to guide the hard negative sampling, enforcing finer-grained decision boundaries. Furthermore, an Adaptive Residual Gating (ARG) is devised to perform instance-aware modulation, dynamically weighing the contribution of global and local views for each specific pair. Extensive experiments on three benchmark datasets demonstrate that DIAM outperforms seven state-of-the-art methods. A case study on Alzheimer's disease further validates the model's effectiveness in identifying potential candidate drugs for practical application.
    Health data miningHealth data miningMedical knowledge representationMedical knowledge representationDrug discoveryDrug discovery
  782. #AI4H110

    CDMIQA: A Cross-Domain Perceptual Method and Benchmark Dataset for Medical Image Quality Assessment

    Leilei Huang, Yue Sun, Mingxiang Wu, Wei Ke, Siyi Xun, Muzhen He, Tao Tan
    Medical image quality assessment (IQA) serves as a critical safeguard for precise clinical diagnosis and treatment. However, existing methods still face challenges arising from data scarcity and heterogeneity across imaging domains, which confine solutions to domain-specific designs and limit their cross-domain generalization ability. In response, we construct a multi-domain and multi-organ dataset comprising 9,105 2D and 3D medical images across three imaging domains and 18 organs, annotated by radiologists. Building upon this, we propose a cross-domain universal medical IQA method termed CDMIQA, which integrates efficient feature extractors with Hierarchical Perceptual Encoding Modules to capture and refine multi-level perceptual features while mitigating interference from noise and artifacts. Notably, the Adaptive Semantic Perception Module is designed to extract semantic features; it enhances adaptability to domain-specific images by dynamically encoding quality variations across different imaging domains and organs. Furthermore, a Top-down Feature Fusion Module is utilized to progressively aggregate features under the guidance of semantic information, reinforcing the model's feature representation capability. Experimental results on the proposed dataset demonstrate that our method outperforms standard competitors and exhibits superior generalization ability across diverse domains. Our code and dataset are available at https://github.com/Leilei-Huang-work/CDMIQA.
    Medical imagingMedical imaging
  783. #AI4H115

    HealthMamba: An Uncertainty-aware Spatiotemporal Graph State Space Model for Effective and Reliable Healthcare Facility Visit Prediction

    Dahai Yu, Lin Jiang, Rongchao Xu, Guang Wang
    Healthcare facility visit prediction is essential for optimizing healthcare resource allocation and informing public health policy. Despite advanced machine learning methods being employed for better prediction performance, existing works usually formulate this task as a time-series forecasting problem without considering the intrinsic spatial dependencies of different types of healthcare facilities, and they also fail to provide reliable predictions under abnormal situations such as public emergencies. To advance existing research, we propose HealthMamba, an uncertainty-aware spatiotemporal framework for accurate and reliable healthcare facility visit prediction. HealthMamba comprises three key components: (i) a Unified Spatiotemporal Context Encoder that fuses heterogeneous static and dynamic information, (ii) a novel Graph State Space Model called GraphMamba for hierarchical spatiotemporal modeling, and (iii) a comprehensive uncertainty quantification module integrating three uncertainty quantification mechanisms for reliable prediction. We evaluate HealthMamba on four large-scale real-world datasets from California, New York, Texas, and Florida. Results show HealthMamba achieves around 6.0% improvement in prediction accuracy and 3.5% improvement in uncertainty quantification over state-of-the-art baselines.
    Health data miningHealth data miningPublic healthPublic health
  784. #AI4H116

    QA-MoE: Quality-Aware and Stable Multimodal Mixture-of-Experts for Robust Clinical Prediction in Noisy and Missing-Modal Settings

    Linpeng Sun, Victor S. Sheng
    Clinical prediction increasingly relies on multi-modal inputs, where reliability and efficiency are crucial for real-world deployment. However, mainstream fusion and MoE gating typically treat all available modalities as uniformly beneficial and allow noisy or weakly informative modalities to perturb routing, leading to instability, routing collapse, and miscalibrated confidence under missingness and shift. We propose QA-MoE, a Quality-Aware and stable multimodal Mixture-of-Experts that decouples reliability estimation from routing to enable robust, sparse, and interpretable fusion. QA-MoE adopts a modular architecture where each modality is initially encoded into a shared embedding space. To deal with structurally missing data, we employ a completion pathway that maintains a consistent interface. Unlike standard approaches, QA-MoE separates reliability estimation from the routing process. We propose an Evidential Quality Scorer to measure epistemic uncertainty, which then guides a Stability-Enhanced Subset Selector to filter out noisy modalities on the fly. Additionally, we include a Ternary Expert Aggregation mechanism acting as a specialized branch to stabilize predictions when data missingness is severe. Evaluations on clinical benchmarks (ADNI for Alzheimer’s staging and MIMIC-IV for Length-of-Stay) demonstrate that QA-MoE outperforms strong multimodal baselines, improving reliability while cutting down unnecessary computation. This indicates that QA-MoE offers a robust solution for multimodal decision support, especially in clinical settings prone to noise and missing data.
    Medical diagnosisMedical diagnosisMultimodal dataMultimodal dataPublic healthPublic health
  785. #AI4H132

    Uncertainty-Guided Adaptive Conservative Offline Reinforcement Learning for Safer Mechanical Ventilation

    Huidong Liu, Hang Yu, Qiyang Zhang, Jiarui Dou, Xianlei Long, Jiantao Shi, Fuqiang Gu
    Mechanical ventilation (MV) is essential in intensive care units (ICUs), yet conventional protocols lack personalization and risk harmful over- or under-ventilation. Offline reinforcement learning (ORL) enables policy optimization from retrospective clinical data without unsafe online interaction, but existing methods are highly sensitive to distributional shift and out-of-distribution (OOD) actions, limiting their reliability in complex clinical settings. To address these challenges, We propose UBER-CQL (Uncertainty-Balanced Exploration and Robust Conservative Q-Learning), a robust ORL algorithm for safe decision-making under dataset shift. UBER-CQL integrates heteroscedastic Bayesian neural networks with conservative Q-learning to model posterior Q-value uncertainty, which is used to adaptively penalize unreliable high-risk actions while maintaining performance within the data support. We further design numerically stable objectives for conservative Bayesian value estimation. Experiments on in-distribution and OOD subsets of MIMIC-III and eICU demonstrate that UBER-CQL outperforms state-of-the-art ORL and clinician baselines, producing safer and more effective MV strategies.
    Clinical decision support systemClinical decision support systemMedical diagnosisMedical diagnosisTreatment recommendationTreatment recommendationPrecision medicinePrecision medicine
  786. #AI4H137

    BFHD: Bidirectional Feature Harmonization Decomposition for Heterogeneous Clinical Assessments

    Yuanhao Zhuo, Zixi Qin, Ling Qin, Wanqing Li
    Clinical assessments are often collected using heterogeneous assessment systems across centers and time, leading to records that mix different but related sets of measurements. This motivates harmonization beyond total-score linking. We formulate clinical harmonization at the measurement level as a bidirectional recoverability problem: given paired observations from two assessment systems, the goal is to identify which measurements can be reliably translated in both directions within an application-defined tolerance, while separating non-translatable components. We propose Bidirectional Feature Harmonization Decomposition (BFHD), a feasibility-driven framework that enforces bidirectionally coupled translation and uses feature-wise output gating to produce an explicit decomposition in the original measurement space. Experiments on synthetic data and real clinical assessment pairs show that BFHD achieves broader feasible harmonization coverage and improved subset stability compared to baselines.
    Clinical decision support systemClinical decision support systemMultimodal dataMultimodal dataPublic healthPublic health
  787. #AI4H143

    Two-Fold Patch Perturbation for Efficient Self-Supervised Learning in 3D Medical Imaging

    Tirthajit Baruah, Kabir Jamadar, Punit Rathore
    Self-supervised pre-training has become a key paradigm for reducing annotation costs in 3D medical imaging, yet many recent approaches rely on complex objectives or incur substantial computational overhead. We propose a simple and efficient self-supervised pre-training framework for 3D medical images based on a two-fold patch-wise perturbation strategy. The method applies Bernoulli patch masking and discrete rotations, and trains a shared encoder with a three-head objective for reconstruction, perturbation localization, and rotation prediction. This design encourages spatially aware and transferable representations while remaining computationally lightweight. Experiments across diverse segmentation and classification benchmarks, including modality-shift scenarios, demonstrate consistent improvements over general self-supervised baselines and competitive or superior performance compared to recent medical self-supervised methods, while requiring substantially less memory, computation, and training time than the state-of-the-art pre-training pipelines.
    Self-supervised learningSelf-supervised learningMedical imagingMedical imagingMedical diagnosisMedical diagnosisMedical knowledge representationMedical knowledge representation
  788. #AI4H162

    One-Shot Federated Class-Incremental Learning for Medical Imaging via Variational Feature Transfer

    Pedro H. Barros, Omid Orang, Giulia Zanon de Castro, Heitor S. Ramos, Frederico Gadelha Guimarães
    Federated learning (FL) enables privacy-preserving collaboration for medical image analysis across decentralized institutions but faces major challenges from non-IID data distributions, high communication overhead in multi-round protocols, and catastrophic forgetting when models must adapt to sequentially arriving tasks. These issues are particularly critical in healthcare, where repeated client participation is often infeasible. We address this setting by proposing a novel class-incremental continual learning (CL) model for a one-shot FL paradigm, in which each task introduces new classes, clients observe heterogeneous and evolving class distributions, and communication with the server occurs only once. Clients estimate class-conditional feature distributions via Variational Inference (VI) from private data and transmit compact statistics to the server, which synthesizes features to train a global classifier in a single communication round. The server aggregates these distributional summaries, synthesizes feature embedding, and learns a global classifier without revisiting real past data or contacting clients again. Our major novelty is continual adaptation at the distribution level via synthetic replay from stored class mixtures, complemented by lightweight distillation. This approach substantially mitigates catastrophic forgetting while consistently enhancing recognition of newly introduced classes. Extensive experiments on multiple medical imaging benchmarks demonstrate that our method outperforms both state-of-the-art one-shot FL and class-incremental CL approaches. Compared with existing CL models, it achieves 90–97% average accuracy with near-zero forgetting (≤ 0.10%) across different tasks and evaluation settings.
    Federated learningFederated learningContinuous learningContinuous learningMedical imagingMedical imaging
  789. #AI4H165

    STAR-Net: Physics Inspired Spectral Topology Aware Reconstruction Network for Single-View Fluorescence Molecular Tomography

    Xiangzheng Li, Jian Zhang, Mengxiang Chu, Xiaoli Luo, Hongbo Guo, Xiaowei He
    Fluorescence molecular tomography (FMT) serves as a pivotal modality for preclinical tumor screening. While single-view FMT offers distinct advantages in data acquisition efficiency and cost-effectiveness, the scarcity of projection views severely exacerbates photon scattering-induced depth ambiguity, rendering 3D volumetric recovery a highly ill-posed inverse problem. To address these challenges, we propose a physics-inspired spectral topology aware reconstruction network (STAR-Net). Specifically, STAR-Net establishes a synergistic framework: initially, a frequency domain decoupling strategy is introduced to simulate the physical characteristics of diffuse light fields; building on this, a differentiable inverse spectral gating (DISG) mechanism is utilized to explicitly impose low-pass spectral regularization for precise depth recovery; and further, a dual-domain synergistic module is integrated to dynamically fuse spatial and frequency features, achieving high-fidelity detail preservation. Extensive experiments on the Digimouse benchmark demonstrate that the proposed STAR-Net achieves the highest dice coefficient under single-view conditions, validating that explicit spectral topology modeling is a powerful paradigm for mitigating depth ambiguity.
    Medical imagingMedical imagingExplainable AIExplainable AI
  790. #AI4H167

    Patient-Visit-Spanned Hypergraph Learning for EHR-based Diagnosis Prediction

    Ye Yuan, Haiyan Wang, Lun Hu, Xin Luo
    Hypergraphs effectively model complex interactions in structured Electronic Health Records (EHR). Consequently, Hypergraph Neural Networks (HGNNs) are commonly applied to EHR-based diagnosis prediction. However, existing HGNNs struggle to capture patient-visit long-range dependencies when processing EHR-derived hypergraphs. To tackle this issue, we propose a Patient-Visit-Spanned HyperGraph Learning (PVHGL) framework specifically designed for diagnosis prediction. Concretely, PVHGL initially constructs a unified patient-visit hypergraph that integrates visit records from all patients, enabling the capture of shared healthcare patterns across the patient population. Subsequently, it incorporates a Transformer architecture enhanced by two structure-encoding matrices to facilitate one-step message propagation, which preserves both local and global hypergraph structural properties effectively. Additionally, the framework integrates a medical code co-occurrence matrix to explicitly guide the learning process by highlighting critical medical code interactions. Comprehensive experiments on three real-world datasets demonstrate that the proposed PVHGL significantly outperforms state-of-the-art baselines in diagnosis prediction.
    Clinical decision support systemClinical decision support systemMedical diagnosisMedical diagnosisEHR analysisEHR analysis
  791. #AI4H172

    Dual-Channel Semantic-Enhanced Combinatorial Medication Recommendation via Knowledge Distillation

    Jiawei Wen, Jiabao Guo, Zhaohai Bai, Kaishun Wu, Ma Lin, Haodi Zhang
    As a vital task in healthcare, combinatorial medication recommendation aims to generate drug combinations tailored to patient health status.
    Precisely capturing the rich semantic information within clinical narratives is crucial for achieving this goal. However, existing approaches primarily rely on isolated identifiers (i.e. patient IDs, drug codes), failing to leverage the inherent semantic associations between patient conditions and medication descriptions. To fill this gap, we propose the Dual-Channel Semantics-Enhanced Network (DCSENet), a novel dual-channel framework that explicitly incorporates context-rich clinical narratives knowledge. DCSENet fine-tunes domain-adapted pre-trained language models (LMs) to capture semantic correlations between patient status and medication narratives. A transformer-based dual-channel decoder decodes the semantic information at the disease-level and the patient-level respectively. The disease-level channel focuses on the natural text semantic associations between diseases and drugs, while the patient-channel provides personalized features. To mitigate the prohibitive computational overhead of the LMs in clinical deployment, we introduce an attention-map-based knowledge distillation mechanism that efficiently transfers semantic knowledge from the LMs into an identifier-based (ID-based) target model. Extensive experiments on MIMIC-Ⅲ and MIMIC-Ⅳ datasets demonstrate that DCSENet outperforms existing state-of-the-art methods in recommendation accuracy while maintaining a low computational cost.
    Clinical decision support systemClinical decision support systemTreatment recommendationTreatment recommendation
  792. #AI4H174

    GeoSFLoRA: Geometry-Conditioned Spectral Flow Low-Rank Adaptation for 2D-to-3D Transfer in Medical Image Segmentation

    Qin Hao, Bonian Chen, Shengwei Tian, Long Yu
    Accurate volumetric medical image segmentation is critical for clinical diagnosis, yet adapting two-dimensional vision foundation models (VFMs) to three-dimensional medical imaging remains challenging. While parameter-efficient fine-tuning (PEFT) methods such as LoRA and adapter-based schemes provide efficient alternatives to full fine-tuning, their geometry-agnostic parameter-space adaptations are insufficient to reconcile discrepancies induced by anisotropic voxel spacing and inter-slice discontinuities. We propose GeoSFLoRA, a geometry-constrained spectral-flow low-rank adaptation framework for efficient 2D-to-3D transfer learning. GeoSFLoRA adapts frozen 2D pretrained vision backbones by operating within a fixed low-rank spectral subspace induced by pretrained linear projections. For each adapted layer, a truncated singular value decomposition is computed once and kept frozen, preserving the original pretrained bases. A lightweight Geometry-Conditioned Encoder (GCE) extracts local volumetric descriptors, which are mapped to token-conditional residuals in the singular-value space, enabling bounded and geometry-aware spectral modulation. GeoSFLoRA consistently improves Dice and HD95 on BraTS20, MSD-Prostate, and MSD-Lung, approaching full fine-tuning performance and demonstrating an effective paradigm for 2D-to-3D medical image segmentation. The code is publicly available at https://github.com/chenbn266/GeoSFLoRA.
    Medical imagingMedical imaging
  793. #AI4H176

    MedFiTRG: Jointly Learning Dynamic Temporal and Cross-Patient Graphs for Clinical Outcome Prediction

    Shivani Gupta, Hari Om Kumar, Joydeep Chandra
    Integrating heterogeneous clinical modalities, structured electronic health records (EHRs), clinical text, and medical imaging is crucial for reliable clinical prediction, yet real-world data are often sparse and imbalanced. Furthermore, prior approaches treat temporal dynamics and inter-patient relationships in isolation, overlooking the dynamic interaction of patient trajectories across populations. We introduce a modality-enhanced dynamic temporal relational graph (MedFiTRG), a unified framework that jointly models sparsity, temporal dynamics, and cross-patient relational dependencies. MedFiTRG leverages modulated graph neural networks (MGNN) to learn modality-aware embeddings, enabling meaningful representation of sparse modalities through adaptive feature modulation. These embeddings are integrated into a temporal relational graph (TRG), where directed intra-patient edges capture longitudinal progression and dynamic inter-patient edges model population-level similarities for synchronized temporal-relational reasoning.
    Extensive experiments on large-scale real-world datasets across four clinical tasks demonstrate that MedFiTRG achieves superior or comparable performance against state-of-the-art baselines, improving Macro-F1 from 0.155 to 0.310 for length of stay (LOS) classification and achieving an AUROC of 0.939 for mortality prediction (↑6.45%).
    The code is available at https://anonymous.4open.science/r/MedFiTRG-2714
    Clinical decision support systemClinical decision support systemMedical diagnosisMedical diagnosisHealth data miningHealth data miningMultimodal dataMultimodal dataExplainable AIExplainable AI
  794. #AI4H182

    VGDM: Visual Localization-Guided 3D Dental Segmentation via Extrinsic–Intrinsic Bridging

    Hanbin Fang, Shidong Yang, Fan Duan, Shuo Wang, YanHeng Zhou, Li Chen
    3D dental segmentation is a key task in digital dentistry. In real intraoral scans data (IOS), occlusion, scanning noise, and reconstruction artifacts often break down the geometric separation structure between teeth, resulting in adjacent teeth being incorrectly merged or a single tooth being over-segmented. Since existing point cloud or mesh-based methods usually rely on local neighborhood consistency, when there are spurious geometric connections, features will diffuse across instances along geometric shortcuts, resulting in instance-level error propagation. To address this issue, we propose VGDM (Visual-Guided Diffusion Modulation), which serves as a bridge between extrinsic visual localization cues and intrinsic surface features under detection settings, enabling the extrinsic cues to regulate the propagation of intrinsic surface features. Instead of global propagation on the entire intraoral scan mesh, VGDM uses the single-view 2D detection results to roughly localize the tooth region and construct local 3D surface patches based on it. Within the patch, we soft-constrain feature propagation by visual cues to suppress the cross-instance propagation generated along spurious geometric connections, and introduce a dual-stream diffusion structure to improve the overall robustness. Experimental results on the largest public intraoral dataset(Teeth3DS) show that VGDM can significantly improve the segmentation rate of tooth instances and effectively reduce the merging and over-segmentation of adjacent teeth.
    Multimodal dataMultimodal dataMedical imagingMedical imaging
  795. #AI4H210

    LLM-Guided Monte Carlo Tree Search over Knowledge Graphs: Composing Mechanistic Explanations for Drug–Disease Pairs

    Rishabh Jakhar, Michel Dumontier, Remzi Celebi
    Extracting multi-step explanations from knowledge graphs poses a combinatorial challenge requiring both heuristic guidance (as candidates proliferate with depth) and credit assignment (as path quality emerges over extended sequences). Frontier LLMs, strong on knowledge/reasoning benchmarks, offer a compelling source of such heuristics, yet their knowledge comes sans guarantees and compositional performance degrades as chains lengthen. We thus present TESSERA, a 3-part neuro-symbolic framework that uses LLMs in a circumscribed role: for local discriminative judgement rather than autonomous multi-step generation; the knowledge graph then defines the hypothesis space enforcing hard structural constraints, and MCTS coordinates the long-horizon search with principled credit assignment via backpropagation. LLMs perform dual roles as a prior policy biasing exploration and a comparative state evaluator supplying reward signals. Evaluation on drug mechanism elucidation across two complementary knowledge graphs demonstrates fidelity to curated biology while surfacing coherent alternative mechanisms, with ablations confirming discriminative contribution from both LLM components. Beyond its current application, our framework offers a general paradigm for compositional reasoning over structured knowledge.
    LLM in medicineLLM in medicineMedical knowledge representationMedical knowledge representationExplainable AIExplainable AI
  796. #AI4H239

    Optimizing Resource-Constrained Non-Pharmaceutical Interventions for Multi-Cluster Outbreak Control Using Hierarchical Reinforcement Learning

    Xueqiao Peng, Andrew Perrault
    Non-pharmaceutical interventions (NPIs), such as diagnostic testing and quarantine, are crucial for controlling infectious disease outbreaks but are often constrained by limited resources, particularly in the early outbreak stages. In real-world public health settings, resources must be allocated across multiple outbreak clusters that emerge asynchronously, vary in size and risk, and compete for a shared resource budget. We define a cluster as a group of close contacts generated by a single infected index case. Thus, decisions must be made under uncertainty and heterogeneous demands while respecting operational constraints. We formulate this problem as a constrained restless multi-armed bandit and propose a hierarchical reinforcement learning framework. A global controller learns a continuous action cost multiplier that adjusts global resource demand, while a generalized local policy estimates the marginal value of allocating resources to individuals within each cluster. We evaluate the proposed framework in a realistic agent-based simulator of SARS-CoV-2 with dynamically arriving clusters. Across a wide range of system scales and testing budgets, our method consistently outperforms RMAB-inspired and heuristic baselines, improving outbreak control effectiveness by 20--30%. Experiments on up to 40 concurrently active clusters further demonstrate that the hierarchical framework is highly scalable and enables faster decision-making than the RMAB-inspired method.
    Public healthPublic health
  797. #AI4H266

    Med-StepBench: A Hierarchical Reasoning Framework for Evaluating Hallucinations in Medical Vision-Language Models

    Minh Khoi Nguyen, Dai Lam Le, Amir Reza Jafari, Tuan Dung Nguyen, Hong Son Mai, Huy Thong Mai, Quang Huy Nguyen, Thanh Trung Nguyen, Reza Farahbakhsh, Noel Crespi, Phi Le Nguyen
    Large vision-language models (VLMs) demonstrate strong performance in medical image understanding, but frequently generate clinically plausible yet incorrect statements, raising significant safety concerns. Existing medical hallucination benchmarks primarily focus on 2D imaging with one-shot diagnostic questions, offering limited insight into whether predictions are grounded in correct localization and abnormality identification, allowing critical reasoning errors to remain hidden behind seemingly correct diagnoses. We introduce Med-StepBench, the first large-scale benchmark for step-wise hallucination detection in 3D oncological PET/CT, comprising over 12,000 images and more than 1,000,000 image–statement pairs across volumetric and multi-view 2D data, which decomposes clinical reasoning into four expert-designed diagnostic stages. Using clinician-verified annotations, we perform the first step-level evaluation of general-purpose and medical VLMs, revealing systematic failure modes obscured by aggregate accuracy metrics. Furthermore, we show that current VLMs are highly susceptible to adversarial yet clinically plausible intermediate explanations, which significantly amplify hallucinations despite contradictory visual evidence. Together, our findings highlight fundamental limitations in grounding multi-step clinical reasoning and establish Med-StepBench as a rigorous benchmark for developing safer and more reliable medical VLMs.
    Automated reasoning in clinical domainsAutomated reasoning in clinical domainsMultimodal dataMultimodal dataMedical imagingMedical imaging
  798. #AI4H270

    TangentFuse: Low-Latency MEG Speech Activity Detection via Riemannian Covariance - CNN Fusion

    Aryan Mangla, Priyanshu Vij, Raj Mehta
    Speech activity recognition in MEG-based non-invasive BCI systems provides a reliable speech gate that can trigger downstream decoders only when speech-related neural activity is present. Such a gate can help users interact with assistive devices in continuous settings. While MEG provides excellent temporal resolution, many MEG speech activity classifiers do not fully exploit available spatial information. We describe a hybrid MEG speech/non-speech classifier that combines a geometry-aware covariance branch with temporal neural streams. We compute shrinkage covariance matrices (SPD) and map them to a Riemannian tangent space around a reference mean; a logistic regression classifier operates on these features. A limited sensor array defines the region-of-interest component, while the final system adds a residual temporal fusion layer over aligned probability streams. The final decision rule, including fusion weights, thresholds, and sequence-level post-processing, is selected on validation only and then applied unchanged to frozen test. On a large within-subject MEG corpus, the final system achieved validation macro-F1 of 0.8972 and frozen-test macro-F1 of 0.8914. This provides a compute-efficient research prototype for MEG-based speech gating.
    Timeseries predictionTimeseries predictionMedical imagingMedical imagingHealth data miningHealth data miningMultimodal dataMultimodal data
  799. #AI4H271

    Structure-Aware Contrastive Learning for Biomedical Embeddings: Bridging the Gap Between HPO and Clinical Literature

    Jose Luis Mellina Andreu, Alejandro Cisterna García, Juan Botía
    Large Language Models (LLMs) are extensively used at biomedical text processing but often fail to capture the complex, functional relationships encoded in expert knowledge graphs like the Human Phenotype Ontology (HPO). This "semantic gap'" limits their utility in precision medicine tasks such as rare disease diagnosis, where distinguishing overlapping clinical presentations requires understanding underlying pathophysiological connections rather than just surface-level textual similarity. In this work, we propose a Neuro-Symbolic Alignment Framework that bridges this separation by integrating literature-mined specialized phenotypical descriptions with the ontological structure used as reference. Specifically, we augment phenotype representations with automatically selected text fragments from massive corpus of descriptions mined from scientific literature (PubMed), overcoming the typical data scarcity of standard ontology definitions. We define a new embedding adaptation procedure whose fine-tuning approach is guided by a novel "Disease-Overlap" similarity measure, which prioritizes clinical co-occurrence of phenotypes over taxonomic distance, and optimizes the embedding space using AnglE Loss to mitigate gradient saturation. Extensive evaluations show that our approach significantly outperforms state-of-the-art baselines, including SapBERT, on both intrinsic semantic correlation and practical downstream tasks, including synthetic patient disease ranking and solving real cases stored in Phenopacket, where our model achieves x4 top-1 accuracy than the previous best model.
    Clinical decision support systemClinical decision support systemBiomedical NLPBiomedical NLPLLM in medicineLLM in medicineMedical knowledge representationMedical knowledge representationComputational phenotypingComputational phenotyping
  800. #AI4H272

    ST-BiT: Spatio-Temporal Bipartite Transformer Network for Interaction-Preserving EEG-Based Dementia Subtyping

    Siddhant Ujjain, Vaibhav Kagathara, Pooja Singh, Tapan Gandhi, Sandeep Kumar
    EEG-based dementia classification often degrades under clinically realistic subject-wise evaluation due to non-stationarity and large inter-subject variability. A key modeling limitation is relation compression: many EEG-GNN pipelines encode functional connectivity as scalar edge weights and blur interaction structure during message passing, while models may also exploit subject-specific cues. We propose ST-BiT, a Spatio-Temporal Bipartite Transformer that represents electrodes as node tokens and functional interactions as explicit, learnable edge tokens. Edge tokens preserve pairwise coupling patterns and are updated only from their endpoint electrodes via incidence-masked cross-attention, while electrode tokens aggregate information only from incident edges. Window-wise sparse graphs are constructed from time-sample correlations of band-limited signals to sparsify and initialize edge tokens. ST-BiT combines this interaction-preserving backbone with temporal self-attention, lightweight band attention, and domain-adversarial alignment to reduce subject bias. On OpenNeuro ds006036 (eyes-open with photic stimulation), using leakage-free subject-wise stratified 5-fold cross-validation with nested model selection, ST-BiT achieves 93.0% accuracy for CN vs. (AD+FTD) and 76.1% for CN/AD/FTD, outperforming classical ML and GNN baselines under identical folds. To assess robustness across recording states and align with prior work on this cohort, we also evaluate on the ds004504 (eyes-closed) dataset.
    Medical diagnosisMedical diagnosisHealth data miningHealth data miningTimeseries predictionTimeseries predictionMultimodal dataMultimodal dataExplainable AIExplainable AI
  801. #AI4H283

    When Pulling Fails: Understanding and Alleviating SDF Collapse in Sparse Freehand Ultrasound Reconstruction

    Jiuan Chen, Song Lai, zhao mingyang, Gaofeng Meng
    Despite being a cost-effective modality for volumetric imaging, freehand three-dimensional (3D) ultrasound produces inherently sparse data due to the significant elevational gaps left by tracked 2D sweeps. This sparsity poses a unique challenge for Implicit Neural Representations (INRs). While successful in other domains, INRs applied here tend to fail as the learned signed distance fields (SDF) collapse toward zero inside the object, leading to the loss of concavities and the incorrect closure of anatomical gaps. Our analysis identifies the root cause as a statistical bias in gradient-based sampling objectives, showing that symmetric volumetric sampling mathematically drives the expected SDF value to zero. To rectify this, we present a geometry-aware framework that explicitly anchors the non-negative half-space. Our method utilizes a boundary-directed exterior sampling strategy to ensure non-negative constraints in empty areas, complemented by an ellipsoid-based adversarial mechanism to regularize the global field distribution. Experiments on multiple anatomical datasets demonstrate that our approach mitigates field collapse and improves geometric fidelity and topological consistency on most metrics. Code is available at https://github.com/jiuanchen/Pulling-Fails-SDF.
    Medical imagingMedical imagingAI4HSelf-supervised learningAI4HMultimodal data
  802. #AI4H284

    SynCABEL: Synthetic Contextualized Augmentation for Biomedical Entity Linking

    Adam Remaki, Christel Gérardin, Eulàlia Farré-Maduell, Martin Krallinger, Xavier Tannier
    We present SynCABEL (Synthetic Contextualized Augmentation for Biomedical Entity Linking), a framework that addresses a central bottleneck in supervised biomedical entity linking (BEL): the scarcity of expert-annotated training data. SynCABEL leverages large language models to generate context-rich synthetic training examples for all candidate concepts in a target knowledge base, providing broad supervision without manual annotation. We demonstrate that SynCABEL, when combined with decoder-only models and guided inference establish new state-of-the-art results across three widely used multilingual benchmarks: MedMentions for English, QUAERO for French, and SPACCC for Spanish. Evaluating data efficiency, we show that SynCABEL reaches the performance of full human supervision using up to 60% less annotated data, substantially reducing reliance on labor-intensive and costly expert labeling. Finally, acknowledging that standard evaluation based on exact code matching often underestimates clinically valid predictions due to ontology redundancy, we introduce an LLM-as-a-judge protocol. This analysis reveals that SynCABEL significantly improves the rate of clinically valid predictions. Our synthetic datasets, models, and code are released to support reproducibility and future research.
    • HuggingFace Datasets & Models
    • GitHub Repository
    Health data miningHealth data miningEHR analysisEHR analysisBiomedical NLPBiomedical NLPLLM in medicineLLM in medicineMedical knowledge representationMedical knowledge representation
  803. #AI4H287

    Hierarchical Conditional Energy Modeling for Medical Vision–Language Pretraining

    Chengsheng Mao, Yuan Luo
    Contrastive vision–language pretraining models such as CLIP align images and text in a shared embedding space but do not explicitly model or evaluate the hierarchical semantics common in medical image interpretation. We propose HCE-CLIP (Hierarchical Conditional Energy CLIP), a vision–language pretraining framework that formulates medical image–text alignment as a hierarchical label-conditional energy modeling problem. HCE-CLIP encodes an image series using transformer-based aggregation and aligns it with free-text reports and structured label state prompts across multiple semantic levels. At each level, conditional energy functions favor clinically consistent label states while suppressing contradictory alternatives, enabling uncertainty-aware inference. To assess semantic coherence, we introduce a hierarchical contradiction-based metric that quantifies logical inconsistencies between fine-grained disease predictions and higher-level clinical summaries. Experiments on MIMIC-CXR and other public benchmarks show that HCE-CLIP outperforms existing medical vision–language pretraining methods in seen-label, zero-shot and linear-probe settings, while producing substantially fewer hierarchical contradictions.
    Multimodal dataMultimodal dataBiomedical NLPBiomedical NLPLLM in medicineLLM in medicineMedical imagingMedical imaging
  804. #AI4H293

    FediLoRA: Practical Federated Fine-Tuning of Foundation Models Under Missing-Modality Constraints

    Lishan Yang, Wei Emma Zhang, Nam Kha Nguyen, Po Hu, Yanjun Shu, Weitong Chen, Sim Mong Yuan
    Federated Learning with LoRA fine-tuning offers an efficient and privacy-aware solution for institutions to collaboratively leverage their large datasets to train VLLMs. However, participating institutions often possess heterogeneous computational resources, resulting in imbalanced LoRA ranks, which pose a major challenge for effective collaboration. In addition, real-world applications in domains such as healthcare and transportation frequently suffer from missing modalities due to user mistakes or device failures, which significantly degrade global model performance in federated settings. To the best of our knowledge, no prior work has addressed these two challenges simultaneously in federated VLLMs. To tackle these issues, we propose FediLoRA, a lightweight federated LoRA aggregation framework that effectively mitigates the impact of missing modalities in heterogeneous environment. FediLoRA is explicitly motivated by the observation that simple averaging and structured editing can jointly benefit both global and personalized models. Our approach achieves strong performance across multiple general-domain and medical-domain benchmark datasets. Additional experiments on healthcare data further demonstrate that FediLoRA is well-suited for practical, real-world deployment scenarios. Our code is released at https://github.com/gotobcn8/FediLoRA.
    Federated learningFederated learningLLM in medicineLLM in medicineAI4HMultimodal data
  805. #AI4H298

    SAM-GPT: Hilbert Curve Enhanced Mamba for Brain Lesion Segmentation and VLM-based Analysis

    Jinfu Wang, Qiyuan Wang, Yunfei Liang, Kaipeng Wang, Jinhua Zhao
    Recent breakthroughs in Vision-Language Models (VLMs) have shown their capabilities in medical analysis tasks, but they remain limited in the brain lesion image domain, especially when pathological regions occupy a small portion of the image. The problem arises because VLMs are prone to put excessive attention on background regions that are visually similar to target regions. Based on the findings, we propose SAM-GPT, a novel framework that leverages segmentation-derived spatial priors to support VLM-based lesion classification. The framework first employs an enhanced segmentation model to localize pathological regions for diagnostic task, and then converts lesion attributes (e.g., volume size, pixel range, lesion location) into linguistic guidance for a vision–language model. To enhance small lesion recognition, we incorporate a new Hilbert scanning method into Mamba that improves both local spatial continuity and global spatial modeling which is critical for identifying subtle pathological regions. Experiments on benchmark datasets show that our model achieves an average Dice coefficient of 72.80% on the brain lesion segmentation task and an accuracy of 80.56% on the brain disease classification task, which indicates the effectiveness of the proposed framework.
    AI4HMultimodal dataAI4HLLM in medicineAI4HMedical imagingAI4HMedical diagnosis
  806. #AI4H302

    Shapley Regression for Rare Disease Diagnosis Support: A Case Study on APDS

    Safa Alsaidi, Tomás Brogueira, Nizar Mahlaoui, Marc Vincent, Guilherme Pelegrina, Nicolas Garcelon, Adrien Coulet, Miguel Couceiro
    Activated PI3Kδ Syndrome (APDS) is a rare genetic immune disorder caused by variants in PIK3CD or PIK3R1, with highly heterogeneous symptoms that often delay diagnosis. Early recognition is hampered by overlapping clinical presentations and limited clinician awareness, motivating systematic, data-driven approaches to detect APDS-associated phenotypic patterns in routine electronic health records. Traditional linear scoring systems cannot capture complex symptom interactions, while deep learning models, though expressive, often lack interpretability. To bridge this gap, we propose Shapley regression, a novel game-theoretic model replacing the linear predictor with a k-additive cooperative game, explicitly modeling co-occurrence of symptoms while maintaining the transparency and convexity of logistic regression. We carry out an empirical study of our lightweight method on eight public biomedical datasets, showing that a 2-additive model with ℓ2 regularization achieves an optimal trade-off between predictive power and noise robustness. We apply it to a real-world cohort of 222 patients, on which Shapley regression accurately distinguished APDS cases from matched controls, confirming and validating phenotypes known to be associated with APDS, and facilitating the exploration of pairwise interactions between symptoms, validated by clinical experts.
    Clinical decision support systemClinical decision support systemExplainable AIExplainable AIAI4HEHR analysisComputational phenotypingComputational phenotyping
  807. #AI4H306

    Singly-Connected Multiple Minimal Networks for Efficient Temporal Reasoning About Multiple Clinical Guideline Instantiations

    Andrea Terenziani
    Temporal constraints are an intrinsic component of most clinical guidelines. Several approaches to computerized clinical guidelines (CIGs) offer temporal reasoning facilities to support the execution of a CIG for a specific patient, mostly based on bounds on differences and on the Simple Temporal Problem framework. However, in scheduling activities (e.g., within a hospital), it is necessary to consider multiple executions of CIGs for different patients, which can also be (partly) related to each other. In this work we extend current temporal reasoning techniques to apply to such a context. We propose a new temporal constraint model, prove its properties, and exploit them to provide efficient management of patients' temporal constraints, and to support efficient query answering. We also propose an experimental evaluation, demonstrating the step forward with respect to current approaches. Notably, our approach is general, and can apply to all temporal reasoning problems having the same topology of the "multiple CIG execution" problem.
    Automated reasoning in clinical domainsAutomated reasoning in clinical domainsMedical knowledge representationMedical knowledge representation
  808. #AI4H319

    OrthKD: Extracting Generalized Clinical Knowledge from Heterogeneous Teachers for Lightweight Deployment

    Yi Xu, Cheng Chen, Mufan Cao
    Deploying diabetic retinopathy (DR) screening models in primary care requires edge-efficient systems that remain accurate, safe, and reliable under domain shift. Multi-teacher knowledge distillation (KD) is a natural compression strategy, but existing approaches largely assume that all teachers provide equally trustworthy supervision. In our setting, this assumption fails: a strong CNN teacher (EfficientNet-B3, 0.876 QWK) and a weaker Transformer teacher (Swin-Base, 0.830 QWK) are complementary, yet the Transformer's logits can still mislead the student. We therefore propose OrthKD, a selective-trust distillation framework that transfers full supervision from the strong CNN, uses feature-only distillation from the weak ViT, and enforces orthogonality between teacher-specific student projections to encourage complementary rather than redundant evidence. This design preserves local lesion precision, injects global structural context, and improves robustness to distribution shift. On 132,049 retinal images, a 5.4M-parameter MobileNetV3 student reaches 0.885 QWK on EyePACS and improves zero-shot Messidor-2 performance from 0.507 to 0.728 QWK, while also achieving strong referral AUC and calibration. These results show that selectively distilling heterogeneous teachers can enable practical DR screening on resource-constrained devices.
    Clinical decision support systemClinical decision support systemMedical diagnosisMedical diagnosisMedical imagingMedical imagingTelehealthTelehealthPublic healthPublic health
  809. #AI4H329

    THGAgents: Traceable Biomedical Hypothesis Generation via Dynamic Causal Reasoning

    Mingjian Yang, Kun Dong, Kevin Lim, Juan Liu
    While Large Language Models (LLMs) offer promise in scientific discovery, leveraging LLMs to drive biomedical research requires the scientific discovery process to be performed in combination with cutting-edge biomedical research and rigorous mechanistic causal chains. As such, both current Retrieval-augmented generation methods lacking causal reasoning capabilities, and the static traditional knowledge graphs failing to reflect evolving scientific knowledge, present obstacles to utilizing LLMs as scientific discovery tools. In response to these ongoing challenges, we present THGAgents. THGAgents utilize collaborative and dynamically updating agents to build a Traceable Causal Knowledge Graph, which serves as the foundation for the evidence-based knowledge structure. Crucially, we employ an LLM-driven heuristic search algorithm to traverse the complex network, balancing both novelty and rigor to deduce strict, evidence-based mechanistic causal chains. Additionally, THGAgents utilize a generator-critic loop to support hypothesis refinement. In experimental benchmarks across both cancer systems and neuroscience, THGAgents achieved up to a 0.80 hit rate in predicting validated scientific discoveries, providing an almost 9.5% increase in hypothesis quality scores versus current state-of-the-art systems, and decreasing the mechanistic hallucination rate to 1.12%. Our code is available at https://github.com/yangCode-res/THGAgents/.
    LLM in medicineLLM in medicineMedical knowledge representationMedical knowledge representation
  810. #AI4H347

    Multiscale-adaptive and Size-adaptive PSO-based Feature Selection for Gene Expression Analysis

    Weihao Deng, Lingyun Zhao, Fei Han
    In gene expression analysis, high dimension low sample size data limits the applicability of deep learning, motivating increasing interest in variable-length evolutionary algorithm–based feature selection methods (VLEAs). However, existing VLEAs suffer from unreliable single-metric discrimination under dynamic search-space variation, mismatches between population size and search space dimensionality, and particle performance degradation after search space changes. To this end, a Multiscale-adaptive and Size-adaptive Particle Swarm Optimization (MASA-PSO) is proposed for gene expression analysis. MASA-PSO adopts a multiscale-adaptive weighting framework to explore feature subsets that distinguish between-class sample distributions during search spaces changes, and theoretically proves it enables the collaborative evaluation of multiple metrics. Meanwhile, it proposes an adaptive population division that explicitly models the functional relationship between the population size and the search space to resolve the mismatch. Furthermore, a particle degradation phenomenon impairing the performance of VLEAs is observed and alleviated through a hybrid elite strategy. Experiments on ten gene expression datasets verify that MASA-PSO outperforms state-of-the-art methods in classification accuracy while capturing smaller feature subsets.
    Health data miningHealth data miningGenomic data analysisGenomic data analysis
  811. #AI4H367

    Structured Modality-Aware Token Interaction for Multimodal Medical Imaging

    Selene Tomassini, Hafiza Ayesha Hoor Chaudhry, Alessandro Galdelli, Paolo Giorgini
    Multimodal medical imaging benefits from the global context modeling of transformers, yet most existing models fuse modalities implicitly by channel concatenation, leaving cross-modal interaction
    unstructured or relying on costly multistream cross-attention. We propose Modality-Aware Token Interaction (MATI), an architecturally lightweight and backbone-agnostic module that structures multimodal interaction within a single token stream by partitioning embedding channels into modality-aligned subspaces. MATI performs modality-preserving intra-subspace self-attention and gated global mixing for controlled inter-subspace exchange. We instantiate MATI in UNETR and introduce two architectures: ModaUNETR^S refines selected skip-token representations, and ModaUNETR^E injects MATI into the transformer encoder to progressively shape the token hierarchy. Experiments on the BraTS 2020 benchmark employ five-fold cross-validation and report segmentation and efficiency metrics on the official training and validation sets. Both models consistently improve over UNETR across all tumor subregions, with larger gains for tumor core and whole tumor, and ModaUNETR^E further improves upon ModaUNETR^S. An ablation study confirms monotonic gains from structured interaction. Compared with heavier transformers, the proposed models achieve a favorable accuracy-efficiency trade-off without modality-specific encoders or quadratic cross-attention, supporting structured multimodal interaction as a first-class architectural principle. Implementation of MATI and its instantiations is available at https://github.com/S3l11/MATI.
    Clinical decision support systemClinical decision support systemMultimodal dataMultimodal dataMedical knowledge representationMedical knowledge representationMedical imagingMedical imaging
  812. #AI4H377

    X-FEMR: A Token-level Explainable Approach for Electronic Health Records Foundation Models using Transformer-based Models

    Jie Huang, Pengfei Yin, Zihan Xu, Daniel Capurro, Mike Conway, Ting Dang
    Foundation Models for Electronic Health Records (FEMRs) are pretrained on large-scale structured patient data, enabling them to convert longitudinal patient trajectories into generalizable representations for diverse clinical prediction tasks. Despite their effectiveness, FEMRs remain black-box models, raising concerns about bias, interpretability, and clinical trust. To address this, we propose the first token-level explainability approach for FEMRs. We train a Transformer-based surrogate model on input-output pairs from the FEMR across two prediction tasks, approximating its behavior while preserving temporal dynamics. We identify the most influential tokens, providing insights into how FEMRs leverage different aspects of patient history for predictions. To evaluate clinical relevance, we introduce a novel clinical alignment metric that quantifies the correspondence between the surrogate model’s key tokens and clinically validated features. Our results demonstrate that the surrogate closely approximates FEMR predictions and that token-level explanations align well with clinical knowledge, offering a practical framework for interpretable and trustworthy clinical AI.
    Clinical decision support systemClinical decision support systemHealth data miningHealth data miningEHR analysisEHR analysisMultimodal dataMultimodal dataExplainable AIExplainable AI
  813. #AI4T17

    Safe and Efficient Control: A Subgraph-Augmented Hierarchical Reinforcement Learning Framework for Dynamically Reconfigurable Battery Systems

    Kai Xie, Jingwei Hu, Ri Huang, Xiaodong Li, Yanglin Zhou, Song Ci, Jun Cheng, Zhihong Zhang
    Dynamically Reconfigurable Battery (DRB) systems employ power electronic switches to create dynamic topologies. They enable effective management of cell difference through real-time adjustment of cell connections. However, existing DRB control methods struggle to learn effective strategies due to sparse rewards, which arise from blind exploration in large topological action spaces and complex operational constraints. This leads to insufficient policy learning, making safety and balancing performance difficult to ensure in practical applications. To this end, we propose a Subgraph-Augmented Hierarchical Reinforcement Learning (SAHRL) framework. By combining hierarchical policies with topological structural knowledge, SAHRL effectively accelerates policy exploration and mitigates reward sparsity. Specifically, the high-level policy determines the strategic direction, while the subgraph-augmented low-level policy refines actions to meet operational constraints. The topological structural knowledge, extracted in the form of subgraphs and incorporated as an inductive bias, helps the agent focus on meaningful action patterns and reduce invalid exploration in the large action space. Extensive simulations and real-world experiments show that SAHRL achieves safe and efficient balancing. Notably, it increases the energy release by 10.56% compared to conventional methods in real-world applications.
    Domain-specific AI4TechOther AI4Tech applicationsEmerging AI4Tech Emerging AI4Tech areas
  814. #AI4T22

    DEPLOY-RL: Active Boundary Discovery and Conservative Certification for Deployable Reinforcement Learning in Safety-Critical Continuous Processes

    Yeojin Jang, Minu Baek, Gihun Gil, Minsung Jung, Beomdo Park, Woohyeon Kwon, Hyeonseok Jang, Junseong Park, Neha Sengar, Andres Saurez, Sangkeum Lee
    Reinforcement learning (RL) policies often outperform classical controllers in simulation, yet rarely reach production in safety-critical processes. The barrier is that there is no principled way to answer “Is this policy safe to deploy?” with statistical guarantees. We introduce DEPLOY-RL, a post-training certification framework built on one key insight: deployment certification requires discovering where failures occur (boundary discovery), not measuring how much everywhere (uniform reconstruction). Our contributions: (1) a contract-coupled acquisition function that concentrates sampling on certification-critical boundaries, achieving ≈ 2× sample efficiency with a semi-empirical ambiguity reduction bound (domain-calibrated convergence guarantee); (2) conformal risk control providing finite-sample false-go guarantees (≤ α) under explicit deployment contracts; (3) a three-way decision framework (Deploy/No-Deploy/Abstain) with fail-safe PID/MPC fallback. In simulations on papermaking (industrial digital twin) and Tennessee Eastman (public benchmark), DEPLOY-RL achieves 4.4% false-go rate (vs. 8.6% for the best baseline) while retaining 88.6% policy coverage—the only method achieving <5% false-go with >85% coverage among 14 baselines under our evaluation protocol.
    Advanced AI4TechAI4Tech foundationsAdvanced AI4TechData-driven AI4TechDomain-specific AI4TechAI4ManufacturingDomain-specific AI4TechAI4Safety
  815. #AI4T43

    Translating Latent Representations for Money Laundering Detection

    Ramon Rico, Ioana Hulpus, Stan Leisink, Boyang Zhao, Yannis Velegrakis
    Anti-money laundering (AML) systems are important for safe economic trade and for the fight against financial crime. Recently, a number of AML algorithms based on graph neural networks (GNNs) and graph transformers (GTs) have been proposed. Compared to traditional machine learning solutions, these methods have been shown to achieve significantly better detection results. Yet, the state-of-the-art AML algorithms have a key limitation: they fail to jointly address money laundering classification and money laundering sub-network discovery, despite their strong theoretical connection. To bridge this gap, we propose a translation-based AML system (TAML) that is capable of jointly solving both problems within the same latent space. Our extensive experimental evaluation on multiple datasets demonstrates the superiority of TAML over the state-of-the-art in both tasks.
    Domain-specific AI4TechAI4Finance
  816. #AI4T44

    DUALFloodGNN: Physics-informed Graph Neural Network for Operational Flood Modeling

    Carlo Malapad Acosta, Herath Mudiyanselage Viraj Vidura Herath, Jia Yu Lim, Abhishek Saha, Sanka Rasnayaka, Lucy Marshall
    Flood models inform strategic disaster management by simulating the spatiotemporal hydrodynamics of flooding. While physics-based numerical flood models are accurate, their substantial computational cost limits their use in operational settings where rapid predictions are essential. Models designed with graph neural networks (GNNs) provide both speed and accuracy while having the ability to process unstructured spatial domains. Given its flexible input and architecture, GNNs can be leveraged alongside physics-informed techniques with ease, significantly improving interpretability and generalizability. We introduce a novel flood GNN architecture, DUALFloodGNN, which embeds physical constraints at both global and local scales through explicit loss terms. The model jointly predicts water volume at nodes and flow along edges through a shared message-passing framework. To improve performance for autoregressive inference, model training is conducted with a multi-step loss enhanced with dynamic curriculum learning. Compared with standard GNN architectures and state-of-the-art GNN flood models, DUALFloodGNN achieves substantial improvements in predicting multiple hydrologic variables (e.g., water volume, flow, and depth) while maintaining high computational efficiency. The model is open sourced at https://github.com/acostacos/dual_flood_gnn. The dataset is open sourced at https://doi.org/10.25910/9xav-0s86.
    Advanced AI4TechData-driven AI4TechAdvanced AI4TechDeep AI4TechAI4Tech infrastructure/systemsAI social systemsDomain-specific AI4TechAI4Home and AI4CityDomain-specific AI4TechAI4Safety
  817. #AI4T45

    Bridging the Data Scarcity in Venous Thromboembolism Detection: A Deep Learning Framework for Large-scale Irregular Clinical Time Series

    Can Xu, Runze Yang, Xinni Xiang, Yongtao Wu, Yaqin Huang, Haike Lei, Jie Yang
    Venous thromboembolism (VTE) is a common and life-threatening complication in cancer patients after treatment. Early risk assessment and detection of VTE primarily rely on clinical indicators, such as blood test results. However, existing studies are limited to static or snapshot-based models, failing to capture the evolving dynamics of disease progression, as deep time-series modeling is hindered by the lack of longitudinal clinical data. To address this gap, we introduce CliTsVTE, a large-scale clinical time-series dataset curated for VTE modeling, comprising 501,063 samples from 26,022 patients over seven years across nine cancer types. The dataset contains continuous time gaps between consecutive time points. Unlike many benchmarks, CliTsVTE reflects real-world clinical settings and presents unique challenges in continuous irregular time-series modeling with long-term irregularity and varying data granularity, which makes missingness significantly consequential. To tackle this, we propose a deep learning framework integrating multiple sequential backbones with an adversarially regularized autoencoder (ARAE) that learns latent representations to eliminate missingness. Experiments on CliTsVTE show that our best model achieves 88.7% accuracy and an AUC of 0.952, significantly outperforming traditional time-point models and regular time-series benchmarks. These results establish a strong benchmark for deep modeling of continuous irregularity in clinical time-series data and highlight the potential of AI-driven large-scale clinical datasets in solving real-world medical research challenges.
    Advanced AI4TechData-driven AI4TechAdvanced AI4TechDeep AI4TechDomain-specific AI4TechAI4Care and AI4HealthDomain-specific AI4TechAI4Biotech
  818. #AI4T55

    Spherical Physics-Informed Neural Operator with Multi-Scale Coupling for Meteorological Downscaling

    Yiqiang Ye, Yichi Wang, Jiawei Wen, Jiahui Jiang, Zhaoyu Zhong, Jiangjian Yu, Chunxia Xiao, Haodi Zhang
    Meteorological downscaling is crucial for high-resolution regional climate forecasting and disaster early warning. While neural operators have emerged as a promising paradigm for modeling complex spatiotemporal mappings, existing frameworks often struggle with spherical manifold geometric distortions, inherent atmospheric multi-scale coupling mismatches, and lack of explicit atmospheric laws. We propose the Spherical Physics-informed Neural Operator, which utilizes a Spherical Laplacian Decomposition to partition atmospheric fields into hierarchical frequency components, maintaining exact point-wise correspondence across scales. To evaluate these representations at arbitrary locations, we introduce a localized spherical integral operator that approximates continuous kernel transforms via geometry-aware attention. Dynamical consistency is further enforced by embedding differentiable constraints into the learning process. Extensive experiments demonstrate that our framework attains superior accuracy and zero-shot generalization across various meteorological variables and unseen queries, representing a robust and interpretable solution for global-to-regional meteorological downscaling.
    Domain-specific AI4TechOther AI4Tech applications
  819. #AI4T68

    Leveraging Implicit Contexts via LLM–Graph Fusion for Temporal Knowledge Graph Reasoning

    Zeshu Tian, Junjie Tao, Hongli Zhang
    Temporal knowledge graph (TKG) reasoning is critical for modeling and forecasting the evolution of real-world events. Existing TKG construction pipelines transform raw text into structured temporal quadruples as graph facts. However, in this process, they often fail to preserve reasoning-relevant contextual semantics from the original corpus that cannot be explicitly represented as graph facts, leaving such implicit contextual information unused by current temporal reasoning models. To address this limitation, we propose a Textual-Temporal Graph Fusion Network (TTGFN), a context-aware framework that leverages implicit contexts from text via LLM-based semantic encoding and fuses it with structure-constrained temporal graph representations for reasoning. To the best of our knowledge, this is the first work to systematically leverage LLMs to reuse previously overlooked implicit contextual information and incorporate it into temporal knowledge graph reasoning, substantially improving model performance. Extensive experiments conducted on three TKG reasoning benchmark datasets demonstrate that TTGFN outperforms the state-of-the-art approaches, with Hits@1 gains of 20.85% on ICEWS14 dataset, 31.14% on ICEWS05-15 dataset, and 24.22% on ICEWS18, respectively.
    Advanced AI4TechData-driven AI4TechAdvanced AI4TechGenerative and LLMs-driven AI4Tech
  820. #AI4T79

    Beyond Isolated Investor: Predicting Startup Success via Roleplay-Based Collective Agents

    Zhongyang Liu, Haoyu Pei, Xiangyi Xiao, Xiaocong Du, Yihui Li, Suting Hong, Kunpeng Zhang, Haipeng Zhang
    Due to the high value and high failure rates of startups, predicting their success is a critical challenge. Existing approaches typically model startup success from a single decision-maker's perspective, overlooking the collective dynamics that dominate real-world venture capital (VC) decision-making. We propose SimVC-CAS, a collective agent system that simulates VC decisions as a multi-agent interaction process. By designing role-playing agents and a GNN-based supervised interaction module, we reformulate startup financing prediction as a group decision-making task, capturing both enterprise fundamentals and investor network dynamics. Each agent represents an investor with distinct traits and preferences, enabling heterogeneous evaluations and realistic information exchange over a graph-structured co-investment network.
    Using both proprietary and public VC data with strict anti-leakage controls, we show that SimVC-CAS significantly improves predictive performance, achieving approximately 25% relative improvement in average precision@10, while exhibiting consistency with real investor decisions. The interaction mechanism is particularly effective for network-central startups, confirming the importance of network in VC decision-making. Analysis of agents' reasoning for decision changes further reveals how network environment influence decision quality, demonstrating the system's interpretability. Our approach may generalize to broader group decision-making scenarios. Our code is available at https://github.com/ZhangDataLab/SimVC-CAS.
    Domain-specific AI4TechAI4Finance
  821. #AI4T95

    Conspiracy Spoofing Detection via Structure-Augmented Generative Graph Model

    Sheng Xiang, Ziwen Xu, Yidong Jiang, Dawei Cheng, Hui Zhao
    Detecting spoofing in financial trading is a critical data mining task. Traditional machine learning models often focus on individual node features, failing to capture the contextual relationships among interconnected nodes. Graph-based methodologies have enhanced this by effectively integrating relational data. Recent advancements in fraud detection demonstrate substantial performance gains by incorporating structure information into detection models. However, spoofing transactions often exhibit a distribution shift compared to historical transactions, rendering historical data less effective. Instead, certain trading patterns, such as motif structures, consistently manifest in transaction graphs regardless of distribution shift, providing a robust alternative for analysis. Motif structures, particularly node motifs, are essential for capturing higher-order interactions and structural patterns within transaction graphs. This paper introduces the Structure-Augmented Generative Graph Model (SAG2M) to address the challenge of detecting conspiracy spoofing through a substructure frequency-augmented detection method. Specifically, our approach extracts the frequency of subgraph patterns among neighboring nodes, leveraging an enumeration algorithm to efficiently identify node orbit data. The extracted motif frequencies are then encoded into a structure-augmented generative framework, enabling detailed structural representations of each transaction (node). Subsequently, a temporal and heterogeneous graph generation and aggregation scheme is applied to collect neighborhood node information, uncovering conspiracy spoofing patterns effectively. Our experiments on datasets such as Amazon, Yelpchi, and T-Finance demonstrate that SAG2M outperforms existing models in detection accuracy. A case study focusing on conspiracy spoofing detection further highlights the model’s superior effectiveness in identifying such complex fraudulent behaviors.
    Advanced AI4TechData-driven AI4TechDomain-specific AI4TechAI4Finance
  822. #AI4T98

    HieraMix: A Hierarchical MLP-Mixer for Large-Scale Traffic Forecasting

    Yongyao Wang, Xie Yu, Jingyuan Wang, Jiahao Ji, Li Chao
    Traffic forecasting task is significant to modern urban management. Recently, there is growing attention on large-scale forecasting, as it better reflects the complexity of real-world traffic networks. However, existing models often exhibit quadratic computational complexity, making them impractical for large-scale real-world scenarios. In this paper, we propose a novel framework, Spatio-Temporal Hierarchical Mixer (HieraMix), which leverages an all-MLP architecture for efficient and effective large-scale traffic forecasting. HieraMix employs a hierarchical spatiotemporal mixing block to extract multi-resolution features through bottom-up aggregation and top-down propagation. Furthermore, an adaptive region mixer generates transformation matrices based on regional semantics, enabling our model to dynamically capture evolving spatiotemporal patterns for different regions. Extensive experiments conducted on four large-scale real-world datasets demonstrate that the proposed method not only achieves state-of-the-art performance but also exhibits competitive computational efficiency.
    Domain-specific AI4TechAI4Transport
  823. #AI4T104

    LLM-guided Cutting-plane Management for Mixed-integer Linear Programming

    Zetao Zheng, Zhe Wang, Jie Shao
    Cutting planes are central to mixed-integer linear programming (MILP) solving, yet their effectiveness hinges on expert tuning of separator configurations and hard-crafted cut-selection heuristics, creating a high barrier for non-specialists. Learning-based methods can reduce manual effort, but typically require large training datasets and often generalize poorly beyond the instance classes they are trained on. We propose an LLM-guided cutting-plane management framework that integrates large language models into the MILP solving pipeline. First, using chain-of-thought (CoT) prompting, the LLM infers an instance-specific separator configuration, deciding which separators to activate and how to set key parameters from the problem type and structural features. Then, it translates the evolving branch-and-bound state in natural language to perform stage-aware cut selection, choosing a high-quality subset of cuts that tightens the relaxation and improves overall solver performance. Leveraging the LLM's reasoning capabilities and rich background knowledge, our work removes dependence on domain-specific training data and substantially reduces reliance on expert-crafted configurations. Experiments show consistent improvements over SCIP's default settings, hard-crafted heuristics, and recent learning-based cut selection baselines.
    Advanced AI4TechGenerative and LLMs-driven AI4TechAdvanced AI4TechOther advances
  824. #AI4T108

    A Durable Machine Unlearning Framework to Nullify Recall of Sensitive Data on Incremental Training

    Qingqing Cao, Liang Hu, Dora D. Liu, Jiaxing Miao, Cao Jian, Zhongyuan Lai, Wei Cao
    The advancement of data privacy regulations has spurred the development of Machine Unlearning (MU), which is designed to remove the influence of sensitive data from a trained model and results in an unlearned model (ULM). Despite rapid progress in MU techniques, their vulnerabilities remain underexplored, which poses risks due to potential leakage of unlearned information. In realistic scenarios, ULMs always need to be incrementally trained with the newly collected data samples, which can lead to the consequences of recalling sensitive information if the new dataset contains similar or even the same unlearned samples. To address this issue, we devise a Durable Unlearning Enhancement (DUE) framework to avoid restoring unwanted sensitive information from incremental training data samples. The DUE framework has three key components that identify sensitive samples and suppress their gradients to update ULMs. Extensive experiments on state-of-the-art MU methods across multiple real-world datasets show that the proposed DUE framework can effectively nullify the recall of sensitive information after MU, and even improve the performance of ULMs. Consequently, our work establishes a new fundamental research direction in safe training against MU vulnerabilities.
    Advanced AI4TechAI4Tech foundationsAdvanced AI4TechData-driven AI4TechAdvanced AI4TechDeep AI4Tech
  825. #AI4T112

    MAgSeg: Segmentation of Agricultural Landscapes in High-Resolution Satellite Imagery using Multimodal Large Language Models

    Piyush Tiwary, Utkarsh Ahuja, Depanshu Sani, Aishwarya Jayagopal, Sagar Gubbi, Subhashini Venugopalan, Alok Talekar, Vaibhav Rajan
    Agricultural landscape segmentation in the Global South is challenging as it is characterized by fragmented plots, high intra-class variance, and a scarcity of labeled training data. Recent advances in segmentation have been made by Multimodal Large Language Models (MLLMs). However, current approaches encounter critical context length bottlenecks and a domain alignment gap in understanding satellite features. We address these limitations through MAgSeg, a novel, decoder-free MLLM segmentation approach. MAgSeg is an architecturally efficient approach that enables standard MLLMs to perform segmentation of complex smallholder agricultural landscapes from high-resolution satellite imagery, without requiring auxiliary vision decoders. We introduce a novel instruction tuning data format designed to enable scalable fine-tuning and post-training on high resolution satellite imagery, which enables MAgSeg to learn from the global context of the image while generating text tokens for only a patch within the image. Extensive evaluations on datasets spanning three countries in the Global South demonstrate that MAgSeg significantly outperforms state-of-the-art MLLM baselines, offering a scalable solution to map smallholder agricultural environments.
    Domain-specific AI4TechAI4Agriculture
  826. #AI4T120

    Belief-Contraction-Driven Active Inverse Source Localization and Characterization

    Yiwei Shi, Mengyue Yang, Qi Zhang, Cunjia Liu, Weinan Zhang, Weiru Liu
    Active inverse source localization and characterization (ISLC) in dynamic fields requires sequential decision making under partial observability, where a mobile sensor must infer latent source parameters from sparse, noisy readings. We introduce a belief-contraction-driven approach that unifies inference, stopping, and control. An attention-augmented particle filter stabilizes Bayesian belief updates through ESS-based resampling, feature-aware sparse attention smoothing, and Metropolis–Hastings rejuvenation that preserves the filtering posterior. Belief contraction (posterior dispersion) defines both a termination rule and a goal-aligned intrinsic reward, enabling reinforcement learning without distance-to-source shaping. Across seven field modalities, spatial out-of-distribution tests, and nonstationary source shifts, our agent (ATT-PFRL) achieves higher completion, faster convergence, and more accurate localization than planning and RL+Bayes baselines under similar computation. Fixed-trajectory studies also show improved ESS and lower RMSE, isolating the benefit of the inference layer.
    Advanced AI4TechAI4Tech foundationsAdvanced AI4TechData-driven AI4TechAdvanced AI4TechDeep AI4Tech
  827. #AI4T132

    PENTESTLLMAGENT: A Task Dependency Graph Planning-Based Multi-Agent Framework for Automated Penetration Testing

    Shuo Sheng, Jixin Zhang, Jia Yang, Ke Cheng, Zheng Qin
    Fully autonomous IP-to-Root penetration testing remains challenging for LLM agents. We conduct an exploratory study on 10 LLMs and introduce AutoPentest-Bench, an end-to-end benchmark with 13 VulnHub targets and 93 sub-tasks. From 130 interaction logs, we identify three challenges: Rigid Strategy, Contextual Forgetting, and Command Generation Hallucination. To address them, we propose PentestLLMAgent, which integrates a Task Dependency Graph (TDG) for dynamic planning and backtracking; a Hierarchical Multi-Agent Architecture (HMA) with function-calling-based tool invocation, output filtering, and semantic compression, and Executable Knowledge-Guided Command Generation (EKG-CG) for retrieving and executing pre-validated, environment-compatible commands. Evaluations demonstrate strong effectiveness: on end-to-end AutoPentest-Bench, PentestLLMAgent achieves a 77% success rate; on AutoPT’s web exploitation benchmark, it attains a 95% overall pass rate; and on the privilege escalation benchmark, it achieves a 100% success rate. In realistic end-to-end runs, it averages 10.9 minutes, 36 interaction rounds, and 69.6K tokens per target. The code, benchmark, and executable knowledge base are publicly available at https://github.com/sanbai123/PentestLLMAgent_code-and-videos.
    Advanced AI4TechGenerative and LLMs-driven AI4TechDomain-specific AI4TechAI4Security
  828. #AI4T138

    Physics-Guided Geometric Diffusion for Macro Placement Generation

    Jongho Yoon, Jinsung Jeon, Seokhyeong Kang
    Macro placement is a pivotal stage in VLSI physical design, fundamentally determining the overall chip performance. Recent data-driven placement methods have demonstrated significant potential, yet they often struggle to handle sequential dependencies and to balance topological connectivity with physical constraints. To bridge this gap, we propose MacroDiff+, a physics-guided geometric diffusion framework. Specifically, we design a dual-domain denoising architecture that couples topological connectivity encoded by heterogeneous GNNs with global geometric context modeled by a Transformer. Furthermore, we introduce Physics-Guided Sampling, an inference strategy that actively steers the generation using explicit gradients to ensure both statistical plausibility and physical validity. On the ISPD2005 MMS benchmarks, MacroDiff+ outperforms state-of-the-art baselines with a 6.1–6.2% reduction in wirelength. Notably, it exhibits superior stability and scalability on large-scale designs where prior methods fail to converge. The source code is provided at https://github.com/jhy00n/MacroDiff-plus.
    Advanced AI4TechGenerative and LLMs-driven AI4TechAI4Tech infrastructure/systemsAI chips, AI sensors, AI computersDomain-specific AI4TechAI4ManufacturingEmerging AI4Tech Emerging AI4Tech areas
  829. #AI4T139

    MORL-CA: Dynamic Multi-Objective Reinforcement Learning for Chlor-Alkali Process Optimization Under Time-Varying Conditions

    Derun Gan, Renhao Yin, Guangzhi Qu, Feng Zhang
    Chlor-alkali production is a large-scale industrial process whose operating conditions and equipment states evolve over time. Its process optimization requires ongoing trade-offs among conflicting objectives such as product yield, energy consumption, and equipment life. Existing optimization approaches are typically static and must be re-optimized after environmental changes, limiting their real-world applicability. In this work, we model the problem as a dynamic multi-objective sequential decision-making problem that continuously tracks a time-varying Pareto set under changing conditions. We propose MORL-CA, a multi-objective reinforcement learning framework that integrates offline pretraining on historical data with constrained online policy refinement. MORL-CA introduces a state-aware adaptive objective weighting mechanism within a multi-critic actor-critic architecture, enabling localized Pareto-improving policy updates while satisfying operational and safety constraints. Extensive experiments in an environment conducted from real chlor-alkali data demonstrate that MORL-CA achieves superior Pareto solution quality and smoother adaptation to dynamics compared with state-of-the-art multi-objective optimizers and MORL baselines.
    Domain-specific AI4TechAI4ManufacturingDomain-specific AI4TechOther AI4Tech applications
  830. #AI4T149

    BrainCGT: A Brain Graph Transformer for Modeling Causal Connectivity in Neurological Disorder Diagnosis

    Ahsan Shehzad, Dongyu Zhang, Shagufta Abid, Shuo Yu, Xin Zheng, Hongfei Lin, Feng Xia
    Brain connectivity analysis is a fundamental tool for identifying biomarkers and understanding of neurological disorders. Most existing approaches employ graph transformers over undirected functional connectivity networks, which are typically estimated using correlation statistics. Although effective for capturing statistical associations, these models do not represent directed interactions between brain regions that arise from causal relationships. As a result, direction-specific disease mechanisms are not explicitly modeled, and interpretability is often limited. To address this gap, we present BrainCGT, a brain graph transformer designed to model causal connectivity inferred from fMRI time-series data. In this framework, brain networks are modeled as directed graphs with a modular organization, where nodes correspond to individual brain regions and directed edges reflect causal flow of information between them. Direction-aware node representations together with direction-biased attention mechanisms allow the model to capture asymmetric interactions across regions. Experimental results on three large-scale fMRI datasets demonstrate that BrainCGT achieves consistently better performance than existing graph-based methods for neurological disorder classification. In addition, examination of the learned attention structures shows correspondence with established neurobiological pathways, suggesting improved interpretability. These results highlight the importance of incorporating causal directionality into brain graph transformer architectures for robust and interpretable neuroimaging analysis.
    Advanced AI4TechData-driven AI4TechAdvanced AI4TechNeuro AI4TechDomain-specific AI4TechAI4Care and AI4HealthDomain-specific AI4TechOther AI4Tech applicationsEmerging AI4Tech Emerging AI4Tech areas
  831. #AI4T151

    ACCFormer: Predicting Analog Circuit Performance Metrics via Topology-Aware Transformers

    Bowen Liao, Yutong Feng, Jianhua Lin, Zhaohui Wu, Yuxuan Liang, Bin Li
    Reusing and migrating analog circuit intellectual property (IP) across process nodes poses a significant challenge in modern chip design. Efficient and generalizable circuit performance prediction methods for analog circuits are crucial to achieving this goal. Current data-driven approaches typically rely on manually designed features, which perform poorly on unseen circuit architectures and struggle to model the inherent structural relationships within analog designs. To address these challenges, we propose ACCFormer, a novel topology-aware Transformer framework for predicting performance metrics of analog circuit. Our model combines device parameters with connectivity data to learn topology-aware representations, followed by a performance-oriented cross-attention mechanism where trainable metric queries adaptively focus on the most critical devices for each target parameter. Validated across different process nodes, our model achieves state-of-the-art prediction accuracy and demonstrates strong cross-process adaptability, highlighting its potential to accelerate IP reuse and reduce design cycles.
    Advanced AI4TechData-driven AI4TechAI4Tech infrastructure/systemsAI chips, AI sensors, AI computersDomain-specific AI4TechOther AI4Tech applications
  832. #AI4T168

    HT-Transformer: Event Sequences Classification by Accumulating Prefix Information with History Tokens

    Ivan Karpukhin, Maksim Polesskii, Andrey Savchenko
    Deep learning has achieved strong results in modeling sequential data, including event sequences, temporal point processes, and irregular time series. Recently, transformers have largely replaced recurrent networks in these tasks. However, transformers often underperform recurrent networks in classification tasks that aim to predict future targets, such as churn, user reactions, or treatment response. The reason behind this performance gap remains largely underexplored. In this paper, we identify a key limitation of transformers: the lack of a single vector representation that compactly summarizes the evolving state of a sequence. We further show that commonly used contrastive embeddings are poorly suited to capturing the local context needed for accurate forward-looking prediction. To address these challenges, we introduce history tokens, a novel concept that enables the accumulation of historical information during next-token prediction pretraining. Our approach significantly improves transformer-based models, achieving impressive results in finance, e-commerce, and healthcare tasks. The code is publicly available: https://github.com/ivan-chai/pretpp.
    Advanced AI4TechDeep AI4TechDomain-specific AI4TechAI4Care and AI4HealthDomain-specific AI4TechAI4Customer and AI4MarketDomain-specific AI4TechAI4Finance
  833. #AI4T199

    DiffLOB: Diffusion Models for Counterfactual Generation in Limit Order Books

    Zhuohan Wang, Carmine Ventre
    Modern generative models for limit order books (LOBs) can reproduce realistic market dynamics, but they remain fundamentally passive: they either model what typically happens without accounting for hypothetical future market conditions, or they require interaction with another agent to explore alternative outcomes. This limits their usefulness for stress testing, scenario analysis, and decision-making. We propose DiffLOB, a regime-conditioned Diffusion model for controllable and counterfactual generation of LOB trajectories. DiffLOB explicitly conditions the generative process on future market regimes—including trend, volatility, liquidity, and order-flow imbalance, which enables the model to answer counterfactual queries of the form: “If the future market regime were X instead of Y, how would the limit order book evolve?” We introduce the first systematic evaluation framework for counterfactual LOB generation consisting of three criteria: (1) Realism, measuring how well generated trajectories can reproduce marginal distributions, temporal dependence structure and regime variables; (2) Counterfactual validity, testing whether interventions on future regimes induce consistent changes in the generated LOB dynamics; (3) Counterfactual usefulness, assessing whether synthetic counterfactual trajectories improve downstream prediction of future market regimes.
    Domain-specific AI4TechAI4Finance
  834. #AI4T220

    GraphPerf-RT: Graph-Driven Performance Modeling with Calibrated Uncertainty for OpenMP Scheduling on Heterogeneous Embedded SoCs

    Mohammad Pivezhandi, Mahdi Banisharif, Saeed Bakhshan, Abusayeed Saifullah, Ali Jannesari
    Autonomous AI agents on embedded platforms require real-time, risk-aware scheduling under resource and thermal constraints. Classical heuristics struggle with workload irregularity, tabular regressors discard structural information, and model-free reinforcement learning (RL) risks overheating. We introduce GraphPerf-RT, an AI technology achieving deep learning accuracy at heuristic speeds (2-7ms). GraphPerf-RT is, to our knowledge, the first graph-grounded infrastructure unifying task DAG topology, CFG-derived code semantics, and runtime context (per-core DVFS, thermal state, utilization) in a heterogeneous graph with typed edges encoding precedence, placement, and contention. The architecture supports multi-task evidential heads with Normal-Inverse-Gamma uncertainty; we validate on makespan prediction for risk-aware scheduling. Experiments on three ARM platforms (Jetson TX2, Orin NX, RUBIK Pi) achieve R^2 = 0.81 on log-transformed makespan with Spearman rho = 0.95 and conservative uncertainty calibration (PICP = 99.9% at 95% confidence). Integration with four RL methods demonstrates that multi-agent model-based RL with GraphPerf-RT as the world model achieves 66% makespan reduction and 82% energy reduction versus model-free baselines, with zero thermal violations.
    Advanced AI4TechAI4Tech foundationsAdvanced AI4TechData-driven AI4TechAdvanced AI4TechDeep AI4Tech
  835. #AI4T231

    Towards Scalable Metaverse Systems with Social-Aware VR Displays

    Shaolong Guo, Yuntao Wang, Qinnan Hu, Zhou Su, Tom H. Luan
    The Metaverse is envisioned to support immersive, large-scale social interactions via virtual reality (VR) displays. However, scalability remains a major bottleneck: as the number of concurrent users grows, tracking and updating display content incurs quadratic overhead, often limiting a shared virtual space to only a few dozen participants. Our key observation is that user attention in social VR is highly selective, with users primarily focusing on socially relevant peers rather than all visible users. Motivated by this observation, we propose SAGE, a social-aware graph-based VR display framework that enables personalized displays based on inferred social relevance. SAGE introduces a dual-graph learning architecture to jointly model long-term social structures and short-term spatiotemporal co-presence patterns, generating complementary interest scores for display prioritization. Based on these scores, we formulate scalable VR display support as a multi-dimensional resource allocation problem and design a lightweight coordination mechanism with provable guarantees, including incentive compatibility and individual rationality. Experiments on Metaverse datasets show that SAGE improves interaction-relevance prediction by 11.64% and increases social welfare by up to 2.4× compared to state-of-the-art schemes. It scales to support up to 1,000 concurrent users and remains robust against strategic manipulation.
    Advanced AI4TechMetaverse AI4TechDomain-specific AI4TechOther AI4Tech applications
  836. #AI4T240

    Structured Discrete Graph Generation Model for Fragmented Image Recovery

    Yalong Zhu, Liguo Zhang, Zhibo Wang, Ruyue Liu, Yue Cao, Yunfei Long, Jingang Wang
    Fragmented image recovery is of significant importance in computer vision, such as cultural relic and artwork restoration, archival document recovery, and digital forensics. The goal is to recover the original image topology from an unordered set of fragments and spatially align and stitch them together. The adjacency relationships among fragments are discrete, sparse, and highly structured, making it difficult for traditional methods to effectively handle global topological consistency. To address this challenge, we propose a fragment adjacency recovery method based on a conditional graph diffusion model. First, we perform discrete denoising pretraining with structural masking to learn structure-aware node representations from perturbed adjacency matrices, using graph neural networks for message passing. Building on this, we design a masked discrete diffusion process tailored for fragment reconstruction, which progressively restores the connectivity between fragments. Furthermore, to enhance the controllability of the generation process, we introduce a topology-guided mechanism that steers the generation of adjacency structures via a topological scoring function, ensuring that the reconstructed fragment graph satisfies global topological constraints. Experimental results demonstrate that our method achieves state-of-the-art performance on hand-torn calligraphy, painting replica datasets and document datasets, outperforming existing approaches in both accuracy and robustness.
    Domain-specific AI4TechAI4Arts and AI4Law
  837. #AI4T243

    Physics-in-the-Loop: A Hybrid Agentic Architecture for Validated CAD Engineering Design

    Elias Berger, Muhammad Usama, Jan Mehlstäubl, Bernhard Saske, Kristin Paetzold-Byhain
    Large Language Models (LLMs) can generate Computer-Aided Design (CAD), yet lack physical comprehension required for reliable engineering design. Instead of attempting to implicitly learn physical laws from data, we propose a Hybrid Agentic-Physical Architecture that embeds validated knowledge-based engineering tools directly into the decision-making loop of autonomous AI agents. In this framework, engineering design is formulated as a closed-loop, sequential decision-making process guided by explicit physical verification. Based on a load case, dedicated agents iteratively plan, generate, evaluate, and revise engineering designs using knowledge-based tools as a feedback signal. We introduce a benchmark dataset and metrics for assessing functional validity in generative CAD. Our system generates more complex and physically verified designs, with a 4.2x increase in structural complexity and improving compile rate by 3.5% compared to similar agentic methods. The codebase, prompts and dataset will be made publicly available to support reproducibility and future research.
    Domain-specific AI4TechAI4ManufacturingAdvanced AI4TechGenerative and LLMs-driven AI4TechAI4TDomain-specific AI4Tech
  838. #AIR20

    GRALP: A Generative Representation Framework for Action Refinement and Latent Planning in Offline Robotic Control

    Talha Zaidi, Arslan Munir, Sardar Ali Abbas
    Offline robotic control requires long-horizon reasoning from fixed datasets while avoiding unsafe extrapolation beyond demonstrated behavior.
    We propose GRALP, a principled framework that resolves this tension by jointly enforcing support preservation and controllability at the level of temporal abstraction. GRALP adopts a deliberate architectural separation: diffusion is used exclusively as a deterministic action decoder for executing fixed latent skills, while planning and value estimation operate entirely in latent space under conservative constraints. This design enables stable value learning, controllable skill composition, and efficient planning without trajectory-level diffusion sampling at inference. Across unified D4RL benchmarks, GRALP achieves the highest average performance on Navigation, Sequential (Kitchen), and Adroit domains while remaining competitive on locomotion tasks. On contact-rich RoboSuite manipulation with human demonstrations (Lift and Pick-and-Place), GRALP achieves consistently high success rates (over 94%). These results indicate that reliable long-horizon offline control emerges when expressivity is confined to execution and decision-making operates over support-aligned latent abstractions.
    AIRGenerative AI, robotic foundation models, and reinforcement learningAIRRobot control, planning, and execution with guaranteesRobot control, planning, and execution with guaranteesSafe and robust control under uncertainty
  839. #AIR22

    FILD-Nav:Vision-and-Language Navigation with Instruction Landmark Features in Continuous Environments

    Chuangye Hu, Lulu Liu, Huaiwei Si, Yawen Zhao, Nan Ding
    Vision-and-language navigation (VLN) requires agents to follow natural language instructions to navigate autonomously in continuous environments. However, existing approaches often lack high-level semantic guidance in waypoint prediction and explicit language–landmark alignment in cross-modal planning. To address these limitations, we propose FILD-Nav, a vision-and-language navigation framework that integrates instruction landmark features. FILD-Nav extracts task-relevant landmarks from instructions and incorporates landmark semantics into both waypoint prediction and topological planning. Specifically, landmark-guided waypoint prediction improves waypoint relevance, while landmark-enhanced cross-modal planning enables more effective long-horizon navigation. Extensive experiments on the VLN-CE benchmark demonstrate that FILD-Nav consistently outperforms prior methods, achieving improvements of 2% in Success Rate (SR), 3% in Success weighted by Path Length (SPL), and 7% in Oracle Success Rate (OSR), particularly in unseen environments.
    Robot control, planning, and execution with guaranteesArchitectures connecting high-level intent and constraints to low-level trajectoriesLearning to understand, generalize, and explain actionsRobust generalization and transfer across tasks, objects, environments, embodiments, and long-horizon scenariosAIRRobot control, planning, and execution with guaranteesFoundations of human–robot interaction and assistanceLearning and inference methods for aligning robot behavior with human instructions, demonstrations, and feedback
  840. #AIR36

    Perturbation-Resilient Autonomous Navigation with Distributionally Robust Reinforcement Learning

    Zhaofan Zhang, Minghao Yang, Sihong Xie, Hui Xiong
    The robustness of autonomous vehicles such as drones and Unmanned Surface Vehicles (USV) is crucial when facing unknown and complex marine environments, especially when heteroscedastic observational noise poses significant challenges to sensor-based navigation tasks. Recently, Distributional Reinforcement Learning (DistRL) has shown promising results in some challenging autonomous navigation tasks without prior environmental information. However, these methods overlook situations where noise patterns vary across different environmental conditions, hindering safe navigation and disrupting the learning of value functions. To address the problem, we propose DRIQN to integrate Distributionally
    Robust Optimization (DRO) with implicit quantile networks to optimize worst-case performance under natural environmental conditions. Leveraging explicit subgroup modeling in the replay buffer, DRIQN incorporates heterogeneous noise sources and target robustness-critical scenarios. Experimental results based on the risk-sensitive environment demonstrate that DRIQN significantly outperforms state-of-the-art meth-
    ods, achieving +13.51% success rate, -12.28% collision rate and +35.46% for time saving, +27.99% for energy saving, compared with the runner-up.
    AIRRobot control, planning, and execution with guaranteesAIRGenerative AI, robotic foundation models, and reinforcement learningRobot control, planning, and execution with guaranteesSafe and robust control under uncertaintySafety, trustworthiness, generalizability, and evaluationTheoretical and algorithmic guarantees on safety, robustness, and out-of-distribution generalization
  841. #AIR41

    Disturbance-Aware Hybrid Learning for Robust and Adaptive UAV Flight in Extreme Winds

    Huidong Liu, Jiarui Dou, Jiangshan Ai, Enwen Hu, Xianlei Long, Mingyan Li, Chao Chen, Fuqiang Gu
    Safe and precise maneuvering of quadrotor unmanned aerial vehicles (UAVs) in high-speed wind environments remains a critical challenge. Wind disturbances are nonlinear, time-varying, and difficult to model, causing traditional controllers to struggle with perception and compensation, especially under unseen wind distributions. To address these limitations, we introduce WA-TD3, a data-driven control framework that enables real-time wind disturbance perception and adaptive compensation without dedicated wind sensors. WA-TD3 employs a deep residual network to extract wind characteristics from temporal patterns in state deviations, forming a dynamics residual-driven perception mechanism that implicitly models and compensates for unknown winds. This residual is integrated into a perception-augmented reinforcement learning architecture, providing the policy with enhanced state information for proactive disturbance-aware control. Extensive experiments on complex trajectories under varying wind intensities demonstrate that WA-TD3 consistently outperforms state-of-the-art methods, achieving over 62% improvement in tracking accuracy under strong winds.
    AIRRobot control, planning, and execution with guaranteesRobot control, planning, and execution with guaranteesSafe and robust control under uncertaintySafety, trustworthiness, generalizability, and evaluationTheoretical and algorithmic guarantees on safety, robustness, and out-of-distribution generalization
  842. #AIR44

    RepSAM: Bridging Foundation Models to Robotic Vision via Representation-Guided Adaptation

    Wenhui Chu
    Robotic perception in unstructured environments remains challenging despite the zero-shot capabilities of foundation models such as SAM. This work attributes performance degradation to non-uniform representation shifts across transformer layers: shallow layers exhibit substantial domain gaps (CKA < 0.5), whereas deep layers transfer effectively (CKA > 0.7). Based on this observation, we propose RepSAM, a representation-guided parameter-efficient fine-tuning (PEFT) framework for adapting foundation models to robotic vision. RepSAM employs a theoretically grounded CKA-guided rank allocation strategy combined with a multi-modal fusion module for robust handling of challenging robotic scenarios, including transparent objects and cluttered scenes. Experimental evaluation across six benchmarks and robotic manipulation tasks demonstrates that RepSAM achieves 97.9% of full fine-tuning performance (89.0% vs. 90.9% mIoU) while reducing trainable parameters by 158× (from 632M to 4.0M). RepSAM outperforms DoRA by 7.9% mIoU with just 4 hours of training on a single A100 GPU (a 96× reduction from full fine-tuning, which takes 384 GPU-hours). These improvements are statistically significant (p<0.01) and translate to a 12.0% absolute improvement in robotic manipulation success rates over the LoRA (RGB) baseline.
    Generative AI, robotic foundation models, and reinforcement learningGrounding large models in real robot interaction with cross-task, cross-environment, cross-platform, and open-domain generalization and transferLearning to understand, generalize, and explain actionsRobust generalization and transfer across tasks, objects, environments, embodiments, and long-horizon scenariosSafety, trustworthiness, generalizability, and evaluationVerification and validation methods for AI-powered robotic systemsSafety, trustworthiness, generalizability, and evaluationTheoretical and algorithmic guarantees on safety, robustness, and out-of-distribution generalization
  843. #AIR46

    VLAs Are Confined yet Capable of Generalizing to Novel Tasks

    Quanyi Li
    Vision-language-action models (VLAs) often achieve high performance on demonstrated tasks but struggle significantly when required to extrapolate, recombining skills used in different tasks in novel ways. For instance, VLAs might successfully put the cream cheese in the bowl and put the bowl on top of the cabinet, yet still fail to put the cream cheese on top of the cabinet. This motivates us to investigate whether VLAs merely overfit to demonstrated tasks or still hold the potential to extrapolate. Our study uses text latent as the ingredient; it is a task-specific vector derived from the models’ hidden states. It thus encodes semantics necessary for completing a task and can be used to reconstruct the associated task behavior by writing it to the model’s residual stream. Furthermore, we find that skills used in distinct tasks can be combined to produce novel behaviors by blending their respective text latent. Applying this to π0, we increase its success rate from 9% to 83% on the proposed libero-ood benchmark, which features 20 tasks extrapolated from standard LIBERO tasks. This reveals that the skill representations encoded in text-latent are individual yet composable, while π0 fails to autonomously combine these representations for extrapolation. This also validates the design of libero-ood; it comprises tasks that the model fails, yet should be able to complete. We then tested other VLAs on libero-ood, and none of them achieved a success rate higher than 21%. Further analysis reveals VLAs share a common pattern to exhibit spatial overfitting, associating object names with where the object is spatially located in the demonstrated scene rather than achieving true object and goal understanding.
    Safety, trustworthiness, generalizability, and evaluationVerification and validation methods for AI-powered robotic systemsAIRSafety, trustworthiness, generalizability, and evaluationIntentional, causal, and intuitive physics reasoningRepresentations and inference for intentions, goals, and affordancesLearning to understand, generalize, and explain actionsRobust generalization and transfer across tasks, objects, environments, embodiments, and long-horizon scenarios
  844. #AIR54

    TeNet: Text-to-Network for Compact Policy Synthesis

    Ariyan Bighashdel, Kevin Sebastian Luck
    Robots that follow natural-language instructions often either plan at a high level using hand-designed interfaces or rely on large end-to-end models that are difficult to deploy for real-time control. We propose TeNet (Text-to-Network), a framework for instantiating compact, task-specific robot policies directly from natural language descriptions. TeNet conditions a hypernetwork on text embeddings produced by a pretrained large language model (LLM) to generate a fully executable policy, which then operates solely on low-dimensional state inputs at high control frequencies. By using the language only once at the policy instantiation time, TeNet inherits the general knowledge and paraphrasing robustness of pretrained LLMs while remaining lightweight and efficient at execution time. To improve generalization, we optionally ground language in behavior during training by aligning text embeddings with demonstrated actions, while requiring no demonstrations at inference time. Experiments on MuJoCo and Meta-World benchmarks show that TeNet produces policies that are orders of magnitude smaller than sequence-based baselines, while achieving strong performance in both multi-task and meta-learning settings and supporting high-frequency control. These results show that text-conditioned hypernetworks offer a practical way to build compact, language-driven controllers for ressource-constrained robot control tasks with real-time requirements.
    AIRGenerative AI, robotic foundation models, and reinforcement learningGenerative AI, robotic foundation models, and reinforcement learningModel construction, representations, task and policy synthesis from language, demonstrations, or instructional videoGenerative AI, robotic foundation models, and reinforcement learningGrounding large models in real robot interaction with cross-task, cross-environment, cross-platform, and open-domain generalization and transferLearning to understand, generalize, and explain actionsRobust generalization and transfer across tasks, objects, environments, embodiments, and long-horizon scenarios
  845. #AIR61

    PECHC: Robust Tactile Grasping Stabilization in Vision-Denied Peripersonal Space

    Changlin Chen, Sisheng Chen, Hang Zhang, Xianglai Zhou, Zhen Tian, Weitao Liu, Feng-Qi Cui, Erbao Dong, Wenjing Chen
    In the final “Last-Centimeter” phase of manipulation, visual occlusion and calibration errors often make vision unreliable, causing robots to suffer high failure rates due to local pose uncertainty and deviations between simulated and real-world dynamics. To address these challenges, this paper proposes the PECHC algorithm, short for Physics-Evolving Cascade Constraint and Human-Correction. To rigorously isolate the contribution of tactile feedback in multi-finger coordination, we adopt a decoupled control strategy that focuses on grasp stabilization within the hand’s workspace and functions as a fail-safe reflex. The core of the proposed approach is Hybrid Correction Imitation Learning (HCIL), which establishes a failure-triggered human–machine correction mechanism to efficiently reduce the model gap through sparse expert corrections. To improve sample efficiency and ensure reliable baseline performance, two supporting modules are introduced. Cascaded Constraint Scheduling (CCS) addresses the geometric gap by enforcing physically plausible behavioral constraints, including geometric approach, force closure, and dynamic stability. Temporal Heterogeneous Distillation (THED) addresses the physical gap by enabling implicit system identification from tactile history. Experiments demonstrate that PECHC achieves a 97.3% real-robot success rate on 150 objects from the Visual Dexterity Dataset under fully autonomous testing. In this setup, one object is used for one-time HCIL calibration, while the remaining 149 objects are evaluated without further intervention. Compared with a standard sim-to-real reinforcement learning baseline, Vanilla PPO with Domain Randomization, PECHC achieves a significant performance improvement of 42.8% and demonstrates human-like force modulation capabilities when handling fragile objects.
    AIRGenerative AI, robotic foundation models, and reinforcement learningLearning to understand, generalize, and explain actionsLearning from language, corrections, preferences, and sparse feedbackAIRRobot control, planning, and execution with guaranteesRobot control, planning, and execution with guaranteesSafe and robust control under uncertainty
  846. #AIR66

    Operationalising Normative Rules in Autonomous Robotic Systems Through Context-Oriented Programming

    Roberto Casadei, Martina De Sanctis, Gianluca Filippone, Sara Pettinari, Gian Luca Scoccia, Nicolas Troquard
    The increasing sensitivity to human aspects in autonomous systems engineering calls for principled approaches to embed ethical and normative concerns into their behaviour. Indeed, recent research has focused on expressing and validating sets of social, legal, ethical, empathetic, and cultural (SLEEC) concerns as rules. However, existing work is limited to rule specification or verification, leaving the problem of semantic-preserving operationalisation of ethical rules in autonomous systems largely unaddressed. For this purpose, we provide an operational solution for ethical-aware autonomous systems, applied to the realm of multi-service robots.
    Specifically, we devise a principled approach, named CO-SLEEC (Context-Oriented SLEEC), connecting the normative setting of SLEEC rules to context-oriented programming (COP). CO-SLEEC enables runtime adaptation while preserving the semantics of SLEEC rules during robot task execution. It features two reusable Python libraries for (i) parsing SLEEC rules into contextual elements for operationalising them, and (ii) connecting the operational model to the Robot Operating System (ROS), respectively.
    We evaluate our implementation for correctness, efficiency, and maintainability over multi-service assistive robot scenarios.
    Foundations of human–robot interaction and assistanceAI foundations for robot assistance and collaboration with humans, emphasizing explicit representations of tasks, roles, goals, norms, and shared contextRobot control, planning, and execution with guaranteesArchitectures connecting high-level intent and constraints to low-level trajectoriesAIRSafety, trustworthiness, generalizability, and evaluation
  847. #AIR71

    Closing the Motion Execution Gap: From Semantic Motion Task Constraints to Kinematic Control

    Simon Stelter, Vanessa Hassouna, Malte Huerkamp, Michael Beetz
    This paper addresses the Motion Execution Gap, the disconnect between high-level symbolic task descriptions using semantic constraints and executable robot motions.
    \textbf{Motion Statecharts} are introduced as an executable symbolic representation for complex motions.
    They allow the arbitrary arrangement of motion constraints, monitors or nested statecharts in parallel and sequence.
    World-centric motion specification and generalization across embodiments are enabled through the use of a unified differentiable kinematic world model of both, robots and environments.
    Motion execution is realized through a \ac{lmpc}-based implementation of the task-function approach, in which smooth transitions during task switches are ensured using jerk bounds.
    Cross-platform transferability was demonstrated by deploying the method on eight robot platforms, operating in diverse environments.
    The proposed framework is called Giskard and is available open source (https://github.com/cram2/cognitive_robot_abstract_machine).
    Robot control, planning, and execution with guaranteesArchitectures connecting high-level intent and constraints to low-level trajectoriesRobot control, planning, and execution with guaranteesOnline monitoring, explanation, and adaptationStructured, semantic, and explicit world models, digital twins, and action representationsExplicit action representations such as task schemas, scripts, and parameterized skillsStructured, semantic, and explicit world models, digital twins, and action representationsStructured world models and semantic digital twins
  848. #AIR73

    Real-Time Multi-Robot Motion Planning with Safe-Interval Search and Learning-Guided Repair

    Rajat Kumar, Kristin Predeck, Ken Meszaros, Trevor Dardik
    Motion planning among multiple robots in a shared space is a fundamental yet computationally challenging problem in robotics, with applications ranging from warehouse automation to autonomous fleets. In this work, we introduce a fast, scalable motion planner that achieves real-time, collision-free trajectory planning via a two-staged algorithm combining deterministic search-based planning with machine learning-driven conflict resolution. We present a prioritized Safe Interval Path Planning algorithm (SIPP-PP) with a novel limited goal reservation strategy to prevent goal-blocking conflicts while allowing shared goal regions. We added a second layer of ML-guided Large Neighborhood Search (LNS) procedure to our SIPP-PP algorithm for improving success rates in highly congested environments via intelligent selection of conflict resolution actions. The result is a planning system that generates collision-free paths for multiple robots in complex environments within tens of milliseconds. For example, compared to recent advanced learning-based methods such as diffusion planners, our planner is two-to-three orders of magnitude faster. Our work demonstrates a multi-robot planner capable of real-time operation in dense scenarios, satisfying the stringent requirements of industrial applications such as drive units in fulfillment centers.
    AIRRobot control, planning, and execution with guaranteesRobot control, planning, and execution with guaranteesIntegrated task and motion planning with feedback controlRobot control, planning, and execution with guaranteesSafe and robust control under uncertainty
  849. #DM33

    Wavelength.AI: Extending the Collaborative Game Wavelength as a Testbed for Studying Shared Understanding in Human–Agent Collaboration

    Katelyn Morrison, Gabriel Gonzalez, Zahra Ashktorab, Matt Riemer, Andrew Anderson, Djallel Bouneffouf, Justin Weisz
    AI's increasing role as a personal agent assisting knowledge workers in everyday tasks underscores the need to investigate how to help human–agent teams build a shared understanding. We extend the collaborative "mind-reading" game Wavelength to include an AI teammate, presenting the first demonstration of an LLM capable of playing this game. Based on our agent–agent play experiments, we developed Wavelength.AI, which implements two strategies to support shared understanding: an initial team grounding conversation and post-game reflective explanations. We interpret higher team scores as evidence for better shared understanding in a preliminary user study with 24 human–AI teams. Our findings reveal that Wavelength.AI can help researchers evaluate and design different strategies to shape human-agent teams' shared understanding. Human players can see if they are on the same wavelength with AI today at https://play-wavelength-ai.com.
  850. #DM67

    SoilNet App: AI-Assisted Expert-level Annotations of Soil Horizons

    Vipin Singh, Joey Pruessing, Teodor Chiaburu, Einar Eberhardt, Sina Hesse, Stefan Broda, Frank Haußer, Felix Biessmann
    Precise descriptions of soil horizons are required for policy makers, agriculture and many applications in civil engineering. Up to date correct soil horizon annotations require human experts as they follow complex hierarchical taxonomies. We present the SoilNet App, a web-based demonstrator that guides experts through relevant tasks for expert-level soil horizon annotations from soil profile images. To demonstrate the reliability of the SoilNet app we present results of a user study with soil horizon annotation experts, which highlights the difficulty of image-only-based annotation and suggests that collaborating with our model not only increases expert performance but also improves inter-annotator consistency. Our app is publicly accessible (https://soilnet.demo.calgo-lab.de).
    AIMultidisciplinary Topics and ApplicationsAIHumans and AIAIComputer Vision
  851. #DM68

    A Resilient Solution for Sewer Overflow Monitoring Across Cloud and Edge

    Vipin Singh, Tianheng Ling, Peter Ghaly, Felix Grimmeisen, Gregor Schiele, Felix Biessmann
    Aging combined sewer systems in many historical cities are increasingly stressed by extreme rainfall events, which can trigger combined sewer overflows (CSO) with significant environmental and public health impacts. Forecasting the filling dynamics of overflow basins is critical for anticipating capacity exceedance and enabling timely preventive actions for CSO. We present a web-based demonstrator that integrates Deep Learning forecasting methods in both cloud and edge settings into an interactive monitoring dashboard for overflow monitoring, resilient to network outages.
    AIMultidisciplinary Topics and ApplicationsAIPlanning and SchedulingAIHumans and AIAIAI Ethics, Trust, Fairnes
  852. #DM76

    Neuro-Symbolic Logical Reasoning with Textual Entailment

    Zacchary Sadeddine, Fabian M. Suchanek
    Large Language Models can use logical deduction to answer natural language questions, but they remain black-boxes with potentially erroneous chains-of-thought. In this paper, we adapt VANESSA, a neuro-symbolic method for chain-of-thought verification, to reasoning-based question answering. VANESSA combines a logical reasoner with a neural textual entailment model to handle phrasing variations. Building on VANESSA, we develop a transparent, logic-based approach to answer natural language questions even with phrase variations. Our experiments across various datasets show our method is competitive with the state of the art, while also delivering proof trees for its answers. A demo interface allows users to interact with the system.
    AINatural Language ProcessingAIKnowledge Representation and Reasoning
  853. #DM80

    RUVA: Personalized Transparent On-Device Graph Reasoning

    Gabriele Conte, Alessio Mattiace, Gianni Carmosino, Potito Aghilar, Giovanni Servedio, Francesco Musicco, Vito Walter Anelli, Tommaso Di Noia, Francesco Donini
    The Personal AI landscape is currently dominated by "Black Box" Retrieval-Augmented Generation. While standard vector databases offer statistical matching, they suffer from a fundamental lack of accountability: when an AI hallucinates or retrieves sensitive data, the user cannot inspect the cause nor correct the error. Worse, "deleting" a concept from a vector space is mathematically imprecise, leaving behind probabilistic "ghosts" that violate true privacy. We propose Ruva, the first "Glass Box" architecture designed for Human-in-the-Loop Memory Curation.
    Ruva grounds Personal AI in a Personal Knowledge Graph, enabling users to inspect what the AI knows and to perform precise redaction of specific facts. By shifting the paradigm from Vector Matching to Graph Reasoning, Ruva ensures the "Right to be Forgotten." Users are the editors of their own lives; Ruva hands them the pen. The project and the demo video are available at http://sisinf00.poliba.it/ruva/.
  854. #DM81

    MC-RAG System: A Structure-Driven RAG System for Multi-Constraint Queries

    Xiao Zhang, Yang Wan, Yi Li, Miao Xie, Chunli Lv
    Retrieval-Augmented Generation (RAG) systems are widely adopted in question answering, yet they often fail to satisfy complex multi-constraint queries, leading to constraint violations, factual inconsistencies, or hallucinations. We present Structure-Driven RAG System for Multi-Constraint Queries(MC-RAG), a structure-driven RAG system that reformulates retrieval as a subgraph matching problem over a knowledge graph. By integrating semantic and structural embeddings with path-level indexing, MC-RAG performs interpretable, structure-aware, and constraint-consistent retrieval and generation. During the demonstration, participants can input medical or encyclopedic multi-constraint queries, visualize how the system parses constraints, performs structural matching, and generates answers, thereby experiencing an end-to-end, interactive, and explainable RAG pipeline.
    A demo video is available at https://youtu.be/J8kahzmAnu0.
    AINatural Language ProcessingAIKnowledge Representation and Reasoning
  855. #DM84

    vSpeedUI: Turning Past GUI Experience into Fast Executable Plans

    Xiaohan Zheng, Yihong Chen, Haiquan Qiu, Quanming Yao
    LLM-based mobile GUI agents usually invoke large models for nearly every micro-action, making real-device automation slow even when similar workflows have been completed before. We present vSpeedUI, a public demo system that turns past GUI experience into fast executable plans. It organizes historical trajectories into an Executable Experience Graph (EXG), where UI states are connected by Semantic Step Summaries with explicit preconditions. At task initialization, vSpeedUI performs Global Look-ahead Planning to retrieve, validate, and rank candidate transitions into a pre-verified plan. During execution, the agent uses lightweight graph traversal with state localization, target adaptation, and fallback when needed. On HarmonyOS, vSpeedUI reduces LLM latency and total task time while maintaining strong success rates, showing a practical route toward data-efficient GUI automation. Code is available at: https://github.com/LARS-research/vSpeedUI.
    AIAgent-based and Multi-agent SystemsAIPlanning and SchedulingAIKnowledge Representation and ReasoningAIMachine Learning
  856. #DM87

    eNNcode: Optimization-Based Analysis of Neural Networks

    Muhammad Atallah, Lukas Dankwart, Daniel Neider, Mustafa Yalciner
    The increasing use of neural networks (NNs) in high-stakes decision-making requires rigorous analysis to ensure safety, fairness, and explainability.
    Formal verification tools for neural networks typically focus on determining the satisfiability of properties such as safety or fairness.
    However, many fairness and explainability tasks go beyond satisfiability and instead rely on arbitrary optimization objectives.
    To address such problems, neural networks are often encoded as mixed-integer linear optimization (MILO) problems with linear objectives.
    In practice, these encodings are usually implemented in an ad-hoc manner, limiting comparability across works, reducing transparency, and increasing implementation effort.
    We address this gap by introducing eNNcode, a user-friendly PyPI library that converts any piecewise-linear neural network in Open Neural Network Exchange (ONNX) format into a MILO instance.
    The library supports arbitrary constraints on input and output nodes, as well as user-defined optimization objectives.
    Our experiments show that eNNcode achieves performance comparable to existing libraries, despite its simplicity and ease of use.
    Overall, eNNcode facilitates reproducible and standardized optimization-based analysis of neural networks.
    AIConstraint Satisfaction and Optimization
  857. #DM99

    AI-Powered Interactive Multimodal Digital Book & Online Shop For Blind and Visually Impaired Users

    Mazen Salous, Daniel Westphal, Wilko Heuen, Ayoub Ben Dhiab, Charles Hudin, Sabrina Paneels, Susanne Boll, Larbi Abdenebaoui
    We present an AI-powered interactive multimodal system that enriches digital image accessibility for blind and visually impaired (BVI) users. Our demonstration showcases two application domains: (1) an educational digital book, and (2) an online shopping interface. In both use-cases, users can virtually feel material textures (leather, wood, etc.) and engage in voice-driven inquiry about images. The system integrates state-of-the-art AI components – including voice-to-voice conversational agents, vision models for object segmentation, and a custom 16-actuator vibrotactile display – to provide multimodal feedback (haptic vibrations, spoken descriptions, and audio cues).
    The result is an inclusive technology with significant societal benefit, empowering BVI users to learn and shop more independently through natural multimodal interactions.
    AIHumans and AIAIComputer VisionAIMachine Learning
  858. #DM100

    Anti-Slavery Intelligence (ASI): An AI-Powered Tool for Modern Slavery Compliance Analysis and Remediation

    Mahmoud Gad, Abdessalam Elhabbash, Steven Young
    We present Anti-Slavery Intelligence (ASI), a deployed AI system that analyses corporate modern slavery statements to identify compliance gaps and generate prioritised remediation advice. ASI orchestrates a multi-model pipeline: Gemini-2.5-Flash for vision-enhanced PDF parsing and structured compliance scoring against 48 expert-defined criteria, and Gemini-2.5-Pro for synthesising company-specific, timeline-based recommendations. A benchmarking engine compares each statement against industry and FTSE index averages across 11 sectors. Evaluation on 95 expert-annotated statements yields an F1-score of 0.86 and recall of 0.94, reducing time to generate an initial compliance assessment from several hours to approximately five minutes per document. ASI is publicly available at https://www.antislaveryintelligence.co.uk/ with free access for academics and NGOs. A demonstration video is at https://youtu.be/DNhdEGrRwzg.
    AIMultidisciplinary Topics and ApplicationsAINatural Language ProcessingAIHumans and AI
  859. #DM105

    Visualizing Deep Agents in Long-Horizon Tasks: Towards Explainable and Trustworthy Agentic AI

    Amirkia Rafiei Oskooei, Mehmet S. Aktas
    The transition from prompt-based Large Language Models (LLMs) to autonomous Deep Agents has enabled the automation of long-horizon tasks. However, as these agents adopt hierarchical architectures with nested tool usage, they suffer from significant opacity. Existing linear tracing tools fail to capture the multi-dimensional complexity of parallel sub-agent execution, hindering both debugging and user trust. We propose a general-purpose observability framework that decomposes agent execution into four distinct visualization dimensions: Temporal, Cognitive, Hierarchical, and Spatial. We validate this framework through RepoLearn, an open-source workbench for automated codebase comprehension. Our user study demonstrates that this multi-dimensional approach reduces the Time-to-Insight (TTI) for complex behavioral analysis by 56% and significantly lowers cognitive load (NASA-TLX) compared to state-of-the-art linear traces. The source code is available at https://github.com/amirkiarafiei/repo-learn and the demo at https://www.youtube.com/watch?v=s3U6E9o94gk.
    AIAgent-based and Multi-agent SystemsAIHumans and AIAIAI Ethics, Trust, Fairnes
  860. #DM110

    DeepMed Search: An Open-Source Agentic Platform for Medical Deep Research with Introspective Verification

    Maolin Liu, Fanyu Xu, xu ruoqing, JiaHang Zhang, Hao Wang, Rui Wang
    Navigating the deluge of heterogeneous medical data, from academic literature (PubMed) to clinical guidelines (Web) and private knowledge bases remains a critical bottleneck for evidence-based medicine. While commercial black-box tools lack transparency, standard open-source RAG implementations frequently suffer from ``reasoning drift'' when handling complex, long-tail queries. We present DeepMed Search, a fully open-source, agentic platform designed for transparent medical deep research. Built on a high-performance Next.js architecture, DeepMed Search features a source-adaptive router that autonomously dispatches sub-queries to PubMed, web search, or local graph-based knowledge bases based on information density. Crucially, the platform integrates an introspective verification module, powered by a causal-consistent multi-agent debate framework, to validate retrieved evidence against diagnostic logic before synthesis. To demonstrate its robustness, we showcase DeepMed Search's ability to autonomously decompose high-difficulty rare disease queries, filter out confounding noise, and generate structured, citation-backed research reports in minutes. By open-sourcing this software, we provide the community with a robust infrastructure to democratize access to trustworthy, glass-box medical reasoning at a commercial-grade performance level, which is publicly available at: https://www.deepmedsearch.cloud and the demonstration video is available at: https://youtu.be/4U4aok8yLpk.
    AIAgent-based and Multi-agent SystemsAIAI Ethics, Trust, Fairnes
  861. #DM113

    Intent Hub: A Self-Healing Semantic Agent Routing System for Resolving Overlap in Agentic Systems

    Chenrui Liang, Peng Xu, Xinyuan Liu
    Semantic overlap poses a fundamental challenge to accurate agent routing in large-scale agentic systems. We present Intent Hub, a self-healing semantic agent routing system that combines offline semantic diagnosis with an online Dual Filtering Mechanism. By leveraging LLM-generated augmentative positive and adversarial negative utterances, Intent Hub constructs explicit decision boundaries and enables interpretable, millisecond-level routing under high semantic overlap. Intent Hub further supports interactive semantic debugging, allowing developers to visually diagnose conflicts, repair routing rules, and immediately observe changes in online agent routing.
  862. #DM114

    A Scalable Cross-Domain Event Extraction System via a Unified Generative Training Framework

    Siting Liang, Omar Adjali, Bhatti Omair, Daniel Sonntag
    Event extraction is fundamental to information extraction. Prior approaches often separate event detection and argument extraction or depend on dataset-specific designs, limiting scalability and cross-domain generalization. We propose a unified generative, sequence-to-sequence framework that performs all event extraction subtasks jointly and supports both end-to-end and pipeline configurations. We fine-tune pre-trained language models on multiple event datasets across diverse domains, enabling a single model to retain domain-specific semantics while generalizing over large, evolving label spaces. Cross-domain experiments show strong, robust performance across datasets, demonstrating a scalable solution for real-world event extraction. We demonstrate these capabilities through a web-based application tailored for researchers and practitioners. The platform supports inspection of different configurations and facilitates cross-domain comparisons.
  863. #DM118

    RoboVineSim: A Simulation Tool for Human-Robot Collaboration in Vineyard Harvesting

    Dimitrios Troullinos, Maria Nuria Conejero, Filippo Bistaffa, Jose Maria Bengochea-Guevara, Ángela Ribeiro, Juan Rodriguez-Aguilar
    In agricultural tasks, manual grape harvesting remains a labor-intensive activity facing challenges of efficiency, labor shortages, and sustainability. To this end, tailor-made robotic systems have been designed with the capabilities to transfer heavy boxes, navigate vineyard terrains, communicate, accurately locate, and safely interact with humans. The introduction of collaborative robotic fleets alongside human workers in large-scale vineyard harvesting effectively presents a Multi-Robot Task Allocation (MRTA) problem, where the real-world domain possesses characteristics that, when combined, pose a challenging research endeavor and align with open issues in MRTA research. Here, we present RoboVineSim, a simulation tool that can capture any vineyard area using geographical data and model the behavior of humans and robots in the environment. In addition, we have established the necessary mechanisms to facilitate the development of novel MRTA methods in this domain.
  864. #DM121

    ADP-MA: An Interactive System for Autonomous Data Processing using Meta-Agents

    Udayan Khurana
    We demonstrate ADP-MA (Autonomous Data Processing using Meta-Agents), a system that autonomously solves a complex and diverse set of data processing tasks. Three domain-agnostic meta-agents coordinate task-specific ground agents through a multi-stage pipeline: data understanding, planning, critique, expansion, execution, and finalization. Errors are caught early via progressive sampling on small data subsets before running on full data. The system supports three execution strategies, twelve domain knowledge packs, and confidence-based early stopping. An interactive web interface lets users watch pipelines being built in real time, replay completed runs at any stage, and compare results across cases. On four benchmarks, ADP-MA reaches 90.6% on DSEval, 44.8% on KramaBench, 50.0% on DA-Code, and 70.0% on AgentBench, outperforming published single-agent baselines.
    AIAgent-based and Multi-agent SystemsAIHumans and AI
  865. #DM122

    Making Weak Supervision Interactive: Exploring Transfer from Sound Libraries to Passive Acoustic Monitoring Data

    Novruz Mammadli, Rida Saghir, Kanwar Ammar Ali, Prathmesh Doddanawar, Thiago S. Gouvêa, Daniel Sonntag
    Passive Acoustic Monitoring (PAM), an increasingly popular method for wildlife monitoring, generates large volumes of data whose analysis depends on instance-level annotations that are costly to obtain. Archival sound collections provide weak labels that lack temporal localisation. In prior work, we demonstrated that Multiple Instance Learning (MIL) can extract approximate event locations from weakly labelled PAM data, suggesting it may be applied to sound collection data.
    This demo operationalizes that approach within an interactive workflow that connects weakly annotated sound collections to downstream PAM deployment. The system supports configurable MIL-based localisation, lightweight interactive refinement, and transfer to an independent PAM dataset.
    We carried out a preliminary evaluation with an actual sound library from a museum collection and a benchmark PAM dataset. Results confirm that weakly annotated sound collections can serve as a viable training signal for downstream PAM detection and illustrate differences between alternative MIL instantiations under real transfer conditions. (Video available at https://cst.dfki.de/projects-weak-supervision-demo)
    AIMachine LearningAIMultidisciplinary Topics and ApplicationsAIHumans and AI
  866. #DM124

    AwakeForest: An Interactive Geospatial Platform for Large-Scale Forest Imagery

    Suraj Prasai, Kangning Cui, Rongkun Zhu, Sarra Alqahtani, Ying Zhang, Victor Paúl Pauca, Miles R. Silman, Fan Yang
    Forest imagery analysis often involves multiple tightly coupled vision tasks, which must be performed under substantial variation in geographic regions, sensors, and acquisition conditions. However, practitioners often lack a unified tool that is geospatial-native, cloud-optimized, and ML-integrated for end-to-end workflows spanning annotation, prediction, visualization, and downstream analysis at scale. We present AwakeForest, an interactive end-to-end platform designed for large-scale forest imagery that integrates model-assisted inference, automatic annotation, and human-in-the-loop refinement within a single workflow. Our platform supports plug-and-play integration of pretrained models and enables scalable interaction with forest imagery ranging from standard aerial scenes to large orthomosaics that can span several gigabytes to hundreds of gigabytes. AwakeForest produces analysis-ready outputs that can be directly used for downstream analysis and to support iterative model and annotation updates on new scenes. We demonstrate the system on the PALMS dataset and illustrate how AwakeForest supports an end-to-end workflow for practical forest management and analysis.
    AIComputer VisionAIMultidisciplinary Topics and ApplicationsAINatural Language ProcessingAIKnowledge Representation and Reasoning
  867. #DM125

    Low-Latency Real-Time Audio Game Commentary System via LLM-based Parallel Text Generation

    Ryota Kawamatsu, Anum Afzal, Yuki Saito, Shinnosuke Takamichi, Graham Neubig, Katsuhito Sudoh, Hiroya Takamura, Tatsuya Ishigaki
    We present a low-latency real-time audio game commentary system that generates spoken commentary directly from live gameplay video. In this end-to-end setting, a key bottleneck is accumulated waiting time; conventional pipelines capture frames, generate text, and synthesize speech sequentially for each utterance, and do not request the next generation until speech playback has completed. This strict sequentiality causes long and unnatural silence between utterances. To address this latency bottleneck, our system runs text generation in parallel with speech playback and buffers multiple candidate utterances ahead of time, enabling immediate synthesis at playback boundaries. Experiments on fast-paced game videos show that our parallel design reduces the mean inter-utterance silence from 9.6 seconds to 0.3 seconds compared to sequential baselines. It also improves similarity to professional speaking--silence timing patterns by over 40 %, and a user study with 120 experienced game players confirms significantly improved perceived speaking rhythm. Our demo video is available at: https://youtu.be/pmrRUlvav8M.
  868. #DM130

    DataCube: A Video Retrieval Platform via Natural Language Semantic Profiling

    Yiming Ju, Hanyu Zhao, Quanyue Ma, Donglin Hao, Chenwei Wu, Ming Li, Songjing Wang, Tengfei Pan
    Large-scale video repositories are increasingly available for modern video understanding and generation tasks. However, transforming raw videos into high-quality, task-specific datasets remains costly and inefficient. We present DataCube, an intelligent platform for automatic video processing, multi-dimensional profiling, and query-driven retrieval. DataCube constructs structured semantic representations of video clips and supports hybrid retrieval with neural re-ranking and deep semantic matching. Through an interactive web interface, users can efficiently construct customized video subsets from massive repositories for training, analysis, and evaluation, and build searchable systems over their own private video collections. The system is publicly accessible at https://datacube.baai.ac.cn/. Demo Video: https://youtu.be/L7bKfPBm2tU
    AIComputer VisionAISearchAIData MiningAINatural Language Processing
  869. #DM134

    Visualizing and Interacting with Model Representation Space for Human-Centric Active Learning

    Rida Saghir, Thiago S. Gouvêa, Daniel Sonntag
    Active learning reduces annotation effort by selecting informative samples, yet most approaches remain model-driven, offering users little control over training or support for understanding model behaviour. Human-centric active learning brings users further into the loop by introducing additional points of interaction, particularly in the sample selection process. However, such systems are typically demonstrated using fixed feature projections or visualizations of shallow classifier outputs. We present a representation-centric active learning tool in which interaction takes place directly within the model’s representation space. By operating in the same space the model uses for decision making, the interface supports the co-evolution of representations and user understanding. We additionally report initial qualitative (think-aloud) and quantitative findings from a pilot study, illustrating that such representation-centric frameworks can achieve comparable performance to standard baselines while fostering improved human–model collaboration. (Video and the code available at \url{https://cst.dfki.de/demo-interacting-model-space})
    AIHumans and AIAIMachine Learning
  870. #DM137

    PEACE 2.0: Grounded Explanations and Counter-Speech for Combating Hate Expressions

    Greta Damo, Stéphane Petiot, Elena Cabrio, Serena Villata
    The increasing volume of hate speech on online platforms poses significant societal challenges. While the Natural Language Processing community has developed effective methods to automatically detect the presence of hate speech, responses to it, called counter-speech, are still an open challenge. We present PEACE 2.0, a novel tool that, besides analysing and explaining why a message is considered hateful or not, also generates a response to it. More specifically, PEACE 2.0 has three main new functionalities: leveraging a Retrieval-Augmented Generation (RAG) pipeline i) to ground HS explanations into evidence and facts, ii) to automatically generate evidence-grounded counter-speech, and iii) exploring the characteristics of counter-speech replies.
    By integrating these capabilities, PEACE 2.0 enables in-depth analysis and response generation for both explicit and implicit hateful messages.
    AINatural Language ProcessingAIMultidisciplinary Topics and ApplicationsAIAI Ethics, Trust, Fairnes
  871. #DM138

    DeepLog: A Software Framework for Modular Neurosymbolic AI

    Robin Manhaeve, Stefano Colamonaco, Vincent Derkinderen, Rik Adriaensen, Lucas Van Praet, Luc De Raedt, Giuseppe Marra
    DeepLog is an operational neurosymbolic framework that unifies logic and deep learning within standard PyTorch workflows. While existing neurosymbolic systems focus on a particular paradigm and semantics, DeepLog serves as a universal backend that can emulate many systems in the neurosymbolic alphabet soup. By treating diverse neurosymbolic languages as high-level specifications, the DeepLog software automatically compiles them into optimized arithmetic circuits. This design lowers the barrier for machine learning practitioners by treating logic as composable modules, while providing neurosymbolic developers with a shared, high-performance basis for prototyping new integration strategies.
    The video is available here: https://youtu.be/CJAQJeaTWB0
  872. #DM143

    A Privacy-Preserving Intelligent Assistant for Clinical Psychology Practice

    Aaron Pico, Joaquin Taverner, Emilio Vivancos, Ana Garcia-Fornes, Vicent Botti
    This paper describes a fully local, privacy-preserving intelligent system designed to assist in clinical psychology practice. The system automatically transcribes therapy sessions performing speaker attribution. Beyond transcription, the tool enhances clinical reasoning by detecting cognitive distortions and emotional patterns utilizing specialized deep learning classifiers and Large Language Models (LLMs). By guiding an LLM locally through a multi-step analysis process, the assistant synthesizes the enriched data and generates a series of analysis reports and clinical documentation of the session. As a result, the assistant reduces the administrative burden on professionals while preserving privacy with an edge computing approach in which the data never leaves the therapist's device. Finally, the assistant uses human-in-the-loop validation so that the professional always remains in control, ensuring clinical accuracy and trust.
    AIHumans and AIAINatural Language ProcessingAIMultidisciplinary Topics and ApplicationsAIAI Ethics, Trust, Fairnes
  873. #DM149

    Looking at Your Photo, What Comes to Mind? Personalized Memory Internalization for Dementia Reminiscence

    Shunjie Wen, Kyung-Hwan Lee, GiMoon Lee, Seongsoo Heo, Jaeyeon Lee, Sangyeob Shin, Gyuwon Moon, Jiwoong Kim, Dong-Wan Choi
    We present PhoMi, an interactive recall assistant that supports dementia reminiscence by engaging users with personal photographs captured via a live camera interface. Given a photograph, PhoMi delivers spoken questions and receives spoken responses, creating an accessible reminiscence setting with reduced reliance of human therapists. Over repeated sessions, user responses are incorporated into lightweight adapters of a vision–language model, enabling progressively personalized question generation without reprocessing prior interaction logs. PhoMi serves as a prototype toward scalable, lifelong AI companions for dementia reminiscence.
  874. #DM159

    TraceBrain: An Open-Source Framework for Agentic Trace Management

    Quy Minh Le, Oscar Cao, Hoang Quoc Viet Pham, Hoang Thanh Lam, Hoang D. Nguyen
    As Large Language Model (LLM) agents scale toward real-world deployment, they generate large volumes of fragmented, non-standardized execution traces. Many existing observability platforms treat these traces primarily as passive logging artifacts, lacking the unified infrastructure to operationalize them for active governance and agent adaptation across heterogeneous single-agent and multi-agent workflows. To address this gap, we introduce TraceBrain, an open-source infrastructure for autonomous agent trace management. TraceBrain adopts a framework-agnostic architecture built on a delta-based OpenTelemetry (OTLP) schema, which mitigates context explosion and supports on-demand reconstruction of long-horizon execution trajectories. For runtime governance, TraceBrain implements uncertainty-driven supervision, where an internal Trace Evaluator prioritizes ambiguous trajectories for human review, thereby reducing manual annotation workload. Moving beyond passive observation, the platform incorporates a hybrid semantic-lexical retrieval engine that combines dense vector similarity and exact keyword matching for operational memory retrieval. Furthermore, an automated curriculum mechanism continuously synthesizes failure patterns into structured training artifacts. Empirical evaluations demonstrate a ~100x reduction in storage overhead together with high precision in uncertainty-guided trace supervision. Ultimately, TraceBrain transforms the execution history into a reusable operational memory substrate, bridging runtime observability with retrieval-driven agent adaptation. The system is publicly available at https://github.com/ToolBrain/TraceBrain.
    AIAgent-based and Multi-agent SystemsAIKnowledge Representation and ReasoningAISearchAIUncertainty in AI
  875. #DM165

    ORBIT: Optimal Recommendation Framework for Boarding with Interpretable Timelines

    Joonseong Kang, Jaehun Bang, Seung Ha Hwang, Jiyoung Ko, Subeen Park, Jeffrey Gennari
    Determining when to leave for the airport is a complex problem shaped by flight delay risk, traffic, weather, airport congestion, and passenger-specific constraints. Existing services rely on isolated delay estimates or simple travel time calculations, failing to capture real-time context. We propose ORBIT, a decision-support system that generates personalized leave-by recommendations by integrating predictive models with real-time operational signals. The system combines user input normalization, real-time data acquisition, Transformer-based delay prediction, and LLM-based reasoning to jointly account for statistical forecasts and dynamic factors like congestion, previous-leg propagation, weather, traffic, and airport processing times. Instead of a standalone delay estimate, ORBIT produces an actionable and interpretable departure plan. We implement ORBIT as an interactive system and demonstrate its applicability. A video demo is available at https://youtu.be/fZX42jceIM4.
  876. #DM170

    Interactive Open-Set Semantic Mapping with a 3D Scene Graph Backend

    Felix Igelbrink, Lennart Niecksch, Martin Günther, Marian Renz, Oscar Lima, Martin Atzmueller
    While Open-Set Semantic Mapping and 3D Semantic Scene Graphs (3DSSGs) have become established paradigms in robotic perception in recent years, most existing works are limited to small environments or sacrifice geometric detail and instance granularity for scalability. Deploying these systems at scale for large multi-room environments remains a major challenge due to the computational overhead of high-dimensional feature integration and the maintenance of the 3DSSG structure. In this paper, we demonstrate a modular mapping architecture that establishes 3D Semantic Scene Graphs (3DSSGs) as its foundational backend. Unlike approaches that generate scene graphs as a post-processing step, our system maintains the graph as the primary, incrementally updated knowledge representation. Our architecture is optimized for GPU-accelerated operations, enabling the dense representation of extensive environments containing thousands of unique object instances, supporting open-vocabulary queries via CLIP features without requiring any additional post-processing steps. In this live demonstration, we showcase our pipeline processing large-scale data from the Habitat Matterport 3D (HM3D) dataset as well as live data collected from a handheld device. Attendees will interact with the generated maps by performing real-time, open-set queries (e.g., “find the vintage wooden chair”) across complex, multi-story environments, highlights the system's capability to represent dynamic, human-aligned environmental understanding suitable for downstream robotic tasks.
  877. #DM171

    SparseDR: Differentiable Rendering of Sparse Signed Distance Fields

    Alexey Budak, Albert Garifullin, Vladimir Frolov
    We present SparseDR, a novel differentiable rendering algorithm designed for sparse representations based on Signed Distance Fields (SDF). We leverage the Sparse Brick Set representation and propose an adaptation of redistancing for sparse SDFs. This enables SparseDR to surpass existing works in accuracy of surface reconstruction by increasing the effective resolution of the SDF representation. SparseDR is efficiently implemented in C++ and Vulkan, achieving several times shorter reconstruction time than other methods.
    AIComputer VisionAIMachine Learning
  878. #DM173

    GRAIL: An Agentic AI Architecture for Interactive Grant Proposal Writing

    Zhisheng Tang, Mayank Kejriwal
    Securing research funding remains fragmented and time-consuming: researchers must navigate separate databases across dozens of agencies while simultaneously drafting competitive proposals. We present GRAIL, a web-based platform that unifies grant discovery and proposal writing through conversational AI. Users describe their research interests in natural language to explore opportunities from a unified index of 11.8K U.S. federal and nonprofit grant opportunities; within the document editor, integrated AI assistance supports real-time proposal revision and refinement. The system runs in any modern browser without installation. Conference attendees are invited to interact with the live system at the demo booth, explore the grant discovery and writing assistance workflows, and provide feedback on the user experience.
    AIAgent-based and Multi-agent SystemsAIData MiningAINatural Language ProcessingAISearch
  879. #DM183

    Optimizing Spectrogram Resolution and Training Strategies for Real-Time Killer Whale Call Type Classification

    Vladislav Naumov, Iaroslav Sheipak, Yuriy Ivanov, Ilya Makarov
    Automated identification of killer whale call types from continuous acoustic recordings is essential for scalable population monitoring, yet existing general-purpose frameworks such as ANIMAL-SPOT suffer from suboptimal spectrogram resolution and lack strategies tailored to the spectral-temporal characteristics of killer whale vocalizations.
    We identify two key limitations of the ANIMAL-SPOT framework: (1)~no possibility to choose different architecture of CNN backbone, and (2)~the absence of regularization and class-balancing techniques limits generalization across call types with varying abundance.
    To address these issues, we propose a framework that combines optimized STFT parameters (FFT size 1024, hop length 172) with label smoothing and targeted oversampling, evaluated across three CNN backbones and five segment lengths.
    On a dataset of 12 killer whale vocalization classes from Avacha Gulf, Russia, our best configuration (ResNet-18 with 1200\,ms segments) achieves \textbf{97.1\%} segment-level accuracy, compared to \textbf{96.2\%} for the ANIMAL-SPOT baseline --- a relative error reduction of 23.7\%.
    We further present an interactive demonstration system for uploading recordings and obtaining time-resolved call-type predictions, enabling rapid analysis of passive acoustic monitoring data.

    Demo video link: https://drive.google.com/file/d/1ITA_52WdnAcyg7zRjIIshutjl6Lp_um1/view?usp=drive_link
    Presentation slides: https://drive.google.com/file/d/1sd-JH1R5YPa9iRtsbRO-OZxY8-fyglsO/view?usp=drive_link
    AIHumans and AIAIMachine LearningAIMultidisciplinary Topics and Applications
  880. #DM191

    RareDASH: A Dynamic Multi-Agent System for Holistic Rare Disease Care

    Jialun Zhong, Jiayang Yu, Yanzeng Li, Meng Qin, Lei Zou, Yuqian Wang, Ying Zhang, Hanna Li, Liying Yan, Jie Qiao
    Rare diseases are characterized by low prevalence and intricate pathogenesis, leading to highly heterogeneous clinical trajectories. The care of rare disease presents formidable challenges due to the requirement for highly specialized expertise and experiences. Existing methods are typically tailored for isolated rare disease scenarios (e.g., diagnostic tasks, medication recommendations), which lacks a comprehensive perspective of the entire care process. Inspired by recent studies of agent skills, we propose RareDASH, a multi-agent system (MAS) featuring dynamic workflow orchestration designed to provide a comprehensive solution for the full life-cycle of rare disease care. Our framework is inherently patient-centric, enhancing rare disease discovery capabilities through proactive inquiry and information elicitation directly from the patients. Furthermore, we implement diverse agent memory to optimize both the accuracy and efficiency of the multi-agent collaboration. Finally, an online auditing module is integrated into the system to monitor and mitigate the hallucinations, ensuring the reliability of clinical outputs. The work sheds light on the feasibility of leveraging MAS in holistic rare disease care.
    AIAgent-based and Multi-agent SystemsAIPlanning and SchedulingAIHumans and AI
  881. #DM194

    LLM-to-Map: Transparent Conversational Tool Orchestration for Real-Time Multi-Domain Simulation

    Marouane Benbrahim, Kavya Gautam, Zonghan Zhang, Zhiqian Chen
    Coupled power-traffic simulations are valuable for studying EV charging stress and outage propagation, but existing tools typically require scripts and opaque configurations. We present LLM-to-Map, a demo system that exposes a real-time multi-domain simulator through structured LLM tool orchestration. Natural-language requests are mapped to typed tool calls that coordinate SUMO traffic simulation, PyPSA power-flow analysis, V2G actions, and map operations. The agent operates over a fixed tool schema and emits auditable execution logs in the UI so users can inspect every action and parameter. Tool execution is deterministic: each request resolves to explicit API calls, and unsafe or destructive actions can require confirmation. The tool layer provides JSON schemas and parameter validation, preventing arbitrary code execution and enabling reproducible runs. The agent receives live system state (loads, EV statistics, V2G status, time, temperature) to support context-aware multi-step commands; when an LLM is unavailable, a deterministic parser preserves the same tool interface. The backend synchronizes cross-domain events (EV charging demand and substation failures) and streams state updates to a 3D map interface. We describe the architecture, coupling and synchronization, and a demo workflow that showcases multi-step scenario control and reporting for non-experts without writing scripts.
  882. #DM195

    UrbanMix: LLM-Guided Simulation of Mixed Autonomy Traffic with Heterogeneous Behavioral Profiles

    Roman Sultimov, Daniil Efimov, Ivan Novikov, Aleksandr Volkov, Yury Maximov
    Cities deploying autonomous vehicles face an urgent policy question: would the adoption of autonomous vehicles (AVs) improve the congestion rate or worsen it? What would be the optimal adoption rate to minimize the congestion rate? How would cautious AVs (Waymo-style) and aggressive AVs (Tesla "Mad Max'"-style) interact with human drivers and delivery robots on shared roads? We present UrbanMix, an interactive simulation platform that embeds cognitively diverse agents (human drivers, cautious AVs, aggressive AVs, and delivery robots) with distinct behavioral profiles inside a Simulation of Urban MObility (SUMO) framework of real urban road networks. Our LLM planner operates as an urban policy coordinator, setting traffic rules through a bounded action interface, while a regulation shield enforces infrastructure constraints.

    Our experiments reveal three key phenomena: (i) roads throughput may substantially drop with the increase in AVs adoption; (ii) an aggressive cascade, where runtime behavior switching modeling Tesla user-selectable "Mad Max" mode triggers up to 60 times increase in emergency braking events on a real Austin Downtown network; and (iii) a delivery bottleneck, showing 24--36% throughput reduction from slow robots.
    Results validated on synthetic and real data from Austin, TX, demonstrate that real network topology amplifies cascade effects by more than four times.

    Demonstration video: https://youtu.be/ksgdX5iguFw. Live demo: https://evacuation-viz.vercel.app/urban.
  883. #DM196

    ElderMTL: Multi-Task Affect Monitoring for Elderly Care

    Maria Razzhivina, Shahane Tigranyan, Aram Avetisyan, Ilya Makarov, Andrey Savchenko
    We present ElderMTL, a multi-task affect monitoring system designed for elderly care settings. The system simultaneously estimates Facial Action Units (FAUs), Valence-Arousal (VA) signals, and categorical emotions (FER) from video, capturing multiple layers of affective information. To improve sensitivity to subtle affective cues common in older adults, our approach incorporates age-conditioned physiological modeling, including baseline muscle adjustments and a dynamic AU co-activation graph. This enables the system to adapt to age-related changes in facial expression patterns, providing more reliable and interpretable emotion assessments. In a live demonstration, we showcase ElderMTL processing video streams, visualizing AU activations, affective state predictions, and interpretable insights that highlight age-specific affective dynamics. This work demonstrates that physiologically grounded, multi-task affective monitoring can provide meaningful, real-world support for elderly care.
    AIComputer VisionAIHumans and AIAIMachine Learning
  884. #DM201

    Secure Coding Unleashed: Boosting Productivity With On-Premise LLM-Powered IDE Plugins

    Vasilii Krikunov, Nikolay Kotlyarov, Eugenii Nikolaev, Vasily Konovalov
    The integration of code assistance powered by Large Language Models (LLMs) into Integrated Development Environments (IDEs) has rapidly expanded, significantly influencing developer productivity. However, existing cloud-based solutions offered by third-party providers introduce critical privacy concerns due to storage and reuse of proprietary codebases for further model training. Addressing these concerns, we propose an enterprise-oriented approach to developing a customizable code-generation plugin for widely used IDEs, utilizing internally hosted LLMs. Through the proposed custom solution, this research explicitly quantifies the impact on developer productivity across various coding tasks.

    https://drive.google.com/file/d/10Y2_cOgUtZPViz5-oiXvG97fYxOUxKVH/view
  885. #DM205

    Interactive System for Reducing Error Propagation in Multi-Stage Ancient Egyptian Text Analysis

    Maksim Golyadkin, Innokentiy Humonen, Ilya Makarov
    We present a web-based system that brings an image-to-text pipeline for Ancient Egyptian hieroglyphs into a single interactive workspace. Instead of only producing a final transcription, our system exposes editable intermediate results so users can validate and correct the pipeline step by step. User edits are stored as image-aligned annotations, which supports both real-time text analysis and dataset creation. Quantitative results indicate improved efficiency and output quality relative to a manual baseline. A demonstration video is available at https://drive.google.com/file/d/1Wjy5vwbnX8kOqhb1ZHWWqi_qfXTUJHVr/view?usp=sharing.
  886. #DM209

    Double Bounded Neural Ray Queries

    Alexander Nikolaev, Nikolay Mozokhin, Roman Rodionov, Vladimir Frolov
    We introduce a novel neural ray tracing method designed for compact scene representations and real-time rendering. To compress the scene with minimal fidelity losses, we address the issue of limiting the search space for intersections. For each scene we first construct two lightweight proxy shells that tightly bound the original surface from inside and outside. While executing ray queries, we intersect the rays with the shells and extract the regions that potentially contain the intersection with the original surface. Extracted regions are passed to the small neural network to retrieve the exact intersection location. We implement our method as part of a GPU-accelerated hybrid path tracing pipeline. We demonstrate it running real-time rendering on a variety of scenes, achieving up to 300x memory reduction and surpassing existing compressed ray tracing techniques in memory-quality trade-off.
    AIMachine Learning
  887. #DM210

    CrossRefine: A Microservice for Cross-Domain Spatial Super-Resolution

    Daniil Sukhorukov, Andrei Zakharov, Ilya Makarov
    High-resolution spatial fields are critical for local decision-making, yet many operational and scientific workflows produce coarse outputs due to computational limits. We present CrossRefine, a deployable microservice for cross-domain spatial super-resolution that enhances multi-channel spatial tiles without modifying upstream models. It is built around a unified, topography-conditioned adversarial UNet trained across geographically diverse regions to ensure robustness to heterogeneous terrains and domain shifts. Unlike region-specific enhancement models, the system generalizes across domains within a single architecture, balancing numerical fidelity and structural realism through a hybrid regression–adversarial objective. The service provides REST API endpoints for batch and streaming inference, supports mixed-precision, and offers per-tile diagnostics and confidence maps to promote safe deployment.
    In our demo, we show interactive refinement of coarse spatial inputs, side-by-side comparison with interpolation and non-adversarial baselines, and real-time profiling of latency and throughput on commodity hardware. CrossRefine illustrates how spatial super-resolution can be delivered as a practical AI microservice, enabling scalable refinement of existing computational workflows without requiring higher-resolution upstream simulations.

    Demonstration video: https://shorturl.at/lz2un
    AIComputer VisionAIMultidisciplinary Topics and Applications
  888. #DM211

    LELA: An End-to-end LLM-based Entity Linking Framework with Zero-shot Domain Adaptation

    Samy Haffoudhi, Nikola Dobricic, Fabian M. Suchanek, Nils Holzenberger
    Entity linking is a key component of many downstream NLP systems, yet existing approaches are often tied to the specific target knowledge bases and domains, limiting their real world application. In this paper, we extend LELA, a modular and domain-agnostic LLM-based entity disambiguation method, into a practical Python library that integrates zero-shot Named Entity Recognition (NER) -- thereby providing a complete end-to-end pipeline for entity-linking in real-world usage. We provide experimental results validating LELA's performance and robustness across diverse entity linking settings. In our demo, users can play with the system on their own input texts. All code is publicly available at https://github.com/NDobricic/LELA, and a video is at https://www.youtube.com/watch?v=WdupiRjLbR4.
  889. #DM216

    Sparse ProtoPatient: Interactive Multi-Prototype Explanations for Clinical Diagnosis Prediction

    Conor Fallon, Bogdan Kostić, Betty Van Aken, Jens-Michalis Papaioannou, Alexei Figueroa, Keno Bressem, Alexander Löser
    We present the Sparse ProtoPatient Demo, a publicly available interactive system for interpretable ICD-10 diagnosis prediction from clinical admission notes.
    The system is designed for clinicians in training, researchers, and educators exploring prototype-based diagnostic reasoning.

    The demo links predictions to learned prototypical patient representations and token-level evidence, allowing users to input custom text or select preset cases, inspect predicted ICD-10 codes, visualize label-wise saliency, retrieve supporting prototype notes, and compare alternative prototype cohorts.
    The demo provides a reproducible platform for interactive inspection of prototype-based clinical reasoning, enabling complementary opinion exploration, model auditing, and teaching of interpretable diagnosis prediction.
    The deployed model is trained on the publicly released CodiEsp corpus (1000 clinical notes, 955 ICD-10 labels) using a sparse multi-prototype architecture with five prototypes per label.
    We use the official machine-translated English CodiEsp-MT release to support English-language interaction.
    It achieves a macro-AUROC of 0.92 on a held-out test set and supports real-time interaction (300ms per query).
    The system is fully containerized for public research and educational use.
  890. #DM218

    DeepL Voice: Real-Time Speech-to-Speech Translation

    Johannes Ernesti, Peter Kaiser, Jonas Heinze, Elnaz Shafaei-Bajestan, Kristina Geißler, Weiyue Wang, Johannes Beck, Sascha Brinker, Thorben Finke
    DeepL Voice is a real-time speech-to-speech translation system for global business communication, following a pragmatic incremental approach: developing a production-grade cascaded speech-to-speech-translation (S2ST) system, while exploring end-to-end solutions in parallel.
    The production system (launched November 2024) achieves competitive transcription quality through proprietary real-time ASR models and eliminates translation "flickering" via stable text streaming while maintaining low latency.
    Supporting 18 input languages and 30+ target languages, it offers DeepL Voice for Meetings (Microsoft Teams/Zoom integration) and DeepL Voice for Conversations (mobile apps).
    Key features include customizable formality and glossary support for business-appropriate communication, with voice cloning TTS under development.
    Demo Video: https://youtu.be/DMMcti2f4rc
  891. #DM220

    SteelAgent: An LLM-Orchestrated System for Physics-Informed Steel Property Prediction and Generalization Auditing

    Aleksandr Volkov, Roman Sultimov, Mikhail Kuzin, Yury Maximov
    Machine learning models for steel property prediction routinely report high quality metrics with $R^2\!>\!0.85$, yet these results rely on random splits that allow similar grades in both train and test sets. We present SteelAgent, an interactive system that exposes a critical generalization gap: the same models drop from $R^2\!>\!0.85$ all the way to $R^2\!=\!0.11$ on unseen steel families revealing more than 7~ times higher quality degradation. Similarly, conformal prediction coverage degrades from 91% to 38% under distribution shift induced by holding out substantial data sources. SteelAgent combines physics-informed features grounded in classical metallurgy and interpretable models with conformal uncertainty quantification, and an LLM orchestrator that coordinates six domain-specific tools. The system supports property prediction with specification compliance checking, competitive steel comparison, and cost-aware inverse alloy design over 3,741 heat treatment records spanning 1,234 grades. All predictions are traceable through explicit tool calls, ensuring that all physical quantities are computed, not generated. We made the code and data open-source and freely accessible to the community.

    Demo video: https://www.youtube.com/watch?v=BwVBJ-SwuQo. Live demo: https://steelagent.vercel.app
  892. #DM223

    Lucyde: A Demonstrator for Explainable Artificial Intelligence and Interactive Machine Learning

    Eda Ismail-Tsaous, Ute Schmid
    As AI‑based decision‑support systems become increasingly widespread, methods aimed at improving the performance of human-AI teams are gaining attention. In recent years, explainable artificial intelligence (XAI) has received growing interest, as it provides methods to make the behavior of machine learning models more transparent and can help to identify errors and flaws, which is particularly important in safety critical domains such as medicine and law. However, explanations themselves can be misleading, inconsistent, or incorrect, which makes it essential to raise awareness of the possibilities and limitations of these methods.
    We introduce Lucyde, a web‑based demonstrator designed to help users explore, compare, and better understand XAI methods across different datasets, models, and configurations. Lucyde provides a curated collection of explanation techniques, enables side‑by‑side comparison of methods, and offers easy‑to‑understand supplementary information for different user groups. It also illustrates interactive machine learning workflows by allowing users to correct model outputs or explanations. Lucyde thereby fosters informed and reflective engagement with AI systems and their explanations.

    The video can be found here: https://cloud.smartcitybamberg.de/s/pRHXdr6qxxng3Me
  893. #DM230

    From Surface Learning to Deep Understanding: A Grounded AI Tutoring System for Moodle

    Anna Ostrowska, Michał Kukla, Gabriela Majstrak, Jan Opala, Sebastian Pergała, Jan Skwarek, Anna Wróblewska
    This demo paper describes the development of the AI Teaching \& Learning Assistant, a modular Moodle plugin that leverages Retrieval-Augmented Generation (RAG) to deliver high-quality, hallucination-free education. The system employs a dual-centric design, providing students with interactive, Socratic-based tutoring and educators with a "Human-in-the-loop" workspace for supervised content generation. By grounding Large Language Model (LLM) responses in teacher-provided materials, the assistant addresses the risks of misinformation while encouraging deep conceptual mastery. Evaluation via the Ragas (LLM-as-a-Judge) framework and a preliminary user study confirms its effectiveness, achieving faithfulness scores up to 0.97 and a 4.00/5.00 recommendation rate.
    Demo Video: https://tinyurl.com/4zz4bcjn
  894. #DM231

    [COMP25] The Automated Negotiating Agents Competition (ANAC) 2026 Challenges and Results

    Yasser Mohammad, Reyhan Aydoğan, Catholijn Jonker, Katsuhide Fujita, Tim Baarslag, Tamara Florijn
    This paper presents the primary research challenges and key findings from the 15th International Automated Negotiating Agents Competition (ANAC 2025), one of the official competitions of IJCAI 2025. We focus on two critical domains: multi-deal negotiations and the development of agents capable of concurrent negotiation within complex supply chain management environments. Furthermore, this work analyzes the results of the competition and outlines strategic directions for future iterations.
  895. #DM234

    LLARS: Enabling Domain Expert & Developer Collaboration for LLM Prompting, Generation and Evaluation

    Philipp Steigerwald, Mara Stieler, Jennifer Burghardt, Eric Rudolph, Jens Albrecht
    We demonstrate LLARS, an open-source platform that bridges the gap between domain experts and developers for building LLM-based systems.
    LLARS integrates three tightly connected modules into an end-to-end pipeline:
    Collaborative Prompt Engineering for real-time co-authoring with version control and instant LLM testing,
    Batch Generation for configurable output production across user-selected prompts x models x data with cost control and
    Hybrid Evaluation where human and LLM evaluators jointly assess outputs through diverse assessment methods, with live agreement metrics and provenance analysis to identify the best model-prompt combination for a given use case.
    New prompts and models are automatically available for batch generation and completed batches can be turned into evaluation scenarios with a single click.
    Interviews with six domain experts and three developers in online counselling confirmed that LLARS feels intuitive, saves considerable time by keeping everything in one place and makes interdisciplinary collaboration seamless.
    Source code: github.com/th-nuernberg/llars
    AIHumans and AIAINatural Language ProcessingAIMultidisciplinary Topics and Applications
  896. #DM237

    PERELMAN: Pipeline for scientific literature meta-analysis

    Daniil Sherki, Daniil Merkulov, Aleksandra Savina, Dzhantemir Kikov, Dmitry Parpulov, Alexander Ivanov, Artem Abakumov, Ekaterina Muravleva
    We present PERELMAN (PipEline foR sciEntific Literature Meta-ANalysis), an agentic framework designed to extract specific information from a large corpus of scientific articles to support large-scale literature reviews and meta-analyses. Our central goal is to reliably transform heterogeneous article content into a unified, machine-readable representation. PERELMAN first elicits domain knowledge-including target variables, inclusion criteria, units, and normalization rulesсthrough a structured dialogue with a subject-matter expert. This domain knowledge is then reused across multiple stages of the pipeline and guides coordinated agents in extracting evidence from narrative text, tables, and figures, enabling consistent
    aggregation across studies. In order to assess reproducibility and validate our implementation, we evaluate the system on the task of reproducing the meta-analysis of layered Li-ion cathode properties NMC811 reported in [Savina and Abakumov, 2023]. We describe our solution, which has the potential to reduce the time required to prepare meta-analyses from months to minutes.
  897. #DM239

    An Automated Maintenance Plant for Highways

    N'zebo Richard Anvo, Alwyn Mathew, Lavindra de Silva, Damian Palin, Jie Xu, Samuel Schaefer, Abir Al-Tabbaa, Fumiya Iida, Ioannis Brilakis
    The Digital Roads project at Cambridge University is leveraging digitalisation, automation, and low-carbon materials to build an Automated Maintenance Plant (AMP) for UK road networks, aimed at minimising repair times to reduce congestion, improving safety, and contributing to the UK’s net-zero goals through faster, more accurate, and efficient road maintenance.
  898. #DM249

    GEV: Statically Correct and Programmable Knowledge Graph Updates

    Eduard Kamburjan, Shqiponja Ahmetaj, Chinmayi Prabhu Prasad Baramashetru, Paolo Pareti
    Knowledge Graphs (KGs) evolve over time and it is critical to ensure that their integrity constraints are maintained after each update.
    We introduce GEV, the first tool to statically ensure that a KG update in Java preserves satisfaction of SHACL constraints. This allows verification of updates at design time, and eliminates the need for costly continuous revalidation.
    GEV is a command-line system that loads and verifies updates, applies them to a loaded KG, and keeps track of the validation status. Internally, it relies on SHACL graph updates, a theoretical framework with a method for static verification.
    AIKnowledge Representation and Reasoning
  899. #HC13

    Beyond “Made with AI”: Visualizing Provenance Density to Mitigate the Transparency Penalty

    Qing Zhang, Yifei Huang, Juyoung Lee, Thad Starner, Jun Rekimoto
    As generative AI makes polished prose cheap to produce, users can no longer rely on fluency as a proxy for truth. We call this failure mode the Fluency Trap: users trust fluent hallucinations while also discounting accurate content once it is disclosed as AI-generated. Binary "Made with AI" labels respond with authorship disclosure, but they do not show what supports a claim. We propose Provenance Density, an evidence-visualization interface that shows the density of verified claims in a text. In a user study with 81 participants, an idealized Provenance Density interface produced a large discernment gap between truth and fabrication (+4.15 points, d=1.82), whereas participants given no signal showed no detectable discrimination. A technical audit with 200 samples shows that retrieval density alone is insufficient; unexpectedly, the Consistency Veto carries most of the discriminative signal on dynamic queries. As AI-generated content becomes indistinguishable from human writing, effective transparency must move from authorship disclosure toward evidence visualization.
    Human-Centred AIHumans and AIHuman-Centred AIAI Ethics, Trust, FairnesHuman-Centred AINatural Language Processing
  900. #HC36

    Self-Refine Learning in LLM Multi-Agent Systems for Legal Norm Cognition and Compliance

    Rongxin Cheng, Jianhui Yang, Bohan Xiong, Ning Zheng, Yiran Hu, Qingjing Chen, Yan Liu, Huanghai Liu, Yun Liu, Weixing Shen
    As large language models (LLMs) increasingly serve as autonomous agents in social simulations, ensuring their ability to understand and comply with legal norms is essential. Yet, current LLM agents frequently exhibit reward hacking (RH) behaviors by optimizing metrics at the expense of norm adherence, undermining simulation fidelity and limiting deployment. We introduce a TBC-TBA self-refine learning multi-agent framework that enables dynamic normative adaptation through iterative multi-agent feedback. This framework integrates Think-Before-Chat (social feedback processing) and Think-Before-Act (norm-guided decision making) phases, allowing agents to progressively refine their normative understanding via structured interaction cycles. Across five mainstream LLMs and 100 legal scenarios, we found that while LLMs partially recognize legal norms, they systematically exhibit RH behaviors with illegal action rates (IAR) of 14.29–37.11%. Comparison with human cognition reveals alignment in moral reasoning but sharp divergence in risk perception and probability distortion. To address these deficits, we introduce four methods to improve LLM's normative compliance. Dynamic Norm Learning Mechanism (DNLM) serves as the core, using a psychologically grounded identify–infer–implement process that reduces IAR by 15.78% on average and delivers the most significant improvement. We also introduce Deep MaxPain (DMP) for consequence based deterrence, Norm Analysis Chain-of-Thought (NA-CoT) for structured reasoning, and Few-shot Norm Learning (FNL) for case based acquisition, all of them enhance compliance. Our findings show that LLM agents can better follow legal norms when equipped with structured self-refine learning and psychologically informed mechanisms. This work improves social alignment in multi-agent systems and opens avenues for future research on scalable, norm-compliant autonomous agents.The code and data are publicly available on GitHub.
    Human-Centred AIAgent-based and Multi-agent SystemsHuman-Centred AIMultidisciplinary Topics and ApplicationsHuman-Centred AIAI Ethics, Trust, FairnesHuman-Centred AINatural Language Processing
  901. #HC39

    Uncovering Discriminatory Behavior in LLM-Based CV Screening Systems with Composite Statistical Metamorphic Relations

    Tetyana Turiy, Simon Speth, Alexander Pretschner
    Artificial intelligence (AI) applications are prone to reproducing existing patterns of marginalization, bias, and discrimination. The adoption of automated decision-making tools, particularly those based on large language models (LLMs), necessitates the detection and analysis of their discriminatory behavior to ensure compliance with anti-discrimination regulations. This work advances metamorphic testing for bias detection, aiming to (1) uncover discriminatory behavior in AI tools and (2) standardize the metamorphic test creation approach, rooting it in the legal and academic research on discrimination. To fill in these gaps, we introduce a metamorphic test creation workflow and evaluate it on the LLM-based CV screening use case. Our method yields a metamorphic relation (MR) catalogue of 8 atomic and 6 composite MRs, legally justified by Article 21 (1) of the Charter of Fundamental Rights of the European Union and sociologically motivated by the intersectionality theory. The resulting metamorphic test suite uncovers discriminatory failures for all 6 evaluated LLMs, totaling 1,272 defects across 7,800 test cases. Both the new atomic MRs and the composite MRs detect additional discriminatory defects for all LLMs. Composite MRs consistently uncovered more defects than their atomic counterparts, with failure rates increasing by on average 9% for Llama-3.1 8B and 25% for Llama-3.2 3B.
    Human-Centred AIAI Ethics, Trust, FairnesHuman-Centred AIMachine LearningHuman-Centred AIMultidisciplinary Topics and ApplicationsHuman-Centred AIHumans and AI
  902. #HC50

    Psychological Benefits and Costs of Diversifying Algorithmic Recourse

    Tomu Tominaga, Naomi Yamashita, Takeshi Kurashima
    Algorithmic recourse provides counterfactual action plans that help people overturn unfavorable AI decisions. While diverse recourse sets may improve transparency and motivation, they may also impose cognitive load and negative emotions by increasing counterfactual reasoning demands. To examine this trade-off, we conducted a between-subjects controlled experiment (N=750) that manipulated recourse-set diversity while controlling the number of options, and evaluated its effects on psychological benefits and costs. Results show that diversification enhances psychological benefits (e.g., willingness to act) for small sets without incurring additional psychological costs, whereas for large sets, it makes cognitive load more salient. These findings suggest that naively diversifying recourse can burden decision subjects, underscoring the need for new diversification methods that incorporate human cognition and psychology to mitigate such costs.
    Human-Centred AIHumans and AIHuman-Centred AIAI Ethics, Trust, Fairnes
  903. #HC77

    Test-Time User Alignment via Bayesian Population Guidance in Subjective Tasks

    Hoe Sung Ryu, June Christoph Kang, Christian Wallraven
    In subjective tasks, different individuals can have different correct answers for the same input—the ground truth is not fixed but rather determined by each user's personal perspective. Standard foundation models suppress this individual variation by producing population-averaged predictions; conversely, few-shot in-context learning (ICL) struggles to capture user-specific preferences from limited demonstrations. To address this limitation, we propose Population-Guided Bayesian Calibration (PGBC), a test-time personalization method that applies Bayesian inference to foundation model outputs for user-specific alignment. Unlike ICL methods that rely on the model to implicitly infer preferences from examples, PGBC explicitly quantifies each individual's deviation from the population distribution and incorporates this into prediction with minimal feedback (K < 10). Experiments on three subjective benchmark datasets across four foundation models demonstrate that PGBC significantly outperforms zero-shot baselines with a large effect size (Cohen's d = 1.21), and the improvement over ICL approaches is even more pronounced (d = 1.40). Furthermore, user-level distribution analysis reveals that PGBC aligns predictions with individual preferences when zero-shot baselines exhibit substantial bias, while ICL methods increase distributional mismatch. Our results show that PGBC enables practical deployment in cold-start scenarios where minimal user feedback must yield maximal personalization.
    Human-Centred AIUncertainty in AIHuman-Centred AIMachine Learning
  904. #HC78

    Responsibility in Multi-Agent Sequential Decision-Making: Comparing Human Judgments to Formal Models of Causal Attribution

    Nripsuta Saxena, Stelios Triantafyllou, Goran Radanović
    With the growing adoption of artificial intelligence in high-stakes decision-making domains, identifying the causes of outcomes--particularly failures--and determining who is responsible has become a critical concern. In this work, we investigate how well formal definitions of responsibility attribution, grounded in the framework of actual causality, align with human judgments of responsibility. To this end, we conduct a large-scale survey to elicit human judgments of responsibility in multi-agent sequential decision-making scenarios, using a modified version of the card game Goofspiel. We evaluate different responsibility attribution methods, assessing their alignment with human judgments about responsibility, and identifying factors that significantly shape responsibility judgments. While no single responsibility attribution method consistently aligns with human responses, our findings highlight key factors that influence human responsibility judgments, including agent-specific biases and the amount of information available to agents during decision-making.
  905. #HC79

    Studying the Effects of Robot Distraction on School Shooter Behavior Using Virtual Reality

    Christopher A. McClurg, Alan R. Wagner
    We examine a particular adversarial human--robot interaction for which mobile robots influence the behavior of people role-playing as a school shooter. Using a high-fidelity virtual reality environment, we conducted a controlled study with 150 participants. Two autonomous robots predicted participant movement and positioned themselves to interfere and distract. The robots' approach strategy was manipulated---either moving directly into the participant's path (aggressive) or maintaining distance (passive)---along with the level of distraction, ranging from no additional cues (low), to siren and lights (medium), to siren, lights, and smoke to impair visibility (high). An aggressive, high-distraction robot configuration reduced the number of victims by 46.6% relative to a no-robot control. These results show that robot-based distraction can meaningfully alter human behavior and outcomes in adversarial settings, while also raising important ethical questions about the use of such systems in school environments.
    Human-Centred AIHumans and AIHuman-Centred AIRoboticsHuman-Centred AIMachine Learning
  906. #HC84

    DiverValue-Bench: A Benchmark and Fine-Tuning Framework for Aligning Large Language Models with Diverse Human Values

    Yao Liang, Dongcheng Zhao, Feifei Zhao, Guobin Shen, Yuwei Wang, Dongqi Liang, Yi Zeng
    The alignment of large language models (LLMs) with human values is critical for their safe and effective deployment across diverse user populations. However, existing benchmarks often neglect cultural and demographic diversity, leading to limited understanding of how value alignment generalizes globally. In this work, we introduce DiverValue-Bench, a benchmark that systematically evaluates LLMs’ alignment with multi-dimensional human value preferences across 74 countries/regions. DiverValue-Bench contains 23,763 high-quality instances annotated with fine-grained value labels, personalized questions, and rich demographic metadata, providing broad demographic and geographic coverage for population-aware value-alignment evaluation. Using DiverValue-Bench, we conduct an in-depth analysis of several representative LLMs, revealing substantial disparities in alignment performance across geographic and demographic lines. We further demonstrate that lightweight fine-tuning methods, such as Low-Rank Adaptation (LoRA) and Direct Preference Optimization (DPO), can significantly enhance value alignment in both in-domain and out-of-domain settings. Our findings underscore the necessity for population-aware alignment evaluation and provide actionable insights for building culturally adaptive and value-sensitive LLMs. DiverValue-Bench serves as a practical foundation for future research on global alignment, personalized value modeling, and equitable AI development.
  907. #HC136

    Human-Centric Behavior-Aware Adaptive Off-Policy Selection

    Ge Gao, Aishwarya Mandyam, Joy He-Yueya, Min Chi, Emma Brunskill
    In many human-centric environments, such as education and healthcare, the unobservability of human underlying states has been recognized as a key obstacle for understanding individual needs, thus hindering out ability to provide personalized decision-making policies. Several reinforcement learning (RL)-related approaches have been used to facilitate sequential decision-making in these settings, including off-policy selection (OPS), which aids in safely evaluating and selecting optimal policies offline. However, existing OPS algorithms are unsuitable when both the state is unobserved and the setting requires a personalized policy. To address this challenge, we propose a behavior-aware adaptive policy selection framework (HBO) that first captures potentially unique characteristics of the state from human behaviors, and then estimates when and how to intervene with less uncertainty in a timely manner, with bounded error. HBO is evaluated over two real-world human-centric applications, intelligent tutoring and sepsis treatments, where it significantly enhanced participants' long-term course outcomes and survival rates. Broadly, our work enables improved policy personalization in high-stakes domains where extensive evaluation is not possible.
  908. #HC146

    Probing Cultural Awareness in LLMs: A Case Study of Cross-Culture Aesthetic Stylistics

    Jessie Wang, YU Fenggang, Jian Wang, Chak Tou Leong, Xiaoyu Shen, Chunpu Xu, Jiawen Duan, Wenjie Li, Johan F. Hoorn
    Large Language Models (LLMs) are increasingly deployed in diverse cultural contexts, yet their ability to master aesthetic stylistics, i.e., the strategic use of language to evoke cultural resonance, remains underexplored. We curate C4Styli, a benchmark of highly stylized translated movie titles and advertising slogans from Hong Kong and the Chinese Mainland, to evaluate LLMs via the lens of behavioral recognition and productive competence. Extensive evaluations show that LLMs differ from humans in stylistic recognition, and this recognition ability varies across text domains. In addition, stylistic recognition and generation performance in LLMs are not consistently aligned. To further examine whether LLMs genuinely capture stylistic information in stylistic recognition, we conduct structural ablation with logistic regression probes. We find that, in the Hong Kong setting, stylistic recognition in LLMs relies primarily on surface-level linguistic information rather than stylistic structure. This suggests limited sensitivity to Hong Kong-specific stylistic structure. Our
    code and data are available at https://github.com/wangjs9/C4STYLI.
    Human-Centred AIHumans and AI
  909. #SV1

    A Survey on 3D Skeleton Based Person Re-Identification: Taxonomy, Advances, Challenges, and Interdisciplinary Prospects

    Haocong Rao, Chunyan Miao
    Person re-identification via 3D skeletons is an important emerging research area that attracts increasing attention within the pattern recognition community. With distinctive advantages across various application scenarios, numerous 3D skeleton based person re-identification (SRID) methods with diverse skeleton modeling and learning paradigms have been proposed in recent years. In this paper, we provide a comprehensive review and analysis of recent SRID advances. First of all, we define the SRID task and provide an overview of its origin and major advancements. Secondly, we formulate a systematic taxonomy that organizes existing methods into three categories centered on hand-crafted, sequence-based, and graph-based modeling. Then, we elaborate on the representative models along these three types with an illustration of foundational mechanisms. Meanwhile, we provide an overview of mainstream supervised, self-supervised, and unsupervised SRID learning paradigms and corresponding common methods. A thorough evaluation of state-of-the-art SRID methods is further conducted over various types of benchmarks and protocols to compare their effectiveness, efficiency, and key properties. Finally, we present the key challenges and prospects to advance future research, and highlight interdisciplinary applications of SRID with a case study. A curated collection of valuable resources is available at https://github.com/Kali-Hac/3D-SRID-Survey.
    Computer VisionRecognition (object detection, categorization)Computer VisionOtherMultidisciplinary Topics and ApplicationsSecurity and privacyMultidisciplinary Topics and ApplicationsOther
  910. #SV5

    A Review on Test-Time Scaling for Agentic Large Language Models

    Jiayu An, Zheng Chen, Yongcheng Jing, Dacheng Tao, Bo Li
    A Review on Test-Time Reasoning for Agentic Large Language Models
    Agent-based and Multi-agent SystemsEngineering methods, platforms, languages and tools
  911. #SV17

    Graph Rewiring in GNNs to Mitigate Over-Squashing and Over-Smoothing: A Survey

    Hugo Attali, Nathalie Pernelle, Davide Buscaldi, Fragkiskos Malliaros
    Graph Neural Networks are powerful models for learning from graph-structured data, yet their effectiveness is often limited by two critical challenges: over-squashing, where information from distant nodes is excessively compressed, and over-smoothing, where repeated propagation makes node representations indistinguishable. Both phenomena stem from the interaction between message passing and the input topology, ultimately degrading information flow and limiting the performance of GNNs. In this survey, we examine graph rewiring techniques, a class of methods designed to modify the graph topology to enhance information propagation in GNNs. We provide a comprehensive review of state-of-the-art rewiring approaches, delving into their theoretical underpinnings, practical implementations, and performance trade-offs.
    Data MiningMining graphsData MiningNetworks
  912. #SV20

    When Vision Meets Graphs: A Survey on Graph Reasoning and Learning

    Xinjian Zhao, Wei Pang, Zhixuan Yu, Xiangru Jian, Xiaozhuang Song, Yaoyao Xu, Zhongkai Xue, Dingshuo Chen, Shu Wu, Philip Torr, Tianshu Yu
    Graphs are a fundamental data structure underlying many problems in the natural and social sciences. Over the past decade, Graph Neural Networks (GNNs) have dominated graph machine learning, supported by solid theoretical foundations. Yet scientists often understand graph structure through vision: chemists read molecular diagrams and social scientists inspect network visualizations. Despite decades of work on graph visualization, most graph learning pipelines still treat graphs purely as symbolic structures, rarely leveraging the visual form of graphs. We argue that this gap deserves renewed attention in the era of powerful vision and vision language models.
    This survey provides a first systematic overview of the emerging area we term vision meets graphs, which treats visual depictions of graphs as first-class inputs for reasoning and learning. We organize existing work into three threads. Vision for Graph Reasoning studies how models can use visual depictions of graphs to understand structure and carry out multi-step reasoning. Vision for Graph Learning explores how visual features can complement or augment graph encoders beyond known limitations of message passing. Scientific Graphs examines domains where standardized depiction conventions support both reasoning and learning. Our goal is to clarify what current methods can and cannot do, and to outline a path toward foundation models that perceive and reason about graphs as scientists do.
    Data MiningData visualizationComputer VisionVision, language and reasoningComputer VisionMultimodal learning
  913. #SV22

    A Survey of Artificial Intelligence in Endoscopic Surgery Workflow: From Perception to Surgical Support

    Juyan Ba, Hao Chen, Xiaohan Xing, Yi Wang
    Endoscopic surgery demands continuous real-time visual decision-making under severe constraints, including a limited field of view, motion blur, and dynamically deforming anatomy. These factors impose substantial cognitive load on surgeons and motivate the integration of artificial intelligence (AI) throughout the endoscopic surgical workflow. This survey reviews recent progress in AI for endoscopic surgery and organizes the literature into four stages that span perception to action: (1) image enhancement and analysis methods that improve visual perception; (2) multimodal video understanding approaches that model and reason surgical instruments and anatomical structures over space and time; (3) 3D reconstruction techniques that enable robust tracking and interpretation of deformable anatomy; and (4) emerging paradigms of embodied surgical intelligence, where action-conditioned world models link perception to intraoperative assistance.
    Across these stages, we summarize current capabilities and limitations and identify key open challenges for clinical deployment. In addition, we provide an overview of 18 publicly available datasets, highlighting their scope and annotations. We hope this survey will stimulate further research toward reliable and clinically deployable AI systems for endoscopic surgery.
    SVComputer VisionSVMultidisciplinary Topics and Applications
  914. #SV28

    Towards Automated Kernel Generation in the Era of LLMs: A Survey

    Yang Yu, Peiyu Zang, Chi Hsu Tsai, Haiming Wu, Yixin Shen, Jialing Zhang, Haoyu Wang, Zhiyou Xiao, Jingze Shi, Yuyu Luo, Wentao Zhang, chunlei men, Guang Liu, Yonghua Lin
    The performance of modern AI systems is fundamentally constrained by the quality of their underlying kernels, which translate high-level algorithmic semantics into low-level hardware operations. Achieving near-optimal kernels requires expert-level understanding of hardware architectures and programming models, making kernel engineering a critical but notoriously time-consuming and non-scalable process. Recent advances in large language models (LLMs) and LLM-based agents have opened new possibilities for automating kernel generation and optimization. LLMs are well-suited to compress expert-level kernel knowledge that is difficult to formalize, while agentic systems further enable scalable optimization by casting kernel development as an iterative, feedback-driven loop. Rapid progress has been made in this area. However, the field remains fragmented, lacking a systematic perspective for LLM-driven kernel generation. This survey addresses this gap by providing a structured overview of existing approaches, spanning LLM-based approaches and agentic optimization workflows, and systematically compiling the datasets and benchmarks that underpin learning and evaluation in this domain. Moreover, key open challenges and future research directions are further outlined, aiming to establish a comprehensive reference for the next generation of automated kernel optimization. To keep track of this field, we maintain an open-source GitHub repository at https://github.com/flagos-ai/awesome-LLM-driven-kernel-generation.
    Natural Language ProcessingApplicationsMultidisciplinary Topics and ApplicationsAI hardwareMultidisciplinary Topics and ApplicationsSoftware engineering
  915. #SV33

    Towards Vision-Spatiotemporal Fusion in Traffic Forecasting: A Survey on Cross-Modal Alignment

    Anna Wang, Chao Zhang, Mingwei Lin, Junbo Zhang, Zeshui Xu, Wentao Li, Pengfei Zhang, Oscar Castillo
    Traffic forecasting is evolving, with world models emerging as a powerful framework applicable to tasks such as core state, trajectory, event, and demand forecasting. These tasks involve both visual and spatiotemporal data, yet most existing methods treat them separately, hindering a unified understanding of traffic scenes in both semantic meanings and spatiotemporal dynamics. The fusion of the two modalities is critical for building models that comprehend complex traffic scenarios. However, the fusion issue faces two fundamental misalignments: semantic, where pixels conflict with traffic concepts, and geometric, which requires spatial intelligence to map 2D inputs into 3D. This survey reframes vision-spatiotemporal fusion via the unique lens of cross-modal alignment, addressing semantic and geometric failures that limit forecasting reliability. First, we categorize existing methods into three paradigms: feature-level, semantic-level, and task-level. This reveals their progression from low-level feature manipulation to high-level architectural integration. Second, we synthesize representative techniques per paradigm, highlighting geometric challenges such as cross-view association and spatial mapping. Third, we examine current datasets and benchmarks, highlighting their deficiencies in evaluating alignment. Finally, we outline future directions, including spatiotemporal intelligence for robust perception and holistic traffic world models. The unified framework establishes a reference for robust and explainable forecasting systems.
    Data MiningMining spatial and/or temporal data
  916. #SV36

    From Multimodal Perception to Strategic Reasoning: A Survey on AI-Generated Game Commentary

    Qirui Zheng, Xingbo Wang, Keyuan Cheng, Yunlong Lu, Muhammad Asif Ali, Lingfeng Li, Yongyi Wang, Wenxin Li
    The advent of artificial intelligence has propelled AI-Generated Game Commentary (AI-GGC) into a rapidly expanding research area, offering advantages such as scalable availability and personalized narration. However, existing studies remain fragmented, and a systematic survey that unifies prior efforts is still lacking. To bridge this gap, our survey introduces a unified framework that systematically organizes the AI-GGC landscape. We present a novel taxonomy focused on three core commentator capabilities: Live Observation, Strategic Analysis, and Historical Recall, and further categorize commentary into three corresponding types: Descriptive Commentary, Analytical Commentary, and Background Commentary. Building on this structure, we provide an in-depth review of methods, datasets, and evaluation metrics, analyzing their strengths and limitations. Finally, we highlight key challenges and point out promising directions for future research in AI-GGC.
    Natural Language ProcessingLanguage generationNatural Language ProcessingResources and evaluationMachine LearningMulti-modal learningMultidisciplinary Topics and ApplicationsEntertainment
  917. #SV40

    From Human Videos to Robot Manipulation: A Survey on Action-Relevant Representation Transfer for Scalable Vision-Language-Action Learning

    zhiyuan feng, Qixiu Li, Huizhi Liang, Rushuai Yang, Yichao Shen, Zhiying Du, Zhaowei Zhang, yu deng, Li Zhao, Hao Zhao, Zongqing Lu, Oier Mees, Marc Pollefeys, Jiaolong Yang, Baining Guo
    Recent progress in generalizable embodied control has been driven by large-scale pretraining of Vision–Language–Action (VLA) models. However, most existing approaches rely on large collections of robot demonstrations, which are costly to obtain and tightly coupled to specific embodiments. Human videos, by contrast, are abundant and capture rich interactions, providing diverse semantic and physical cues for real-world manipulation. Yet, embodiment differences and the frequent absence of task-aligned annotations make their direct use in VLA models challenging. This survey provides a unified view of how human videos are transformed into effective knowledge for VLA models. We categorize existing approaches into four classes based on the action-related information they derive: (i) latent action representations that encode inter-frame changes; (ii) predictive world models that forecast future frames; (iii) explicit 2D supervision that extracts image-plane cues; and (iv) explicit 3D reconstruction that recovers geometry or motion. Beyond this taxonomy, we highlight three key open challenges in this area: structuring unstructured videos into training-ready episodes, grounding video-derived supervision into robot-executable actions under embodiment and viewpoint heterogeneity, and designing evaluation protocols that better predict real-world deployment performance and transfer efficiency, thereby informing future research directions.
    RoboticsLearning in roboticsRoboticsRobotics and visionRoboticsManipulation
  918. #SV41

    Concept Bottleneck Models for Explainable Decision Making: A Survey of Progress, Taxonomy, and Future Directions

    Chunjiang Wang, Fan Li, Wenbo Hu, Rui Yan, Kun Zhang, Shaohua Kevin Zhou
    Deep neural networks deliver strong performance but remain opaque, limiting their use in high-stakes domains that require transparency and human oversight. Concept Bottleneck Models (CBMs) address this gap by introducing a human-interpretable concept layer that mediates inputs and decisions, enabling semantic explanations and test-time intervention. This survey provides a unified review of CBMs organized along four dimensions: concept acquisition, concept-based decision making, concept intervention, and concept evaluation. We summarize the evolution of concept construction from manual annotation to lexicon-based mining, LLM/VLM-guided generation, and visually grounded discovery via prototypes and diffusion models; review emerging CBM architectures beyond strict bottlenecks; and consolidate evaluation and intervention protocols emphasizing faithfulness, sparsity, and intervenability, with particular relevance to high-stakes domains such as healthcare. We synthesize fragmented literature and outline key challenges and future directions for concept-based interpretable decision making.
    AI Ethics, Trust, FairnesExplainability and interpretability
  919. #SV74

    Constraining Generative Models: A Survey from the Constraint Programming Perspective

    Alexandre Bonlarron, François Pachet, Pierre Roy, Jean-Charles Régin
    Generative models produce long and high probability sequences, yet they often fail to satisfy explicit constraints set by users. Over the past two decades, Constraint Programming (CP) has provided a complementary paradigm: combining generative models with a constraint solver to guarantee feasibility. This survey reviews the main concepts behind these CP-driven hybrid approaches, from enforcing ubiquitous structural rules (e.g., length and patterns) to preventing plagiarism. It synthesizes how learned models can be treated as constraints, compiled structures, or probabilistic factors. We highlight what has remained stable across applications, then discuss how these principles transfer to the Large Language Model era and outline open challenges for controllable and trustworthy generative systems.
    Constraint Satisfaction and OptimizationConstraint programming
  920. #SV77

    A Survey of Personalized Federated Foundation Models for Privacy-Preserving Recommendation

    Zhiwei Li, Guodong Long, Chunxu Zhang, Honglei Zhang, Chengqi Zhang, Jing Jiang
    Integrating Foundation Models (FMs) into recommendation systems is an emerging and promising research direction. However, centralized paradigms face growing pressure from privacy concerns and strict regulatory requirements. Federated learning offers a viable solution that enables collaborative model refinement while keeping raw user data on local devices or organizational silos. Yet, applying FMs in this setting creates a fundamental tension, where the system must balance the leverage of global knowledge with the necessity of capturing user personality. This survey provides a comprehensive overview of Personalized Federated Foundation Models for privacy-preserving recommendation, and review recent progress in this emerging field. We first analyze personalization techniques that function effectively under federated settings. Furthermore, we discuss the adaptation of foundation models to such federated architectures to balance generalization with user-specific needs for achieving privacy-preserving recommendation. In contrast to existing reviews, our work specifically emphasizes the architectural intersection of federation, personalization, and foundation models.
    Data MiningRecommender systemsMachine LearningFederated learningMachine LearningFoundation modelsData MiningPrivacy-preserving data mining
  921. #SV80

    Tackling Multimodal Learning Challenges with Mixture-of-Expert: A Survey

    Liangwei Zheng, Wei Emma Zhang, Olaf Maennel, Lin Yue, Weitong Chen
    Mixture-of-Experts (MoE) presents a naturally compatible and scalable framework for multimodal learning, demonstrating strong adaptability across diverse modalities and tasks. Despite its growing success, a comprehensive and systematic evaluation of multimodal MoE remains lacking. Existing surveys tend to address either multimodal learning or MoE independently, overlooking the unique interplay between them. This survey fills that gap by addressing a central question: \textit{How does MoE effectively resolve multimodal challenges?} We approach this from three key perspectives: (1) \textbf{MoE as an Efficient Multimodal Framework:} enabling scalable multimodal modeling by decoupling computational cost from parameter growth and mitigating modality redundancy through selective expert activation; (2) \textbf{MoE as a Multimodal Representation Learner:} integrating complementary multi-opinion expert knowledge to enrich alignment and interaction representations; and (3) \textbf{MoE as a Multimodal Adapter:} providing a modular and flexible mechanism to model imperfect modality data such as modality imbalance and missing modality. Through an extensive literature review, we identify critical research gaps, including interpretable routing, expert communication, modality integration, and lifelong multimodal learning. We position this survey as a foundation for future research toward interpretable, adaptive, and sustainable multimodal Mixture-of-Experts systems.
    Machine LearningMulti-modal learningData MiningInformation retrievalData MiningMining heterogenous dataAI Ethics, Trust, FairnesTrustworthy AI
  922. #SV82

    AI-Enhanced Vein Biometrics: A Comprehensive Survey

    Yifan Wang, Jie Gui, Changsheng Chen, Alex Kot
    Vein biometrics has emerged as a promising biometric modality for personal identity authentication, benefiting from its intrinsic properties such as high discriminative capability, resistance to forgery, and contactless acquisition. Recent advances in artificial intelligence, particularly deep learning, have significantly accelerated its development. This paper presents a comprehensive and systematic survey of AI-enhanced vein biometrics. We review fundamental principles, publicly available datasets, and evaluation protocols, and systematically analyze existing methods across the entire vein biometric pipeline, including acquisition, preprocessing, feature extraction, recognition and verification, security and privacy protection, and multimodal fusion. Furthermore, we summarize representative application scenarios, identify key challenges, and highlight promising directions for future research. To facilitate reproducible research and long-term development of the field, we release an open, evolving research resource Awesome-Vein-Biometrics that systematically summarizes and tracks recent advances in vein biometrics.
    Computer VisionBiometrics, face, gesture and pose recognition
  923. #SV87

    A Comprehensive Survey of Deep Learning for Multivariate Time Series Forecasting: A Channel Strategy Perspective

    Xiangfei Qiu, Hanyin Cheng, Xingjian Wu, Junkai Lu, Jilin Hu, Chenjuan Guo, Christian S. Jensen, Bin Yang
    Multivariate Time Series Forecasting (MTSF) plays a crucial role across diverse fields, ranging from economic, energy, to traffic. In recent years, deep learning has demonstrated outstanding performance in MTSF tasks. In MTSF, modeling the correlations among different channels is critical, as leveraging information from other related channels can significantly improve the prediction accuracy of a specific channel. This study systematically reviews the channel modeling strategies for time series and proposes a taxonomy organized into three hierarchical levels: the strategy perspective, the mechanism perspective, and the characteristic perspective. On this basis, we provide a structured analysis of these methods and conduct an in-depth examination of the advantages and limitations of different channel strategies. Finally, we summarize and discuss some future research directions to provide useful research guidance. Moreover, we maintain an up-to-date Github repository which includes all the papers discussed in the survey.
    Data MiningMining spatial and/or temporal data
  924. #SV92

    Large Language Models for Blockchain Security and Analytics: A Survey

    Cuneyt Akcora, Collette Eguakun Okundia, Arijit khan
    Large Language Models are transforming blockchain security and analytics, yet systematic evaluation of their capabilities remains limited. This survey delivers a comprehensive, AI‑centric assessment of LLM‑based methods across more than sixty recent studies spanning nine application domains, including smart contract auditing, transaction fraud detection, cryptocurrency portfolio management, and DeFi security analysis. We introduce a unified taxonomy that standardizes task formulations, datasets, tools, algorithms, and evaluation practices, enabling consistent comparison across approaches. For each domain, we review deployed LLM architectures; learning and inference paradigms such as pre‑training, prompt engineering, fine‑tuning, retrieval‑augmented generation, and agentic strategies; and input representations tailored to blockchain data. We further analyze the strengths, limitations, and emerging patterns observed in current systems. Finally, the survey provides practical guidance for selecting LLM techniques and outlines promising research directions, e.g., explainable smart contract verification, automated DeFi protocol analysis, adversarial robustness evaluation, and scalable on‑chain anomaly detection.
    Data MiningApplications
  925. #SV96

    Dynamic Heterogeneous Graph Representation Learning: A Survey

    Huan Liu, Pengfei Jiao, Jie Yin, Hongjiang Chen, Zhidong Zhao
    Graph representation learning (GRL) serves as a canonical paradigm for modeling complex networks. However, real-world AI systems inherently manifest as evolving heterogeneous entities with complex interactions, posing significant challenges to static or homogeneous modeling. To address these complexities, representation learning for Dynamic Heterogeneous Graphs (DHGs) has emerged as a vital approach for learning low-dimensional representations that simultaneously preserve structural semantics and temporal dynamics. This survey presents the first systematic review of DHG representation learning methods. We first introduce a unified formal definition that encompasses both discrete-time and continuous-time DHGs from the perspective of temporal granularity. Building upon this formulation, we propose a novel algorithm-centric taxonomy that categorizes existing literature, including early embedding-based approaches, graph neural network (GNN)-based models, and relatively recent Transformer-based DHG methods, while explicitly highlighting their intrinsic modeling biases with respect to dynamic granularity. Furthermore, we summarize representative applications of DHG representation learning, along with commonly used datasets and benchmarks. Finally, we discuss promising research directions that guide future advances in this rapidly evolving field.
    Data MiningMining graphsMachine LearningRepresentation learningMachine LearningSelf-supervised LearningMachine LearningSequence and graph learning
  926. #SV108

    A Survey of Joint Online-Offline Fine-tuning for Large Language Models

    Taihang Zhen, Guang Yang, Chenzhang Li, Nuo Yan, Shilong Zhou, Guangyu Liu, Xiaotong Tang, Jing Huo, Boyan Wang, Junlan Feng, Yuyao Zhang
    Post-training for Large Language Models (LLMs) can be mainly categorized into offline Supervised Fine-Tuning (SFT) for knowledge acquisition and online Reinforcement Fine-Tuning (RFT) for adaptive refinement. Current state-of-the-art approaches typically employ a sequential cold-start pipeline (SFT-Then-RFT). However, we argue that this disjoint transition imposes an “alignment tax", leading to catastrophic forgetting and reward hacking during the unregularized exploration phase. In this work, we advocate for Joint Online-Offline Fine-Tuning as a superior paradigm that breaks the convention of restricting offline data to SFT and online data to RFT. By integrating full offline response generation with online rollouts—particularly within the realm of Reinforcement Learning with Verifiable Rewards (RLVR)—this approach mitigates the limitations of isolated training phases. We provide the first comprehensive survey focusing specifically on the synchronization of data provenance. We introduce a novel taxonomy for related works, analyze their theoretical advantages in balancing stability with plasticity, and outline a roadmap for next-generation post-training frameworks.
    Natural Language ProcessingLanguage modelsMachine LearningSupervised LearningMachine LearningReinforcement learning
  927. #SV112

    Multimodal Emotion Recognition with Large Language Models

    Hongrui Zhang, Daiqing Wu, Yangyang Li, Kuien Liu, Yuhui Wang, Yu Zhou, Sicheng Zhao
    Multimodal Emotion Recognition (MER) focuses on identifying and interpreting emotions from modality-compound inputs. Closely mirroring human cognitive processes in real-world environments, MER has drawn substantial attention from both academia and industry. Recently, a paradigm shift has been unveiled in MER, from leveraging small-scale, task-specific models to Large Language Models (LLMs). We refer to the latter as the MER-with-LLMs paradigm, which offers unprecedented generality, spurring numerous empirical attempts, even alongside speculation about their potential to achieve general emotional intelligence. However, with these new opportunities come new challenges, including the scarcity of emotionally annotated data, the affective gap both within and across modalities, and the opacity of affective interpretation. To systematically review existing research and guide future exploration, this paper categorizes prior works according to their focus on addressing these challenges into three directions: Affective Data Augmentation, Multimodal Affective Representation, and Multimodal Affective Reasoning. By thoroughly tracing the development, emerging trends, and remaining issues within each direction, this paper aims to provide a clear academic map of the MER-with-LLMs paradigm and foster its structured advancement.
    Computer VisionInterpretability and transparencyComputer VisionMultimodal learningComputer VisionVideo analysis and understandingNatural Language ProcessingLanguage generationNatural Language ProcessingLanguage models
  928. #SV134

    From Time Series Analysis to Question Answering: A Survey in the LLM Era

    Wei Li, Zhe Xie, Yuxuan Liang, Xinli Hao, Yunyao Cheng, Dan Pei, Xiaofeng Meng
    Recently, Large Language Models (LLMs) have introduced a novel paradigm in Time Series Analysis (TSA), leveraging strong language capabilities to support tasks such as forecasting and anomaly detection. However, these analysis tasks cannot adequately cover temporal language tasks, such as interpretation and captioning. A fundamental gap remains between TSA and LLMs: LLMs are pre-trained to optimize natural language relevance for question answering rather than objectives specialized for TSA. To bridge this gap, TSA is evolving toward Time Series Question Answering (TSQA), shifting from expert-driven and task-specific analysis to user-driven and task-unified question answering. TSQA depends on flexible exploration rather than predefined TSA pipelines. In this survey, we first propose a taxonomy that reflects the evolution from TSA to TSQA, driven by a shift from external to internal alignment. We then organize existing literature into three alignment paradigms: Injective Alignment, Bridging Alignment, and Internal Alignment, and provide practical guidance for flexible, economical, and generalizable selection of alignment paradigms. We finally analyze datasets across domains and characteristics, identify challenges, and highlight future research directions.
    Data MiningMining spatial and/or temporal dataNatural Language ProcessingQuestion answering
  929. #SV137

    Modeling Liquid Democracy: A Survey of the (Computational) Social Choice Literature

    Davide Grossi, Andreas Nitsche, Georgios Papasotiropoulos
    Liquid democracy encompasses a family of decision-making processes, where votes can be cast directly or passed along proxy chains. We provide a community-maintainable and systematic survey of (computational) social choice papers on liquid democracy, organized through a searchable taxonomy of core modeling features that have appeared in the literature. Drawing on the insights from our survey, we also outline a number of research directions, which we consider of special importance for both the theory and practice of liquid democracy.
    Game Theory and Economic ParadigmsComputational social choice
  930. #SV140

    A Survey on the Verification of Reinforcement Learning Policies

    Luca Marzari, Ezio Bartocci, Enrico Marchesini
    Reinforcement learning (RL) is increasingly applied in complex, safety-critical domains, yet the lack of rigorous behavioral guarantees for neural network-based policies remains a major barrier to deployment. Recent advances in policy expressiveness and scale have intensified this challenge, leading to a rapidly growing but conceptually fragmented body of work on RL policy verification. This survey provides a unifying perspective on RL verification methods. We introduce a taxonomy that clarifies relationships among existing approaches along three axes: verification paradigm (formal versus probabilistic), temporal scope (step-wise versus multi-step), and guarantees strength. Beyond taxonomy, we unify underlying theoretical foundations, make implicit assumptions and limitations explicit, and identify emerging directions.
    Agent-based and Multi-agent SystemsFormal verification, validation and synthesisMachine LearningReinforcement learning
  931. #SV141

    Approximation Algorithms for the Shapley Value: Taxonomy and Properties

    Patrick Kolpaczki, Eyke Hüllermeier
    Attributing importance to the individual components of a larger unit has become a popular method for understanding models and data in AI and machine learning. Starting with feature explanation, this method is now also used in data valuation or federated learning, just to name a few. Despite their differences, all of these applications use the same mathematical attribution mechanism: the Shapley value, which is rooted in cooperative game theory. While the Shapley value is appealing and has strong axiomatic foundations, it is computationally intractable due to the combinatorial explosion of player subsets. Therefore, there is a need for approximation algorithms, which have been studied intensively in recent years. This survey provides an overview of general-purpose approximation methods applicable to any domain. We categorize these methods into algorithmic classes, compare their properties, and highlight connections between approaches in a comprehensive taxonomy.
    Game Theory and Economic ParadigmsCooperative gamesMachine LearningGame Theory
  932. #SV144

    Spatial Pattern Matching: A Survey

    Nicole Schneider, Kent O'Sullivan, Hanan Samet
    Recent developments in Artificial Intelligence (AI) have led to flexible ways for users to search through vast information.
    However, users may have questions that are grounded in the real world which require spatial inference, for which language models are not well suited.
    Conversely, traditional spatial search methods, like spatial pattern matching, can answer spatial reasoning questions correctly but are noise-intolerant, slow, and brittle.
    Given the current state, there are opportunities to integrate AI and spatial pattern matching to enable robust and flexible spatial search.
    To bridge this gap, we survey existing spatial pattern matching methods, including the few that apply AI to the problem, discussing their efficiency and limitations, and describing opportunities to further enable spatial search via AI.
    Knowledge Representation and ReasoningQualitative, geometric, spatial, and temporal reasoningData MiningMining spatial and/or temporal dataData MiningKnowledge graphs and knowledge base completion
  933. #SV146

    A Survey on Value Alignment in Agentic AI Systems

    Wei Zeng, Hengshu Zhu, Chuan Qin, Han Wu, Yihang Cheng, Yinuo Shen, Zhe Wang, Yuyang Wang, Sirui Zhang, Xiaowei Jin, Zhenxing Wang, Feimin Zhong, Hui Xiong
    With the evolution of artificial intelligence (AI) paradigms towards agentic AI, the widespread integration of large language models (LLMs) enhances system capabilities while also introducing situational risks and challenges of value misalignment, making value alignment in agentic AI systems a critical issue. This paper constructs a multi-level value framework encompassing L0 (universal values), L1 (cultural and industry values), and L2 (context-specific values). Guided by this framework, we conduct an in-depth analysis along the technical stack: at the LLM level, we examine value injection mechanisms through pretraining and post-training; at the single-agent level, we focus on representation and injecting values to agents, Profiles and memory, and planning and action; at the multi-agent level, we summarize collaborative alignment methods such as communication strategy optimization and multi-objective reinforcement learning. Following a systematic review of existing datasets and methods for multi-level alignment evaluation, we outline future research directions, including inter-agent value coordination mechanisms, high-quality scenario data sharing, game-theoretic design for value alignment in agent interaction and communication protocol alignment—aiming to establish a more systematic and dynamic evaluation framework and to promote robust and trustworthy value consensus in agentic AI systems within social collaboration.
    Agent-based and Multi-agent SystemsOtherAI Ethics, Trust, FairnesValues
  934. #SV153

    Adaptive Reward Design in Reinforcement Learning: A Taxonomy and Survey

    Raphaela Erbel, Carlo D'Eramo, Philipp Brune
    Adaptive Reward Design (ARD) is becoming a fundamental component for Reinforcement Learning (RL) agents, as they are deployed in increasingly complex settings where a single static reward across all phases of learning is rarely sufficient. Yet, ARD is rarely studied as a coherent topic: Relevant ideas are dispersed across reward shaping, curriculum learning, intrinsic motivation, non-stationary objectives, and preference- or feedback-based learning, which obscures conceptual connections and complicates method selection. This survey provides a unified view of ARD in RL by introducing a taxonomy, organized by the primary driver of the reward variation. The taxonomy distinguishes external-feedback-driven reward updates from reward adaptations driven by endogenous within-run signals and those conditioned on exogenous context signals. Using explicit assignment rules, we place work published between 2010 and 2025 within this taxonomy. By synthesizing typical RL settings and domains at the driver level, we simplify the method selection in ARD. Further, we describe the evolution and current trends of ARD and conclude by outlining promising future research directions.
    Machine LearningReinforcement learning
  935. #SV155

    Beyond Scaling: A Survey on Data-Efficient Agentic Learning

    Yaqing Wang, Zhenlin Luo, Peiyao Zhao, Yunfeng Cai, Quanming Yao
    LLM-based agents are increasingly deployed across web and GUI automation, embodied decision making, and scientific workflows, yet their progress is often constrained by limited data and interaction. High-quality supervision is costly, and real-environment interactions are expensive, risky, and quickly invalidated by environment drift. This survey studies how to obtain and improve LLM-based agents with fewer samples, fewer labels, and fewer/ cheaper interactions. We view agentic learning as a closed-loop decision process where experience arises from both external supervision and online interactions, and data efficiency requires maximizing information yield per unit cost. We then introduce a unified agentic learning framework and organize the literature along three complementary dimensions: experience augmentation, agent structural design, and learning paradigms. This perspective connects design choices to where learning signals come from, how they are utilized, and how adaptation is performed under bounded budgets. We summarize representative benchmarks and synthesize key open challenges, aiming to clarify the emerging landscape and support future progress in data-efficient agentic learning.
    Agent-based and Multi-agent SystemsApplicationsMachine LearningFew-shot learningMachine LearningLearnware/model reuse/transfer learning
  936. #SV158

    LLM-based Intelligent Tutoring Systems: A Survey

    Li Kong, Jianwen Sun, Junsheng Zhou, Vincent Ng
    Large Language Models (LLMs) are reshaping the design and capabilities of intelligent tutoring systems (ITS) by providing powerful generative, reasoning and interaction abilities, which surpass traditional rule-based approaches. This survey presents a structured overview of LLM-based ITS and analyzes how these models transform classical system components and architectures. We first review the foundational concepts of traditional ITS and introduce the functional roles of the main components, followed by LLM-based techniques and related datasets for realizing each of these components. Furthermore, we examine the key application domains and concludes the survey by outlining future research directions.
    Natural Language ProcessingApplications
  937. #SV160

    An XAI View on Explainable ASP: Methods, Systems, and Perspectives

    Thomas Eiter, Tobias Geibinger, Zeynep G. Saribatur
    Answer Set Programming (ASP) is a popular declarative reasoning and problem solving approach in symbolic AI. Its rule-based formalism makes it inherently attractive for explainable and interpretive reasoning, which is gaining increasing importance with the surge of Explainable AI (XAI). A number of explanation approaches and tools for ASP have been developed, which often tackle specific explanatory settings and may not cover all scenarios that ASP users might encounter. In this survey, we provide, guided by an XAI perspective, an overview of types of ASP explanations in connection with user questions for explanation, and describe how their coverage by current theory and tools in ASP. Furthermore, we pinpoint gaps in existing ASP explanations approaches and identify research directions for future work.
    AI Ethics, Trust, FairnesExplainability and interpretabilityKnowledge Representation and ReasoningLogic programmingKnowledge Representation and ReasoningNon-monotonic reasoning
  938. #SV161

    Accelerating Masked Diffusion Large Language Models: A Survey of Efficient Inference Techniques

    Daehoon Gwak, Minhyung Lee, Junwoo Park, Jaegul Choo
    Diffusion large language models (dLLMs) offer a theoretical advantage in parallel generation over standard autoregressive models. However, parallel generation alone does not guarantee practical speedups. Realizing this efficiency requires specialized inference mechanisms, such as diffusion-aware caching and reuse. Consequently, as inference efficiency becomes a prerequisite for practical deployment, recent research has actively explored acceleration techniques across algorithms, architectures, and systems. However, rigorous comparisons remain difficult, as end-to-end latency stems from intricate trade-offs between algorithmic, architectural, and system-level factors that are often conflated in existing benchmarks. In this survey, we introduce a unified latency decomposition framework for dLLMs to disentangle these factors and analyze their impact on inference speed in real deployments. Guided by this framework, we categorize acceleration techniques along three axes covering algorithmic innovations, architectural and system optimizations, and inference-time scaling. Finally, we provide guidelines for reproducible benchmarking and highlight open challenges for realizing the full potential of parallel generation.
    Natural Language ProcessingLanguage modelsNatural Language ProcessingResources and evaluation
  939. #SV181

    Test-Time Adaptation for Graph Learning: A Systematic Survey

    Jiayi Chen, Xin Zheng, Bo Li, Zeyu Wang, Yanqing Guo, Feng Xia
    Graph distribution shifts between training and test graphs pose severe challenges to the generalization of graph neural networks (GNNs). In real-world deployment, application environments are continuously evolving, while retraining or redesigning GNNs is often costly and impractical. In light of this, test-time adaptation on graphs, which aims to dynamically adapt well-trained GNNs or adjust test graphs to improve inference performance, has attracted growing attention as a practical solution. In this survey, we provide a comprehensive review of test-time adaptation on graphs, an emerging yet underexplored research direction. We identify two fundamental challenges: (1) Data-level: complex graph distribution shifts; and (2) Model-level: limited test-time learning information. Upon this, we present a systematic taxonomy of existing methods into (a) model-centric, (b) data-centric, and (c) hybrid methods, followed by a summary of representative applications, benchmarks, and open opportunities. We aim to bridge the gap between laboratory GNN development and real-world deployment via test-time adaptation.
    Data MiningMining graphsData MiningNetworks
  940. #SV185

    Learning PDE Solvers with Physics and Data: A Unifying View of Physics-Informed Neural Networks and Neural Operators

    Yilong Dai, Shengyu Chen, Ziyi Wang, Xiaowei Jia, Yiqun Xie, Vipin Kumar, Runlong Yu
    Partial differential equations (PDEs) are central to scientific modeling. Nowadays, modern workflows increasingly rely on learning-based components to support model reuse, inference, and integration across large computational processes. Despite the emergence of various physics-aware data-driven approaches, the field still lacks a unified perspective to uncover their relationships, limitations, and appropriate roles in scientific workflows. To this end, we propose a unifying perspective to place two dominant paradigms: Physics-Informed Neural Networks (PINNs) and Neural Operators (NOs), within a shared design space. We organize existing methods from three fundamental dimensions: what is learned, how physical structures are integrated into the learning process, and how the computational load is amortized across problem instances. In this way, many practical challenges can be best understood as consequences of these structural properties of learning PDEs. By analyzing recent advances through this unifying view, our survey aims to facilitate the development of reliable learning-based PDE solvers, as well as catalyzing a synthesis of physics and data.
    Machine LearningKnowledge-aided learning
  941. #SV195

    Harnessing Multiple Large Language Models: A Survey on LLM Ensemble

    Zhijun Chen, Xiaodong Lu, Jingzheng Li, Pengpeng Chen, Zhuoran Li, Kai Sun, Yuankai Luo, Qianren Mao, Ming Li, Likang Xiao, Dingqi Yang, Yikun Ban, Hailong Sun
    LLM Ensemble---which involves the comprehensive use of multiple large language models (LLMs), each aimed at handling user queries during downstream inference, to benefit from their individual strengths---has gained substantial attention recently. The widespread availability of LLMs, coupled with their varying strengths and out-of-the-box usability, has profoundly advanced the field of LLM Ensemble. This paper presents the first systematic review of recent developments in LLM Ensemble. First, we introduce our taxonomy of LLM Ensemble and discuss several related research problems. Then, we provide a more in-depth classification of the methods under the broad categories of ``ensemble-before-inference, ensemble-during-inference, ensemble-after-inference'', and review relevant methods. Finally, we introduce related benchmarks and applications, summarize existing studies, and suggest several future research directions.
    Natural Language ProcessingApplicationsMachine LearningEnsemble methods
  942. #SV208

    Machine Learning Methods for Studying Latent Neural Activity Dynamics

    Shufeng Kong, Fumei Deng, Xinyi Dong, Caihua Liu, Weiwei Chen, Yingheng Wang, Daniel Cao, Azahara Oliva, Antonio Fernandez-Ruiz, Carla Gomes
    Recent developments in brain recording are driving a demand for machine learning tools capable of decoding the latent structure of large populations of neurons. In this paper, we provide a comprehensive survey that outlines the trajectory of Latent Variable Models (LVMs) from early state-space models to more recent deep generative models. We organize the literature into three closely related domains: (1) Single-Region Latent Dynamics, which includes models such as linear dynamical systems to more complex dynamics represented by Recurrent Neural Networks (RNNs) and Neural Ordinary Differential Equations (ODEs); (2) Multi-Region Communication, which employs probabilistic as well as subspace methods to study how information is transfered across different brain areas considering synaptic propagation delays and network connectivity; and (3) Behavior-Aligned Modeling, which seeks to disentangle neural activity related to task performance from other internal states via supervised or contrastive learning. Finally, we conclude and discuss benchmarks, evaluation criteria, and open challenges, such as the ability to identify causal links or directionality of communication, to facilitate future research for bridging interpretable brain dynamics with reliable neural decoding.
    Multidisciplinary Topics and ApplicationsLife sciencesMultidisciplinary Topics and ApplicationsComputational sustainabilityMultidisciplinary Topics and ApplicationsBioinformatics
  943. #SV220

    A Survey on Actionable Interpretability in Large Language Models

    Jie Cai, Mafizur Rahman, James Enouen, Lijun Qian, Yan Liu
    Large Language Models (LLMs) have become central to modern AI, with interpretability serving as a critical means of investigating the opaque and highly nonlinear mechanisms encoded within billions of parameters and ensuring trustworthy deployment. However, descriptive interpretability approaches for LLMs remain largely post-hoc, illuminating model behavior without providing the actionable leverage needed to influence or adapt model behavior, thereby limiting their practical utility. Recent work has therefore reframed interpretability as an actionable paradigm, shifting the focus from explanation alone toward methods that connect internal mechanisms to model refinement. This survey reviews LLM interpretability through the lens of actionability, presenting a taxonomy of attributional, concept-based, and mechanistic approaches, along with emerging methods tailored to vision–language models (VLMs). We further examine how interpretability supports downstream objectives such as hallucination mitigation, model editing, fairness, and safety. By positioning interpretability as a pathway to better-guided LLM design and practice, this survey outlines key challenges and future directions toward trustworthy and controllable foundation models.
    AI Ethics, Trust, FairnesExplainability and interpretabilityAI Ethics, Trust, FairnesFairness and diversityAI Ethics, Trust, FairnesTrustworthy AI
  944. #SV226

    Deep Learning and Foundation Models for Weather Prediction: A Survey

    Jimeng Shi, Azam Shirali, Bowen Jin, Sizhe Zhou, Wei Hu, Rahuul Rangaraj, Zhaonan Wang, Yanzhao Wu, Leonardo Bobadilla, Upmanu Lall, Shaowen Wang, Jiawei Han, Giri Narasimhan
    Numerical weather prediction (NWP) models remain the cornerstone of atmospheric sciences. Yet, deep learning (DL) is challenging this paradigm by its ability to capture intricate spatio-temporal patterns and deliver ultra-fast predictions. Analogous to the foundation models (e.g., ChatGPT) in natural language processing, foundation models in the weather/climate domain have also been developed. This paper reviews DL and foundation models for weather prediction by highlighting their strengths and limitations. In particular, we carefully examine them from the perspective of their training paradigms: deterministic predictive learning, probabilistic generative learning, and pre-training & fine-tuning. For each paradigm, we summarize the underlying model architectures, training methods, and respective features. To facilitate further study, we provide a curated repository featuring categorized papers, open-source code, and benchmark datasets. Finally, we discuss and suggest potential research directions across new tasks and models in weather data storage and management, and operational deployment, further inspiring innovations in this rapidly evolving field. GitHub: https://github.com/JimengShi/DL-Foundation-Models-Weather.
    Machine LearningApplicationsMultidisciplinary Topics and ApplicationsEnergy, environment and sustainabilityMultidisciplinary Topics and ApplicationsLife sciencesMultidisciplinary Topics and ApplicationsOther
  945. #SV238

    A Resource-Aware Taxonomy of AI Bias Mitigation Techniques

    Daniela Loreti, Roberta Calegari, Michela Milano
    The literature on AI fairness has grown rapidly, proposing a large number of bias mitigation techniques that are commonly organized into pre-, in-, and post-processing methods. This pipeline-centric view offers an operational, lifecycle-based perspective on where mitigation can be applied. In deployment settings, however, practitioners also face an additional question: whether a mitigation family is applicable given the resources and access rights available in a concrete system.
    In this survey, we use resources broadly to denote data access/control, training capability, and deployment-time interface/decision control.
    Accordingly, we introduce a resource-aware taxonomy that complements existing taxonomies by classifying AI bias mitigation methods according to the conditions that make them practically implementable. We use this taxonomy to structure and reinterpret existing literature on the topic, highlighting which mitigation families remain feasible under resource constraints.
    AI Ethics, Trust, FairnesEthical, legal and societal issuesAI Ethics, Trust, FairnesTrustworthy AIAI Ethics, Trust, FairnesBiasAI Ethics, Trust, FairnesFairness and diversity
  946. #SV241

    Graph4LLM: A Systematic Survey of Graph-Enhanced Large Language Models

    Xinyan Zhu, Cheng Yang, Qiuyu Wang, Zeyuan Guo, Yiding Wang, Zedi Liu, Chunchen Wang, Chuan Shi
    Large Language Models (LLMs) excel in natural language processing (NLP) tasks. However, they suffer from inherent limitations due to their sequence-based nature, such as structural information loss and factual unreliability. Graphs, with the ability to explicitly model entities and relations, offer an effective way to address these shortcomings. To systematically synthesize the emerging research on graph-enhanced LLMs, this survey, Graph4LLM, examines how these methods integrate graphs into various stages of the LLM pipeline, including the input, model, and output phases. For each phase, we provide a detailed review of the key methods and techniques. We also introduce a wide range of application scenarios where Graph4LLM methods demonstrate significant potential. Finally, we outline the challenges and future research directions for developing more efficient and interpretable solutions.
    Natural Language ProcessingLanguage modelsData MiningMining graphs
  947. #SV245

    A Survey on Quantitative Possibility Theory in Artificial Intelligence. A Convenient Uncertainty and Preference Model

    Henri Prade, Sébastien Destercke, Didier Dubois
    Quantitative (or numerical) possibility theory offers a simple but yet very expressive setting for handling higher-order uncertainty and in particular imprecise probabilities. The paper surveys the basic ideas and notions underlying numerical possibility theory, its relation to the other uncertainty settings and its use in AI-related issues. Numerical possibility theory looks of interest for coping with imperfect statistical information, especially non-Bayesian statistics relying on likelihood functions and confidence intervals. Quantitative possibility theory can be used in inference, machine learning, tracking and information fusion, and finally preference modeling.
    Uncertainty in AINonprobabilistic modelsUncertainty in AIGraphical modelsUncertainty in AIUncertainty representations
  948. #SV252

    A Survey on Active Feature Acquisition Strategies

    Linus Aronsson, Arman Rahbar, Morteza Haghir Chehreghani
    Active feature acquisition (AFA) studies how a predictive system can sequentially choose which feature values to obtain for each instance to balance predictive accuracy against feature acquisition cost (financial, time, invasiveness, or privacy). This survey provides the first unified treatment of modern AFA through an explicit MDP and POMDP formulation, showing that most existing methods can be understood as different approximations of the same underlying sequential decision problem. The survey proposes an up-to-date taxonomy organizing AFA into three families: (i) embedded cost-aware predictors (notably cost-sensitive decision trees and ensembles), (ii) model-based methods that plan using learned probabilistic components, and (iii) model-free or hybrid methods that learn policies from simulated acquisition episodes. We hope this POMDP-centric view both clarifies existing work and motivates new AFA methods that more directly build on the mature literature on POMDP planning and approximation. It concludes by outlining open challenges for achieving robust cost–accuracy trade-offs in practice, including reliable evaluation under realistic missingness and logging, computational constraints, and deployment requirements such as robustness and interpretability.
    Machine LearningActive learningMachine LearningCost-sensitive learningMachine LearningFeature extraction, selection and dimensionality reductionMachine LearningPartially observable reinforcement learning and POMDPsUncertainty in AISequential decision making
  949. #SV253

    Sparsity in Federated Learning: A Survey

    Alessio Mora, Adriano Guastella, Lorenzo Sani, Paolo Bellavista, Nicholas Lane
    Conventional Federated Learning (FL) pipelines focus on the collaborative training of a global dense model across client devices. Sparsity has been increasingly adopted in FL, during or after local optimization, for a range of objectives, including reducing communication and computation costs, supporting unlearning, enhancing privacy guarantees, and improving local personalization. In this survey, we introduce a novel taxonomy of sparse FL methods that systematically organizes the existing literature according to their core objectives and methodological choices. Using this taxonomy, we analyze and categorize prior work, highlighting the underlying intuitions, technical mechanisms, benefits, and limitations of each class of approaches. Finally, we identify open challenges, expose research gaps, and extract guidance to help practitioners understand and adopt sparsity mechanisms in FL.
    Machine LearningFederated learningMachine LearningLearning sparse models
  950. #SV263

    A Comprehensive Survey of Interaction Techniques in 3D Scene Generation

    Yuqi Li, Siwei Meng, Chuanguang Yang, Weilun Feng, Junming Liu, Zhulin An, Yikai Wang, Yingli Tian
    The rapid evolution of 3D scene generation has revolutionized content creation across domains such as gaming, film production, and architectural visualization. Within this landscape, interaction techniques serve as the pivotal bridge connecting user intent with generative models, enabling precise control, real-time feedback, and personalized customization of complex 3D scenes. Existing literature reviews predominantly focus on general generative paradigms or are limited to specific subdomains such as single-object modeling, often overlooking the systematic classification of interaction mechanisms. To bridge this gap, this survey presents a comprehensive survey of interaction techniques in 3D scene generation. We propose a unified taxonomy that categorizes existing methods into three primary paradigms: Interactive Generation, Interactive Editing, and Embodied Interaction. For each category, we analyze representative methods in terms of controllability, interaction granularity, and physical consistency, and discuss their advantages and limitations. We further summarize commonly used datasets and evaluation protocols for interactive 3D scene generation. Finally, we discuss future directions toward more physically grounded, multi-modal, and human-centered interactive 3D scene generation systems.
    Computer Vision3D computer visionMachine LearningGenerative modelsAgent-based and Multi-agent SystemsHuman-agent interaction
  951. #SV270

    Implicit Identity Technologies for LLMs: Fingerprinting and Watermarking Across Datasets, Models, and Generated Content

    Bing Liu, Shunping Wang, Yufan Zhu, xinyi yu, Jing Huang, linkang du, HONGBIN PEI, Wei Luo
    Large language models (LLMs) are substantial investments and increasingly deployed in high-stakes domains, making it critical to protect LLM-related assets and to trace their provenance.Identity technologies such as fingerprinting and watermarking address these needs by enabling ownership verification and attribution, and have rapidly emerged as an active research focus.However, as the field remains at an early stage, existing techniques lack a systematic organization, leading to two key challenges—terminological confusion and isolated research lines—that have hindered the development of this research field.To this end, we present a comprehensive review of LLM identity techniques, focusing on fingerprinting and watermarking across the LLM lifecycle, including datasets, models, and generated content. We make three primary contributions. First, we introduce \emph{implicit identity} as a unifying abstraction and distinguish fingerprinting from watermarking. Second, we propose a lifecycle-based taxonomy that organises techniques by asset type and verification role, aligning each with asset protection or provenance.Third, we establish an evaluation framework around three objectives---identifiability, robustness, and deployability---and summarise representative metrics under realistic access and transformation regimes, providing a common basis for comparison. Together, these contributions unify and structure the landscape of LLM identity techniques, clarify terminology, and highlight directions toward more secure deployment.
    AI Ethics, Trust, FairnesSafety and robustnessAI Ethics, Trust, FairnesAI and law, governance, regulationAI Ethics, Trust, FairnesEthical, legal and societal issues
  952. #SV272

    LLM-Based Agents on the Edge: A Survey of Privacy, Scalability, Heterogeneity, and Autonomy

    Nikita Agrawal, Ruben Mayer
    Large language model (LLM)–based agents are increasingly being deployed beyond centralized cloud environments and toward the edge of the network, where they operate closer to data sources. This transition facilitates lower latency and enhances contextual awareness, privacy, and responsiveness, but it also introduces challenges that differ from traditional cloud-based agent deployments. This survey provides a systematic overview of LLM-based edge agents with a particular focus on four critical dimensions: privacy, scalability, heterogeneity, and autonomy. To facilitate structured analysis, we introduce a novel taxonomy along four axes: deployment, functional role, interaction, and adaptation. Based on our taxonomy, we analyze the challenges LLM-based agents face on the edge and discuss design solutions that can help mitigate possible issues. We further analyze the degree to which existing LLM-based edge agent frameworks achieve privacy, scalability, heterogeneity, and autonomy.
    Agent-based and Multi-agent SystemsAgent communicationAgent-based and Multi-agent SystemsCoordination and cooperationAI Ethics, Trust, FairnesTrustworthy AIKnowledge Representation and ReasoningLearning and reasoning
  953. #SV273

    Knowledge-Guided 3D CT Generation: A Conditioning-Centric Taxonomy

    Francesca Pia Panaccione, Eugenio Lomurno, Matteo Matteucci
    Controllable generation guided by external knowledge is a key requirement in modern generative deep learning applications, enabling the synthesis of samples with explicit constraints on semantic content, structural properties, and variability. In 3D Computed Tomography (CT), such control is essential for clinical applications, including data augmentation, privacy-preserving data sharing, and the simulation of specific anatomical or pathological scenarios. While research on conditional 3D CT generation has expanded rapidly, the diversity of existing approaches makes systematic comparison
    difficult and obscures fundamental design choices.
    In this survey, we propose a conditioning-centric taxonomy that organizes the literature along three orthogonal dimensions: the type of external knowledge (K), the knowledge integration paradigm (I), and the generative architecture (A). This factorization defines an explicit design space (K x I x A) that provides a unified perspective on prior work. Using this framework, we systematize existing methods, identify dominant trends and recurring design patterns, and highlight underexplored regions of the design space that point toward promising directions for future research.
    Computer Vision3D computer visionComputer VisionBiomedical image analysisComputer VisionMultimodal learningMachine LearningDeep learning architecturesMachine LearningGenerative models