IJCAI-ECAI 2026 Accepted Papers · Survey Track
Presentation format
Every accepted paper is presented in two formats: an oral talk — which must be delivered in person in Bremen by one of the authors — and a poster during a dedicated poster session.
-
#SV1
A Survey on 3D Skeleton Based Person Re-Identification: Taxonomy, Advances, Challenges, and Interdisciplinary Prospects
Person re-identification via 3D skeletons is an important emerging research area that attracts increasing attention within the pattern recognition community. With distinctive advantages across various application scenarios, numerous 3D skeleton based person re-identification (SRID) methods with diverse skeleton modeling and learning paradigms have been proposed in recent years. In this paper, we provide a comprehensive review and analysis of recent SRID advances. First of all, we define the SRID task and provide an overview of its origin and major advancements. Secondly, we formulate a systematic taxonomy that organizes existing methods into three categories centered on hand-crafted, sequence-based, and graph-based modeling. Then, we elaborate on the representative models along these three types with an illustration of foundational mechanisms. Meanwhile, we provide an overview of mainstream supervised, self-supervised, and unsupervised SRID learning paradigms and corresponding common methods. A thorough evaluation of state-of-the-art SRID methods is further conducted over various types of benchmarks and protocols to compare their effectiveness, efficiency, and key properties. Finally, we present the key challenges and prospects to advance future research, and highlight interdisciplinary applications of SRID with a case study. A curated collection of valuable resources is available at https://github.com/Kali-Hac/3D-SRID-Survey.Computer VisionRecognition (object detection, categorization)Computer VisionOtherMultidisciplinary Topics and ApplicationsSecurity and privacyMultidisciplinary Topics and ApplicationsOther -
#SV5
A Review on Test-Time Scaling for Agentic Large Language Models
A Review on Test-Time Reasoning for Agentic Large Language ModelsAgent-based and Multi-agent SystemsEngineering methods, platforms, languages and tools -
#SV17
Graph Rewiring in GNNs to Mitigate Over-Squashing and Over-Smoothing: A Survey
Graph Neural Networks are powerful models for learning from graph-structured data, yet their effectiveness is often limited by two critical challenges: over-squashing, where information from distant nodes is excessively compressed, and over-smoothing, where repeated propagation makes node representations indistinguishable. Both phenomena stem from the interaction between message passing and the input topology, ultimately degrading information flow and limiting the performance of GNNs. In this survey, we examine graph rewiring techniques, a class of methods designed to modify the graph topology to enhance information propagation in GNNs. We provide a comprehensive review of state-of-the-art rewiring approaches, delving into their theoretical underpinnings, practical implementations, and performance trade-offs.Data MiningMining graphsData MiningNetworks -
#SV20
When Vision Meets Graphs: A Survey on Graph Reasoning and Learning
Graphs are a fundamental data structure underlying many problems in the natural and social sciences. Over the past decade, Graph Neural Networks (GNNs) have dominated graph machine learning, supported by solid theoretical foundations. Yet scientists often understand graph structure through vision: chemists read molecular diagrams and social scientists inspect network visualizations. Despite decades of work on graph visualization, most graph learning pipelines still treat graphs purely as symbolic structures, rarely leveraging the visual form of graphs. We argue that this gap deserves renewed attention in the era of powerful vision and vision language models.
This survey provides a first systematic overview of the emerging area we term vision meets graphs, which treats visual depictions of graphs as first-class inputs for reasoning and learning. We organize existing work into three threads. Vision for Graph Reasoning studies how models can use visual depictions of graphs to understand structure and carry out multi-step reasoning. Vision for Graph Learning explores how visual features can complement or augment graph encoders beyond known limitations of message passing. Scientific Graphs examines domains where standardized depiction conventions support both reasoning and learning. Our goal is to clarify what current methods can and cannot do, and to outline a path toward foundation models that perceive and reason about graphs as scientists do.Data MiningData visualizationComputer VisionVision, language and reasoningComputer VisionMultimodal learning -
#SV22
A Survey of Artificial Intelligence in Endoscopic Surgery Workflow: From Perception to Surgical Support
Endoscopic surgery demands continuous real-time visual decision-making under severe constraints, including a limited field of view, motion blur, and dynamically deforming anatomy. These factors impose substantial cognitive load on surgeons and motivate the integration of artificial intelligence (AI) throughout the endoscopic surgical workflow. This survey reviews recent progress in AI for endoscopic surgery and organizes the literature into four stages that span perception to action: (1) image enhancement and analysis methods that improve visual perception; (2) multimodal video understanding approaches that model and reason surgical instruments and anatomical structures over space and time; (3) 3D reconstruction techniques that enable robust tracking and interpretation of deformable anatomy; and (4) emerging paradigms of embodied surgical intelligence, where action-conditioned world models link perception to intraoperative assistance.
Across these stages, we summarize current capabilities and limitations and identify key open challenges for clinical deployment. In addition, we provide an overview of 18 publicly available datasets, highlighting their scope and annotations. We hope this survey will stimulate further research toward reliable and clinically deployable AI systems for endoscopic surgery.SVComputer VisionSVMultidisciplinary Topics and Applications -
#SV28
Towards Automated Kernel Generation in the Era of LLMs: A Survey
The performance of modern AI systems is fundamentally constrained by the quality of their underlying kernels, which translate high-level algorithmic semantics into low-level hardware operations. Achieving near-optimal kernels requires expert-level understanding of hardware architectures and programming models, making kernel engineering a critical but notoriously time-consuming and non-scalable process. Recent advances in large language models (LLMs) and LLM-based agents have opened new possibilities for automating kernel generation and optimization. LLMs are well-suited to compress expert-level kernel knowledge that is difficult to formalize, while agentic systems further enable scalable optimization by casting kernel development as an iterative, feedback-driven loop. Rapid progress has been made in this area. However, the field remains fragmented, lacking a systematic perspective for LLM-driven kernel generation. This survey addresses this gap by providing a structured overview of existing approaches, spanning LLM-based approaches and agentic optimization workflows, and systematically compiling the datasets and benchmarks that underpin learning and evaluation in this domain. Moreover, key open challenges and future research directions are further outlined, aiming to establish a comprehensive reference for the next generation of automated kernel optimization. To keep track of this field, we maintain an open-source GitHub repository at https://github.com/flagos-ai/awesome-LLM-driven-kernel-generation.Natural Language ProcessingApplicationsMultidisciplinary Topics and ApplicationsAI hardwareMultidisciplinary Topics and ApplicationsSoftware engineering -
#SV33
Towards Vision-Spatiotemporal Fusion in Traffic Forecasting: A Survey on Cross-Modal Alignment
Traffic forecasting is evolving, with world models emerging as a powerful framework applicable to tasks such as core state, trajectory, event, and demand forecasting. These tasks involve both visual and spatiotemporal data, yet most existing methods treat them separately, hindering a unified understanding of traffic scenes in both semantic meanings and spatiotemporal dynamics. The fusion of the two modalities is critical for building models that comprehend complex traffic scenarios. However, the fusion issue faces two fundamental misalignments: semantic, where pixels conflict with traffic concepts, and geometric, which requires spatial intelligence to map 2D inputs into 3D. This survey reframes vision-spatiotemporal fusion via the unique lens of cross-modal alignment, addressing semantic and geometric failures that limit forecasting reliability. First, we categorize existing methods into three paradigms: feature-level, semantic-level, and task-level. This reveals their progression from low-level feature manipulation to high-level architectural integration. Second, we synthesize representative techniques per paradigm, highlighting geometric challenges such as cross-view association and spatial mapping. Third, we examine current datasets and benchmarks, highlighting their deficiencies in evaluating alignment. Finally, we outline future directions, including spatiotemporal intelligence for robust perception and holistic traffic world models. The unified framework establishes a reference for robust and explainable forecasting systems.Data MiningMining spatial and/or temporal data -
#SV36
From Multimodal Perception to Strategic Reasoning: A Survey on AI-Generated Game Commentary
The advent of artificial intelligence has propelled AI-Generated Game Commentary (AI-GGC) into a rapidly expanding research area, offering advantages such as scalable availability and personalized narration. However, existing studies remain fragmented, and a systematic survey that unifies prior efforts is still lacking. To bridge this gap, our survey introduces a unified framework that systematically organizes the AI-GGC landscape. We present a novel taxonomy focused on three core commentator capabilities: Live Observation, Strategic Analysis, and Historical Recall, and further categorize commentary into three corresponding types: Descriptive Commentary, Analytical Commentary, and Background Commentary. Building on this structure, we provide an in-depth review of methods, datasets, and evaluation metrics, analyzing their strengths and limitations. Finally, we highlight key challenges and point out promising directions for future research in AI-GGC.Natural Language ProcessingLanguage generationNatural Language ProcessingResources and evaluationMachine LearningMulti-modal learningMultidisciplinary Topics and ApplicationsEntertainment -
#SV40
From Human Videos to Robot Manipulation: A Survey on Action-Relevant Representation Transfer for Scalable Vision-Language-Action Learning
Recent progress in generalizable embodied control has been driven by large-scale pretraining of Vision–Language–Action (VLA) models. However, most existing approaches rely on large collections of robot demonstrations, which are costly to obtain and tightly coupled to specific embodiments. Human videos, by contrast, are abundant and capture rich interactions, providing diverse semantic and physical cues for real-world manipulation. Yet, embodiment differences and the frequent absence of task-aligned annotations make their direct use in VLA models challenging. This survey provides a unified view of how human videos are transformed into effective knowledge for VLA models. We categorize existing approaches into four classes based on the action-related information they derive: (i) latent action representations that encode inter-frame changes; (ii) predictive world models that forecast future frames; (iii) explicit 2D supervision that extracts image-plane cues; and (iv) explicit 3D reconstruction that recovers geometry or motion. Beyond this taxonomy, we highlight three key open challenges in this area: structuring unstructured videos into training-ready episodes, grounding video-derived supervision into robot-executable actions under embodiment and viewpoint heterogeneity, and designing evaluation protocols that better predict real-world deployment performance and transfer efficiency, thereby informing future research directions.RoboticsLearning in roboticsRoboticsRobotics and visionRoboticsManipulation -
#SV41
Concept Bottleneck Models for Explainable Decision Making: A Survey of Progress, Taxonomy, and Future Directions
Deep neural networks deliver strong performance but remain opaque, limiting their use in high-stakes domains that require transparency and human oversight. Concept Bottleneck Models (CBMs) address this gap by introducing a human-interpretable concept layer that mediates inputs and decisions, enabling semantic explanations and test-time intervention. This survey provides a unified review of CBMs organized along four dimensions: concept acquisition, concept-based decision making, concept intervention, and concept evaluation. We summarize the evolution of concept construction from manual annotation to lexicon-based mining, LLM/VLM-guided generation, and visually grounded discovery via prototypes and diffusion models; review emerging CBM architectures beyond strict bottlenecks; and consolidate evaluation and intervention protocols emphasizing faithfulness, sparsity, and intervenability, with particular relevance to high-stakes domains such as healthcare. We synthesize fragmented literature and outline key challenges and future directions for concept-based interpretable decision making.AI Ethics, Trust, FairnesExplainability and interpretability -
#SV74
Constraining Generative Models: A Survey from the Constraint Programming Perspective
Generative models produce long and high probability sequences, yet they often fail to satisfy explicit constraints set by users. Over the past two decades, Constraint Programming (CP) has provided a complementary paradigm: combining generative models with a constraint solver to guarantee feasibility. This survey reviews the main concepts behind these CP-driven hybrid approaches, from enforcing ubiquitous structural rules (e.g., length and patterns) to preventing plagiarism. It synthesizes how learned models can be treated as constraints, compiled structures, or probabilistic factors. We highlight what has remained stable across applications, then discuss how these principles transfer to the Large Language Model era and outline open challenges for controllable and trustworthy generative systems.Constraint Satisfaction and OptimizationConstraint programming -
#SV77
A Survey of Personalized Federated Foundation Models for Privacy-Preserving Recommendation
Integrating Foundation Models (FMs) into recommendation systems is an emerging and promising research direction. However, centralized paradigms face growing pressure from privacy concerns and strict regulatory requirements. Federated learning offers a viable solution that enables collaborative model refinement while keeping raw user data on local devices or organizational silos. Yet, applying FMs in this setting creates a fundamental tension, where the system must balance the leverage of global knowledge with the necessity of capturing user personality. This survey provides a comprehensive overview of Personalized Federated Foundation Models for privacy-preserving recommendation, and review recent progress in this emerging field. We first analyze personalization techniques that function effectively under federated settings. Furthermore, we discuss the adaptation of foundation models to such federated architectures to balance generalization with user-specific needs for achieving privacy-preserving recommendation. In contrast to existing reviews, our work specifically emphasizes the architectural intersection of federation, personalization, and foundation models.Data MiningRecommender systemsMachine LearningFederated learningMachine LearningFoundation modelsData MiningPrivacy-preserving data mining -
#SV80
Tackling Multimodal Learning Challenges with Mixture-of-Expert: A Survey
Mixture-of-Experts (MoE) presents a naturally compatible and scalable framework for multimodal learning, demonstrating strong adaptability across diverse modalities and tasks. Despite its growing success, a comprehensive and systematic evaluation of multimodal MoE remains lacking. Existing surveys tend to address either multimodal learning or MoE independently, overlooking the unique interplay between them. This survey fills that gap by addressing a central question: \textit{How does MoE effectively resolve multimodal challenges?} We approach this from three key perspectives: (1) \textbf{MoE as an Efficient Multimodal Framework:} enabling scalable multimodal modeling by decoupling computational cost from parameter growth and mitigating modality redundancy through selective expert activation; (2) \textbf{MoE as a Multimodal Representation Learner:} integrating complementary multi-opinion expert knowledge to enrich alignment and interaction representations; and (3) \textbf{MoE as a Multimodal Adapter:} providing a modular and flexible mechanism to model imperfect modality data such as modality imbalance and missing modality. Through an extensive literature review, we identify critical research gaps, including interpretable routing, expert communication, modality integration, and lifelong multimodal learning. We position this survey as a foundation for future research toward interpretable, adaptive, and sustainable multimodal Mixture-of-Experts systems.Machine LearningMulti-modal learningData MiningInformation retrievalData MiningMining heterogenous dataAI Ethics, Trust, FairnesTrustworthy AI -
#SV82
AI-Enhanced Vein Biometrics: A Comprehensive Survey
Vein biometrics has emerged as a promising biometric modality for personal identity authentication, benefiting from its intrinsic properties such as high discriminative capability, resistance to forgery, and contactless acquisition. Recent advances in artificial intelligence, particularly deep learning, have significantly accelerated its development. This paper presents a comprehensive and systematic survey of AI-enhanced vein biometrics. We review fundamental principles, publicly available datasets, and evaluation protocols, and systematically analyze existing methods across the entire vein biometric pipeline, including acquisition, preprocessing, feature extraction, recognition and verification, security and privacy protection, and multimodal fusion. Furthermore, we summarize representative application scenarios, identify key challenges, and highlight promising directions for future research. To facilitate reproducible research and long-term development of the field, we release an open, evolving research resource Awesome-Vein-Biometrics that systematically summarizes and tracks recent advances in vein biometrics.Computer VisionBiometrics, face, gesture and pose recognition -
#SV87
A Comprehensive Survey of Deep Learning for Multivariate Time Series Forecasting: A Channel Strategy Perspective
Multivariate Time Series Forecasting (MTSF) plays a crucial role across diverse fields, ranging from economic, energy, to traffic. In recent years, deep learning has demonstrated outstanding performance in MTSF tasks. In MTSF, modeling the correlations among different channels is critical, as leveraging information from other related channels can significantly improve the prediction accuracy of a specific channel. This study systematically reviews the channel modeling strategies for time series and proposes a taxonomy organized into three hierarchical levels: the strategy perspective, the mechanism perspective, and the characteristic perspective. On this basis, we provide a structured analysis of these methods and conduct an in-depth examination of the advantages and limitations of different channel strategies. Finally, we summarize and discuss some future research directions to provide useful research guidance. Moreover, we maintain an up-to-date Github repository which includes all the papers discussed in the survey.Data MiningMining spatial and/or temporal data -
#SV92
Large Language Models for Blockchain Security and Analytics: A Survey
Large Language Models are transforming blockchain security and analytics, yet systematic evaluation of their capabilities remains limited. This survey delivers a comprehensive, AI‑centric assessment of LLM‑based methods across more than sixty recent studies spanning nine application domains, including smart contract auditing, transaction fraud detection, cryptocurrency portfolio management, and DeFi security analysis. We introduce a unified taxonomy that standardizes task formulations, datasets, tools, algorithms, and evaluation practices, enabling consistent comparison across approaches. For each domain, we review deployed LLM architectures; learning and inference paradigms such as pre‑training, prompt engineering, fine‑tuning, retrieval‑augmented generation, and agentic strategies; and input representations tailored to blockchain data. We further analyze the strengths, limitations, and emerging patterns observed in current systems. Finally, the survey provides practical guidance for selecting LLM techniques and outlines promising research directions, e.g., explainable smart contract verification, automated DeFi protocol analysis, adversarial robustness evaluation, and scalable on‑chain anomaly detection.Data MiningApplications -
#SV96
Dynamic Heterogeneous Graph Representation Learning: A Survey
Graph representation learning (GRL) serves as a canonical paradigm for modeling complex networks. However, real-world AI systems inherently manifest as evolving heterogeneous entities with complex interactions, posing significant challenges to static or homogeneous modeling. To address these complexities, representation learning for Dynamic Heterogeneous Graphs (DHGs) has emerged as a vital approach for learning low-dimensional representations that simultaneously preserve structural semantics and temporal dynamics. This survey presents the first systematic review of DHG representation learning methods. We first introduce a unified formal definition that encompasses both discrete-time and continuous-time DHGs from the perspective of temporal granularity. Building upon this formulation, we propose a novel algorithm-centric taxonomy that categorizes existing literature, including early embedding-based approaches, graph neural network (GNN)-based models, and relatively recent Transformer-based DHG methods, while explicitly highlighting their intrinsic modeling biases with respect to dynamic granularity. Furthermore, we summarize representative applications of DHG representation learning, along with commonly used datasets and benchmarks. Finally, we discuss promising research directions that guide future advances in this rapidly evolving field.Data MiningMining graphsMachine LearningRepresentation learningMachine LearningSelf-supervised LearningMachine LearningSequence and graph learning -
#SV108
A Survey of Joint Online-Offline Fine-tuning for Large Language Models
Post-training for Large Language Models (LLMs) can be mainly categorized into offline Supervised Fine-Tuning (SFT) for knowledge acquisition and online Reinforcement Fine-Tuning (RFT) for adaptive refinement. Current state-of-the-art approaches typically employ a sequential cold-start pipeline (SFT-Then-RFT). However, we argue that this disjoint transition imposes an “alignment tax", leading to catastrophic forgetting and reward hacking during the unregularized exploration phase. In this work, we advocate for Joint Online-Offline Fine-Tuning as a superior paradigm that breaks the convention of restricting offline data to SFT and online data to RFT. By integrating full offline response generation with online rollouts—particularly within the realm of Reinforcement Learning with Verifiable Rewards (RLVR)—this approach mitigates the limitations of isolated training phases. We provide the first comprehensive survey focusing specifically on the synchronization of data provenance. We introduce a novel taxonomy for related works, analyze their theoretical advantages in balancing stability with plasticity, and outline a roadmap for next-generation post-training frameworks.Natural Language ProcessingLanguage modelsMachine LearningSupervised LearningMachine LearningReinforcement learning -
#SV112
Multimodal Emotion Recognition with Large Language Models
Multimodal Emotion Recognition (MER) focuses on identifying and interpreting emotions from modality-compound inputs. Closely mirroring human cognitive processes in real-world environments, MER has drawn substantial attention from both academia and industry. Recently, a paradigm shift has been unveiled in MER, from leveraging small-scale, task-specific models to Large Language Models (LLMs). We refer to the latter as the MER-with-LLMs paradigm, which offers unprecedented generality, spurring numerous empirical attempts, even alongside speculation about their potential to achieve general emotional intelligence. However, with these new opportunities come new challenges, including the scarcity of emotionally annotated data, the affective gap both within and across modalities, and the opacity of affective interpretation. To systematically review existing research and guide future exploration, this paper categorizes prior works according to their focus on addressing these challenges into three directions: Affective Data Augmentation, Multimodal Affective Representation, and Multimodal Affective Reasoning. By thoroughly tracing the development, emerging trends, and remaining issues within each direction, this paper aims to provide a clear academic map of the MER-with-LLMs paradigm and foster its structured advancement.Computer VisionInterpretability and transparencyComputer VisionMultimodal learningComputer VisionVideo analysis and understandingNatural Language ProcessingLanguage generationNatural Language ProcessingLanguage models -
#SV134
From Time Series Analysis to Question Answering: A Survey in the LLM Era
Recently, Large Language Models (LLMs) have introduced a novel paradigm in Time Series Analysis (TSA), leveraging strong language capabilities to support tasks such as forecasting and anomaly detection. However, these analysis tasks cannot adequately cover temporal language tasks, such as interpretation and captioning. A fundamental gap remains between TSA and LLMs: LLMs are pre-trained to optimize natural language relevance for question answering rather than objectives specialized for TSA. To bridge this gap, TSA is evolving toward Time Series Question Answering (TSQA), shifting from expert-driven and task-specific analysis to user-driven and task-unified question answering. TSQA depends on flexible exploration rather than predefined TSA pipelines. In this survey, we first propose a taxonomy that reflects the evolution from TSA to TSQA, driven by a shift from external to internal alignment. We then organize existing literature into three alignment paradigms: Injective Alignment, Bridging Alignment, and Internal Alignment, and provide practical guidance for flexible, economical, and generalizable selection of alignment paradigms. We finally analyze datasets across domains and characteristics, identify challenges, and highlight future research directions.Data MiningMining spatial and/or temporal dataNatural Language ProcessingQuestion answering -
#SV137
Modeling Liquid Democracy: A Survey of the (Computational) Social Choice Literature
Liquid democracy encompasses a family of decision-making processes, where votes can be cast directly or passed along proxy chains. We provide a community-maintainable and systematic survey of (computational) social choice papers on liquid democracy, organized through a searchable taxonomy of core modeling features that have appeared in the literature. Drawing on the insights from our survey, we also outline a number of research directions, which we consider of special importance for both the theory and practice of liquid democracy.Game Theory and Economic ParadigmsComputational social choice -
#SV140
A Survey on the Verification of Reinforcement Learning Policies
Reinforcement learning (RL) is increasingly applied in complex, safety-critical domains, yet the lack of rigorous behavioral guarantees for neural network-based policies remains a major barrier to deployment. Recent advances in policy expressiveness and scale have intensified this challenge, leading to a rapidly growing but conceptually fragmented body of work on RL policy verification. This survey provides a unifying perspective on RL verification methods. We introduce a taxonomy that clarifies relationships among existing approaches along three axes: verification paradigm (formal versus probabilistic), temporal scope (step-wise versus multi-step), and guarantees strength. Beyond taxonomy, we unify underlying theoretical foundations, make implicit assumptions and limitations explicit, and identify emerging directions.Agent-based and Multi-agent SystemsFormal verification, validation and synthesisMachine LearningReinforcement learning -
#SV141
Approximation Algorithms for the Shapley Value: Taxonomy and Properties
Attributing importance to the individual components of a larger unit has become a popular method for understanding models and data in AI and machine learning. Starting with feature explanation, this method is now also used in data valuation or federated learning, just to name a few. Despite their differences, all of these applications use the same mathematical attribution mechanism: the Shapley value, which is rooted in cooperative game theory. While the Shapley value is appealing and has strong axiomatic foundations, it is computationally intractable due to the combinatorial explosion of player subsets. Therefore, there is a need for approximation algorithms, which have been studied intensively in recent years. This survey provides an overview of general-purpose approximation methods applicable to any domain. We categorize these methods into algorithmic classes, compare their properties, and highlight connections between approaches in a comprehensive taxonomy.Game Theory and Economic ParadigmsCooperative gamesMachine LearningGame Theory -
#SV144
Spatial Pattern Matching: A Survey
Recent developments in Artificial Intelligence (AI) have led to flexible ways for users to search through vast information.
However, users may have questions that are grounded in the real world which require spatial inference, for which language models are not well suited.
Conversely, traditional spatial search methods, like spatial pattern matching, can answer spatial reasoning questions correctly but are noise-intolerant, slow, and brittle.
Given the current state, there are opportunities to integrate AI and spatial pattern matching to enable robust and flexible spatial search.
To bridge this gap, we survey existing spatial pattern matching methods, including the few that apply AI to the problem, discussing their efficiency and limitations, and describing opportunities to further enable spatial search via AI.Knowledge Representation and ReasoningQualitative, geometric, spatial, and temporal reasoningData MiningMining spatial and/or temporal dataData MiningKnowledge graphs and knowledge base completion -
#SV146
A Survey on Value Alignment in Agentic AI Systems
With the evolution of artificial intelligence (AI) paradigms towards agentic AI, the widespread integration of large language models (LLMs) enhances system capabilities while also introducing situational risks and challenges of value misalignment, making value alignment in agentic AI systems a critical issue. This paper constructs a multi-level value framework encompassing L0 (universal values), L1 (cultural and industry values), and L2 (context-specific values). Guided by this framework, we conduct an in-depth analysis along the technical stack: at the LLM level, we examine value injection mechanisms through pretraining and post-training; at the single-agent level, we focus on representation and injecting values to agents, Profiles and memory, and planning and action; at the multi-agent level, we summarize collaborative alignment methods such as communication strategy optimization and multi-objective reinforcement learning. Following a systematic review of existing datasets and methods for multi-level alignment evaluation, we outline future research directions, including inter-agent value coordination mechanisms, high-quality scenario data sharing, game-theoretic design for value alignment in agent interaction and communication protocol alignment—aiming to establish a more systematic and dynamic evaluation framework and to promote robust and trustworthy value consensus in agentic AI systems within social collaboration.Agent-based and Multi-agent SystemsOtherAI Ethics, Trust, FairnesValues -
#SV153
Adaptive Reward Design in Reinforcement Learning: A Taxonomy and Survey
Adaptive Reward Design (ARD) is becoming a fundamental component for Reinforcement Learning (RL) agents, as they are deployed in increasingly complex settings where a single static reward across all phases of learning is rarely sufficient. Yet, ARD is rarely studied as a coherent topic: Relevant ideas are dispersed across reward shaping, curriculum learning, intrinsic motivation, non-stationary objectives, and preference- or feedback-based learning, which obscures conceptual connections and complicates method selection. This survey provides a unified view of ARD in RL by introducing a taxonomy, organized by the primary driver of the reward variation. The taxonomy distinguishes external-feedback-driven reward updates from reward adaptations driven by endogenous within-run signals and those conditioned on exogenous context signals. Using explicit assignment rules, we place work published between 2010 and 2025 within this taxonomy. By synthesizing typical RL settings and domains at the driver level, we simplify the method selection in ARD. Further, we describe the evolution and current trends of ARD and conclude by outlining promising future research directions.Machine LearningReinforcement learning -
#SV155
Beyond Scaling: A Survey on Data-Efficient Agentic Learning
LLM-based agents are increasingly deployed across web and GUI automation, embodied decision making, and scientific workflows, yet their progress is often constrained by limited data and interaction. High-quality supervision is costly, and real-environment interactions are expensive, risky, and quickly invalidated by environment drift. This survey studies how to obtain and improve LLM-based agents with fewer samples, fewer labels, and fewer/ cheaper interactions. We view agentic learning as a closed-loop decision process where experience arises from both external supervision and online interactions, and data efficiency requires maximizing information yield per unit cost. We then introduce a unified agentic learning framework and organize the literature along three complementary dimensions: experience augmentation, agent structural design, and learning paradigms. This perspective connects design choices to where learning signals come from, how they are utilized, and how adaptation is performed under bounded budgets. We summarize representative benchmarks and synthesize key open challenges, aiming to clarify the emerging landscape and support future progress in data-efficient agentic learning.Agent-based and Multi-agent SystemsApplicationsMachine LearningFew-shot learningMachine LearningLearnware/model reuse/transfer learning -
#SV158
LLM-based Intelligent Tutoring Systems: A Survey
Large Language Models (LLMs) are reshaping the design and capabilities of intelligent tutoring systems (ITS) by providing powerful generative, reasoning and interaction abilities, which surpass traditional rule-based approaches. This survey presents a structured overview of LLM-based ITS and analyzes how these models transform classical system components and architectures. We first review the foundational concepts of traditional ITS and introduce the functional roles of the main components, followed by LLM-based techniques and related datasets for realizing each of these components. Furthermore, we examine the key application domains and concludes the survey by outlining future research directions.Natural Language ProcessingApplications -
#SV160
An XAI View on Explainable ASP: Methods, Systems, and Perspectives
Answer Set Programming (ASP) is a popular declarative reasoning and problem solving approach in symbolic AI. Its rule-based formalism makes it inherently attractive for explainable and interpretive reasoning, which is gaining increasing importance with the surge of Explainable AI (XAI). A number of explanation approaches and tools for ASP have been developed, which often tackle specific explanatory settings and may not cover all scenarios that ASP users might encounter. In this survey, we provide, guided by an XAI perspective, an overview of types of ASP explanations in connection with user questions for explanation, and describe how their coverage by current theory and tools in ASP. Furthermore, we pinpoint gaps in existing ASP explanations approaches and identify research directions for future work.AI Ethics, Trust, FairnesExplainability and interpretabilityKnowledge Representation and ReasoningLogic programmingKnowledge Representation and ReasoningNon-monotonic reasoning -
#SV161
Accelerating Masked Diffusion Large Language Models: A Survey of Efficient Inference Techniques
Diffusion large language models (dLLMs) offer a theoretical advantage in parallel generation over standard autoregressive models. However, parallel generation alone does not guarantee practical speedups. Realizing this efficiency requires specialized inference mechanisms, such as diffusion-aware caching and reuse. Consequently, as inference efficiency becomes a prerequisite for practical deployment, recent research has actively explored acceleration techniques across algorithms, architectures, and systems. However, rigorous comparisons remain difficult, as end-to-end latency stems from intricate trade-offs between algorithmic, architectural, and system-level factors that are often conflated in existing benchmarks. In this survey, we introduce a unified latency decomposition framework for dLLMs to disentangle these factors and analyze their impact on inference speed in real deployments. Guided by this framework, we categorize acceleration techniques along three axes covering algorithmic innovations, architectural and system optimizations, and inference-time scaling. Finally, we provide guidelines for reproducible benchmarking and highlight open challenges for realizing the full potential of parallel generation.Natural Language ProcessingLanguage modelsNatural Language ProcessingResources and evaluation -
#SV181
Test-Time Adaptation for Graph Learning: A Systematic Survey
Graph distribution shifts between training and test graphs pose severe challenges to the generalization of graph neural networks (GNNs). In real-world deployment, application environments are continuously evolving, while retraining or redesigning GNNs is often costly and impractical. In light of this, test-time adaptation on graphs, which aims to dynamically adapt well-trained GNNs or adjust test graphs to improve inference performance, has attracted growing attention as a practical solution. In this survey, we provide a comprehensive review of test-time adaptation on graphs, an emerging yet underexplored research direction. We identify two fundamental challenges: (1) Data-level: complex graph distribution shifts; and (2) Model-level: limited test-time learning information. Upon this, we present a systematic taxonomy of existing methods into (a) model-centric, (b) data-centric, and (c) hybrid methods, followed by a summary of representative applications, benchmarks, and open opportunities. We aim to bridge the gap between laboratory GNN development and real-world deployment via test-time adaptation.Data MiningMining graphsData MiningNetworks -
#SV185
Learning PDE Solvers with Physics and Data: A Unifying View of Physics-Informed Neural Networks and Neural Operators
Partial differential equations (PDEs) are central to scientific modeling. Nowadays, modern workflows increasingly rely on learning-based components to support model reuse, inference, and integration across large computational processes. Despite the emergence of various physics-aware data-driven approaches, the field still lacks a unified perspective to uncover their relationships, limitations, and appropriate roles in scientific workflows. To this end, we propose a unifying perspective to place two dominant paradigms: Physics-Informed Neural Networks (PINNs) and Neural Operators (NOs), within a shared design space. We organize existing methods from three fundamental dimensions: what is learned, how physical structures are integrated into the learning process, and how the computational load is amortized across problem instances. In this way, many practical challenges can be best understood as consequences of these structural properties of learning PDEs. By analyzing recent advances through this unifying view, our survey aims to facilitate the development of reliable learning-based PDE solvers, as well as catalyzing a synthesis of physics and data.Machine LearningKnowledge-aided learning -
#SV195
Harnessing Multiple Large Language Models: A Survey on LLM Ensemble
LLM Ensemble---which involves the comprehensive use of multiple large language models (LLMs), each aimed at handling user queries during downstream inference, to benefit from their individual strengths---has gained substantial attention recently. The widespread availability of LLMs, coupled with their varying strengths and out-of-the-box usability, has profoundly advanced the field of LLM Ensemble. This paper presents the first systematic review of recent developments in LLM Ensemble. First, we introduce our taxonomy of LLM Ensemble and discuss several related research problems. Then, we provide a more in-depth classification of the methods under the broad categories of ``ensemble-before-inference, ensemble-during-inference, ensemble-after-inference'', and review relevant methods. Finally, we introduce related benchmarks and applications, summarize existing studies, and suggest several future research directions.Natural Language ProcessingApplicationsMachine LearningEnsemble methods -
#SV208
Machine Learning Methods for Studying Latent Neural Activity Dynamics
Recent developments in brain recording are driving a demand for machine learning tools capable of decoding the latent structure of large populations of neurons. In this paper, we provide a comprehensive survey that outlines the trajectory of Latent Variable Models (LVMs) from early state-space models to more recent deep generative models. We organize the literature into three closely related domains: (1) Single-Region Latent Dynamics, which includes models such as linear dynamical systems to more complex dynamics represented by Recurrent Neural Networks (RNNs) and Neural Ordinary Differential Equations (ODEs); (2) Multi-Region Communication, which employs probabilistic as well as subspace methods to study how information is transfered across different brain areas considering synaptic propagation delays and network connectivity; and (3) Behavior-Aligned Modeling, which seeks to disentangle neural activity related to task performance from other internal states via supervised or contrastive learning. Finally, we conclude and discuss benchmarks, evaluation criteria, and open challenges, such as the ability to identify causal links or directionality of communication, to facilitate future research for bridging interpretable brain dynamics with reliable neural decoding.Multidisciplinary Topics and ApplicationsLife sciencesMultidisciplinary Topics and ApplicationsComputational sustainabilityMultidisciplinary Topics and ApplicationsBioinformatics -
#SV220
A Survey on Actionable Interpretability in Large Language Models
Large Language Models (LLMs) have become central to modern AI, with interpretability serving as a critical means of investigating the opaque and highly nonlinear mechanisms encoded within billions of parameters and ensuring trustworthy deployment. However, descriptive interpretability approaches for LLMs remain largely post-hoc, illuminating model behavior without providing the actionable leverage needed to influence or adapt model behavior, thereby limiting their practical utility. Recent work has therefore reframed interpretability as an actionable paradigm, shifting the focus from explanation alone toward methods that connect internal mechanisms to model refinement. This survey reviews LLM interpretability through the lens of actionability, presenting a taxonomy of attributional, concept-based, and mechanistic approaches, along with emerging methods tailored to vision–language models (VLMs). We further examine how interpretability supports downstream objectives such as hallucination mitigation, model editing, fairness, and safety. By positioning interpretability as a pathway to better-guided LLM design and practice, this survey outlines key challenges and future directions toward trustworthy and controllable foundation models.AI Ethics, Trust, FairnesExplainability and interpretabilityAI Ethics, Trust, FairnesFairness and diversityAI Ethics, Trust, FairnesTrustworthy AI -
#SV226
Deep Learning and Foundation Models for Weather Prediction: A Survey
Numerical weather prediction (NWP) models remain the cornerstone of atmospheric sciences. Yet, deep learning (DL) is challenging this paradigm by its ability to capture intricate spatio-temporal patterns and deliver ultra-fast predictions. Analogous to the foundation models (e.g., ChatGPT) in natural language processing, foundation models in the weather/climate domain have also been developed. This paper reviews DL and foundation models for weather prediction by highlighting their strengths and limitations. In particular, we carefully examine them from the perspective of their training paradigms: deterministic predictive learning, probabilistic generative learning, and pre-training & fine-tuning. For each paradigm, we summarize the underlying model architectures, training methods, and respective features. To facilitate further study, we provide a curated repository featuring categorized papers, open-source code, and benchmark datasets. Finally, we discuss and suggest potential research directions across new tasks and models in weather data storage and management, and operational deployment, further inspiring innovations in this rapidly evolving field. GitHub: https://github.com/JimengShi/DL-Foundation-Models-Weather.Machine LearningApplicationsMultidisciplinary Topics and ApplicationsEnergy, environment and sustainabilityMultidisciplinary Topics and ApplicationsLife sciencesMultidisciplinary Topics and ApplicationsOther -
#SV238
A Resource-Aware Taxonomy of AI Bias Mitigation Techniques
The literature on AI fairness has grown rapidly, proposing a large number of bias mitigation techniques that are commonly organized into pre-, in-, and post-processing methods. This pipeline-centric view offers an operational, lifecycle-based perspective on where mitigation can be applied. In deployment settings, however, practitioners also face an additional question: whether a mitigation family is applicable given the resources and access rights available in a concrete system.
In this survey, we use resources broadly to denote data access/control, training capability, and deployment-time interface/decision control.
Accordingly, we introduce a resource-aware taxonomy that complements existing taxonomies by classifying AI bias mitigation methods according to the conditions that make them practically implementable. We use this taxonomy to structure and reinterpret existing literature on the topic, highlighting which mitigation families remain feasible under resource constraints.AI Ethics, Trust, FairnesEthical, legal and societal issuesAI Ethics, Trust, FairnesTrustworthy AIAI Ethics, Trust, FairnesBiasAI Ethics, Trust, FairnesFairness and diversity -
#SV241
Graph4LLM: A Systematic Survey of Graph-Enhanced Large Language Models
Large Language Models (LLMs) excel in natural language processing (NLP) tasks. However, they suffer from inherent limitations due to their sequence-based nature, such as structural information loss and factual unreliability. Graphs, with the ability to explicitly model entities and relations, offer an effective way to address these shortcomings. To systematically synthesize the emerging research on graph-enhanced LLMs, this survey, Graph4LLM, examines how these methods integrate graphs into various stages of the LLM pipeline, including the input, model, and output phases. For each phase, we provide a detailed review of the key methods and techniques. We also introduce a wide range of application scenarios where Graph4LLM methods demonstrate significant potential. Finally, we outline the challenges and future research directions for developing more efficient and interpretable solutions.Natural Language ProcessingLanguage modelsData MiningMining graphs -
#SV245
A Survey on Quantitative Possibility Theory in Artificial Intelligence. A Convenient Uncertainty and Preference Model
Quantitative (or numerical) possibility theory offers a simple but yet very expressive setting for handling higher-order uncertainty and in particular imprecise probabilities. The paper surveys the basic ideas and notions underlying numerical possibility theory, its relation to the other uncertainty settings and its use in AI-related issues. Numerical possibility theory looks of interest for coping with imperfect statistical information, especially non-Bayesian statistics relying on likelihood functions and confidence intervals. Quantitative possibility theory can be used in inference, machine learning, tracking and information fusion, and finally preference modeling.Uncertainty in AINonprobabilistic modelsUncertainty in AIGraphical modelsUncertainty in AIUncertainty representations -
#SV252
A Survey on Active Feature Acquisition Strategies
Active feature acquisition (AFA) studies how a predictive system can sequentially choose which feature values to obtain for each instance to balance predictive accuracy against feature acquisition cost (financial, time, invasiveness, or privacy). This survey provides the first unified treatment of modern AFA through an explicit MDP and POMDP formulation, showing that most existing methods can be understood as different approximations of the same underlying sequential decision problem. The survey proposes an up-to-date taxonomy organizing AFA into three families: (i) embedded cost-aware predictors (notably cost-sensitive decision trees and ensembles), (ii) model-based methods that plan using learned probabilistic components, and (iii) model-free or hybrid methods that learn policies from simulated acquisition episodes. We hope this POMDP-centric view both clarifies existing work and motivates new AFA methods that more directly build on the mature literature on POMDP planning and approximation. It concludes by outlining open challenges for achieving robust cost–accuracy trade-offs in practice, including reliable evaluation under realistic missingness and logging, computational constraints, and deployment requirements such as robustness and interpretability.Machine LearningActive learningMachine LearningCost-sensitive learningMachine LearningFeature extraction, selection and dimensionality reductionMachine LearningPartially observable reinforcement learning and POMDPsUncertainty in AISequential decision making -
#SV253
Sparsity in Federated Learning: A Survey
Conventional Federated Learning (FL) pipelines focus on the collaborative training of a global dense model across client devices. Sparsity has been increasingly adopted in FL, during or after local optimization, for a range of objectives, including reducing communication and computation costs, supporting unlearning, enhancing privacy guarantees, and improving local personalization. In this survey, we introduce a novel taxonomy of sparse FL methods that systematically organizes the existing literature according to their core objectives and methodological choices. Using this taxonomy, we analyze and categorize prior work, highlighting the underlying intuitions, technical mechanisms, benefits, and limitations of each class of approaches. Finally, we identify open challenges, expose research gaps, and extract guidance to help practitioners understand and adopt sparsity mechanisms in FL.Machine LearningFederated learningMachine LearningLearning sparse models -
#SV263
A Comprehensive Survey of Interaction Techniques in 3D Scene Generation
The rapid evolution of 3D scene generation has revolutionized content creation across domains such as gaming, film production, and architectural visualization. Within this landscape, interaction techniques serve as the pivotal bridge connecting user intent with generative models, enabling precise control, real-time feedback, and personalized customization of complex 3D scenes. Existing literature reviews predominantly focus on general generative paradigms or are limited to specific subdomains such as single-object modeling, often overlooking the systematic classification of interaction mechanisms. To bridge this gap, this survey presents a comprehensive survey of interaction techniques in 3D scene generation. We propose a unified taxonomy that categorizes existing methods into three primary paradigms: Interactive Generation, Interactive Editing, and Embodied Interaction. For each category, we analyze representative methods in terms of controllability, interaction granularity, and physical consistency, and discuss their advantages and limitations. We further summarize commonly used datasets and evaluation protocols for interactive 3D scene generation. Finally, we discuss future directions toward more physically grounded, multi-modal, and human-centered interactive 3D scene generation systems.Computer Vision3D computer visionMachine LearningGenerative modelsAgent-based and Multi-agent SystemsHuman-agent interaction -
#SV270
Implicit Identity Technologies for LLMs: Fingerprinting and Watermarking Across Datasets, Models, and Generated Content
Large language models (LLMs) are substantial investments and increasingly deployed in high-stakes domains, making it critical to protect LLM-related assets and to trace their provenance.Identity technologies such as fingerprinting and watermarking address these needs by enabling ownership verification and attribution, and have rapidly emerged as an active research focus.However, as the field remains at an early stage, existing techniques lack a systematic organization, leading to two key challenges—terminological confusion and isolated research lines—that have hindered the development of this research field.To this end, we present a comprehensive review of LLM identity techniques, focusing on fingerprinting and watermarking across the LLM lifecycle, including datasets, models, and generated content. We make three primary contributions. First, we introduce \emph{implicit identity} as a unifying abstraction and distinguish fingerprinting from watermarking. Second, we propose a lifecycle-based taxonomy that organises techniques by asset type and verification role, aligning each with asset protection or provenance.Third, we establish an evaluation framework around three objectives---identifiability, robustness, and deployability---and summarise representative metrics under realistic access and transformation regimes, providing a common basis for comparison. Together, these contributions unify and structure the landscape of LLM identity techniques, clarify terminology, and highlight directions toward more secure deployment.AI Ethics, Trust, FairnesSafety and robustnessAI Ethics, Trust, FairnesAI and law, governance, regulationAI Ethics, Trust, FairnesEthical, legal and societal issues -
#SV272
LLM-Based Agents on the Edge: A Survey of Privacy, Scalability, Heterogeneity, and Autonomy
Large language model (LLM)–based agents are increasingly being deployed beyond centralized cloud environments and toward the edge of the network, where they operate closer to data sources. This transition facilitates lower latency and enhances contextual awareness, privacy, and responsiveness, but it also introduces challenges that differ from traditional cloud-based agent deployments. This survey provides a systematic overview of LLM-based edge agents with a particular focus on four critical dimensions: privacy, scalability, heterogeneity, and autonomy. To facilitate structured analysis, we introduce a novel taxonomy along four axes: deployment, functional role, interaction, and adaptation. Based on our taxonomy, we analyze the challenges LLM-based agents face on the edge and discuss design solutions that can help mitigate possible issues. We further analyze the degree to which existing LLM-based edge agent frameworks achieve privacy, scalability, heterogeneity, and autonomy.Agent-based and Multi-agent SystemsAgent communicationAgent-based and Multi-agent SystemsCoordination and cooperationAI Ethics, Trust, FairnesTrustworthy AIKnowledge Representation and ReasoningLearning and reasoning -
#SV273
Knowledge-Guided 3D CT Generation: A Conditioning-Centric Taxonomy
Controllable generation guided by external knowledge is a key requirement in modern generative deep learning applications, enabling the synthesis of samples with explicit constraints on semantic content, structural properties, and variability. In 3D Computed Tomography (CT), such control is essential for clinical applications, including data augmentation, privacy-preserving data sharing, and the simulation of specific anatomical or pathological scenarios. While research on conditional 3D CT generation has expanded rapidly, the diversity of existing approaches makes systematic comparison
difficult and obscures fundamental design choices.
In this survey, we propose a conditioning-centric taxonomy that organizes the literature along three orthogonal dimensions: the type of external knowledge (K), the knowledge integration paradigm (I), and the generative architecture (A). This factorization defines an explicit design space (K x I x A) that provides a unified perspective on prior work. Using this framework, we systematize existing methods, identify dominant trends and recurring design patterns, and highlight underexplored regions of the design space that point toward promising directions for future research.Computer Vision3D computer visionComputer VisionBiomedical image analysisComputer VisionMultimodal learningMachine LearningDeep learning architecturesMachine LearningGenerative models
