IJCAI-ECAI 2026 Accepted Papers · Special Track on AI and Health
Presentation format
Every accepted paper is presented in two formats: an oral talk — which must be delivered in person in Bremen by one of the authors — and a poster during a dedicated poster session.
-
#AI4H7
When Vision-Language Models Meet Fetal Cardiac Ultrasound: Dual-Level Contrastive Learning for Out-of-Distribution Detection
Recent advances in vision-language models (VLMs) have shown remarkable performance in medical image classification tasks. However, applying VLMs to fetal cardiac ultrasound (FCU) remains challenging due to compound distribution shifts, including covariate shifts caused by cross-center heterogeneity and semantic shifts arising from clinically non-standard views. To address this issue, we propose Dual-Level Contrastive Learning (DLCL), the first prompt-based VLM framework for out-of-distribution (OOD) detection in FCU, to the best of our knowledge. DLCL explicitly shapes the vision-language representation space through complementary local-level and global-level contrastive objectives. Specifically, local contrastive learning aligns instance-level features to mitigate covariate shifts, whereas global contrastive learning regularizes global prototypes to address semantic shifts. We conduct extensive experiments on a private multi-center FCU dataset and the public ISIC-OOD dataset to validate the proposed approach. On the challenging FCU task, DLCL achieves an AUROC of 89.61% and a harmonic mean of 80.35%, significantly outperforming state-of-the-art methods.Multimodal dataMultimodal dataMedical imagingMedical imaging -
#AI4H15
MTP-DDA: Enhanced Drug-Disease Associations Prediction via Multi-task Learning with Multi-view Graph Convolutional Networks and Contrastive Learning
Predicting drug-disease associations (DDAs) plays a crucial role in drug development and disease treatment. However, existing researches predominantly focus on single DDAs prediction task, often overlooking the intricate relationships among different tasks, which can further improve the performance of methods for DDAs prediction. To address this limitation, a multi-task prediction framework, capable of simultaneously predicting drug-disease, drug-protein, and disease-protein associations, is proposed, named MTP-DDA. The framework constructs three distinct graphs to reflect different relationships between biological entities. Then, based on these graphs, two sub-views and one main-view are constructed. For sub-view, corruption strategy is adopted to generate corrupted view, and Graph Convolutional Network (GCN) is employed to extract features from both the original view and its corrupted version, with contrastive learning applied to enhance feature representations. For main-view, GCN and Node2Vec are utilized to extract low-order and high-order node features respectively, and an attention mechanism is utilized for feature fusion. Finally, the node features from three views above are integrated, and the dot product operation is applied to the node features of association pairs to derive association scores, thereby enabling multi-task association prediction. Under 10-fold cross-validation, the proposed framework outperforms current methods on public datasets, demonstrating its effectiveness and robustness.AI4HDrug discoveryAI4HMedical knowledge representationAI4HSelf-supervised learningAI4HHealth data miningAI4HPrecision medicine -
#AI4H26
D²G-TO: Task-aware and OOD-guided Discrete Graph Diffusion for Robust CNS Drug Discovery
Central nervous system (CNS) drug discovery is constrained by an immense and sparse chemical search space. Meanwhile, molecules that simultaneously achieve brain penetration, target efficacy, and synthesizability are extremely scarce. However, existing generative models rarely couple rigorous multi-objective control with robustness to distribution shift, limiting their reliability in realistic CNS design. We introduce D²G-TO, a task-aware and out of-distribution (OOD)-guided discrete graph diffusion framework that unifies multi-pharmacological properties with structural distribution guidance. A novel Structural Similarity Guidance mechanism steers generation toward in-distribution regions while repelling OOD modes, maintaining structural distributional consistency in realistic scenarios. Across BBBP, BACE, and QM9 benchmarks, D²G-TO achieves strong validity, diversity, and other metrics. In an Alzheimer's disease case study, we subject the generated molecules to cross-property pharmacological prediction and systematic ADMET profiling, followed by structure based molecular docking against BACE-1 to assess binding-mode plausibility. D²G-TO identifies candidates that jointly satisfy blood–brain barrier permeability, β-site amyloid precursor protein cleaving enzyme 1 inhibition, and synthetic accessibility. Thus, D²G-TO has the potential to serve as an efficient in silico engine for early-stage CNS drug design. The code is available at https://github.com/zhaix922/DDG_TO.Health data miningHealth data miningDrug discoveryDrug discoveryExplainable AIExplainable AI -
#AI4H51
GroupMIL: Semantic Group Based Multiple Instance Learning for Whole Slide Image Analysing
Whole Slide Image (WSI) analysis faces challenges due to gigapixel resolutions and slide-level weak supervision. Multiple Instance Learning (MIL) serves as a pivotal method for this task. However, existing MIL frameworks often fail to exploit the inherent redundancy of tissue patterns or the semantic coherence among similar patches within a WSI. We propose GroupMIL, a novel framework that introduces a differentiable grouping mechanism into the MIL framework. This approach enables the automatic emergence of semantic segments using only slide-level labels. We specifically introduce a multi-stage grouping block and a hierarchical aggregator, which progressively fuse features within and across groups to construct a robust slide-level representation. Extensive experiments on multiple public datasets across cancer subtyping and survival prediction tasks demonstrate that GroupMIL consistently surpasses state-of-the-art performance.Medical diagnosisMedical diagnosisHealth data miningHealth data miningMedical imagingMedical imaging -
#AI4H56
sc2Flow: Mitigating Mean Prediction Bias in Single-Cell Perturbation with Dual-Stage Flow Matching
Predicting single-cell gene responses to chemical perturbations is vital for personalized therapy, yet existing deep learning models face significant hurdles. Standard regression-based approaches suffer from "mean prediction bias," failing to capture cellular heterogeneity, while current Flow Matching (FM) methods struggle with the extreme sparsity of single-cell data and the dimensionality mismatches inherent in transformer-based generation. To address these challenges, we introduce sc2Flow, a dual-stage framework that unifies discrete and continuous flow matching. sc2Flow first predicts the binary mask of expressed genes and subsequently models their quantitative levels using a scalable Transformer. This decoupling effectively resolves dimensionality conflicts and eliminates parameter redundancy. Extensive benchmarks on Sci-Plex3 and other datasets demonstrate that sc2Flow significantly outperforms state-of-the-art baselines on distribution matching, successfully mitigating mean bias to preserve critical biological heterogeneity.Genomic data analysisGenomic data analysisDrug discoveryDrug discovery -
#AI4H58
PhysioGMC: Generalizable Multi-modal Coordination for Physiological Signals
Physiological signals are widely used for health assessment in clinical and daily-life settings. Established physiological signals collected for inpatient and clinical use are often impractical for patients' daily home use due to their complexity and resource demands. In contrast, wearable signals enable continuous monitoring in everyday life, but many have limited reliability and are not widely understood or accepted in clinical practices. To leverage complementary strengths of clinical and wearable physiological signals, we propose PhysioGMC, a Generalizable Multi-modal Coordination framework for Physiological signals that explicitly accounts for their strong inter-subject variability. PhysioGMC incorporates both clinical and wearable modalities into the training process to improve cross-subject performance when only a single wearable modality is available at deployment. The framework introduces a cross-modal contrastive learning module comprising two contrastive losses to jointly learn label-relevant, subject-agnostic representations across modalities. The self-supervised contrastive loss aligns latent features across modalities, while the supervised contrastive loss encourages learning label-discriminative features that are invariant to subject identity. Experiments on cardiovascular health monitoring and sleep staging tasks demonstrate that PhysioGMC consistently outperforms existing methods, achieving superior cross-subject performance at test time using only wearable modalities, such as photoplethysmography (PPG).Multimodal dataMultimodal dataMedical knowledge representationMedical knowledge representationRemote monitoringRemote monitoring -
#AI4H63
LLM as Clinical Graph Structure Refiner: Enhancing Representation Learning in EEG Seizure Diagnosis
Electroencephalogram (EEG) signals are vital for automated seizure detection, but their inherent noise makes robust representation learning challenging. Existing graph construction methods, whether correlation-based or learning-based, often generate redundant or irrelevant edges due to the noisy nature of EEG data. This significantly impairs the quality of graph representations and limits downstream task performance. Motivated by the remarkable reasoning and contextual understanding capabilities of large language models (LLMs), we explore the idea of using LLMs as graph edge refiners. Specifically, we propose a two-stage framework: we first verify that LLM-based edge refinement can effectively identify and remove redundant connections, leading to significant improvements in seizure detection accuracy and more meaningful graph structures. Building on this insight, we further develop a robust solution where the initial graph is constructed using a transformer-based edge predictor and multilayer perceptron, assigning probability scores to potential edges and applying a threshold to determine their existence. The LLM then acts as an edge set refiner, making informed decisions based on both textual and statistical features of node pairs to validate the remaining connections. Extensive experiments on TUSZ dataset demonstrate that our LLM-refined graph learning framework not only enhances task performance but also yields cleaner and more interpretable graph representations.Clinical decision support systemClinical decision support systemHealth data miningHealth data miningLLM in medicineLLM in medicine -
#AI4H68
CardioBench: Do Echocardiography Foundation Models Generalize Beyond the Lab?
Foundation models are reshaping medical imaging, yet their application in echocardiography remains limited, hindered by a heavy reliance on private datasets that prevent reproducible comparison. Echocardiography poses unique challenges, including noisy acquisitions, high frame redundancy, and limited diverse public datasets. To address this, we introduce CardioBench, a comprehensive benchmark for echocardiography foundation models. Specifically, CardioBench unifies eight publicly available datasets into a standardized suite spanning four regression and five classification tasks, covering functional, structural, diagnostic, and view recognition endpoints. Leveraging this framework, we evaluate several leading foundation models, including cardiac-specific, biomedical, and general-purpose encoders, under consistent zero-shot, probing, and alignment protocols. Our analysis reveals that while general-purpose encoders transfer well and often close the gap with probing, they struggle significantly with fine-grained distinctions like view classification and subtle pathology recognition. Results indicate that models capturing temporal cardiac dynamics perform best on functional tasks, while retrieval-based approaches generalize more consistently across datasets. By releasing preprocessing, splits, and public evaluation pipelines, CardioBench establishes a reproducible reference point to guide the architectural design of future echocardiography and possibly other medical imaging foundation models.Medical diagnosisMedical diagnosisMedical imagingMedical imaging -
#AI4H71
TvaraNet: A Lightweight Mamba Neural Network for Real-Time Medical Image Segmentation
Medical image segmentation models based on vision mamba architectures have recently shown strong performance with improved efficiency over convolutional and transformer-based models. However, existing lightweight and ultralight variants often suffer from boundary degradation and inconsistent shape prediction, thereby limiting their clinical reliability. We propose TvaraNet, an extremely lightweight segmentation network designed to preserve boundary fidelity under strict efficiency constraints. TvaraNet contains only 0.037M parameters and requires only 0.060 GFLOPs, enabling deployment in resource-constrained environments.
The architecture introduces two core lightweight modules. Parallel Skip Mamba Averaging (PASMA) enhances long-range dependency modeling by injecting globally aggregated channel context into Mamba blocks. Dilated Asymmetric Spatial Mixer Attention (DASMA) improves boundary-aware feature refinement in skip connections through efficient multi-scale spatial modulation. In addition, training-time regularization strategies, including an Adaptive Gated Gradient Reversal Layer (AGGRL), an adaptive Singular Value Decomposition (SVD) loss, and a Boundary-weighted Cross-Entropy loss (BWCE), are employed to suppress redundant features and emphasize object boundaries without adding inference overhead. Experiments on ISIC-17, ISIC-18, and a spinal segmentation dataset demonstrate that TvaraNet consistently outperforms existing lightweight models and achieves competitive or superior boundary-aware performance compared to heavier architectures. Notably, it improves mean IoU by +1.50, +1.42, and +3.35 over Ultralight VMUNet on the respective datasets, establishing TvaraNet as a practical solution for boundary-consistent medical image segmentation on edge and low-resource platforms. Code is available at: \url{https://github.com/MindsLab-GitHub/TvaraNet}Medical imagingMedical imagingMedical diagnosisMedical diagnosis -
#AI4H75
BioDisco: Multi-Agent Hypothesis Generation with Dual-Mode Evidence, Iterative Feedback and Temporal Evaluation
Identifying novel hypotheses is essential to scientific research, yet this process risks being overwhelmed by the sheer volume and complexity of available information. Existing automated methods often struggle to generate novel and evidence-grounded hypotheses, lack robust iterative refinement and rarely undergo rigorous temporal evaluation for future discovery potential. To address this, we propose BIODISCO, a multi-agent framework that draws upon language model-based reasoning and a dual-mode evidence system (biomedical knowledge graphs and automated literature retrieval) for grounded novelty, integrates an internal scoring and feedback loop for iterative refinement, and validates performance through pioneering temporal and human evaluations and a Bradley-Terry paired comparison model for statistical assessment. Evaluations suggest improved novelty and significance relative to ablated configurations and a generalist biomedical agent. Designed for flexibility and modularity, BIODISCO allows seamless integration of custom language models or knowledge graphs, and can be run with just a few lines of code.Biomedical NLPBiomedical NLPLLM in medicineLLM in medicineMedical knowledge representationMedical knowledge representationAI4HDrug discovery -
#AI4H88
DNA-PPG: A Foundation Model for Photoplethysmography via Dual Neighborhood Alignment
Existing physiological foundation models face two limitations: rigid hard-negative sampling indiscriminately repels morphologically similar samples, distorting the natural manifold; and coarse discretization strategies sever the intrinsic continuity of physiological states, inducing precision loss. To address these challenges, we propose DNA-PPG, a novel pre-training framework anchored in Dual Neighborhood Alignment. DNA-PPG integrates the Morphology-Aware Self-Supervised Branch using Time-Frequency Soft Weighting to capture universal signal dynamics shared among diverse subjects, with the Physiological Semantic Alignment Branch that projects physiological indicators into continuous semantic space to explicitly embed precise physiological priors into the representation space. We scale the pre-training to 10.7 million PPG segments from over 8,400 subjects to ensure robust generalization. Extensive evaluations on six downstream benchmarks demonstrate that DNA-PPG significantly outperforms state-of-the-art baselines, achieving an 18% reduction in regression error and an 11.5% improvement in classification performance. These results validate DNA-PPG as a robust, universal feature extractor for diverse photoplethysmography applications.Self-supervised learningSelf-supervised learningHealth data miningHealth data miningRemote monitoringRemote monitoring -
#AI4H89
GMENet: Generative Mixture of Experts Network for Multi-Center Glioma Diagnosis with Incomplete Imaging Sequences
Contemporary glioma diagnosis integrates molecular features with histopathology to guide clinical decision-making. However, in clinical settings, divergent imaging protocols result in incomplete MRI sequences, leading to two primary challenges: forcing existing frameworks to discard a large portion of clinical data during training and consequently limiting their clinical applicability. To address these limitations, we propose GMENet, a Generative Mixture of Experts Network for multi-center glioma diagnosis with incomplete imaging sequences. Firstly, we design a Cross-attention-based Gated Generation Module that synthesizes missing sequence features from available sequences via cross-attention and dynamic gating mechanisms, incorporating a cycle-consistency loss to preserve semantic integrity. Secondly, we introduce a Dynamically Weighted Experts Fusion Module that performs mixture-of-experts interaction and confidence-aware fusion over original and synthesized dual-sequence features for multi-task prediction. We evaluate GMENet on a multi-center cohort of 1,241 subjects from four in-house datasets and two public repositories. Experiments show that GMENet expands clinically usable training data by 97%, relative to complete-sequence-only data. Furthermore, it consistently outperforms state-of-the-art methods trained on complete data, demonstrating improved robustness under cross-center distribution shifts. Code is available at: https://github.com/spf-sd/GMENet.Medical diagnosisMedical diagnosisMedical imagingMedical imagingPrecision medicinePrecision medicine -
#AI4H96
Knowing When Not to Predict: Self Supervised Learning and Abstention for Safer DR Screening
Self-supervised learning (SSL) is now a standard way to pretrain medical image models, but performance is still mostly judged by downstream accuracy. For safety-critical screening tasks such as diabetic retinopathy grading, this is not enough: a model must also know when its predictions are unreliable and defer uncertain cases for clinical review. In this work, we examine how the length of SSL pretraining influences confidence calibration and confidence-based abstention.
We evaluate multiple SSL checkpoints under a fixed fine-tuning protocol and assess calibrated confidence, coverage, selective accuracy, and selective macro-F1. Across datasets and data regimes, SSL pretraining improves selective prediction compared to training from scratch. Unlike prior SSL studies that primarily evaluate downstream accuracy or AUROC, we analyze how SSL pretraining duration influences calibration and selective prediction behavior under confidence-based abstention. However, once accuracy saturates, selective performance can still change markedly across checkpoints, and longer pretraining does not consistently improve reliability. These results underscore the importance of abstention-aware evaluation and suggest that pretraining length should be treated as an important reliability-related design choice rather than only a computational detail. Code is available at https://github.com/29
muskaan712/ijcai-knowing-when-not-to-predict.Medical diagnosisMedical diagnosisSelf-supervised learningSelf-supervised learningMedical knowledge representationMedical knowledge representationMedical imagingMedical imagingPublic healthPublic health -
#AI4H102
SBSDM: A Style-aware Bidirectional Stream Diffusion Model for CT-to-PET Synthesis
CT-to-PET synthesis aims to synthesize PET images from the widely available and lower-cost CT scans to address the high cost and additional radiation exposure associated with PET scanning.
However, CT-to-PET synthesis faces two key challenges due to the sequential correlation of volumetric imaging: preserving smooth transition between adjacent slices and ensuring style consistency across long-range slices.
To overcome these limitations, we propose the Style-aware Bidirectional Stream Diffusion Model, which ensures both inter-slice continuity and global style consistency with low computational cost.
Specifically, we first utilize a Vector Quantized Variational Autoencoder to encode bidirectional adjacent CT slices into latent codes. A Neighbor Attention module is then introduced to capture transition patterns among these latent codes, ensuring structural continuity. To enhance style consistency, we further design a Prototype Prompting mechanism to construct a feature pool from long-range slices. Global style prototypes are extracted from the feature pool and dynamically integrated into the generation process, guiding the model’s attention toward consistent stylistic features across slices. Extensive experiments demonstrate that our approach significantly outperforms state-of-the-art methods.Multimodal dataMultimodal dataMedical imagingMedical imaging -
#AI4H106
Prior Knowledge-enhanced Spatio-temporal Epidemic Forecasting
Spatio-temporal epidemic forecasting is critical for public health management, yet existing methods often struggle with insensitivity to weak epidemic signals, over-simplified spatial relations, and unstable parameter estimation. To address these challenges, we propose the Spatio-Temporal priOr-aware Epidemic Predictor (STOEP), a novel hybrid framework that integrates implicit spatio-temporal priors and explicit expert priors. STOEP consists of three key components: (1) Case-aware Adjacency Learning (CAL), which dynamically adjusts mobility-based regional dependencies using historical infection patterns; (2) Space-informed Parameter Estimating (SPE), which employs learnable spatial priors to amplify weak epidemic signals; and (3) Filter-based Mechanistic Forecasting (FMF), which uses an expert-guided adaptive thresholding strategy to regularize epidemic parameters. Extensive experiments on real-world COVID-19 and influenza datasets demonstrate that STOEP outperforms the best baseline by 11.1% in RMSE. The system has been deployed at a provincial CDC in China to facilitate downstream applications.Timeseries predictionTimeseries predictionPublic healthPublic health -
#AI4H107
DIAM: Adaptive Drug Repositioning with Decoupled Biological Mechanism and Instance-Aware Modulation
Drug repositioning has emerged as an attractive drug development strategy with deep learning-based computational methods showing great potential in predicting Drug-Disease Associations (DDAs). However, dominant computational paradigms typically rely on Random Negative Sampling (RNS) and static embedding fusion, leading to two fundamental limitations. First, RNS treats unobserved pairs uniformly, resulting in coarse decision boundaries that fail to distinguish true associations from ambiguous candidates. Second, static fusion applies a monolithic combination of heterogeneous features, failing to adapt to the sample-specific dominance of different biological mechanisms. To address these issues, we propose DIAM, which establishes a mechanism-adaptive paradigm by explicitly decoupling structural and molecular signals. Specifically, DIAM introduces a Dual-Stream Biological Mechanism Decoupling module to construct global structural propagation and local molecular interaction views explicitly. Leveraging these views, we design a biological plausibility score to guide the hard negative sampling, enforcing finer-grained decision boundaries. Furthermore, an Adaptive Residual Gating (ARG) is devised to perform instance-aware modulation, dynamically weighing the contribution of global and local views for each specific pair. Extensive experiments on three benchmark datasets demonstrate that DIAM outperforms seven state-of-the-art methods. A case study on Alzheimer's disease further validates the model's effectiveness in identifying potential candidate drugs for practical application.Health data miningHealth data miningMedical knowledge representationMedical knowledge representationDrug discoveryDrug discovery -
#AI4H110
CDMIQA: A Cross-Domain Perceptual Method and Benchmark Dataset for Medical Image Quality Assessment
Medical image quality assessment (IQA) serves as a critical safeguard for precise clinical diagnosis and treatment. However, existing methods still face challenges arising from data scarcity and heterogeneity across imaging domains, which confine solutions to domain-specific designs and limit their cross-domain generalization ability. In response, we construct a multi-domain and multi-organ dataset comprising 9,105 2D and 3D medical images across three imaging domains and 18 organs, annotated by radiologists. Building upon this, we propose a cross-domain universal medical IQA method termed CDMIQA, which integrates efficient feature extractors with Hierarchical Perceptual Encoding Modules to capture and refine multi-level perceptual features while mitigating interference from noise and artifacts. Notably, the Adaptive Semantic Perception Module is designed to extract semantic features; it enhances adaptability to domain-specific images by dynamically encoding quality variations across different imaging domains and organs. Furthermore, a Top-down Feature Fusion Module is utilized to progressively aggregate features under the guidance of semantic information, reinforcing the model's feature representation capability. Experimental results on the proposed dataset demonstrate that our method outperforms standard competitors and exhibits superior generalization ability across diverse domains. Our code and dataset are available at https://github.com/Leilei-Huang-work/CDMIQA.Medical imagingMedical imaging -
#AI4H115
HealthMamba: An Uncertainty-aware Spatiotemporal Graph State Space Model for Effective and Reliable Healthcare Facility Visit Prediction
Healthcare facility visit prediction is essential for optimizing healthcare resource allocation and informing public health policy. Despite advanced machine learning methods being employed for better prediction performance, existing works usually formulate this task as a time-series forecasting problem without considering the intrinsic spatial dependencies of different types of healthcare facilities, and they also fail to provide reliable predictions under abnormal situations such as public emergencies. To advance existing research, we propose HealthMamba, an uncertainty-aware spatiotemporal framework for accurate and reliable healthcare facility visit prediction. HealthMamba comprises three key components: (i) a Unified Spatiotemporal Context Encoder that fuses heterogeneous static and dynamic information, (ii) a novel Graph State Space Model called GraphMamba for hierarchical spatiotemporal modeling, and (iii) a comprehensive uncertainty quantification module integrating three uncertainty quantification mechanisms for reliable prediction. We evaluate HealthMamba on four large-scale real-world datasets from California, New York, Texas, and Florida. Results show HealthMamba achieves around 6.0% improvement in prediction accuracy and 3.5% improvement in uncertainty quantification over state-of-the-art baselines.Health data miningHealth data miningPublic healthPublic health -
#AI4H116
QA-MoE: Quality-Aware and Stable Multimodal Mixture-of-Experts for Robust Clinical Prediction in Noisy and Missing-Modal Settings
Clinical prediction increasingly relies on multi-modal inputs, where reliability and efficiency are crucial for real-world deployment. However, mainstream fusion and MoE gating typically treat all available modalities as uniformly beneficial and allow noisy or weakly informative modalities to perturb routing, leading to instability, routing collapse, and miscalibrated confidence under missingness and shift. We propose QA-MoE, a Quality-Aware and stable multimodal Mixture-of-Experts that decouples reliability estimation from routing to enable robust, sparse, and interpretable fusion. QA-MoE adopts a modular architecture where each modality is initially encoded into a shared embedding space. To deal with structurally missing data, we employ a completion pathway that maintains a consistent interface. Unlike standard approaches, QA-MoE separates reliability estimation from the routing process. We propose an Evidential Quality Scorer to measure epistemic uncertainty, which then guides a Stability-Enhanced Subset Selector to filter out noisy modalities on the fly. Additionally, we include a Ternary Expert Aggregation mechanism acting as a specialized branch to stabilize predictions when data missingness is severe. Evaluations on clinical benchmarks (ADNI for Alzheimer’s staging and MIMIC-IV for Length-of-Stay) demonstrate that QA-MoE outperforms strong multimodal baselines, improving reliability while cutting down unnecessary computation. This indicates that QA-MoE offers a robust solution for multimodal decision support, especially in clinical settings prone to noise and missing data.Medical diagnosisMedical diagnosisMultimodal dataMultimodal dataPublic healthPublic health -
#AI4H132
Uncertainty-Guided Adaptive Conservative Offline Reinforcement Learning for Safer Mechanical Ventilation
Mechanical ventilation (MV) is essential in intensive care units (ICUs), yet conventional protocols lack personalization and risk harmful over- or under-ventilation. Offline reinforcement learning (ORL) enables policy optimization from retrospective clinical data without unsafe online interaction, but existing methods are highly sensitive to distributional shift and out-of-distribution (OOD) actions, limiting their reliability in complex clinical settings. To address these challenges, We propose UBER-CQL (Uncertainty-Balanced Exploration and Robust Conservative Q-Learning), a robust ORL algorithm for safe decision-making under dataset shift. UBER-CQL integrates heteroscedastic Bayesian neural networks with conservative Q-learning to model posterior Q-value uncertainty, which is used to adaptively penalize unreliable high-risk actions while maintaining performance within the data support. We further design numerically stable objectives for conservative Bayesian value estimation. Experiments on in-distribution and OOD subsets of MIMIC-III and eICU demonstrate that UBER-CQL outperforms state-of-the-art ORL and clinician baselines, producing safer and more effective MV strategies.Clinical decision support systemClinical decision support systemMedical diagnosisMedical diagnosisTreatment recommendationTreatment recommendationPrecision medicinePrecision medicine -
#AI4H137
BFHD: Bidirectional Feature Harmonization Decomposition for Heterogeneous Clinical Assessments
Clinical assessments are often collected using heterogeneous assessment systems across centers and time, leading to records that mix different but related sets of measurements. This motivates harmonization beyond total-score linking. We formulate clinical harmonization at the measurement level as a bidirectional recoverability problem: given paired observations from two assessment systems, the goal is to identify which measurements can be reliably translated in both directions within an application-defined tolerance, while separating non-translatable components. We propose Bidirectional Feature Harmonization Decomposition (BFHD), a feasibility-driven framework that enforces bidirectionally coupled translation and uses feature-wise output gating to produce an explicit decomposition in the original measurement space. Experiments on synthetic data and real clinical assessment pairs show that BFHD achieves broader feasible harmonization coverage and improved subset stability compared to baselines.Clinical decision support systemClinical decision support systemMultimodal dataMultimodal dataPublic healthPublic health -
#AI4H143
Two-Fold Patch Perturbation for Efficient Self-Supervised Learning in 3D Medical Imaging
Self-supervised pre-training has become a key paradigm for reducing annotation costs in 3D medical imaging, yet many recent approaches rely on complex objectives or incur substantial computational overhead. We propose a simple and efficient self-supervised pre-training framework for 3D medical images based on a two-fold patch-wise perturbation strategy. The method applies Bernoulli patch masking and discrete rotations, and trains a shared encoder with a three-head objective for reconstruction, perturbation localization, and rotation prediction. This design encourages spatially aware and transferable representations while remaining computationally lightweight. Experiments across diverse segmentation and classification benchmarks, including modality-shift scenarios, demonstrate consistent improvements over general self-supervised baselines and competitive or superior performance compared to recent medical self-supervised methods, while requiring substantially less memory, computation, and training time than the state-of-the-art pre-training pipelines.Self-supervised learningSelf-supervised learningMedical imagingMedical imagingMedical diagnosisMedical diagnosisMedical knowledge representationMedical knowledge representation -
#AI4H162
One-Shot Federated Class-Incremental Learning for Medical Imaging via Variational Feature Transfer
Federated learning (FL) enables privacy-preserving collaboration for medical image analysis across decentralized institutions but faces major challenges from non-IID data distributions, high communication overhead in multi-round protocols, and catastrophic forgetting when models must adapt to sequentially arriving tasks. These issues are particularly critical in healthcare, where repeated client participation is often infeasible. We address this setting by proposing a novel class-incremental continual learning (CL) model for a one-shot FL paradigm, in which each task introduces new classes, clients observe heterogeneous and evolving class distributions, and communication with the server occurs only once. Clients estimate class-conditional feature distributions via Variational Inference (VI) from private data and transmit compact statistics to the server, which synthesizes features to train a global classifier in a single communication round. The server aggregates these distributional summaries, synthesizes feature embedding, and learns a global classifier without revisiting real past data or contacting clients again. Our major novelty is continual adaptation at the distribution level via synthetic replay from stored class mixtures, complemented by lightweight distillation. This approach substantially mitigates catastrophic forgetting while consistently enhancing recognition of newly introduced classes. Extensive experiments on multiple medical imaging benchmarks demonstrate that our method outperforms both state-of-the-art one-shot FL and class-incremental CL approaches. Compared with existing CL models, it achieves 90–97% average accuracy with near-zero forgetting (≤ 0.10%) across different tasks and evaluation settings.Federated learningFederated learningContinuous learningContinuous learningMedical imagingMedical imaging -
#AI4H165
STAR-Net: Physics Inspired Spectral Topology Aware Reconstruction Network for Single-View Fluorescence Molecular Tomography
Fluorescence molecular tomography (FMT) serves as a pivotal modality for preclinical tumor screening. While single-view FMT offers distinct advantages in data acquisition efficiency and cost-effectiveness, the scarcity of projection views severely exacerbates photon scattering-induced depth ambiguity, rendering 3D volumetric recovery a highly ill-posed inverse problem. To address these challenges, we propose a physics-inspired spectral topology aware reconstruction network (STAR-Net). Specifically, STAR-Net establishes a synergistic framework: initially, a frequency domain decoupling strategy is introduced to simulate the physical characteristics of diffuse light fields; building on this, a differentiable inverse spectral gating (DISG) mechanism is utilized to explicitly impose low-pass spectral regularization for precise depth recovery; and further, a dual-domain synergistic module is integrated to dynamically fuse spatial and frequency features, achieving high-fidelity detail preservation. Extensive experiments on the Digimouse benchmark demonstrate that the proposed STAR-Net achieves the highest dice coefficient under single-view conditions, validating that explicit spectral topology modeling is a powerful paradigm for mitigating depth ambiguity.Medical imagingMedical imagingExplainable AIExplainable AI -
#AI4H167
Patient-Visit-Spanned Hypergraph Learning for EHR-based Diagnosis Prediction
Hypergraphs effectively model complex interactions in structured Electronic Health Records (EHR). Consequently, Hypergraph Neural Networks (HGNNs) are commonly applied to EHR-based diagnosis prediction. However, existing HGNNs struggle to capture patient-visit long-range dependencies when processing EHR-derived hypergraphs. To tackle this issue, we propose a Patient-Visit-Spanned HyperGraph Learning (PVHGL) framework specifically designed for diagnosis prediction. Concretely, PVHGL initially constructs a unified patient-visit hypergraph that integrates visit records from all patients, enabling the capture of shared healthcare patterns across the patient population. Subsequently, it incorporates a Transformer architecture enhanced by two structure-encoding matrices to facilitate one-step message propagation, which preserves both local and global hypergraph structural properties effectively. Additionally, the framework integrates a medical code co-occurrence matrix to explicitly guide the learning process by highlighting critical medical code interactions. Comprehensive experiments on three real-world datasets demonstrate that the proposed PVHGL significantly outperforms state-of-the-art baselines in diagnosis prediction.Clinical decision support systemClinical decision support systemMedical diagnosisMedical diagnosisEHR analysisEHR analysis -
#AI4H172
Dual-Channel Semantic-Enhanced Combinatorial Medication Recommendation via Knowledge Distillation
As a vital task in healthcare, combinatorial medication recommendation aims to generate drug combinations tailored to patient health status.
Precisely capturing the rich semantic information within clinical narratives is crucial for achieving this goal. However, existing approaches primarily rely on isolated identifiers (i.e. patient IDs, drug codes), failing to leverage the inherent semantic associations between patient conditions and medication descriptions. To fill this gap, we propose the Dual-Channel Semantics-Enhanced Network (DCSENet), a novel dual-channel framework that explicitly incorporates context-rich clinical narratives knowledge. DCSENet fine-tunes domain-adapted pre-trained language models (LMs) to capture semantic correlations between patient status and medication narratives. A transformer-based dual-channel decoder decodes the semantic information at the disease-level and the patient-level respectively. The disease-level channel focuses on the natural text semantic associations between diseases and drugs, while the patient-channel provides personalized features. To mitigate the prohibitive computational overhead of the LMs in clinical deployment, we introduce an attention-map-based knowledge distillation mechanism that efficiently transfers semantic knowledge from the LMs into an identifier-based (ID-based) target model. Extensive experiments on MIMIC-Ⅲ and MIMIC-Ⅳ datasets demonstrate that DCSENet outperforms existing state-of-the-art methods in recommendation accuracy while maintaining a low computational cost.Clinical decision support systemClinical decision support systemTreatment recommendationTreatment recommendation -
#AI4H174
GeoSFLoRA: Geometry-Conditioned Spectral Flow Low-Rank Adaptation for 2D-to-3D Transfer in Medical Image Segmentation
Accurate volumetric medical image segmentation is critical for clinical diagnosis, yet adapting two-dimensional vision foundation models (VFMs) to three-dimensional medical imaging remains challenging. While parameter-efficient fine-tuning (PEFT) methods such as LoRA and adapter-based schemes provide efficient alternatives to full fine-tuning, their geometry-agnostic parameter-space adaptations are insufficient to reconcile discrepancies induced by anisotropic voxel spacing and inter-slice discontinuities. We propose GeoSFLoRA, a geometry-constrained spectral-flow low-rank adaptation framework for efficient 2D-to-3D transfer learning. GeoSFLoRA adapts frozen 2D pretrained vision backbones by operating within a fixed low-rank spectral subspace induced by pretrained linear projections. For each adapted layer, a truncated singular value decomposition is computed once and kept frozen, preserving the original pretrained bases. A lightweight Geometry-Conditioned Encoder (GCE) extracts local volumetric descriptors, which are mapped to token-conditional residuals in the singular-value space, enabling bounded and geometry-aware spectral modulation. GeoSFLoRA consistently improves Dice and HD95 on BraTS20, MSD-Prostate, and MSD-Lung, approaching full fine-tuning performance and demonstrating an effective paradigm for 2D-to-3D medical image segmentation. The code is publicly available at https://github.com/chenbn266/GeoSFLoRA.Medical imagingMedical imaging -
#AI4H176
MedFiTRG: Jointly Learning Dynamic Temporal and Cross-Patient Graphs for Clinical Outcome Prediction
Integrating heterogeneous clinical modalities, structured electronic health records (EHRs), clinical text, and medical imaging is crucial for reliable clinical prediction, yet real-world data are often sparse and imbalanced. Furthermore, prior approaches treat temporal dynamics and inter-patient relationships in isolation, overlooking the dynamic interaction of patient trajectories across populations. We introduce a modality-enhanced dynamic temporal relational graph (MedFiTRG), a unified framework that jointly models sparsity, temporal dynamics, and cross-patient relational dependencies. MedFiTRG leverages modulated graph neural networks (MGNN) to learn modality-aware embeddings, enabling meaningful representation of sparse modalities through adaptive feature modulation. These embeddings are integrated into a temporal relational graph (TRG), where directed intra-patient edges capture longitudinal progression and dynamic inter-patient edges model population-level similarities for synchronized temporal-relational reasoning.
Extensive experiments on large-scale real-world datasets across four clinical tasks demonstrate that MedFiTRG achieves superior or comparable performance against state-of-the-art baselines, improving Macro-F1 from 0.155 to 0.310 for length of stay (LOS) classification and achieving an AUROC of 0.939 for mortality prediction (↑6.45%).
The code is available at https://anonymous.4open.science/r/MedFiTRG-2714Clinical decision support systemClinical decision support systemMedical diagnosisMedical diagnosisHealth data miningHealth data miningMultimodal dataMultimodal dataExplainable AIExplainable AI -
#AI4H182
VGDM: Visual Localization-Guided 3D Dental Segmentation via Extrinsic–Intrinsic Bridging
3D dental segmentation is a key task in digital dentistry. In real intraoral scans data (IOS), occlusion, scanning noise, and reconstruction artifacts often break down the geometric separation structure between teeth, resulting in adjacent teeth being incorrectly merged or a single tooth being over-segmented. Since existing point cloud or mesh-based methods usually rely on local neighborhood consistency, when there are spurious geometric connections, features will diffuse across instances along geometric shortcuts, resulting in instance-level error propagation. To address this issue, we propose VGDM (Visual-Guided Diffusion Modulation), which serves as a bridge between extrinsic visual localization cues and intrinsic surface features under detection settings, enabling the extrinsic cues to regulate the propagation of intrinsic surface features. Instead of global propagation on the entire intraoral scan mesh, VGDM uses the single-view 2D detection results to roughly localize the tooth region and construct local 3D surface patches based on it. Within the patch, we soft-constrain feature propagation by visual cues to suppress the cross-instance propagation generated along spurious geometric connections, and introduce a dual-stream diffusion structure to improve the overall robustness. Experimental results on the largest public intraoral dataset(Teeth3DS) show that VGDM can significantly improve the segmentation rate of tooth instances and effectively reduce the merging and over-segmentation of adjacent teeth.Multimodal dataMultimodal dataMedical imagingMedical imaging -
#AI4H210
LLM-Guided Monte Carlo Tree Search over Knowledge Graphs: Composing Mechanistic Explanations for Drug–Disease Pairs
Extracting multi-step explanations from knowledge graphs poses a combinatorial challenge requiring both heuristic guidance (as candidates proliferate with depth) and credit assignment (as path quality emerges over extended sequences). Frontier LLMs, strong on knowledge/reasoning benchmarks, offer a compelling source of such heuristics, yet their knowledge comes sans guarantees and compositional performance degrades as chains lengthen. We thus present TESSERA, a 3-part neuro-symbolic framework that uses LLMs in a circumscribed role: for local discriminative judgement rather than autonomous multi-step generation; the knowledge graph then defines the hypothesis space enforcing hard structural constraints, and MCTS coordinates the long-horizon search with principled credit assignment via backpropagation. LLMs perform dual roles as a prior policy biasing exploration and a comparative state evaluator supplying reward signals. Evaluation on drug mechanism elucidation across two complementary knowledge graphs demonstrates fidelity to curated biology while surfacing coherent alternative mechanisms, with ablations confirming discriminative contribution from both LLM components. Beyond its current application, our framework offers a general paradigm for compositional reasoning over structured knowledge.LLM in medicineLLM in medicineMedical knowledge representationMedical knowledge representationExplainable AIExplainable AI -
#AI4H239
Optimizing Resource-Constrained Non-Pharmaceutical Interventions for Multi-Cluster Outbreak Control Using Hierarchical Reinforcement Learning
Non-pharmaceutical interventions (NPIs), such as diagnostic testing and quarantine, are crucial for controlling infectious disease outbreaks but are often constrained by limited resources, particularly in the early outbreak stages. In real-world public health settings, resources must be allocated across multiple outbreak clusters that emerge asynchronously, vary in size and risk, and compete for a shared resource budget. We define a cluster as a group of close contacts generated by a single infected index case. Thus, decisions must be made under uncertainty and heterogeneous demands while respecting operational constraints. We formulate this problem as a constrained restless multi-armed bandit and propose a hierarchical reinforcement learning framework. A global controller learns a continuous action cost multiplier that adjusts global resource demand, while a generalized local policy estimates the marginal value of allocating resources to individuals within each cluster. We evaluate the proposed framework in a realistic agent-based simulator of SARS-CoV-2 with dynamically arriving clusters. Across a wide range of system scales and testing budgets, our method consistently outperforms RMAB-inspired and heuristic baselines, improving outbreak control effectiveness by 20--30%. Experiments on up to 40 concurrently active clusters further demonstrate that the hierarchical framework is highly scalable and enables faster decision-making than the RMAB-inspired method.Public healthPublic health -
#AI4H266
Med-StepBench: A Hierarchical Reasoning Framework for Evaluating Hallucinations in Medical Vision-Language Models
Large vision-language models (VLMs) demonstrate strong performance in medical image understanding, but frequently generate clinically plausible yet incorrect statements, raising significant safety concerns. Existing medical hallucination benchmarks primarily focus on 2D imaging with one-shot diagnostic questions, offering limited insight into whether predictions are grounded in correct localization and abnormality identification, allowing critical reasoning errors to remain hidden behind seemingly correct diagnoses. We introduce Med-StepBench, the first large-scale benchmark for step-wise hallucination detection in 3D oncological PET/CT, comprising over 12,000 images and more than 1,000,000 image–statement pairs across volumetric and multi-view 2D data, which decomposes clinical reasoning into four expert-designed diagnostic stages. Using clinician-verified annotations, we perform the first step-level evaluation of general-purpose and medical VLMs, revealing systematic failure modes obscured by aggregate accuracy metrics. Furthermore, we show that current VLMs are highly susceptible to adversarial yet clinically plausible intermediate explanations, which significantly amplify hallucinations despite contradictory visual evidence. Together, our findings highlight fundamental limitations in grounding multi-step clinical reasoning and establish Med-StepBench as a rigorous benchmark for developing safer and more reliable medical VLMs.Automated reasoning in clinical domainsAutomated reasoning in clinical domainsMultimodal dataMultimodal dataMedical imagingMedical imaging -
#AI4H270
TangentFuse: Low-Latency MEG Speech Activity Detection via Riemannian Covariance - CNN Fusion
Speech activity recognition in MEG-based non-invasive BCI systems provides a reliable speech gate that can trigger downstream decoders only when speech-related neural activity is present. Such a gate can help users interact with assistive devices in continuous settings. While MEG provides excellent temporal resolution, many MEG speech activity classifiers do not fully exploit available spatial information. We describe a hybrid MEG speech/non-speech classifier that combines a geometry-aware covariance branch with temporal neural streams. We compute shrinkage covariance matrices (SPD) and map them to a Riemannian tangent space around a reference mean; a logistic regression classifier operates on these features. A limited sensor array defines the region-of-interest component, while the final system adds a residual temporal fusion layer over aligned probability streams. The final decision rule, including fusion weights, thresholds, and sequence-level post-processing, is selected on validation only and then applied unchanged to frozen test. On a large within-subject MEG corpus, the final system achieved validation macro-F1 of 0.8972 and frozen-test macro-F1 of 0.8914. This provides a compute-efficient research prototype for MEG-based speech gating.Timeseries predictionTimeseries predictionMedical imagingMedical imagingHealth data miningHealth data miningMultimodal dataMultimodal data -
#AI4H271
Structure-Aware Contrastive Learning for Biomedical Embeddings: Bridging the Gap Between HPO and Clinical Literature
Large Language Models (LLMs) are extensively used at biomedical text processing but often fail to capture the complex, functional relationships encoded in expert knowledge graphs like the Human Phenotype Ontology (HPO). This "semantic gap'" limits their utility in precision medicine tasks such as rare disease diagnosis, where distinguishing overlapping clinical presentations requires understanding underlying pathophysiological connections rather than just surface-level textual similarity. In this work, we propose a Neuro-Symbolic Alignment Framework that bridges this separation by integrating literature-mined specialized phenotypical descriptions with the ontological structure used as reference. Specifically, we augment phenotype representations with automatically selected text fragments from massive corpus of descriptions mined from scientific literature (PubMed), overcoming the typical data scarcity of standard ontology definitions. We define a new embedding adaptation procedure whose fine-tuning approach is guided by a novel "Disease-Overlap" similarity measure, which prioritizes clinical co-occurrence of phenotypes over taxonomic distance, and optimizes the embedding space using AnglE Loss to mitigate gradient saturation. Extensive evaluations show that our approach significantly outperforms state-of-the-art baselines, including SapBERT, on both intrinsic semantic correlation and practical downstream tasks, including synthetic patient disease ranking and solving real cases stored in Phenopacket, where our model achieves x4 top-1 accuracy than the previous best model.Clinical decision support systemClinical decision support systemBiomedical NLPBiomedical NLPLLM in medicineLLM in medicineMedical knowledge representationMedical knowledge representationComputational phenotypingComputational phenotyping -
#AI4H272
ST-BiT: Spatio-Temporal Bipartite Transformer Network for Interaction-Preserving EEG-Based Dementia Subtyping
EEG-based dementia classification often degrades under clinically realistic subject-wise evaluation due to non-stationarity and large inter-subject variability. A key modeling limitation is relation compression: many EEG-GNN pipelines encode functional connectivity as scalar edge weights and blur interaction structure during message passing, while models may also exploit subject-specific cues. We propose ST-BiT, a Spatio-Temporal Bipartite Transformer that represents electrodes as node tokens and functional interactions as explicit, learnable edge tokens. Edge tokens preserve pairwise coupling patterns and are updated only from their endpoint electrodes via incidence-masked cross-attention, while electrode tokens aggregate information only from incident edges. Window-wise sparse graphs are constructed from time-sample correlations of band-limited signals to sparsify and initialize edge tokens. ST-BiT combines this interaction-preserving backbone with temporal self-attention, lightweight band attention, and domain-adversarial alignment to reduce subject bias. On OpenNeuro ds006036 (eyes-open with photic stimulation), using leakage-free subject-wise stratified 5-fold cross-validation with nested model selection, ST-BiT achieves 93.0% accuracy for CN vs. (AD+FTD) and 76.1% for CN/AD/FTD, outperforming classical ML and GNN baselines under identical folds. To assess robustness across recording states and align with prior work on this cohort, we also evaluate on the ds004504 (eyes-closed) dataset.Medical diagnosisMedical diagnosisHealth data miningHealth data miningTimeseries predictionTimeseries predictionMultimodal dataMultimodal dataExplainable AIExplainable AI -
#AI4H283
When Pulling Fails: Understanding and Alleviating SDF Collapse in Sparse Freehand Ultrasound Reconstruction
Despite being a cost-effective modality for volumetric imaging, freehand three-dimensional (3D) ultrasound produces inherently sparse data due to the significant elevational gaps left by tracked 2D sweeps. This sparsity poses a unique challenge for Implicit Neural Representations (INRs). While successful in other domains, INRs applied here tend to fail as the learned signed distance fields (SDF) collapse toward zero inside the object, leading to the loss of concavities and the incorrect closure of anatomical gaps. Our analysis identifies the root cause as a statistical bias in gradient-based sampling objectives, showing that symmetric volumetric sampling mathematically drives the expected SDF value to zero. To rectify this, we present a geometry-aware framework that explicitly anchors the non-negative half-space. Our method utilizes a boundary-directed exterior sampling strategy to ensure non-negative constraints in empty areas, complemented by an ellipsoid-based adversarial mechanism to regularize the global field distribution. Experiments on multiple anatomical datasets demonstrate that our approach mitigates field collapse and improves geometric fidelity and topological consistency on most metrics. Code is available at https://github.com/jiuanchen/Pulling-Fails-SDF.Medical imagingMedical imagingAI4HSelf-supervised learningAI4HMultimodal data -
#AI4H284
SynCABEL: Synthetic Contextualized Augmentation for Biomedical Entity Linking
We present SynCABEL (Synthetic Contextualized Augmentation for Biomedical Entity Linking), a framework that addresses a central bottleneck in supervised biomedical entity linking (BEL): the scarcity of expert-annotated training data. SynCABEL leverages large language models to generate context-rich synthetic training examples for all candidate concepts in a target knowledge base, providing broad supervision without manual annotation. We demonstrate that SynCABEL, when combined with decoder-only models and guided inference establish new state-of-the-art results across three widely used multilingual benchmarks: MedMentions for English, QUAERO for French, and SPACCC for Spanish. Evaluating data efficiency, we show that SynCABEL reaches the performance of full human supervision using up to 60% less annotated data, substantially reducing reliance on labor-intensive and costly expert labeling. Finally, acknowledging that standard evaluation based on exact code matching often underestimates clinically valid predictions due to ontology redundancy, we introduce an LLM-as-a-judge protocol. This analysis reveals that SynCABEL significantly improves the rate of clinically valid predictions. Our synthetic datasets, models, and code are released to support reproducibility and future research.
• HuggingFace Datasets & Models
• GitHub RepositoryHealth data miningHealth data miningEHR analysisEHR analysisBiomedical NLPBiomedical NLPLLM in medicineLLM in medicineMedical knowledge representationMedical knowledge representation -
#AI4H287
Hierarchical Conditional Energy Modeling for Medical Vision–Language Pretraining
Contrastive vision–language pretraining models such as CLIP align images and text in a shared embedding space but do not explicitly model or evaluate the hierarchical semantics common in medical image interpretation. We propose HCE-CLIP (Hierarchical Conditional Energy CLIP), a vision–language pretraining framework that formulates medical image–text alignment as a hierarchical label-conditional energy modeling problem. HCE-CLIP encodes an image series using transformer-based aggregation and aligns it with free-text reports and structured label state prompts across multiple semantic levels. At each level, conditional energy functions favor clinically consistent label states while suppressing contradictory alternatives, enabling uncertainty-aware inference. To assess semantic coherence, we introduce a hierarchical contradiction-based metric that quantifies logical inconsistencies between fine-grained disease predictions and higher-level clinical summaries. Experiments on MIMIC-CXR and other public benchmarks show that HCE-CLIP outperforms existing medical vision–language pretraining methods in seen-label, zero-shot and linear-probe settings, while producing substantially fewer hierarchical contradictions.Multimodal dataMultimodal dataBiomedical NLPBiomedical NLPLLM in medicineLLM in medicineMedical imagingMedical imaging -
#AI4H293
FediLoRA: Practical Federated Fine-Tuning of Foundation Models Under Missing-Modality Constraints
Federated Learning with LoRA fine-tuning offers an efficient and privacy-aware solution for institutions to collaboratively leverage their large datasets to train VLLMs. However, participating institutions often possess heterogeneous computational resources, resulting in imbalanced LoRA ranks, which pose a major challenge for effective collaboration. In addition, real-world applications in domains such as healthcare and transportation frequently suffer from missing modalities due to user mistakes or device failures, which significantly degrade global model performance in federated settings. To the best of our knowledge, no prior work has addressed these two challenges simultaneously in federated VLLMs. To tackle these issues, we propose FediLoRA, a lightweight federated LoRA aggregation framework that effectively mitigates the impact of missing modalities in heterogeneous environment. FediLoRA is explicitly motivated by the observation that simple averaging and structured editing can jointly benefit both global and personalized models. Our approach achieves strong performance across multiple general-domain and medical-domain benchmark datasets. Additional experiments on healthcare data further demonstrate that FediLoRA is well-suited for practical, real-world deployment scenarios. Our code is released at https://github.com/gotobcn8/FediLoRA.Federated learningFederated learningLLM in medicineLLM in medicineAI4HMultimodal data -
#AI4H298
SAM-GPT: Hilbert Curve Enhanced Mamba for Brain Lesion Segmentation and VLM-based Analysis
Recent breakthroughs in Vision-Language Models (VLMs) have shown their capabilities in medical analysis tasks, but they remain limited in the brain lesion image domain, especially when pathological regions occupy a small portion of the image. The problem arises because VLMs are prone to put excessive attention on background regions that are visually similar to target regions. Based on the findings, we propose SAM-GPT, a novel framework that leverages segmentation-derived spatial priors to support VLM-based lesion classification. The framework first employs an enhanced segmentation model to localize pathological regions for diagnostic task, and then converts lesion attributes (e.g., volume size, pixel range, lesion location) into linguistic guidance for a vision–language model. To enhance small lesion recognition, we incorporate a new Hilbert scanning method into Mamba that improves both local spatial continuity and global spatial modeling which is critical for identifying subtle pathological regions. Experiments on benchmark datasets show that our model achieves an average Dice coefficient of 72.80% on the brain lesion segmentation task and an accuracy of 80.56% on the brain disease classification task, which indicates the effectiveness of the proposed framework.AI4HMultimodal dataAI4HLLM in medicineAI4HMedical imagingAI4HMedical diagnosis -
#AI4H302
Shapley Regression for Rare Disease Diagnosis Support: A Case Study on APDS
Activated PI3Kδ Syndrome (APDS) is a rare genetic immune disorder caused by variants in PIK3CD or PIK3R1, with highly heterogeneous symptoms that often delay diagnosis. Early recognition is hampered by overlapping clinical presentations and limited clinician awareness, motivating systematic, data-driven approaches to detect APDS-associated phenotypic patterns in routine electronic health records. Traditional linear scoring systems cannot capture complex symptom interactions, while deep learning models, though expressive, often lack interpretability. To bridge this gap, we propose Shapley regression, a novel game-theoretic model replacing the linear predictor with a k-additive cooperative game, explicitly modeling co-occurrence of symptoms while maintaining the transparency and convexity of logistic regression. We carry out an empirical study of our lightweight method on eight public biomedical datasets, showing that a 2-additive model with ℓ2 regularization achieves an optimal trade-off between predictive power and noise robustness. We apply it to a real-world cohort of 222 patients, on which Shapley regression accurately distinguished APDS cases from matched controls, confirming and validating phenotypes known to be associated with APDS, and facilitating the exploration of pairwise interactions between symptoms, validated by clinical experts.Clinical decision support systemClinical decision support systemExplainable AIExplainable AIAI4HEHR analysisComputational phenotypingComputational phenotyping -
#AI4H306
Singly-Connected Multiple Minimal Networks for Efficient Temporal Reasoning About Multiple Clinical Guideline Instantiations
Temporal constraints are an intrinsic component of most clinical guidelines. Several approaches to computerized clinical guidelines (CIGs) offer temporal reasoning facilities to support the execution of a CIG for a specific patient, mostly based on bounds on differences and on the Simple Temporal Problem framework. However, in scheduling activities (e.g., within a hospital), it is necessary to consider multiple executions of CIGs for different patients, which can also be (partly) related to each other. In this work we extend current temporal reasoning techniques to apply to such a context. We propose a new temporal constraint model, prove its properties, and exploit them to provide efficient management of patients' temporal constraints, and to support efficient query answering. We also propose an experimental evaluation, demonstrating the step forward with respect to current approaches. Notably, our approach is general, and can apply to all temporal reasoning problems having the same topology of the "multiple CIG execution" problem.Automated reasoning in clinical domainsAutomated reasoning in clinical domainsMedical knowledge representationMedical knowledge representation -
#AI4H319
OrthKD: Extracting Generalized Clinical Knowledge from Heterogeneous Teachers for Lightweight Deployment
Deploying diabetic retinopathy (DR) screening models in primary care requires edge-efficient systems that remain accurate, safe, and reliable under domain shift. Multi-teacher knowledge distillation (KD) is a natural compression strategy, but existing approaches largely assume that all teachers provide equally trustworthy supervision. In our setting, this assumption fails: a strong CNN teacher (EfficientNet-B3, 0.876 QWK) and a weaker Transformer teacher (Swin-Base, 0.830 QWK) are complementary, yet the Transformer's logits can still mislead the student. We therefore propose OrthKD, a selective-trust distillation framework that transfers full supervision from the strong CNN, uses feature-only distillation from the weak ViT, and enforces orthogonality between teacher-specific student projections to encourage complementary rather than redundant evidence. This design preserves local lesion precision, injects global structural context, and improves robustness to distribution shift. On 132,049 retinal images, a 5.4M-parameter MobileNetV3 student reaches 0.885 QWK on EyePACS and improves zero-shot Messidor-2 performance from 0.507 to 0.728 QWK, while also achieving strong referral AUC and calibration. These results show that selectively distilling heterogeneous teachers can enable practical DR screening on resource-constrained devices.Clinical decision support systemClinical decision support systemMedical diagnosisMedical diagnosisMedical imagingMedical imagingTelehealthTelehealthPublic healthPublic health -
#AI4H329
THGAgents: Traceable Biomedical Hypothesis Generation via Dynamic Causal Reasoning
While Large Language Models (LLMs) offer promise in scientific discovery, leveraging LLMs to drive biomedical research requires the scientific discovery process to be performed in combination with cutting-edge biomedical research and rigorous mechanistic causal chains. As such, both current Retrieval-augmented generation methods lacking causal reasoning capabilities, and the static traditional knowledge graphs failing to reflect evolving scientific knowledge, present obstacles to utilizing LLMs as scientific discovery tools. In response to these ongoing challenges, we present THGAgents. THGAgents utilize collaborative and dynamically updating agents to build a Traceable Causal Knowledge Graph, which serves as the foundation for the evidence-based knowledge structure. Crucially, we employ an LLM-driven heuristic search algorithm to traverse the complex network, balancing both novelty and rigor to deduce strict, evidence-based mechanistic causal chains. Additionally, THGAgents utilize a generator-critic loop to support hypothesis refinement. In experimental benchmarks across both cancer systems and neuroscience, THGAgents achieved up to a 0.80 hit rate in predicting validated scientific discoveries, providing an almost 9.5% increase in hypothesis quality scores versus current state-of-the-art systems, and decreasing the mechanistic hallucination rate to 1.12%. Our code is available at https://github.com/yangCode-res/THGAgents/.LLM in medicineLLM in medicineMedical knowledge representationMedical knowledge representation -
#AI4H347
Multiscale-adaptive and Size-adaptive PSO-based Feature Selection for Gene Expression Analysis
In gene expression analysis, high dimension low sample size data limits the applicability of deep learning, motivating increasing interest in variable-length evolutionary algorithm–based feature selection methods (VLEAs). However, existing VLEAs suffer from unreliable single-metric discrimination under dynamic search-space variation, mismatches between population size and search space dimensionality, and particle performance degradation after search space changes. To this end, a Multiscale-adaptive and Size-adaptive Particle Swarm Optimization (MASA-PSO) is proposed for gene expression analysis. MASA-PSO adopts a multiscale-adaptive weighting framework to explore feature subsets that distinguish between-class sample distributions during search spaces changes, and theoretically proves it enables the collaborative evaluation of multiple metrics. Meanwhile, it proposes an adaptive population division that explicitly models the functional relationship between the population size and the search space to resolve the mismatch. Furthermore, a particle degradation phenomenon impairing the performance of VLEAs is observed and alleviated through a hybrid elite strategy. Experiments on ten gene expression datasets verify that MASA-PSO outperforms state-of-the-art methods in classification accuracy while capturing smaller feature subsets.Health data miningHealth data miningGenomic data analysisGenomic data analysis -
#AI4H367
Structured Modality-Aware Token Interaction for Multimodal Medical Imaging
Multimodal medical imaging benefits from the global context modeling of transformers, yet most existing models fuse modalities implicitly by channel concatenation, leaving cross-modal interaction
unstructured or relying on costly multistream cross-attention. We propose Modality-Aware Token Interaction (MATI), an architecturally lightweight and backbone-agnostic module that structures multimodal interaction within a single token stream by partitioning embedding channels into modality-aligned subspaces. MATI performs modality-preserving intra-subspace self-attention and gated global mixing for controlled inter-subspace exchange. We instantiate MATI in UNETR and introduce two architectures: ModaUNETR^S refines selected skip-token representations, and ModaUNETR^E injects MATI into the transformer encoder to progressively shape the token hierarchy. Experiments on the BraTS 2020 benchmark employ five-fold cross-validation and report segmentation and efficiency metrics on the official training and validation sets. Both models consistently improve over UNETR across all tumor subregions, with larger gains for tumor core and whole tumor, and ModaUNETR^E further improves upon ModaUNETR^S. An ablation study confirms monotonic gains from structured interaction. Compared with heavier transformers, the proposed models achieve a favorable accuracy-efficiency trade-off without modality-specific encoders or quadratic cross-attention, supporting structured multimodal interaction as a first-class architectural principle. Implementation of MATI and its instantiations is available at https://github.com/S3l11/MATI.Clinical decision support systemClinical decision support systemMultimodal dataMultimodal dataMedical knowledge representationMedical knowledge representationMedical imagingMedical imaging -
#AI4H377
X-FEMR: A Token-level Explainable Approach for Electronic Health Records Foundation Models using Transformer-based Models
Foundation Models for Electronic Health Records (FEMRs) are pretrained on large-scale structured patient data, enabling them to convert longitudinal patient trajectories into generalizable representations for diverse clinical prediction tasks. Despite their effectiveness, FEMRs remain black-box models, raising concerns about bias, interpretability, and clinical trust. To address this, we propose the first token-level explainability approach for FEMRs. We train a Transformer-based surrogate model on input-output pairs from the FEMR across two prediction tasks, approximating its behavior while preserving temporal dynamics. We identify the most influential tokens, providing insights into how FEMRs leverage different aspects of patient history for predictions. To evaluate clinical relevance, we introduce a novel clinical alignment metric that quantifies the correspondence between the surrogate model’s key tokens and clinically validated features. Our results demonstrate that the surrogate closely approximates FEMR predictions and that token-level explanations align well with clinical knowledge, offering a practical framework for interpretable and trustworthy clinical AI.Clinical decision support systemClinical decision support systemHealth data miningHealth data miningEHR analysisEHR analysisMultimodal dataMultimodal dataExplainable AIExplainable AI
