Longitudinal magnetic resonance imaging (MRI) is essential for diagnosing and monitoring multiple sclerosis (MS), a chronic central nervous system disorder. Tracking brain lesion evolution over time is essential for predicting MS progression, yet this process is time-consuming and subject to intra- and interobserver variability. While deep learning models such as convolutional neural networks (CNNs) and vision transformers (ViTs) have been applied to lesion detection, they often struggle to fully capture spatial, structural and temporal relationships. Vision graph neural networks (ViGs) present a novel approach with the potential to improve performance in these tasks by effectively capturing relational and structural information. We introduce DEFUSE-MS, a Deformation Field-Guided Spatiotemporal ViG-Based Framework for detecting MS new T2-weighted lesions. The framework features a Heterogeneous Spatiotemporal Graph Module (HSTGM), which functions as both an encoder and decoder. Evaluated on the MSSEG-II dataset, DEFUSE-MS achieves state-of-the-art performance with a lesion detection F1 score of 0.65, sensitivity (SensL) of 0.74, positive predictive value (PPVL) of 0.65, and a mean segmentation Dice score of 0.55, outperforming the state-of-the-art methods. These results highlight DEFUSE-MS’s efficacy in MS new lesion detection. The code is available at https://github.com/BioMedIA-MBZUAI/DEFUSE-MS