Hesam Abdolmotalleby is the main researcher and lead author on the paper "Detecting Airway Invasion in Variable-Length Videofluoroscopic Swallowing Studies: A Vision Transformer Approach for Oropharyngeal Dysphagia". Recognizing the burden of manual interpretation for videofluoroscopic swallowing studies (VFSS) in diagnosing dysphagia, this research developed a novel Vision Transformer (ViT) model. The ViT utilizes a temporal sliding window and 3D patch tokenization to robustly capture spatio-temporal dependencies within variable-length VFSS sequences. Evaluated against 1154 VFSS sequences, the ViT achieved an impressive 84.37% accuracy, 90.81% sensitivity, and 79.49% specificity, significantly outperforming several conventional Convolutional Neural Network (CNN) baselines like VGG-16 and ResNet-50. These results highlight the ViT's strong capability for automated VFSS classification, establishing a promising foundation for the clinical deployment of AI-driven tools to streamline dysphagia screening and improve timely abnormality detection.
This study introduces a Vision Transformer (ViT) model that effectively automates the classification of videofluoroscopic swallowing studies (VFSS) for dysphagia, outperforming traditional CNNs by accurately detecting abnormalities.
Friday, December 12, 2025