Dysphagia, defined as the difficulty in transporting liquid or solid boluses from the mouth to the stomach, affects a significant portion of the general population. Its prevalence is notably higher among older adults, stroke survivors, individuals with neurodegenerative diseases such as Alzheimer's and Parkinson's, and patients with head and neck cancers.

The consequences of dysphagia are extensive and severe. Social isolation, malnutrition, and dehydration are common, alongside diminished muscle strength and immobility. More critically, dysphagia can lead to aspiration pneumonia, which significantly increases morbidity and mortality rates. The condition also imposes a substantial burden on healthcare systems and caregivers, often necessitating significant lifestyle alterations for patients and their families.

Early detection of dysphagia is crucial for the timely and appropriate selection of treatment strategies. Effective interventions can include swallowing exercises, compensatory swallowing strategies, bolus consistency modification, and comprehensive caregiver and patient education. These measures not only accelerate the recovery of overall health but also reduce the efforts and costs associated with rehabilitation. Identifying dysphagia at an early stage thus plays a vital role in improving patient outcomes and minimizing healthcare burdens.

The videofluoroscopic swallowing study (VFSS) is the preferred method for dysphagia screenings because it provides real-time visualization of bolus movement and the dynamics of swallowing-related anatomical structures. It is the only technique that can detect silent aspiration, which occurs without coughing. During VFSS, a Speech-Language Pathologist (SLP) mixes food or liquids with a contrast agent (barium sulfate) and instructs the patient to swallow while a radiologist records the process using X-ray video. The SLP then reviews the videos to ensure airway protection and assess the swallowing process.

swallow_gif

Analyzing the swallowing function frame-by-frame is labor-intensive and subjective, requiring multiple reviews due to the repetitive nature of the exams. This can lead to human error and fatigue. Studies have shown inconsistent judgments among experts, particularly in the biomechanical aspects of the pharyngeal phase. Manual assessments are also prone to errors because some swallowing functions, like the pharyngeal phase, occur briefly and require precise frame selection and monitoring. Therefore, there is a need for an automated approach to detect and classify airway invasion.

Recently, our group introduced a method to detect penetration or aspiration using a long-term recurrent convolutional network (LRCN). The significance of this method relies on the fact that the convolutional layers capture the spatial dependencies of the data, while the long short-term memory (LSTM) blocks maintain the temporal dependencies. Swallowing mechanics are intricate, involving over 30 nerves and muscles, some of which activate unconsciously. Therefore, the chosen model must identify relationships among all structures and dynamically connect them over time. This is why we opted for an LSTM-based architecture, which may effectively learn and recognize the sequential nature of swallowing. The final classification accuracy of this model is 85%, with an area-under-curve of 0.89, which indicates a promising result.

LRCN Architecture

 

 

LRCN Area-Under-Curve

In future work, we aim to expand and improve the classification performance. For more information, please contact Hesam Abdolmotalleby.