A persistent challenge in the treatment of head and neck squamous cell carcinoma is understanding the spread of both clinical and subclinical regional metastases. The presence of nodal metastasis is a significant prognostic indicator and directly drives treatment options. While the pattern and distribution of nodal metastasis has been well established, few studies report on the probability of metastasis by location. Here, we use machine learning in the form of Bayesian Networks to create a robust and predictive model for regional spread of squamous cell carcinoma of the tongue. One of the significant challenges in modeling tumor metastasis is the paucity of data with sufficient variables to fully describe spread. As many studies focus on the distribution of spread and not probability, they report datasets which are insufficient for training robust models. Our approach offers a means to combine small but ‘complete’ datasets with heterogeneous ‘partial’ datasets from disparate studies towards training one model.
The medical records of 54 previously untreated patients with non-recurrent squamous cell carcinoma of the lateral tongue were combined with reports from previously published articles for a combined study size of 303 patients. Data from the 54 new patients was deemed ‘complete’, as each patient had information on the location of the primary site, stage of the primary site (T1-4) and site of metastasis. Data from previously published articles reported was deemed ‘partial’ as one of the aforementioned categories was either missing or only reported in aggregate. From these datasets, 223 subjects were used to train, or iteratively define the parameters of our model. These subjects included new patients collected from the University of Washington as well as ‘partial data’ obtained from three of the previously published articles sets. External validity was demonstrated by testing the trained model against the fourth (and largest) previously published dataset with a sample size of 80. Our Bayesian model demonstrated good internal and external validity as areas under the ROC curves were 0.89 and 0.83, respectively. Thus, we demonstrate an approach whereby a Bayesian Network can be trained using disparate small datasets in order to create a robust model for tongue squamous cell carcinoma regional metastasis. This approach also forms the theoretical foundation for the development of models for other primary sites, including the possibility of combining several models from different primary sites to create an overarching model for the metastasis of head and neck squamous cell carcinoma.