: Head and neck squamous cell carcinoma (HNSCC) is among the most prevalent neoplasms in the world. Likewise in other malignancies, the TNM System is the main tool for the prediction of 5-year overall survival and is based on anatomical, histological, and clinical information, as well as data on Human Papillomavirus (HPV) infection. However, the efficacy of this system is criticized by joining patients into heterogeneous categories, as well as for not being able to predict individualized probabilities of death.
: To create a 5-year mortality prediction model to be applied at the first presentation, that performs better than TNM, as well as to evaluate the incremental contribution of HPV status to this model.
: a retrospective study of data prospectively collected between July 2000 and August 2011 by the GENCAPO study group. Data were mined using Classification and Regression Trees, Random Forests, and Boosted and Regression Trees. Logistic Regression (LR) and Artificial Neural Network (ANN) models were created for all tumor sites and for each site separately. Methods of internal validation, discrimination - by the area under the ROC curve (AUC) - and calibration - Goodness of Fit test - were performed. Paraffin-embedded tumor samples from 692 patients were tested for HPV DNA by polymerase chain reactions (PCR). The contribution of HPV on model performance was analyzed by the coefficient of determination and the AUC.
: 1811 patients were analyzed, and HPV was positive in 8,7% of all tumors and in 11,8% of oropharyngeal tumors. After data mining the predictive variables, 28 were selected. The LR model for all sites selected tumor volume, sex, age, and lymph node enlargement as significant variables, with an AUC=0.77, while the ANN model obtained an AUC=0.76. The LR models for oral cavity and oropharynx tumors selected the same variables as the model for all sites, the LR model for larynx excluded sex as a significant variable, while the LR model for hypopharynx select only age and lymph node enlargement. The AUC for tumors of the oral cavity were 0.78 (LR) and 0.74 (ANN); for oropharynx were 0.72 (LR) and 0.65 (ANN); for hypopharynx were 0.45 (LR) and 0.58 (ANN); and for larynx were 0.71 (LR) and 0.76 (ANN). All models, except for the hypopharynx, presented good calibration. The survival curves according to the quartiles of predicted risk of death (Q1≤25%, 2575%) and according to the TNM stage were compared. The quartile curves presented a better visual separation than the TNM stage curves (Tables 1-4). There was no prediction increment by adding HPV status to any of the models.
: It was possible to develop models with good performance to predict 5-year survival, with easy to retrieve information, that were superior to the TNM System. No prediction increment was obtained by adding HPV status to the models in our study population.