Using Laboratory Data for Prediction of 30-Day Hospital Readmission and Using Deep Learning to Find Low-Dimensional Representations of TCGA Gene Expression Data
Abstract: Prior work in readmission risk prediction has under-utilized laboratory data, which may provide valuable information about a patient’s condition. We aim to assess the contribution of laboratory data in predicting readmission risk. Preliminary work has focused on pediatric seizure, which has the highest volume of pediatric readmissions but no identified readmission risk factors.
We used ICD-9 codes to identify seizure-specific visits to Children’s Hospital of Pittsburgh of UPMC during 2007-2012. Patients were considered readmitted if they returned to the hospital within 30-days post-discharge. We extracted features to summarize laboratory data for each patient. We used a training dataset (2007-2011) to rank features with information gain ratio and added features to a baseline model in order of rank. We kept features that improved prediction accuracy under 10-fold cross validation. A testing dataset (2012) was used to compare the AUROCs of the baseline model and the model with the added laboratory features. The addition of laboratory features significantly improved the prediction ability of the model, which suggests that laboratory data may be useful in identifying patients at risk of readmission. Ongoing work includes examining the contribution of laboratory data to readmission risk in adult heart failure patients.
Using Deep Learning to Find Low-Dimensional Representations of TCGA Gene Expression Data
Jonathan Young, MD, Masters Fellow
Abstract: Understanding the cellular signal transduction pathways that drive cells to become cancerous is fundamental to developing personalized cancer therapies that decrease the morbidity and mortality of cancer. The purpose of this study was to develop an unsupervised deep learning model for finding meaningful, low-dimensional representations of lung cancer gene expression data. Clustering these low-dimensional representations provided more insight into patient survival than clustering the high-dimensional input data.