From Double Edges to One: Advancing Health Equity in Clinical Data Science (Newman-Griffis) & Leveraging Causal Modeling to Predict COPD Progression (Gregg)

Seminar Date: 
Seminar Time: 
11am - 12pm
Seminar Location: 
Denis Newman-Griffis, PhD & Robert Gregg, PhD
Presenter's Institution: 
University of Pittsburgh


(Newman-Griffis)  As data science technologies proliferate in healthcare, and are seeing increasing deployment into real-world settings, they present both a challenge and an opportunity to improve health equity for all populations. Data science technologies, including artificial intelligence, enable processing complex, highly multivariate data about all aspects of health and health experience, allowing us to take advantage of far more precise information than can be processed by humans alone. At the same time, without careful design, data science technologies can perpetuate and exacerbate existing inequities in healthcare access and quality. I will describe recent work and ongoing directions on advancing two aspects of equity in natural language processing and clinical data science: developing informatics methods for analyzing information on the lived experience of function and disability; and analyzing racial and gender inequities in clinical documentation practice. I will show how careful design and continuous critical analysis can help to develop data science pipelines grounded in principles of health equity, and highlight opportunities to collaborate with the DBMI community going forward.


(Gregg) Chronic obstructive pulmonary disease (COPD) is a complex, heterogeneous lung condition that claims the lives of 3 million people each year, making it the fourth leading cause of death worldwide. The disease is characterized by coughing exacerbations and dyspnea attributable to inflamed bronchial tubes (bronchitis) and damaged alveoli (emphysema). Previous efforts to predict progression (as defined by GOLD Stage) have seen limited success because COPD patients can exhibit radically different disease trajectories. In this study, we employ probabilistic graphical modeling to determine which clinical, genetic, and radiological variables measured from the COPDGene® study exhibit a causal relationship to initial COPD progression. In turn, these variables are used as predictors in supervised machine learning models to differentiate between patients with slow or rapid disease progression. Of the 115 variables, 8 were identified as being causally linked to early-stage COPD progression, with spirometry measurements and airway wall thickness being the most prominent. Fitting a logistic regression model resulted in an AUROC of 0.76 demonstrating some meaningful predictability from the selected variables. More sophisticated methods were explored, such as a random forest models, but did not significantly increase AUROC. Overall, this study takes an unbiased approach to determine which measurements both differentiate and predict the early stages COPD progression.