A multivariate probabilistic method for comparing two clinical datasets
Sverchkov Y, Visweswaran S, Clermont G, Hauskrecht M, Cooper GF. A multivariate probabilistic method for comparing two clinical datasets. In: Proceedings of the ACM International Health Informatics Symposium (2012) 795-800.
We present a novel method for obtaining a concise and mathematically grounded description of multivariate diﬀerences between a pair of clinical datasets. Often data collected under similar circumstances reﬂect fundamentally diﬀerent patterns. For example, information about patients undergoing similar treatments in diﬀerent intensive care units (ICUs), or within the same ICU during diﬀerent periods, may show systematically diﬀerent outcomes. In such circumstances, the multivariate probability distributions induced by the datasets would diﬀer in selected ways. To capture the probabilistic relationships, we learn a Bayesian network (BN) from the union of the two datasets. We include an indicator variable that represents the dataset from which a given patient record is obtained. We then extract the relevant conditional distributions from the network by ﬁnding the conditional probabilities that diﬀer most when conditioning on the indicator variable. Our work is a form of explanation that bears some similarity to previous work on BN explanation; however, while previous work has mostly focused on justifying inference, our work is aimed at explaining multivariate diﬀerences between distributions.