A new method for estimating causal model learning accuracy. In: Workshop on Data Mining for Medical Informatics (DMMI)
Kummerfeld E, Cooper GF. A new method for estimating causal model learning accuracy. In: Workshop on Data Mining for Medical Informatics (DMMI) – Causal Inference for Health Data Analytics (2017).
After learning a causal model by running a causal discovery algorithm on a real world data set, investigators would
like to estimate the accuracy of the learned causal model. It is difficult to obtain an accurate estimate, however,
since knowledge of the causal truth, which serves as the gold standard, is often incomplete and sometimes incorrect.
One method that is sometimes used to estimate accuracy is resimulation. Resimulation is a process in which data
are generated (simulated) from the learned model. An accuracy estimate is then obtained by running the discovery
algorithm on this resimulated data set, and comparing its output model against the model that was used to generate
the resimulated data. This paper introduces a new method called hybrid resimulation (Hsim) that estimates causal
learning accuracy by using a learning dataset that contains both real and resimulated data. In a simulation study we
show the difficulty of graph accuracy estimation, and we compare the performance of Hsim to that of standard full
resimulation. The results support that Hsim provides a better estimate of algorithm accuracy when the underlying
causal mechanisms are nonlinear. We also compare these methods in a case study using the Breast Cancer Wisconsin
(Diagnostic) Data Set.