A simulation study of three related causal data mining algorithms
Mani S, Cooper GF. A simulation study of three related causal data mining algorithms. In: Proceedings of the International Workshop on Artificial Intelligence and Statistics (2001) 73-80.
In all scientifc domains causality plays a significant role. This study focused on evaluating and refining effcient algorithms to learn causal relationships from observational data. Evaluation of learned causal output is diffcult, due to lack of a gold standard in real-world domains. Therefore, we used simulated data from a known causal network in a medical domain---the Alarm network. For causal discovery we used three variants of the Local Causal Discovery #LCD# algorithms, that are referred to as LCDa, LCDb and LCDc. These algorithms use the framework of causal Bayesian Networks to represent causal relationships among model variables. LCDa, LCDb and LCDc take as input a dataset and a partial node ordering, and output purported causes of the form variable Y causally infuences variable Z. Using the simulated Alarm dataset as input, LCDa had a false positive rate of 0.09, LCDb 0.08, and LCDc 0.04. All the algorithms had a true positive rate of about 0.27. Most of the false positives occurred when a causal relationship was confounded. LCDc output as causal only those causally confounded pairs that had very weak confounding. We identify and discuss the causally confounded relationships that often seem to induce false positive results.