Bayesian rule learning for biomedical data mining
Motivation: Disease state prediction from biomarker proﬁling studiesisanimportantproblembecausemoreaccurateclassiﬁcation models will potentially lead to the discovery of better, more discriminative markers. Data mining methods are routinely applied to such analyses of biomedical datasets generated from highthroughput ‘omic’ technologies applied to clinical samples from tissues or bodily ﬂuids. Past work has demonstrated that rule models can be successfully applied to this problem, since they can produce understandable models that facilitate review of discriminative biomarkers by biomedical scientists. While many rule-based methods produce rules that make predictions under uncertainty, they typically do not quantify the uncertainty in the validity of the rule itself. This article describes an approach that uses a Bayesian score to evaluate rule models.
Results: We have combined the expressiveness of rules with the mathematical rigor of Bayesian networks (BNs) to develop and evaluate a Bayesian rule learning (BRL) system. This system utilizes a novel variant of the K2 algorithm for building BNs from the trainingdatatoprovideprobabilisticscoresforIF-antecedent-THENconsequent rules using heuristic best-ﬁrst search. We then apply rule-based inference to evaluate the learned models during 10-fold cross-validation performed two times. The BRL system is evaluated on 24 published ‘omic’ datasets, and on average it performs on par or better than other readily available rule learning methods. Moreover, BRL produces models that contain on average 70% fewer variables, which means that the biomarker panels for disease predictioncontainfewermarkersforfurtherveriﬁcationandvalidation by bench scientists.