An evaluation of discretization methods for learning rules from biomedical datasets
Lustgarten, JL, Visweswaran, S, Grover, H, Gopalakrishnan, V. An evaluation of discretization methods for learning rules from biomedical datasets. In: Proceedings of the International Conference on Bioinformatics and Computational Biology (BIOCOMP-08) (July 2008).
Rule learning has the major advantage of understandability by human experts when performing knowledge discovery within the biomedical domain. Many rule learning algorithms require discrete data in order to learn the IF-THEN rule sets. This requirement makes the selection of a discretization technique an important step in rule learning. We compare the performance of one standard technique, Fayyad and Irani’s Minimum Description Length Principle Criterion, which is the defacto discretization method in many machine learning packages, to that of a new Efficient Bayesian Discretization (EBD) method and show that EBD leads to significant gains in performance especially as the complexity of the rule learner increases.