Improving classification performance with discretization on biomedical datasets

Lustgarten, JL, Gopalakrishnan, V, Grover, H, Visweswaran, S. Improving classification performance with discretization on biomedical datasets. In: Proceedings of the Fall Symposium of the American Medical Informatics Association (Nov 2008) 445-9. PMID: 18999186 PMCID: PMC2656082

Discretization acts as a variable selection method in addition to transforming the continuous values of the variable to discrete ones. Machine learning algorithms such as Support Vector Machines and Random Forests have been used for classification in high-dimensional genomic and proteomic data due to their robustness to the dimensionality of the data. We show that discretization can help improve significantly the classification performance of these algorithms as well as algorithms like Naïve Bayes that are sensitive to the dimensionality of the data.

Publication Year: 
2008
Publication Credits: 
Jonathan L. Lustgarten, MS. Vanathi Gopalakrishnan, PhD. Himanshu Grover, MS.
^