A novel approach to modeling multifactorial diseases using Ensemble Bayesian Rule classifiers

Balasubramanian JB, Boes RD, Gopalakrishnan V. A novel approach to modeling multifactorial diseases using Ensemble Bayesian Rule classifiers. Journal of Biomedical Informatics 020 Jul;107:103455. doi: 10.1016/j.jbi.2020.103455. PMID: 32497685

Modeling factors influencing disease phenotypes, from biomarker profiling study datasets, is a critical task in biomedicine. Such datasets are typically generated from high-throughput ’omic’ technologies, which help examine disease mechanisms at an unprecedented resolution. These datasets are challenging because they are high-dimensional. The disease mechanisms they study are also complex because many diseases are multifactorial,resultingfromthecollectiveactivityofseveralfactors,eachwithasmalleffect.Bayesianrulelearning (BRL)isarulemodelinferredfromlearningBayesiannetworksfromdata,andhasbeenshowntobeeffectivein modelinghigh-dimensionaldatasets.However,BRLisnotefficientatmodelingmultifactorialdiseasessinceit suffersfromdatafragmentationduringlearning.Inthispaper,weovercomethislimitationbyimplementingand evaluatingthreetypesofensemblemodelcombinationstrategieswithBRL—uniformcombination(UC;sameas Bagging), Bayesian model averaging (BMA), and Bayesian model combination (BMC)— collectively called EnsembleBayesianRuleLearning(EBRL).WealsointroduceanovelmethodtovisualizeEBRLmodels,called theBayesianRuleEnsembleVisualizingtool(BREVity),whichhelpsextractinterpretthemostimportantrule patterns guiding the predictions made by the ensemble model. Our results using twenty-five public, high-dimensional, gene expression datasets of multifactorial diseases, suggest that, both EBRL models using UC and BMCachievebetterpredictiveperformancethanBMAandotherclassicmachinelearningmethods.Furthermore, BMCisfoundtobemorereliablethanUC,whentheensembleincludessub-optimalmodelsresultingfromthe stochasticity of the model search process. Together, EBRL and BREVity provides researchers a promising and novel tool for modeling multifactorial diseases from high-dimensional datasets that leverages strengths of ensemblemethodsforpredictiveperformance,whilealsoprovidinginterpretableexplanationsforitspredictions.
 

Publication Year: 
2020
Publication Credits: 
Balasubramanian JB, Boes RD, Gopalakrishnan V.
Publication Download: 
AttachmentSize
PDF icon doug.pdf2.48 MB
^