An Automated Data Science Assistant
Machine learning is commonly described as a “field of study that gives computers the ability to learn without being explicitly programmed” (Simon, 2013). Despite this common claim, practitioners know that designing effective machine learning pipelines is often a tedious endeavor, and typically requires considerable experience with machine learning algorithms, expert knowledge of the problem domain, and brute force search to accomplish. Thus, contrary to what machine learning enthusiasts would have us believe, machine learning still requires considerable explicit programming and expertise. In response to this challenge, we have developed an automated machine learning (AutoML) method called the tree-based pipeline optimization tool (TPOT). The TPOT method will be presented and discussed in the context of developing automated data science assistants for the analysis of complex biomedical data.