University of Pittsburgh School of Computing and Information
Big Data Colloquium
Monday, September 25
10:30 AM – 12:00 PM
Information Sciences Building, 3rd Floor Quite Study, 135 N. Bellefield Avenue
“THE SURPRISING POWER OF THOMPSON SAMPLING: ADVANCES IN BANDIT DRIVEN LEARNING AND OPTIMIZATION OF COMPLEX STOCHASTIC SYSTEMS”
Thompson Sampling based policies have emerged as a top performing approach to stochastic optimization in unknown environments. Some of the first theoretical results for the Multi-Armed Bandit Problem can be found in (Granmo, 2010), proving that Thompson Sampling is instantaneously self-correcting, converging to the optimal action with probability arbitrarily close to unity. These pioneering results were followed by an explosion of research that have broadened the scope of Thompson Sampling to a number of complex stochastic optimization problems. In this talk, I will provide a brief history of the powerful Thompson Sampling principle. I will further explore how this principle allows the solution of several fascinating long-standing problems, through integrating Bayesian modelling and reasoning with Thompson Sampling. By simultaneously learning the problem properties and solving the problem, both stochastic and unknown environments can be optimized on-line, outperforming current state-of-the-art solutions. The talk will thus cover novel solutions to several problems, including constrained object partitioning, order-picking, the N-Door Problem, non-linear knapsack problems (resource allocation), as well as the decentralized Goore Game. Finally, I will conclude with demonstrating how deceptive environments that provide misleading feedback can be addressed, allowing on-line solution of e.g. the stochastic root finding problem.
Bio: Professor Ole-Christoffer Granmo is Director of the Centre for Artificial Intelligence Research (CAIR). His passion for artificial intelligence was lit at the age of ten when he got his first computer and discovered programming. Fascinated by the idea of superintelligence, he obtained his master's degree in informatics in 1999 and the PhD degree in 2004, both from the University of Oslo, Norway. Granmo develops theory and algorithms for systems that explore, experiment and learn in complex real-world environments. Drawing inspiration from the paradigms of deep reinforcement learning and probabilistic causal reasoning, he seeks to surpass human capability in flexible pattern recognition, model building and reasoning. Within his field of research, Granmo has written more than 115 refereed journal and conference publications.