1165 København K
Tlf: 21 17 95 65 (man-fre kl. 9-15)
Ph.d.-forsvar — On 9 April 2020, Julian Ulf Zimmert will defend his PhD thesis. The PhD defence will take place virtually on Zoom and everyone can join via this link https://ucph-ku.zoom.us/j/3593116057
Date & Time:
Adversarially robust stochastic multi-armed bandits
Multi-armed bandits are a fundamental problem in sequential decision making with applications in optimal experimental design, online advertisement, recommender systems and much more. In multi-armed bandits, an agent has to repeatedly choose an action from a finite set of options.
After each action, the agent receives and observes a reward. For example, the action might be to prescribe a certain drug and the reward is the outcome of the patient’s treatment. The goal of the agent is to maximise its cumulative reward. It is commonly assumed that the rewards are drawn from i.i.d. distributions that only depend on the agent’s action. This assumption allows algorithms with provably fast learning to be derived. However, minor violations of this assumption can prevent these algorithms from learning anything at all.
Another line of research aims at maximal robustness. These algorithms have performance guarantees even in the absence of any stochastic model. The drawback is that learning happens significantly slower.
Naturally the question arises if a combination of both properties is achievable. Are there robust algorithms that also adapt automatically to easier, i.e. stochastic, environments? This question has remained open for 7 years. Its conclusion is the major contribution of this thesis.
We derive an algorithm that is maximally robust and also achieves optimal rates in the stochastic setting. Furthermore, it is optimal in several intermediate regimes and we extend these results to problems beyond multi-armed bandits.
Chairperson: Professor Christian Igel, Department of Computer Science, University of Copenhagen, Denmark
Professor Alexandra Carpentier, Otto-von-Guericke-Universität Magdeburg, Germany
Professor Nicolò Cesa-Bianchi, Università degli Studi di Milano, Italy
Associate professor Yevgeny Seldin, Department of Computer Science, University of Copenhagen
Professor Christina Lioma, Department of Computer Science, University of Copenhagen
For an electronic copy of the thesis, please contact firstname.lastname@example.org