Fundamental Goals for Online Learning

Dr. Martin Zinkevich
Thursday, April 6, 2006
1:30PM - CSB 232

Abstract


When algorithms are evaluated in artificial intelligence, it is often similar to the evaluation of the students in a class: they are given example problems with solutions (a training set), and then receive a score based upon how well they solve novel problems on a test (a test set). It is assumed that the test problems will be similar to the training problems, and therefore it is possible to do well on the test. We refer to this as offline learning, because one can learn what to do before one is tested.

In online learning, there is no specific training and test period: learners are evaluated on how they perform on every problem they face. Moreover, there is not explicit assumption that new problems will be similar to old ones. Such an assumption is invalidated when the examples the learner observes are influenced by another intelligent agent which is influenced in turn by the learner. In some domains, there is no good guarantee one can achieve; in others, performance similar to the offline setting is achievable. The goal of online learning is to design objectives that smoothly interpolate between these two extremes. In the past, online formulations have been developed for a variety of problems, such as regressions, classification, and repeated games. In this talk, I show a unified regret framework in which one simple algorithm can be used that obtains a guarantee in all these settings. Then, I show a second framework which addresses the drawbacks of traditional regret guarantees in the repeated prisoner's dilemma; this framework also sets a novel fundamental goal for reinforcement learning.

Short Bio


Martin Zinkevich was a NSF Graduate Research Fellow at Carnegie Mellon University where he earned his Ph.D. in Computer Science under Avrim Blum. At CMU, his research focused on the foundations of multi-agent learning. Afterwards, he worked with Amy Greenwald at Brown University and Michael Littman at Rutgers University on understanding multi-agent Q-learning. At present, he is an AICML Postdoctoral Fellow at the University of Alberta, where, among other projects, he is considering how to apply multi-agent techniques to the analysis and play of poker.