CATS-Oct-28-2016

From Theory
Revision as of 14:06, 21 October 2016 by Karthikabinav (talk | contribs) (Created page with "== Title == Bandits and agents: How to incentivize exploration? == Speaker == Alex Slivkins, Microsoft Research NYC. == Abstract == Individual decision-makers consume inform...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Title[edit]

Bandits and agents: How to incentivize exploration?

Speaker[edit]

Alex Slivkins, Microsoft Research NYC.

Abstract[edit]

Individual decision-makers consume information revealed by the previous decision-makers, and produce information that may help in future decisions. This phenomenon is common in a wide range of scenarios in the Internet economy, as well as elsewhere, such as medical decisions. Each decision-maker would individually prefer to "exploit": select an action with the highest expected reward given her current information. At the same time, each decision-maker would prefer previous decision makes to "explore", producing information about the rewards of various actions. A social planner, by means of carefully designed information disclosure, can incentivize the agents to balance the exploration and exploitation so as to maximize social welfare.

We formulate this problem as a multi-arm bandit problem under incentive-compatibility constraints induced by agents' Bayesian priors. We design a Bayesian incentive-compatible bandit algorithm for the social planner with asymptotically optimal regret. Moreover, we provide a black-box reduction from an arbitrary multi-arm bandit algorithm to an incentive-compatible one, with only a constant multiplicative increase in regret. This reduction works for a very general bandit setting that incorporate contexts and arbitrary partial feedback. Another extension allows for multiple interdependent decision-makers in each round of the bandit problem.

Joint work with Yishay Mansour (Tel Aviv University and MSR Israel), Vasilis Syrgkanis (MSR-NE) and Steven Wu (Penn).