Hugh Brendan McMahan

Thesis Title: Robust Planning in Domains with Stochastic Outcomes, Adversaries, and Partial Observability
Degree Type: Ph.D. in Computer Science
Advisor(s): Geoffrey Gordon, Avrim Blum
Graduated: December 2006

Abstract:

Real-world planning problems often feature multiple sources of uncertainty, including randomness in outcomes, the presence of adversarial agents, and lack of complete knowledge of the world state. This thesis describes algorithms for four related formal models that can address multiple types of uncertainty: Markov decision processes, MDPs with adversarial costs, extensive-form games, and a new class of games that includes both extensive-form games and MDPs as special cases.

Markov decision processes can represent problems where actions have stochastic outcomes. We describe several new algorithms for MDPs, and then show how MDPs can be generalized to model the presence of an adversary that has some control over costs. Extensive-form games can model games with random events and partial observability. In the zero-sum perfect-recall case, a minimax solution can be found in time polynomial in the size of the game tree. However, the game tree must "remember" all past actions and random outcomes, and so the size of the game tree grows exponentially in the length of the game. This thesis introduces a new generalization of extensive-form games that relaxes this need to remember all past actions exactly, producing exponentially smaller representations for interesting problems. Further, this formulation unifies extensive-form games with MDP planning.

We present a new class of fast anytime algorithms for the off-line computation of minimax equilibria in both traditional and generalized extensive-form games. Experimental results demonstrate their effectiveness on an adversarial MDP problem and on a large abstracted poker game. We also present a new algorithm for playing repeated extensive-form games that can be used when only the total payoff of the game is observed on each round.

Thesis Committee:
Avrim Blum (Co-Chair)
Geoffrey Gordon (Co-Chair)
Jeff Schneider
Andrew Ng (Stanford University)

Jeannette Wing, Head, Computer Science Department
Randy Bryant, Dean, School of Computer Science

Keywords:
Planning, Game Theory, Markov Decision Processes, Extensive-form Games, Convex Games, Algorithms

CMU-CS-06-166.pdf (2.03 MB) ( 207 pages)
Copyright Notice