# Towards General Function Approximation in Zero-Sum Markov Games

@article{Huang2021TowardsGF, title={Towards General Function Approximation in Zero-Sum Markov Games}, author={Baihe Huang and Jason D. Lee and Zhaoran Wang and Zhuoran Yang}, journal={ArXiv}, year={2021}, volume={abs/2107.14702} }

This paper considers two-player zero-sum finite-horizon Markov games with simultaneous moves. The study focuses on the challenging settings where the value function or the model is parameterized by general function classes. Provably efficient algorithms for both decoupled and coordinated settings are developed. In the decoupled setting where the agent controls a single player and plays against an arbitrary opponent, we propose a new model-free algorithm. The sample complexity is governed by the… Expand

#### References

SHOWING 1-10 OF 49 REFERENCES

Approximate Dynamic Programming for Two-Player Zero-Sum Markov Games

- Mathematics, Computer Science
- ICML
- 2015

This paper provides a novel and unified error propagation analysis in Lp-norm of three well-known algorithms adapted to Stochastic Games and shows that it can achieve a stationary policy which is 2γe+e′/(1-γ)2 -optimal. Expand

Solving Discounted Stochastic Two-Player Games with Near-Optimal Time and Sample Complexity

- Computer Science, Mathematics
- AISTATS
- 2020

The sampling complexity of solving discounted two-player turn-based zero-sum stochastic games up to polylogarithmic factors is settled by showing how to generalize a near-optimal Q-learning based algorithms for MDP, in particular Sidford et al (2018), to two- player strategy computation algorithms. Expand

Learning Nash Equilibrium for General-Sum Markov Games from Batch Data

- Mathematics, Computer Science
- AISTATS
- 2017

A new definition of $\epsilon$-Nash equilibrium in MGs which grasps the strategy's quality for multiplayer games and introduces a neural network architecture named NashNetwork that successfully learns a Nash equilibrium in a generic multiplayer general-sum turn-based MG. Expand

Feature-Based Q-Learning for Two-Player Stochastic Games

- Computer Science, Mathematics
- ArXiv
- 2019

This work proposes a two-player Q-learning algorithm for approximating the Nash equilibrium strategy via sampling and proves that the algorithm is guaranteed to find an $\epsilon$-optimal strategy using no more than $\tilde{\mathcal{O}}(K/(\epsil on^{2}(1-\gamma)^{4}))$ samples with high probability. Expand

Online Reinforcement Learning in Stochastic Games

- Computer Science, Mathematics
- NIPS
- 2017

The UCSG algorithm is proposed that achieves a sublinear regret compared to the game value when competing with an arbitrary opponent, and this result improves previous ones under the same setting. Expand

Value Function Approximation in Zero-Sum Markov Games

- Computer Science, Mathematics
- UAI
- 2002

The viability of value function approximation for Markov games is demonstrated by using the Least squares policy iteration (LSPI) algorithm to learn good policies for a soccer domain and a flow control problem. Expand

Provably Efficient Policy Gradient Methods for Two-Player Zero-Sum Markov Games

- Computer Science, Mathematics
- ArXiv
- 2021

This is the first quantitative analysis of policy gradient methods with function approximation for two-player zero-sum Markov games and thoroughly characterize the algorithms’ performance in terms of the number of samples, number of iterations, concentrability coefficients, and approximation error. Expand

Near-Optimal Reinforcement Learning with Self-Play

- Computer Science, Mathematics
- NeurIPS
- 2020

An optimistic variant of the Nash Q-learning algorithm with sample complexity $\tilde{\mathcal{O}}(SAB)$, and a new \emph{Nash V-learning}, which matches the information-theoretic lower bound in all problem-dependent parameters except for a polynomial factor of the length of each episode. Expand

Actor-Critic Fictitious Play in Simultaneous Move Multistage Games

- Computer Science
- AISTATS
- 2018

A stochastic approximation of the fictitious play process is built using an architecture inspired by actor-critic algorithms and it is proved convergence of the method towards a Nash equilibrium in both the cases of zero-sum two-player multistage games and cooperative multistages games. Expand

Actor-Critic Policy Optimization in Partially Observable Multiagent Environments

- Computer Science, Mathematics
- NeurIPS
- 2018

This paper examines the role of policy gradient and actor-critic algorithms in partially-observable multiagent environments and relates them to a foundation of regret minimization and multiagent learning techniques for the one-shot and tabular cases, leading to previously unknown convergence guarantees. Expand