Average Payoff Games


Download Average Payoff Games


Preview text

Introduction Average payoff games (discrete-time) Average payoff games (continuous-time)
Conclusion Appendix

Average Payoff Games
Marcin Jurdzin´ski1 Ashutosh Trivedi1
1Department of Computer Science University of Warwick
22nd British Colloquium for Theoretical Computer Science BCTCS 2006

Ashutosh Trivedi

Average Payoff Games

Introduction Average payoff games (discrete-time) Average payoff games (continuous-time)
Conclusion Appendix
1 Introduction Average payoff games (discrete-time and continuous-time)
2 Average payoff games (discrete-time) Average payoff games (discrete-time) Optimality equations Strategy improvement algorithm
3 Average payoff games (continuous-time) Average payoff games (continuous-time) Average time games
4 Conclusion

Ashutosh Trivedi

Average Payoff Games

Introduction Average payoff games (discrete-time) Average payoff games (continuous-time)
Conclusion Appendix
Average payoff games

Average payoff games (discrete-time and continuous-time)

Discrete-time average payoff games
2-player perfect information, zero-sum games on weighted finite automata. run = v1 −a→1 v2 −a→2 v3 . . . players optimize
reward per transition (mean-payoff games).

Continuous-time average payoff games

2-player perfect information, zero-sum games

on priced timed automata

run = v1 −a→1 v2 −a→2 v3 . . .

t1

t2

players optimize :

time per transition (average time game)

v2

2

a

b

v1

b

a

v3

1

3

a

b

b

a

4

v4 v2

Max Min

{x} 2

y<1

x>1 a

b {y} a

b

v1 1

b {y} 3 v3

a {x}

x>1 b

a y<1 4

v4

x

y

Ashutosh Trivedi

Average Payoff Games

Introduction Average payoff games (discrete-time) Average payoff games (continuous-time)
Conclusion Appendix
Average payoff games

Average payoff games (discrete-time and continuous-time)

Discrete-time average payoff games
2-player perfect information, zero-sum games on weighted finite automata. run = v1 −a→1 v2 −a→2 v3 . . . players optimize
reward per transition (mean-payoff games).

Continuous-time average payoff games

2-player perfect information, zero-sum games

on priced timed automata

run = v1 −a→1 v2 −a→2 v3 . . .

t1

t2

players optimize :

payoff per transition (average price game)

v2

2

a

b

v1

b

a

v3

1

3

a

b

b

a

4

v4 v2

Max Min

{x} 2

y<1

x>1 a

b {y} a

b

v1 1

b {y} 3 v3

a {x}

x>1 b

a y<1 4

v4

x

y

Ashutosh Trivedi

Average Payoff Games

Introduction Average payoff games (discrete-time) Average payoff games (continuous-time)
Conclusion Appendix
Average payoff games

Average payoff games (discrete-time and continuous-time)

Discrete-time average payoff games
2-player perfect information, zero-sum games on weighted finite automata. run = v1 −a→1 v2 −a→2 v3 . . . players optimize
reward per transition (mean-payoff games).

Continuous-time average payoff games

2-player perfect information, zero-sum games

on priced timed automata

run = v1 −a→1 v2 −a→2 v3 . . .

t1

t2

players optimize :

payoff per unit time (average price game)

v2

2

a

b

v1

b

a

v3

1

3

a

b

b

a

4

v4 v2

Max Min

{x} 2

y<1

x>1 a

b {y} a

b

v1 1

b {y} 3 v3

a {x}

x>1 b

a y<1 4

v4

x

y

Ashutosh Trivedi

Average Payoff Games

Introduction Average payoff games (discrete-time) Average payoff games (continuous-time)
Conclusion Appendix

Average payoff games (discrete-time) Optimality equations Strategy improvement algorithm

Average payoff games (discrete-time)

Game arena A finite directed graph G = {V, E} A partition of vertices V = VMax ∪ VMin A reward function r : V → Z.
Optimization goals π = v1 −a→1 v2 −a→2 v3 . . . A(π) = limn→∞(1/n) · ∑ni=1 r(vi)

v2

2

a

b

v1

b

a

v3

1

3

a

b

b

a

4

v4

Max Min

Strategies History dependent ΣMax = {V∗ × VMax → V} Positional ΣMax = {VMax → V}

Ashutosh Trivedi

Average Payoff Games

Introduction Average payoff games (discrete-time) Average payoff games (continuous-time)
Conclusion Appendix
Average payoff games

Average payoff games (discrete-time) Optimality equations Strategy improvement algorithm

Value of the game A(v, χ, µ) = limiting average of the game. upper-value A∗(v) = infµ∈ΣMin supχ∈ΣMax A(v, µ, χ), lower value A∗(v) = supχ∈ΣMax infµ∈ΣMin A(v, µ, χ). (minimax) value of the game A∗(v) = A∗(v) = A(v, χ∗, µ∗).
Algorithmic problem Find the optimal strategy for Max and Min. Pseudo-polynomial time algorithm [Zwick and Paterson’95].

Ashutosh Trivedi

Average Payoff Games

Introduction Average payoff games (discrete-time) Average payoff games (continuous-time)
Conclusion Appendix
Optimality equations

Average payoff games (discrete-time) Optimality equations Strategy improvement algorithm

Local witness of global optimality gain(v) = value of the game. bias(v) = measure of initial fluctuations.
Optimality equations OE For every max vertex v ∈ VMax,
g(v) = max{g(u) : (v, u) ∈ E}. b(v) = max{r(v) − g(v) + b(u) : (u, v) ∈ E and g(v) = g(u)}.
Positional optimal strategy Solution to OE gives optimal value of the game. Solution to OE provides positional optimal strategy.

Ashutosh Trivedi

Average Payoff Games

Introduction Average payoff games (discrete-time) Average payoff games (continuous-time)
Conclusion Appendix
Optimality equations

Average payoff games (discrete-time) Optimality equations Strategy improvement algorithm

Local witness of global optimality gain(v) = value of the game. bias(v) = measure of initial fluctuations.
Optimality equations OE For every min vertex v ∈ VMin,
g(v) = min{g(u) : (v, u) ∈ E}. b(v) = min{r(v) − g(v) + b(u) : (u, v) ∈ E and g(v) = g(u)}.
Positional optimal strategy Solution to OE gives optimal value of the game. Solution to OE provides positional optimal strategy.

Ashutosh Trivedi

Average Payoff Games

Introduction Average payoff games (discrete-time) Average payoff games (continuous-time)
Conclusion Appendix

Average payoff games (discrete-time) Optimality equations Strategy improvement algorithm

Optimality equations – Contd.

Optimality equations
If solution for OE exists, then optimal-cycle(v) = Cycle reachable from vertex v, when players play optimal strategies. gain(v) = average of the optimal-cycle(v). bias(v) = optimal distance (fluctuations) from v to an arbitrary vertex in optimal-cycle(v).

g1 b1

v1 r1

g2 b2

v2 r2

g3 b3

v3 r3

g4 b4

v4 r4

g1 = g2 = g3 = g4 b1 = r1 − g + b2

=g

b2 = r2 − g + b3 b3 = r3 − g + b4 b4 = r4 − g + b2

Ashutosh Trivedi

Average Payoff Games

Preparing to load PDF file. please wait...

0 of 0
100%
Average Payoff Games