Policies.MusicalChair module¶

MusicalChair: implementation of the decentralized multi-player policy from [A Musical Chair approach, Shamir et al., 2015](https://arxiv.org/abs/1512.02866).

• Each player has 3 states, 1st is random exploration, 2nd is musical chair, 3rd is staying sit
• 1st step
• Every player tries uniformly an arm for $$T_0$$ steps, counting the empirical means of each arm, and the number of observed collisions $$C_{T_0}$$
• Finally, $$N^* = M$$ = nbPlayers is estimated based on nb of collisions $$C_{T_0}$$, and the $$N^*$$ best arms are computed from their empirical means
• 2nd step:
• Every player Choose an arm uniformly, among the $$N^*$$ best arms, until she does not encounter collision right after choosing it
• When an arm was chosen by only one player, she decides to sit on this chair (= arm)
• 3rd step:
• Every player stays sitted on her chair for the rest of the game
• $$\implies$$ constant regret if $$N^*$$ is well estimated and if the estimated N* best arms were correct
• $$\implies$$ linear regret otherwise
Policies.MusicalChair.optimalT0(nbArms=10, epsilon=0.1, delta=0.05)[source]

Compute the lower-bound suggesting “large-enough” values for $$T_0$$ that should guarantee constant regret with probability at least $$1 - \delta$$, if the gap $$\Delta$$ is larger than $$\epsilon$$.

Examples:

• For $$K=2$$ arms, and in order to have a constant regret with probability at least $$90\%$$, if the gap $$\Delta$$ is known to be $$\geq 0.05$$, then their theoretical analysis suggests to use $$T_0 \geq 18459$$. That’s very huge, for just two arms!
>>> optimalT0(2, 0.1, 0.05)     # Just 2 arms !
18459                           # ==> That's a LOT of steps for just 2 arms!
• For a harder problem with $$K=6$$ arms, for a risk smaller than $$1\%$$ and a gap $$\Delta \geq 0.05$$, they suggest at least $$T_0 \geq 7646924$$, i.e., about 7 millions of trials. That is simply too much for any realistic system, and starts to be too large for simulated systems.
>>> optimalT0(6, 0.01, 0.05)    # Constant regret with >99% proba
7646924                         # ==> That's a LOT of steps!
>>> optimalT0(6, 0.001, 0.05)   # Reasonable value of epsilon
764692376                       # ==> That's a LOT of steps!!!
• For an even harder problem with $$K=17$$ arms, the values given by their Theorem 1 start to be really unrealistic:
>>> optimalT0(17, 0.01, 0.05)   # Constant regret with >99% proba
27331794                        # ==> That's a LOT of steps!
>>> optimalT0(17, 0.001, 0.05)  # Reasonable value of epsilon
2733179304                      # ==> That's a LOT of steps!!!
Policies.MusicalChair.boundOnFinalRegret(T0, nbPlayers)[source]

Use the upper-bound on regret when $$T_0$$ and $$M$$ are known.

• The “constant” regret of course grows linearly with $$T_0$$, as:

$\forall T \geq T_0, \;\; R_T \leq T_0 K + 2 \mathrm{exp}(2) K.$

Warning

this bound is not a deterministic result, it is only value with a certain probability (at least $$1 - \delta$$, if $$T_0$$ is chosen as given by optimalT0()).

>>> boundOnFinalRegret(18459, 2)        # Crazy constant regret!  # doctest: +ELLIPSIS
36947.5..
>>> boundOnFinalRegret(7646924, 6)      # Crazy constant regret!!  # doctest: +ELLIPSIS
45881632.6...
>>> boundOnFinalRegret(764692376, 6)    # Crazy constant regret!!  # doctest: +ELLIPSIS
4588154344.6...
>>> boundOnFinalRegret(27331794, 17)    # Crazy constant regret!!  # doctest: +ELLIPSIS
464640749.2...
>>> boundOnFinalRegret(2733179304, 17)  # Crazy constant regret!!  # doctest: +ELLIPSIS
46464048419.2...
class Policies.MusicalChair.State

Bases: enum.Enum

Different states during the Musical Chair algorithm

InitialPhase = 2
MusicalChair = 3
NotStarted = 1
Sitted = 4
__module__ = 'Policies.MusicalChair'
class Policies.MusicalChair.MusicalChair(nbArms, Time0=0.25, Time1=None, N=None, lower=0.0, amplitude=1.0)[source]

MusicalChair: implementation of the decentralized multi-player policy from [A Musical Chair approach, Shamir et al., 2015](https://arxiv.org/abs/1512.02866).

__init__(nbArms, Time0=0.25, Time1=None, N=None, lower=0.0, amplitude=1.0)[source]
• nbArms: number of arms,
• Time0: required, number of step, or portion of the horizon Time1 (optional), for the first step (pure random exploration by each players),
• N: optional, exact or upper bound on the number of players,
• Time1: optional, only used to compute Time0 if Time0 is fractional (eg. 0.2).

Example:

>>> nbArms, Time0, Time1, N = 17, 0.1, 10000, 6
>>> player1 = MusicalChair(nbArms, Time0, Time1, N)

For multi-players use:

>>> configuration["players"] = Selfish(NB_PLAYERS, MusicalChair, nbArms, Time0=0.25, Time1=HORIZON, N=NB_PLAYERS).children
state = None

Current state

Time0 = None

Parameter T0

nbPlayers = None

Number of players

chair = None

Current chair. Not sited yet.

cumulatedRewards = None

That’s the s_i(t) of the paper

nbObservations = None

That’s the o_i of the paper

A = None

A random permutation of arms, it will then be of size nbPlayers!

nbCollision = None

Number of collisions, that’s the C_Time0 of the paper

t = None

Internal times

__str__()[source]

-> str

startGame()[source]

Just reinitialize all the internal memory, and decide how to start (state 1 or 2).

choice()[source]

Choose an arm, as described by the Musical Chair algorithm.

getReward(arm, reward)[source]

Receive a reward on arm of index ‘arm’, as described by the Musical Chair algorithm.

• If not collision, receive a reward after pulling the arm.
_endInitialPhase()[source]

Small computation needed at the end of the initial random exploration phase.

handleCollision(arm, reward=None)[source]

Handle a collision, on arm of index ‘arm’.

• Warning: this method has to be implemented in the collision model, it is NOT implemented in the EvaluatorMultiPlayers.
__module__ = 'Policies.MusicalChair'