Policies.MusicalChairNoSensing module

MusicalChairNoSensing: implementation of the decentralized multi-player policy from [[“Multiplayer bandits without observing collision information”, by Gabor Lugosi and Abbas Mehrabian]](https://arxiv.org/abs/1808.08416).

Note

The algorithm implemented here is Algorithm 1 (page 8) in the article, but the authors did not named it. I will refer to it as the Musical Chair algorithm with no sensing, or MusicalChairNoSensing in the code.

Policies.MusicalChairNoSensing.ConstantC = 1

A crazy large constant to get all the theoretical results working. The paper suggests \(C = 128\).

Warning

One can choose a much smaller value in order to (try to) have reasonable empirical performances! I have tried \(C = 1\). BUT the algorithm DOES NOT work better with a much smaller constant: every single simulations I tried end up with a linear regret for MusicalChairNoSensing.

Policies.MusicalChairNoSensing.parameter_g(K=9, m=3, T=1000, constant_c=1)[source]

Length \(g\) of the phase 1, from parameters K, m and T.

\[g = 128 K \log(3 K m^2 T^2).\]

Examples:

>>> parameter_g(m=2, K=2, T=100)  # DOCTEST: +ELLIPSIS
3171.428...
>>> parameter_g(m=2, K=2, T=1000)  # DOCTEST: +ELLIPSIS
4350.352...
>>> parameter_g(m=2, K=3, T=100)  # DOCTEST: +ELLIPSIS
4912.841...
>>> parameter_g(m=3, K=3, T=100)  # DOCTEST: +ELLIPSIS
5224.239...
Policies.MusicalChairNoSensing.estimate_length_phases_12(K=3, m=9, Delta=0.1, T=1000)[source]

Estimate the length of phase 1 and 2 from the parameters of the problem.

Examples:

>>> estimate_length_phases_12(m=2, K=2, Delta=0.1, T=100)
198214307
>>> estimate_length_phases_12(m=2, K=2, Delta=0.01, T=100)
19821430723
>>> estimate_length_phases_12(m=2, K=2, Delta=0.1, T=1000)
271897030
>>> estimate_length_phases_12(m=2, K=3, Delta=0.1, T=100)
307052623
>>> estimate_length_phases_12(m=2, K=5, Delta=0.1, T=100)
532187397
Policies.MusicalChairNoSensing.smallest_T_from_where_length_phases_12_is_larger(K=3, m=9, Delta=0.1, Tmax=1000000000.0)[source]

Compute the smallest horizon T from where the (estimated) length of phases 1 and 2 is larger than T.

Examples:

>>> smallest_T_from_where_length_phases_12_is_larger(K=2, m=1)
687194767
>>> smallest_T_from_where_length_phases_12_is_larger(K=3, m=2)
1009317314
>>> smallest_T_from_where_length_phases_12_is_larger(K=3, m=3)
1009317314

Examples with even longer phase 1:

>>> smallest_T_from_where_length_phases_12_is_larger(K=10, m=5)
1009317314
>>> smallest_T_from_where_length_phases_12_is_larger(K=10, m=10)
1009317314

With \(K=100\) arms, it starts to be crazy:

>>> smallest_T_from_where_length_phases_12_is_larger(K=100, m=10)
1009317314
class Policies.MusicalChairNoSensing.State

Bases: enum.Enum

Different states during the Musical Chair with no sensing algorithm

InitialPhase = 2
MusicalChair = 4
NotStarted = 1
Sitted = 5
UniformWaitPhase2 = 3
__module__ = 'Policies.MusicalChairNoSensing'
class Policies.MusicalChairNoSensing.MusicalChairNoSensing(nbPlayers=1, nbArms=1, horizon=1000, constant_c=1, lower=0.0, amplitude=1.0)[source]

Bases: Policies.BasePolicy.BasePolicy

MusicalChairNoSensing: implementation of the decentralized multi-player policy from [[“Multiplayer bandits without observing collision information”, by Gabor Lugosi and Abbas Mehrabian]](https://arxiv.org/abs/1808.08416).

__init__(nbPlayers=1, nbArms=1, horizon=1000, constant_c=1, lower=0.0, amplitude=1.0)[source]
  • nbArms: number of arms (K in the paper),

  • nbPlayers: number of players (m in the paper),

  • horizon: horizon (length) of the game (T in the paper),

Example:

>>> nbPlayers, nbArms, horizon = 3, 9, 10000
>>> player1 = MusicalChairNoSensing(nbPlayers, nbArms, horizon)

For multi-players use:

>>> configuration["players"] = Selfish(NB_PLAYERS, MusicalChairNoSensing, nbArms, nbPlayers=nbPlayers, horizon=horizon).children

or

>>> configuration["players"] = [ MusicalChairNoSensing(nbPlayers=nbPlayers, nbArms=nbArms, horizon=horizon) for _ in range(NB_PLAYERS) ]
state = None

Current state

nbPlayers = None

Number of players

nbArms = None

Number of arms

horizon = None

Parameter T (horizon)

chair = None

Current chair. Not sited yet.

cumulatedRewards = None

That’s the s_i(t) of the paper

nbObservations = None

That’s the o_i of the paper

A = None

A random permutation of arms, it will then be of size nbPlayers!

tau_phase_2 = None

Time when phase 2 starts

t = None

Internal times

__str__()[source]

-> str

startGame()[source]

Just reinitialize all the internal memory, and decide how to start (state 1 or 2).

choice()[source]

Choose an arm, as described by the Musical Chair with no Sensing algorithm.

getReward(arm, reward)[source]

Receive a reward on arm of index ‘arm’, as described by the Musical Chair with no Sensing algorithm.

  • If not collision, receive a reward after pulling the arm.

__module__ = 'Policies.MusicalChairNoSensing'
_endPhase2()[source]

Small computation needed at the end of the initial random exploration phase.

handleCollision(arm, reward=None)[source]

Handle a collision, on arm of index ‘arm’.

  • Here, as its name suggests it, the MusicalChairNoSensing algorithm does not use any collision information, hence this method is empty.

  • Warning: this method has to be implemented in the collision model, it is NOT implemented in the EvaluatorMultiPlayers.